bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Use obj_cgroup APIs to change kmem pages
@ 2021-03-01  6:22 Muchun Song
  2021-03-01  6:22 ` [PATCH 1/5] mm: memcontrol: introduce obj_cgroup_{un}charge_page Muchun Song
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Muchun Song @ 2021-03-01  6:22 UTC (permalink / raw)
  To: viro, jack, amir73il, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, kpsingh, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro,
	songmuchun, alex.shi, alexander.h.duyck, chris, richard.weiyang,
	vbabka, mathieu.desnoyers, posk, jannh, iamjoonsoo.kim,
	daniel.vetter, longman, walken, christian.brauner, ebiederm,
	keescook, krisman, esyr, surenb, elver
  Cc: linux-fsdevel, linux-kernel, netdev, bpf, cgroups, linux-mm,
	duanxiongchun

Since Roman series "The new cgroup slab memory controller" applied. All
slab objects are changed via the new APIs of obj_cgroup. This new APIs
introduce a struct obj_cgroup instead of using struct mem_cgroup directly
to charge slab objects. It prevents long-living objects from pinning the
original memory cgroup in the memory. But there are still some corner
objects (e.g. allocations larger than order-1 page on SLUB) which are
not charged via the API of obj_cgroup. Those objects (include the pages
which are allocated from buddy allocator directly) are charged as kmem
pages which still hold a reference to the memory cgroup.

E.g. We know that the kernel stack is charged as kmem pages because the
size of the kernel stack can be greater than 2 pages (e.g. 16KB on x86_64
or arm64). If we create a thread (suppose the thread stack is charged to
memory cgroup A) and then move it from memory cgroup A to memory cgroup
B. Because the kernel stack of the thread hold a reference to the memory
cgroup A. The thread can pin the memory cgroup A in the memory even if
we remove the cgroup A. If we want to see this scenario by using the
following script. We can see that the system has added 500 dying cgroups.

	#!/bin/bash

	cat /proc/cgroups | grep memory

	cd /sys/fs/cgroup/memory
	echo 1 > memory.move_charge_at_immigrate

	for i in range{1..500}
	do
		mkdir kmem_test
		echo $$ > kmem_test/cgroup.procs
		sleep 3600 &
		echo $$ > cgroup.procs
		echo `cat kmem_test/cgroup.procs` > cgroup.procs
		rmdir kmem_test
	done

	cat /proc/cgroups | grep memory

This patchset aims to make those kmem pages drop the reference to memory
cgroup by using the APIs of obj_cgroup. Finally, we can see that the number
of the dying cgroups will not increase if we run the above test script.

Patch 1-3 are using obj_cgroup APIs to charge kmem pages. The remote
memory cgroup charing APIs is a mechanism to charge kernel memory to a
given memory cgroup. So I also make it use the APIs of obj_cgroup.
Patch 4-5 are doing this.

Muchun Song (5):
  mm: memcontrol: introduce obj_cgroup_{un}charge_page
  mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem
    page
  mm: memcontrol: reparent the kmem pages on cgroup removal
  mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM
  mm: memcontrol: use object cgroup for remote memory cgroup charging

 fs/buffer.c                          |  10 +-
 fs/notify/fanotify/fanotify.c        |   6 +-
 fs/notify/fanotify/fanotify_user.c   |   2 +-
 fs/notify/group.c                    |   3 +-
 fs/notify/inotify/inotify_fsnotify.c |   8 +-
 fs/notify/inotify/inotify_user.c     |   2 +-
 include/linux/bpf.h                  |   2 +-
 include/linux/fsnotify_backend.h     |   2 +-
 include/linux/memcontrol.h           | 109 +++++++++++---
 include/linux/sched.h                |   6 +-
 include/linux/sched/mm.h             |  30 ++--
 kernel/bpf/syscall.c                 |  35 ++---
 kernel/fork.c                        |   4 +-
 mm/memcontrol.c                      | 276 ++++++++++++++++++++++-------------
 mm/page_alloc.c                      |   4 +-
 15 files changed, 324 insertions(+), 175 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] mm: memcontrol: introduce obj_cgroup_{un}charge_page
  2021-03-01  6:22 [PATCH 0/5] Use obj_cgroup APIs to change kmem pages Muchun Song
@ 2021-03-01  6:22 ` Muchun Song
  2021-03-01  6:22 ` [PATCH 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page Muchun Song
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Muchun Song @ 2021-03-01  6:22 UTC (permalink / raw)
  To: viro, jack, amir73il, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, kpsingh, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro,
	songmuchun, alex.shi, alexander.h.duyck, chris, richard.weiyang,
	vbabka, mathieu.desnoyers, posk, jannh, iamjoonsoo.kim,
	daniel.vetter, longman, walken, christian.brauner, ebiederm,
	keescook, krisman, esyr, surenb, elver
  Cc: linux-fsdevel, linux-kernel, netdev, bpf, cgroups, linux-mm,
	duanxiongchun

We know that the unit of charging slab object is bytes, the unit of
charging kmem page is PAGE_SIZE. So If we want to reuse obj_cgroup
APIs to charge the kmem pages, we should pass PAGE_SIZE (as third
parameter) to obj_cgroup_charge(). Because the charing size is page
size, we always need to refill objcg stock. This is pointless. As we
already know the charing size. So we can directly skip touch the
objcg stock and introduce obj_cgroup_{un}charge_page() to charge or
uncharge a kmem page.

In the later patch, we can reuse those helpers to charge/uncharge
the kmem pages. This is just code movement without any functional
change.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/memcontrol.c | 46 +++++++++++++++++++++++++++++++---------------
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2db2aeac8a9e..2eafbae504ac 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3060,6 +3060,34 @@ static void memcg_free_cache_id(int id)
 	ida_simple_remove(&memcg_cache_ida, id);
 }
 
+static inline void obj_cgroup_uncharge_page(struct obj_cgroup *objcg,
+					    unsigned int nr_pages)
+{
+	rcu_read_lock();
+	__memcg_kmem_uncharge(obj_cgroup_memcg(objcg), nr_pages);
+	rcu_read_unlock();
+}
+
+static int obj_cgroup_charge_page(struct obj_cgroup *objcg, gfp_t gfp,
+				  unsigned int nr_pages)
+{
+	struct mem_cgroup *memcg;
+	int ret;
+
+	rcu_read_lock();
+retry:
+	memcg = obj_cgroup_memcg(objcg);
+	if (unlikely(!css_tryget(&memcg->css)))
+		goto retry;
+	rcu_read_unlock();
+
+	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
+
+	css_put(&memcg->css);
+
+	return ret;
+}
+
 /**
  * __memcg_kmem_charge: charge a number of kernel pages to a memcg
  * @memcg: memory cgroup to charge
@@ -3184,11 +3212,8 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
 		unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
 		unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1);
 
-		if (nr_pages) {
-			rcu_read_lock();
-			__memcg_kmem_uncharge(obj_cgroup_memcg(old), nr_pages);
-			rcu_read_unlock();
-		}
+		if (nr_pages)
+			obj_cgroup_uncharge_page(old, nr_pages);
 
 		/*
 		 * The leftover is flushed to the centralized per-memcg value.
@@ -3246,7 +3271,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
 
 int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
 {
-	struct mem_cgroup *memcg;
 	unsigned int nr_pages, nr_bytes;
 	int ret;
 
@@ -3263,24 +3287,16 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
 	 * refill_obj_stock(), called from this function or
 	 * independently later.
 	 */
-	rcu_read_lock();
-retry:
-	memcg = obj_cgroup_memcg(objcg);
-	if (unlikely(!css_tryget(&memcg->css)))
-		goto retry;
-	rcu_read_unlock();
-
 	nr_pages = size >> PAGE_SHIFT;
 	nr_bytes = size & (PAGE_SIZE - 1);
 
 	if (nr_bytes)
 		nr_pages += 1;
 
-	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
+	ret = obj_cgroup_charge_page(objcg, gfp, nr_pages);
 	if (!ret && nr_bytes)
 		refill_obj_stock(objcg, PAGE_SIZE - nr_bytes);
 
-	css_put(&memcg->css);
 	return ret;
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-01  6:22 [PATCH 0/5] Use obj_cgroup APIs to change kmem pages Muchun Song
  2021-03-01  6:22 ` [PATCH 1/5] mm: memcontrol: introduce obj_cgroup_{un}charge_page Muchun Song
@ 2021-03-01  6:22 ` Muchun Song
  2021-03-01 18:11   ` Shakeel Butt
  2021-03-01  6:22 ` [PATCH 3/5] mm: memcontrol: reparent the kmem pages on cgroup removal Muchun Song
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Muchun Song @ 2021-03-01  6:22 UTC (permalink / raw)
  To: viro, jack, amir73il, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, kpsingh, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro,
	songmuchun, alex.shi, alexander.h.duyck, chris, richard.weiyang,
	vbabka, mathieu.desnoyers, posk, jannh, iamjoonsoo.kim,
	daniel.vetter, longman, walken, christian.brauner, ebiederm,
	keescook, krisman, esyr, surenb, elver
  Cc: linux-fsdevel, linux-kernel, netdev, bpf, cgroups, linux-mm,
	duanxiongchun

We want to reuse the obj_cgroup APIs to reparent the kmem pages when
the memcg offlined. If we do this, we should store an object cgroup
pointer to page->memcg_data for the kmem pages.

Finally, page->memcg_data can have 3 different meanings.

  1) For the slab pages, page->memcg_data points to an object cgroups
     vector.

  2) For the kmem pages (exclude the slab pages), page->memcg_data
     points to an object cgroup.

  3) For the user pages (e.g. the LRU pages), page->memcg_data points
     to a memory cgroup.

Currently we always get the memcg associated with a page via page_memcg
or page_memcg_rcu. page_memcg_check is special, it has to be used in
cases when it's not known if a page has an associated memory cgroup
pointer or an object cgroups vector. Because the page->memcg_data of
the kmem page is not pointing to a memory cgroup in the later patch,
the page_memcg and page_memcg_rcu cannot be applicable for the kmem
pages. In this patch, we introduce page_memcg_kmem to get the memcg
associated with the kmem pages. And make page_memcg and page_memcg_rcu
no longer apply to the kmem pages.

In the end, there are 4 helpers to get the memcg associated with a
page. The usage is as follows.

  1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU
     pages).

     - page_memcg()
     - page_memcg_rcu()

  2) Get the memory cgroup associated with a kmem page (exclude the slab
     pages).

     - page_memcg_kmem()

  3) Get the memory cgroup associated with a page. It has to be used in
     cases when it's not known if a page has an associated memory cgroup
     pointer or an object cgroups vector. Returns NULL for slab pages or
     uncharged pages, otherwise, returns memory cgroup for charged pages
     (e.g. kmem pages, LRU pages).

     - page_memcg_check()

In some place, we use page_memcg to check whether the page is charged.
Now we introduce page_memcg_charged helper to do this.

This is a preparation for reparenting the kmem pages. To support reparent
kmem pages, we just need to adjust page_memcg_kmem and page_memcg_check in
the later patch.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/memcontrol.h | 56 +++++++++++++++++++++++++++++++++++++++-------
 mm/memcontrol.c            | 23 ++++++++++---------
 mm/page_alloc.c            |  4 ++--
 3 files changed, 63 insertions(+), 20 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e6dc793d587d..1d2c82464c8c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -358,14 +358,46 @@ enum page_memcg_data_flags {
 
 #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
 
+/* Return true for charged page, otherwise false. */
+static inline bool page_memcg_charged(struct page *page)
+{
+	unsigned long memcg_data = page->memcg_data;
+
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+
+	return !!memcg_data;
+}
+
 /*
- * page_memcg - get the memory cgroup associated with a page
+ * page_memcg_kmem - get the memory cgroup associated with a kmem page.
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the memory cgroup associated with the kmem page,
+ * or NULL. This function assumes that the page is known to have a proper
+ * memory cgroup pointer. It is only suitable for kmem pages which means
+ * PageMemcgKmem() returns true for this page.
+ */
+static inline struct mem_cgroup *page_memcg_kmem(struct page *page)
+{
+	unsigned long memcg_data = page->memcg_data;
+
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+	VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page);
+
+	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+}
+
+/*
+ * page_memcg - get the memory cgroup associated with a non-kmem page
  * @page: a pointer to the page struct
  *
  * Returns a pointer to the memory cgroup associated with the page,
  * or NULL. This function assumes that the page is known to have a
  * proper memory cgroup pointer. It's not safe to call this function
- * against some type of pages, e.g. slab pages or ex-slab pages.
+ * against some type of pages, e.g. slab pages, kmem pages or ex-slab
+ * pages.
  *
  * Any of the following ensures page and memcg binding stability:
  * - the page lock
@@ -378,27 +410,30 @@ static inline struct mem_cgroup *page_memcg(struct page *page)
 	unsigned long memcg_data = page->memcg_data;
 
 	VM_BUG_ON_PAGE(PageSlab(page), page);
-	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_FLAGS_MASK, page);
 
-	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+	return (struct mem_cgroup *)memcg_data;
 }
 
 /*
- * page_memcg_rcu - locklessly get the memory cgroup associated with a page
+ * page_memcg_rcu - locklessly get the memory cgroup associated with a non-kmem page
  * @page: a pointer to the page struct
  *
  * Returns a pointer to the memory cgroup associated with the page,
  * or NULL. This function assumes that the page is known to have a
  * proper memory cgroup pointer. It's not safe to call this function
- * against some type of pages, e.g. slab pages or ex-slab pages.
+ * against some type of pages, e.g. slab pages, kmem pages or ex-slab
+ * pages.
  */
 static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
 {
+	unsigned long memcg_data = READ_ONCE(page->memcg_data);
+
 	VM_BUG_ON_PAGE(PageSlab(page), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_FLAGS_MASK, page);
 	WARN_ON_ONCE(!rcu_read_lock_held());
 
-	return (struct mem_cgroup *)(READ_ONCE(page->memcg_data) &
-				     ~MEMCG_DATA_FLAGS_MASK);
+	return (struct mem_cgroup *)memcg_data;
 }
 
 /*
@@ -1072,6 +1107,11 @@ void mem_cgroup_split_huge_fixup(struct page *head);
 
 struct mem_cgroup;
 
+static inline bool page_memcg_charged(struct page *page)
+{
+	return false;
+}
+
 static inline struct mem_cgroup *page_memcg(struct page *page)
 {
 	return NULL;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2eafbae504ac..bfd6efe1e196 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -855,10 +855,11 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
 			     int val)
 {
 	struct page *head = compound_head(page); /* rmap on tail pages */
-	struct mem_cgroup *memcg = page_memcg(head);
+	struct mem_cgroup *memcg;
 	pg_data_t *pgdat = page_pgdat(page);
 	struct lruvec *lruvec;
 
+	memcg = PageMemcgKmem(head) ? page_memcg_kmem(head) : page_memcg(head);
 	/* Untracked pages have no memcg, no lruvec. Update only the node */
 	if (!memcg) {
 		__mod_node_page_state(pgdat, idx, val);
@@ -3170,12 +3171,13 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
  */
 void __memcg_kmem_uncharge_page(struct page *page, int order)
 {
-	struct mem_cgroup *memcg = page_memcg(page);
+	struct mem_cgroup *memcg;
 	unsigned int nr_pages = 1 << order;
 
-	if (!memcg)
+	if (!page_memcg_charged(page))
 		return;
 
+	memcg = page_memcg_kmem(page);
 	VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
 	__memcg_kmem_uncharge(memcg, nr_pages);
 	page->memcg_data = 0;
@@ -6831,24 +6833,25 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 {
 	unsigned long nr_pages;
+	struct mem_cgroup *memcg;
 
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 
-	if (!page_memcg(page))
+	if (!page_memcg_charged(page))
 		return;
 
 	/*
 	 * Nobody should be changing or seriously looking at
-	 * page_memcg(page) at this point, we have fully
-	 * exclusive access to the page.
+	 * page memcg at this point, we have fully exclusive
+	 * access to the page.
 	 */
-
-	if (ug->memcg != page_memcg(page)) {
+	memcg = PageMemcgKmem(page) ? page_memcg_kmem(page) : page_memcg(page);
+	if (ug->memcg != memcg) {
 		if (ug->memcg) {
 			uncharge_batch(ug);
 			uncharge_gather_clear(ug);
 		}
-		ug->memcg = page_memcg(page);
+		ug->memcg = memcg;
 
 		/* pairs with css_put in uncharge_batch */
 		css_get(&ug->memcg->css);
@@ -6881,7 +6884,7 @@ void mem_cgroup_uncharge(struct page *page)
 		return;
 
 	/* Don't touch page->lru of any random page, pre-check: */
-	if (!page_memcg(page))
+	if (!page_memcg_charged(page))
 		return;
 
 	uncharge_gather_clear(&ug);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f10966e3b4a5..bcb58ae15e24 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1124,7 +1124,7 @@ static inline bool page_expected_state(struct page *page,
 	if (unlikely((unsigned long)page->mapping |
 			page_ref_count(page) |
 #ifdef CONFIG_MEMCG
-			(unsigned long)page_memcg(page) |
+			page_memcg_charged(page) |
 #endif
 			(page->flags & check_flags)))
 		return false;
@@ -1149,7 +1149,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
 			bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set";
 	}
 #ifdef CONFIG_MEMCG
-	if (unlikely(page_memcg(page)))
+	if (unlikely(page_memcg_charged(page)))
 		bad_reason = "page still charged to cgroup";
 #endif
 	return bad_reason;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] mm: memcontrol: reparent the kmem pages on cgroup removal
  2021-03-01  6:22 [PATCH 0/5] Use obj_cgroup APIs to change kmem pages Muchun Song
  2021-03-01  6:22 ` [PATCH 1/5] mm: memcontrol: introduce obj_cgroup_{un}charge_page Muchun Song
  2021-03-01  6:22 ` [PATCH 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page Muchun Song
@ 2021-03-01  6:22 ` Muchun Song
  2021-03-01  6:22 ` [PATCH 4/5] mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM Muchun Song
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Muchun Song @ 2021-03-01  6:22 UTC (permalink / raw)
  To: viro, jack, amir73il, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, kpsingh, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro,
	songmuchun, alex.shi, alexander.h.duyck, chris, richard.weiyang,
	vbabka, mathieu.desnoyers, posk, jannh, iamjoonsoo.kim,
	daniel.vetter, longman, walken, christian.brauner, ebiederm,
	keescook, krisman, esyr, surenb, elver
  Cc: linux-fsdevel, linux-kernel, netdev, bpf, cgroups, linux-mm,
	duanxiongchun

Currently the slab objects already reparent to it's parent memcg on
cgroup removal. But there are still some corner objects which are
not reparent (e.g. allocations larger than order-1 page on SLUB).
Actually those objects are allocated directly from the buddy allocator.
And they are chared as kmem to memcg via __memcg_kmem_charge_page().
Such objects are not reparent on cgroup removal.

So this patch aims to reparent kmem pages on cgroup removal. Doing
this is simple with help of the infrastructures of obj_cgroup.
Finally, the page->memcg_data points to an object cgroup for the
kmem page.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/memcontrol.h |  66 +++++++++++--------
 mm/memcontrol.c            | 155 ++++++++++++++++++++++++---------------------
 2 files changed, 124 insertions(+), 97 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1d2c82464c8c..27043478220f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -370,23 +370,15 @@ static inline bool page_memcg_charged(struct page *page)
 }
 
 /*
- * page_memcg_kmem - get the memory cgroup associated with a kmem page.
- * @page: a pointer to the page struct
+ * After the initialization objcg->memcg is always pointing at
+ * a valid memcg, but can be atomically swapped to the parent memcg.
  *
- * Returns a pointer to the memory cgroup associated with the kmem page,
- * or NULL. This function assumes that the page is known to have a proper
- * memory cgroup pointer. It is only suitable for kmem pages which means
- * PageMemcgKmem() returns true for this page.
+ * The caller must ensure that the returned memcg won't be released:
+ * e.g. acquire the rcu_read_lock or css_set_lock.
  */
-static inline struct mem_cgroup *page_memcg_kmem(struct page *page)
+static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
 {
-	unsigned long memcg_data = page->memcg_data;
-
-	VM_BUG_ON_PAGE(PageSlab(page), page);
-	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
-	VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page);
-
-	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+	return READ_ONCE(objcg->memcg);
 }
 
 /*
@@ -462,6 +454,17 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
 	if (memcg_data & MEMCG_DATA_OBJCGS)
 		return NULL;
 
+	if (memcg_data & MEMCG_DATA_KMEM) {
+		struct obj_cgroup *objcg;
+
+		/*
+		 * The caller must ensure that the returned memcg won't be
+		 * released: e.g. acquire the rcu_read_lock or css_set_lock.
+		 */
+		objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+		return obj_cgroup_memcg(objcg);
+	}
+
 	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
@@ -520,6 +523,24 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
 	return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
+/*
+ * page_objcg - get the object cgroup associated with a kmem page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the object cgroup associated with the kmem page,
+ * or NULL. This function assumes that the page is known to have an
+ * associated object cgroup. It's only safe to call this function
+ * against kmem pages (PageMemcgKmem() returns true).
+ */
+static inline struct obj_cgroup *page_objcg(struct page *page)
+{
+	unsigned long memcg_data = page->memcg_data;
+
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+	VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page);
+
+	return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+}
 #else
 static inline struct obj_cgroup **page_objcgs(struct page *page)
 {
@@ -530,6 +551,11 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
 {
 	return NULL;
 }
+
+static inline struct obj_cgroup *page_objcg(struct page *page)
+{
+	return NULL;
+}
 #endif
 
 static __always_inline bool memcg_stat_item_in_bytes(int idx)
@@ -748,18 +774,6 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg)
 	percpu_ref_put(&objcg->refcnt);
 }
 
-/*
- * After the initialization objcg->memcg is always pointing at
- * a valid memcg, but can be atomically swapped to the parent memcg.
- *
- * The caller must ensure that the returned memcg won't be released:
- * e.g. acquire the rcu_read_lock or css_set_lock.
- */
-static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
-{
-	return READ_ONCE(objcg->memcg);
-}
-
 static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 	if (memcg)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index bfd6efe1e196..39cb8c5bf8b2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -856,10 +856,16 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
 {
 	struct page *head = compound_head(page); /* rmap on tail pages */
 	struct mem_cgroup *memcg;
-	pg_data_t *pgdat = page_pgdat(page);
+	pg_data_t *pgdat;
 	struct lruvec *lruvec;
 
-	memcg = PageMemcgKmem(head) ? page_memcg_kmem(head) : page_memcg(head);
+	if (PageMemcgKmem(head)) {
+		__mod_lruvec_kmem_state(page_to_virt(head), idx, val);
+		return;
+	}
+
+	pgdat = page_pgdat(head);
+	memcg = page_memcg(head);
 	/* Untracked pages have no memcg, no lruvec. Update only the node */
 	if (!memcg) {
 		__mod_node_page_state(pgdat, idx, val);
@@ -1056,24 +1062,6 @@ static __always_inline struct mem_cgroup *active_memcg(void)
 		return current->active_memcg;
 }
 
-static __always_inline struct mem_cgroup *get_active_memcg(void)
-{
-	struct mem_cgroup *memcg;
-
-	rcu_read_lock();
-	memcg = active_memcg();
-	if (memcg) {
-		/* current->active_memcg must hold a ref. */
-		if (WARN_ON_ONCE(!css_tryget(&memcg->css)))
-			memcg = root_mem_cgroup;
-		else
-			memcg = current->active_memcg;
-	}
-	rcu_read_unlock();
-
-	return memcg;
-}
-
 static __always_inline bool memcg_kmem_bypass(void)
 {
 	/* Allow remote memcg charging from any context. */
@@ -1088,20 +1076,6 @@ static __always_inline bool memcg_kmem_bypass(void)
 }
 
 /**
- * If active memcg is set, do not fallback to current->mm->memcg.
- */
-static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void)
-{
-	if (memcg_kmem_bypass())
-		return NULL;
-
-	if (unlikely(active_memcg()))
-		return get_active_memcg();
-
-	return get_mem_cgroup_from_mm(current->mm);
-}
-
-/**
  * mem_cgroup_iter - iterate over memory cgroup hierarchy
  * @root: hierarchy root
  * @prev: previously returned memcg, NULL on first invocation
@@ -3148,18 +3122,18 @@ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page
  */
 int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
 {
-	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
 	int ret = 0;
 
-	memcg = get_mem_cgroup_from_current();
-	if (memcg && !mem_cgroup_is_root(memcg)) {
-		ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
+	objcg = get_obj_cgroup_from_current();
+	if (objcg) {
+		ret = obj_cgroup_charge_page(objcg, gfp, 1 << order);
 		if (!ret) {
-			page->memcg_data = (unsigned long)memcg |
+			page->memcg_data = (unsigned long)objcg |
 				MEMCG_DATA_KMEM;
 			return 0;
 		}
-		css_put(&memcg->css);
+		obj_cgroup_put(objcg);
 	}
 	return ret;
 }
@@ -3171,17 +3145,18 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
  */
 void __memcg_kmem_uncharge_page(struct page *page, int order)
 {
-	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
 	unsigned int nr_pages = 1 << order;
 
 	if (!page_memcg_charged(page))
 		return;
 
-	memcg = page_memcg_kmem(page);
-	VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
-	__memcg_kmem_uncharge(memcg, nr_pages);
+	VM_BUG_ON_PAGE(!PageMemcgKmem(page), page);
+
+	objcg = page_objcg(page);
+	obj_cgroup_uncharge_page(objcg, nr_pages);
 	page->memcg_data = 0;
-	css_put(&memcg->css);
+	obj_cgroup_put(objcg);
 }
 
 static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
@@ -6798,8 +6773,12 @@ struct uncharge_gather {
 	struct mem_cgroup *memcg;
 	unsigned long nr_pages;
 	unsigned long pgpgout;
-	unsigned long nr_kmem;
 	struct page *dummy_page;
+
+#ifdef CONFIG_MEMCG_KMEM
+	struct obj_cgroup *objcg;
+	unsigned long nr_kmem;
+#endif
 };
 
 static inline void uncharge_gather_clear(struct uncharge_gather *ug)
@@ -6811,12 +6790,21 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 {
 	unsigned long flags;
 
+#ifdef CONFIG_MEMCG_KMEM
+	if (ug->objcg) {
+		obj_cgroup_uncharge_page(ug->objcg, ug->nr_kmem);
+		/* drop reference from uncharge_kmem_page */
+		obj_cgroup_put(ug->objcg);
+	}
+#endif
+
+	if (!ug->memcg)
+		return;
+
 	if (!mem_cgroup_is_root(ug->memcg)) {
 		page_counter_uncharge(&ug->memcg->memory, ug->nr_pages);
 		if (do_memsw_account())
 			page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages);
-		if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
-			page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
 		memcg_oom_recover(ug->memcg);
 	}
 
@@ -6826,26 +6814,40 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 	memcg_check_events(ug->memcg, ug->dummy_page);
 	local_irq_restore(flags);
 
-	/* drop reference from uncharge_page */
+	/* drop reference from uncharge_user_page */
 	css_put(&ug->memcg->css);
 }
 
-static void uncharge_page(struct page *page, struct uncharge_gather *ug)
+#ifdef CONFIG_MEMCG_KMEM
+static void uncharge_kmem_page(struct page *page, struct uncharge_gather *ug)
 {
-	unsigned long nr_pages;
-	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg = page_objcg(page);
 
-	VM_BUG_ON_PAGE(PageLRU(page), page);
+	if (ug->objcg != objcg) {
+		if (ug->objcg) {
+			uncharge_batch(ug);
+			uncharge_gather_clear(ug);
+		}
+		ug->objcg = objcg;
 
-	if (!page_memcg_charged(page))
-		return;
+		/* pairs with obj_cgroup_put in uncharge_batch */
+		obj_cgroup_get(ug->objcg);
+	}
+
+	ug->nr_kmem += compound_nr(page);
+	page->memcg_data = 0;
+	obj_cgroup_put(ug->objcg);
+}
+#else
+static void uncharge_kmem_page(struct page *page, struct uncharge_gather *ug)
+{
+}
+#endif
+
+static void uncharge_user_page(struct page *page, struct uncharge_gather *ug)
+{
+	struct mem_cgroup *memcg = page_memcg(page);
 
-	/*
-	 * Nobody should be changing or seriously looking at
-	 * page memcg at this point, we have fully exclusive
-	 * access to the page.
-	 */
-	memcg = PageMemcgKmem(page) ? page_memcg_kmem(page) : page_memcg(page);
 	if (ug->memcg != memcg) {
 		if (ug->memcg) {
 			uncharge_batch(ug);
@@ -6856,18 +6858,30 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 		/* pairs with css_put in uncharge_batch */
 		css_get(&ug->memcg->css);
 	}
+	ug->pgpgout++;
+	ug->dummy_page = page;
+
+	ug->nr_pages += compound_nr(page);
+	page->memcg_data = 0;
+	css_put(&ug->memcg->css);
+}
 
-	nr_pages = compound_nr(page);
-	ug->nr_pages += nr_pages;
+static void uncharge_page(struct page *page, struct uncharge_gather *ug)
+{
+	VM_BUG_ON_PAGE(PageLRU(page), page);
 
+	if (!page_memcg_charged(page))
+		return;
+
+	/*
+	 * Nobody should be changing or seriously looking at
+	 * page memcg at this point, we have fully exclusive
+	 * access to the page.
+	 */
 	if (PageMemcgKmem(page))
-		ug->nr_kmem += nr_pages;
+		uncharge_kmem_page(page, ug);
 	else
-		ug->pgpgout++;
-
-	ug->dummy_page = page;
-	page->memcg_data = 0;
-	css_put(&ug->memcg->css);
+		uncharge_user_page(page, ug);
 }
 
 /**
@@ -6910,8 +6924,7 @@ void mem_cgroup_uncharge_list(struct list_head *page_list)
 	uncharge_gather_clear(&ug);
 	list_for_each_entry(page, page_list, lru)
 		uncharge_page(page, &ug);
-	if (ug.memcg)
-		uncharge_batch(&ug);
+	uncharge_batch(&ug);
 }
 
 /**
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/5] mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM
  2021-03-01  6:22 [PATCH 0/5] Use obj_cgroup APIs to change kmem pages Muchun Song
                   ` (2 preceding siblings ...)
  2021-03-01  6:22 ` [PATCH 3/5] mm: memcontrol: reparent the kmem pages on cgroup removal Muchun Song
@ 2021-03-01  6:22 ` Muchun Song
  2021-03-02  1:15   ` Roman Gushchin
  2021-03-01  6:22 ` [PATCH 5/5] mm: memcontrol: use object cgroup for remote memory cgroup charging Muchun Song
  2021-03-02  1:12 ` [PATCH 0/5] Use obj_cgroup APIs to change kmem pages Roman Gushchin
  5 siblings, 1 reply; 18+ messages in thread
From: Muchun Song @ 2021-03-01  6:22 UTC (permalink / raw)
  To: viro, jack, amir73il, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, kpsingh, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro,
	songmuchun, alex.shi, alexander.h.duyck, chris, richard.weiyang,
	vbabka, mathieu.desnoyers, posk, jannh, iamjoonsoo.kim,
	daniel.vetter, longman, walken, christian.brauner, ebiederm,
	keescook, krisman, esyr, surenb, elver
  Cc: linux-fsdevel, linux-kernel, netdev, bpf, cgroups, linux-mm,
	duanxiongchun

The remote memcg charing APIs is a mechanism to charge kernel memory
to a given memcg. So we can move the infrastructure to the scope of
the CONFIG_MEMCG_KMEM.

As a bonus, on !CONFIG_MEMCG_KMEM build some functions and variables
can be compiled out.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/sched.h    | 2 ++
 include/linux/sched/mm.h | 2 +-
 kernel/fork.c            | 2 +-
 mm/memcontrol.c          | 4 ++++
 4 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ee46f5cab95b..c2d488eddf85 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1314,7 +1314,9 @@ struct task_struct {
 
 	/* Number of pages to reclaim on returning to userland: */
 	unsigned int			memcg_nr_pages_over_high;
+#endif
 
+#ifdef CONFIG_MEMCG_KMEM
 	/* Used by memcontrol for targeted memcg charge: */
 	struct mem_cgroup		*active_memcg;
 #endif
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 1ae08b8462a4..64a72975270e 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -294,7 +294,7 @@ static inline void memalloc_nocma_restore(unsigned int flags)
 }
 #endif
 
-#ifdef CONFIG_MEMCG
+#ifdef CONFIG_MEMCG_KMEM
 DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
 /**
  * set_active_memcg - Starts the remote memcg charging scope.
diff --git a/kernel/fork.c b/kernel/fork.c
index d66cd1014211..d66718bc82d5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -942,7 +942,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	tsk->use_memdelay = 0;
 #endif
 
-#ifdef CONFIG_MEMCG
+#ifdef CONFIG_MEMCG_KMEM
 	tsk->active_memcg = NULL;
 #endif
 	return tsk;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 39cb8c5bf8b2..092dc4588b43 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -76,8 +76,10 @@ EXPORT_SYMBOL(memory_cgrp_subsys);
 
 struct mem_cgroup *root_mem_cgroup __read_mostly;
 
+#ifdef CONFIG_MEMCG_KMEM
 /* Active memory cgroup to use from an interrupt context */
 DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg);
+#endif
 
 /* Socket memory accounting disabled? */
 static bool cgroup_memory_nosocket;
@@ -1054,6 +1056,7 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
 }
 EXPORT_SYMBOL(get_mem_cgroup_from_mm);
 
+#ifdef CONFIG_MEMCG_KMEM
 static __always_inline struct mem_cgroup *active_memcg(void)
 {
 	if (in_interrupt())
@@ -1074,6 +1077,7 @@ static __always_inline bool memcg_kmem_bypass(void)
 
 	return false;
 }
+#endif
 
 /**
  * mem_cgroup_iter - iterate over memory cgroup hierarchy
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/5] mm: memcontrol: use object cgroup for remote memory cgroup charging
  2021-03-01  6:22 [PATCH 0/5] Use obj_cgroup APIs to change kmem pages Muchun Song
                   ` (3 preceding siblings ...)
  2021-03-01  6:22 ` [PATCH 4/5] mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM Muchun Song
@ 2021-03-01  6:22 ` Muchun Song
  2021-03-02  1:12 ` [PATCH 0/5] Use obj_cgroup APIs to change kmem pages Roman Gushchin
  5 siblings, 0 replies; 18+ messages in thread
From: Muchun Song @ 2021-03-01  6:22 UTC (permalink / raw)
  To: viro, jack, amir73il, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, kpsingh, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro,
	songmuchun, alex.shi, alexander.h.duyck, chris, richard.weiyang,
	vbabka, mathieu.desnoyers, posk, jannh, iamjoonsoo.kim,
	daniel.vetter, longman, walken, christian.brauner, ebiederm,
	keescook, krisman, esyr, surenb, elver
  Cc: linux-fsdevel, linux-kernel, netdev, bpf, cgroups, linux-mm,
	duanxiongchun

We spent a lot of energy to make slab accounting do not hold a refcount
to memory cgroup, so the dying cgroup can be freed as soon as possible
on cgroup offlined.

But some users of remote memory cgroup charging (e.g. bpf and fsnotify)
hold a refcount to memory cgroup for charging to it later. Actually,
the slab core use obj_cgroup APIs for memory cgroup charing, so we can
hold a refcount to obj_cgroup instead of memory cgroup. In this case,
the infrastructure of remote meory charging also do not hold a refcount
to memory cgroup.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 fs/buffer.c                          | 10 ++++--
 fs/notify/fanotify/fanotify.c        |  6 ++--
 fs/notify/fanotify/fanotify_user.c   |  2 +-
 fs/notify/group.c                    |  3 +-
 fs/notify/inotify/inotify_fsnotify.c |  8 ++---
 fs/notify/inotify/inotify_user.c     |  2 +-
 include/linux/bpf.h                  |  2 +-
 include/linux/fsnotify_backend.h     |  2 +-
 include/linux/memcontrol.h           | 15 ++++++++
 include/linux/sched.h                |  4 +--
 include/linux/sched/mm.h             | 28 +++++++--------
 kernel/bpf/syscall.c                 | 35 +++++++++----------
 kernel/fork.c                        |  2 +-
 mm/memcontrol.c                      | 66 ++++++++++++++++++++++++++++--------
 14 files changed, 121 insertions(+), 64 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 591547779dbd..cc99fcf66368 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -842,14 +842,16 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
 	struct buffer_head *bh, *head;
 	gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
 	long offset;
-	struct mem_cgroup *memcg, *old_memcg;
+	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg, *old_objcg;
 
 	if (retry)
 		gfp |= __GFP_NOFAIL;
 
 	/* The page lock pins the memcg */
 	memcg = page_memcg(page);
-	old_memcg = set_active_memcg(memcg);
+	objcg = get_obj_cgroup_from_mem_cgroup(memcg);
+	old_objcg = set_active_obj_cgroup(objcg);
 
 	head = NULL;
 	offset = PAGE_SIZE;
@@ -868,7 +870,9 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
 		set_bh_page(bh, page, offset);
 	}
 out:
-	set_active_memcg(old_memcg);
+	set_active_obj_cgroup(old_objcg);
+	if (objcg)
+		obj_cgroup_put(objcg);
 	return head;
 /*
  * In case anything failed, we just free everything we got.
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 1192c9953620..04d24acfffc7 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -530,7 +530,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir);
 	const struct path *path = fsnotify_data_path(data, data_type);
 	unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
-	struct mem_cgroup *old_memcg;
+	struct obj_cgroup *old_objcg;
 	struct inode *child = NULL;
 	bool name_event = false;
 
@@ -580,7 +580,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 		gfp |= __GFP_RETRY_MAYFAIL;
 
 	/* Whoever is interested in the event, pays for the allocation. */
-	old_memcg = set_active_memcg(group->memcg);
+	old_objcg = set_active_obj_cgroup(group->objcg);
 
 	if (fanotify_is_perm_event(mask)) {
 		event = fanotify_alloc_perm_event(path, gfp);
@@ -608,7 +608,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 		event->pid = get_pid(task_tgid(current));
 
 out:
-	set_active_memcg(old_memcg);
+	set_active_obj_cgroup(old_objcg);
 	return event;
 }
 
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9e0c1afac8bd..055ca36d4e0e 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -985,7 +985,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	group->fanotify_data.user = user;
 	group->fanotify_data.flags = flags;
 	atomic_inc(&user->fanotify_listeners);
-	group->memcg = get_mem_cgroup_from_mm(current->mm);
+	group->objcg = get_obj_cgroup_from_current();
 
 	group->overflow_event = fanotify_alloc_overflow_event();
 	if (unlikely(!group->overflow_event)) {
diff --git a/fs/notify/group.c b/fs/notify/group.c
index ffd723ffe46d..fac46b92c16f 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -24,7 +24,8 @@ static void fsnotify_final_destroy_group(struct fsnotify_group *group)
 	if (group->ops->free_group_priv)
 		group->ops->free_group_priv(group);
 
-	mem_cgroup_put(group->memcg);
+	if (group->objcg)
+		obj_cgroup_put(group->objcg);
 	mutex_destroy(&group->mark_mutex);
 
 	kfree(group);
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index 1901d799909b..20835554819a 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -66,7 +66,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask,
 	int ret;
 	int len = 0;
 	int alloc_len = sizeof(struct inotify_event_info);
-	struct mem_cgroup *old_memcg;
+	struct obj_cgroup *old_objcg;
 
 	if (name) {
 		len = name->len;
@@ -81,12 +81,12 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask,
 
 	/*
 	 * Whoever is interested in the event, pays for the allocation. Do not
-	 * trigger OOM killer in the target monitoring memcg as it may have
+	 * trigger OOM killer in the target monitoring objcg as it may have
 	 * security repercussion.
 	 */
-	old_memcg = set_active_memcg(group->memcg);
+	old_objcg = set_active_obj_cgroup(group->objcg);
 	event = kmalloc(alloc_len, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
-	set_active_memcg(old_memcg);
+	set_active_obj_cgroup(old_objcg);
 
 	if (unlikely(!event)) {
 		/*
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index c71be4fb7dc5..5b4de477fcac 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -649,7 +649,7 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events)
 	oevent->name_len = 0;
 
 	group->max_events = max_events;
-	group->memcg = get_mem_cgroup_from_mm(current->mm);
+	group->objcg = get_obj_cgroup_from_current();
 
 	spin_lock_init(&group->inotify_data.idr_lock);
 	idr_init(&group->inotify_data.idr);
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cccaef1088ea..b6894e3cd095 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -158,7 +158,7 @@ struct bpf_map {
 	u32 btf_value_type_id;
 	struct btf *btf;
 #ifdef CONFIG_MEMCG_KMEM
-	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
 #endif
 	char name[BPF_OBJ_NAME_LEN];
 	u32 btf_vmlinux_value_type_id;
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index e5409b83e731..d0303f634da6 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -220,7 +220,7 @@ struct fsnotify_group {
 						 * notification list is too
 						 * full */
 
-	struct mem_cgroup *memcg;	/* memcg to charge allocations */
+	struct obj_cgroup *objcg;	/* objcg to charge allocations */
 
 	/* groups can define private fields here or use the void *private */
 	union {
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 27043478220f..96e63ec7274a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1642,6 +1642,7 @@ static inline void memcg_set_shrinker_bit(struct mem_cgroup *memcg,
 int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
 void __memcg_kmem_uncharge_page(struct page *page, int order);
 
+struct obj_cgroup *get_obj_cgroup_from_mem_cgroup(struct mem_cgroup *memcg);
 struct obj_cgroup *get_obj_cgroup_from_current(void);
 
 int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size);
@@ -1692,6 +1693,20 @@ static inline int memcg_cache_id(struct mem_cgroup *memcg)
 struct mem_cgroup *mem_cgroup_from_obj(void *p);
 
 #else
+static inline
+struct obj_cgroup *get_obj_cgroup_from_mem_cgroup(struct mem_cgroup *memcg)
+{
+	return NULL;
+}
+
+static inline struct obj_cgroup *get_obj_cgroup_from_current(void)
+{
+	return NULL;
+}
+
+static inline void obj_cgroup_put(struct obj_cgroup *objcg)
+{
+}
 
 static inline int memcg_kmem_charge_page(struct page *page, gfp_t gfp,
 					 int order)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c2d488eddf85..75d5b571edcb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1317,8 +1317,8 @@ struct task_struct {
 #endif
 
 #ifdef CONFIG_MEMCG_KMEM
-	/* Used by memcontrol for targeted memcg charge: */
-	struct mem_cgroup		*active_memcg;
+	/* Used by memcontrol for targeted object cgroup charge: */
+	struct obj_cgroup		*active_objcg;
 #endif
 
 #ifdef CONFIG_BLK_CGROUP
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 64a72975270e..e713f4290914 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -295,36 +295,34 @@ static inline void memalloc_nocma_restore(unsigned int flags)
 #endif
 
 #ifdef CONFIG_MEMCG_KMEM
-DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
+DECLARE_PER_CPU(struct obj_cgroup *, int_active_objcg);
 /**
- * set_active_memcg - Starts the remote memcg charging scope.
- * @memcg: memcg to charge.
+ * set_active_obj_cgroup - Starts the remote object cgroup charging scope.
+ * @objcg: object cgroup to charge.
  *
- * This function marks the beginning of the remote memcg charging scope. All the
- * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
- * given memcg.
+ * This function marks the beginning of the remote object cgroup charging scope.
+ * All the __GFP_ACCOUNT allocations till the end of the scope will be charged
+ * to the given object cgroup.
  *
  * NOTE: This function can nest. Users must save the return value and
  * reset the previous value after their own charging scope is over.
  */
-static inline struct mem_cgroup *
-set_active_memcg(struct mem_cgroup *memcg)
+static inline struct obj_cgroup *set_active_obj_cgroup(struct obj_cgroup *objcg)
 {
-	struct mem_cgroup *old;
+	struct obj_cgroup *old;
 
 	if (in_interrupt()) {
-		old = this_cpu_read(int_active_memcg);
-		this_cpu_write(int_active_memcg, memcg);
+		old = this_cpu_read(int_active_objcg);
+		this_cpu_write(int_active_objcg, objcg);
 	} else {
-		old = current->active_memcg;
-		current->active_memcg = memcg;
+		old = current->active_objcg;
+		current->active_objcg = objcg;
 	}
 
 	return old;
 }
 #else
-static inline struct mem_cgroup *
-set_active_memcg(struct mem_cgroup *memcg)
+static inline struct obj_cgroup *set_active_obj_cgroup(struct obj_cgroup *objcg)
 {
 	return NULL;
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c859bc46d06c..1b078eddf083 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -390,37 +390,38 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
 }
 
 #ifdef CONFIG_MEMCG_KMEM
-static void bpf_map_save_memcg(struct bpf_map *map)
+static void bpf_map_save_objcg(struct bpf_map *map)
 {
-	map->memcg = get_mem_cgroup_from_mm(current->mm);
+	map->objcg = get_obj_cgroup_from_current();
 }
 
-static void bpf_map_release_memcg(struct bpf_map *map)
+static void bpf_map_release_objcg(struct bpf_map *map)
 {
-	mem_cgroup_put(map->memcg);
+	if (map->objcg)
+		obj_cgroup_put(map->objcg);
 }
 
 void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
 			   int node)
 {
-	struct mem_cgroup *old_memcg;
+	struct obj_cgroup *old_objcg;
 	void *ptr;
 
-	old_memcg = set_active_memcg(map->memcg);
+	old_objcg = set_active_obj_cgroup(map->objcg);
 	ptr = kmalloc_node(size, flags | __GFP_ACCOUNT, node);
-	set_active_memcg(old_memcg);
+	set_active_obj_cgroup(old_objcg);
 
 	return ptr;
 }
 
 void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags)
 {
-	struct mem_cgroup *old_memcg;
+	struct obj_cgroup *old_objcg;
 	void *ptr;
 
-	old_memcg = set_active_memcg(map->memcg);
+	old_objcg = set_active_obj_cgroup(map->objcg);
 	ptr = kzalloc(size, flags | __GFP_ACCOUNT);
-	set_active_memcg(old_memcg);
+	set_active_obj_cgroup(old_objcg);
 
 	return ptr;
 }
@@ -428,22 +429,22 @@ void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags)
 void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 				    size_t align, gfp_t flags)
 {
-	struct mem_cgroup *old_memcg;
+	struct obj_cgroup *old_objcg;
 	void __percpu *ptr;
 
-	old_memcg = set_active_memcg(map->memcg);
+	old_objcg = set_active_obj_cgroup(map->objcg);
 	ptr = __alloc_percpu_gfp(size, align, flags | __GFP_ACCOUNT);
-	set_active_memcg(old_memcg);
+	set_active_obj_cgroup(old_objcg);
 
 	return ptr;
 }
 
 #else
-static void bpf_map_save_memcg(struct bpf_map *map)
+static void bpf_map_save_objcg(struct bpf_map *map)
 {
 }
 
-static void bpf_map_release_memcg(struct bpf_map *map)
+static void bpf_map_release_objcg(struct bpf_map *map)
 {
 }
 #endif
@@ -454,7 +455,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
 	struct bpf_map *map = container_of(work, struct bpf_map, work);
 
 	security_bpf_map_free(map);
-	bpf_map_release_memcg(map);
+	bpf_map_release_objcg(map);
 	/* implementation dependent freeing */
 	map->ops->map_free(map);
 }
@@ -877,7 +878,7 @@ static int map_create(union bpf_attr *attr)
 	if (err)
 		goto free_map_sec;
 
-	bpf_map_save_memcg(map);
+	bpf_map_save_objcg(map);
 
 	err = bpf_map_new_fd(map, f_flags);
 	if (err < 0) {
diff --git a/kernel/fork.c b/kernel/fork.c
index d66718bc82d5..5a800916ad8d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -943,7 +943,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 #endif
 
 #ifdef CONFIG_MEMCG_KMEM
-	tsk->active_memcg = NULL;
+	tsk->active_objcg = NULL;
 #endif
 	return tsk;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 092dc4588b43..024a0f377eb7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -77,8 +77,8 @@ EXPORT_SYMBOL(memory_cgrp_subsys);
 struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #ifdef CONFIG_MEMCG_KMEM
-/* Active memory cgroup to use from an interrupt context */
-DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg);
+/* Active object cgroup to use from an interrupt context */
+DEFINE_PER_CPU(struct obj_cgroup *, int_active_objcg);
 #endif
 
 /* Socket memory accounting disabled? */
@@ -1057,18 +1057,18 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
 EXPORT_SYMBOL(get_mem_cgroup_from_mm);
 
 #ifdef CONFIG_MEMCG_KMEM
-static __always_inline struct mem_cgroup *active_memcg(void)
+static __always_inline struct obj_cgroup *active_obj_cgroup(void)
 {
 	if (in_interrupt())
-		return this_cpu_read(int_active_memcg);
+		return this_cpu_read(int_active_objcg);
 	else
-		return current->active_memcg;
+		return current->active_objcg;
 }
 
 static __always_inline bool memcg_kmem_bypass(void)
 {
 	/* Allow remote memcg charging from any context. */
-	if (unlikely(active_memcg()))
+	if (unlikely(active_obj_cgroup()))
 		return false;
 
 	/* Memcg to charge can't be determined. */
@@ -2971,26 +2971,47 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
 	return page_memcg_check(page);
 }
 
-__always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
+__always_inline
+struct obj_cgroup *get_obj_cgroup_from_mem_cgroup(struct mem_cgroup *memcg)
 {
 	struct obj_cgroup *objcg = NULL;
+
+	rcu_read_lock();
+	for (; !mem_cgroup_is_root(memcg); memcg = parent_mem_cgroup(memcg)) {
+		objcg = rcu_dereference(memcg->objcg);
+		if (objcg && obj_cgroup_tryget(objcg))
+			break;
+		objcg = NULL;
+	}
+	rcu_read_unlock();
+
+	return objcg;
+}
+
+__always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
+{
+	struct obj_cgroup *objcg;
 	struct mem_cgroup *memcg;
 
 	if (memcg_kmem_bypass())
 		return NULL;
 
 	rcu_read_lock();
-	if (unlikely(active_memcg()))
-		memcg = active_memcg();
-	else
-		memcg = mem_cgroup_from_task(current);
+	objcg = active_obj_cgroup();
+	if (unlikely(objcg)) {
+		/* remote object cgroup must hold a reference. */
+		obj_cgroup_get(objcg);
+		goto out;
+	}
 
+	memcg = mem_cgroup_from_task(current);
 	for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) {
 		objcg = rcu_dereference(memcg->objcg);
 		if (objcg && obj_cgroup_tryget(objcg))
 			break;
 		objcg = NULL;
 	}
+out:
 	rcu_read_unlock();
 
 	return objcg;
@@ -5296,16 +5317,33 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
 	return ERR_PTR(error);
 }
 
+#ifdef CONFIG_MEMCG_KMEM
+static inline struct obj_cgroup *memcg_obj_cgroup(struct mem_cgroup *memcg)
+{
+	return memcg ? memcg->objcg : NULL;
+}
+#else
+static inline struct obj_cgroup *memcg_obj_cgroup(struct mem_cgroup *memcg)
+{
+	return NULL;
+}
+#endif
+
 static struct cgroup_subsys_state * __ref
 mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 {
 	struct mem_cgroup *parent = mem_cgroup_from_css(parent_css);
-	struct mem_cgroup *memcg, *old_memcg;
+	struct mem_cgroup *memcg;
+	struct obj_cgroup *old_objcg;
 	long error = -ENOMEM;
 
-	old_memcg = set_active_memcg(parent);
+	/*
+	 * The @parent cannot be offlined, so @parent->objcg cannot be freed
+	 * under us.
+	 */
+	old_objcg = set_active_obj_cgroup(memcg_obj_cgroup(parent));
 	memcg = mem_cgroup_alloc();
-	set_active_memcg(old_memcg);
+	set_active_obj_cgroup(old_objcg);
 	if (IS_ERR(memcg))
 		return ERR_CAST(memcg);
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-01  6:22 ` [PATCH 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page Muchun Song
@ 2021-03-01 18:11   ` Shakeel Butt
  2021-03-01 19:09     ` Johannes Weiner
  2021-03-02  3:03     ` Muchun Song
  0 siblings, 2 replies; 18+ messages in thread
From: Shakeel Butt @ 2021-03-01 18:11 UTC (permalink / raw)
  To: Muchun Song
  Cc: Alexander Viro, Jan Kara, Amir Goldstein, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, john.fastabend, kpsingh, Ingo Molnar,
	Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, dietmar.eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, bristot, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Alex Shi, alexander.h.duyck, Chris Down, Wei Yang,
	Vlastimil Babka, Mathieu Desnoyers, Peter Oskolkov, Jann Horn,
	Joonsoo Kim, daniel.vetter, Waiman Long, Michel Lespinasse,
	Christian Brauner, Eric W. Biederman, Kees Cook, krisman, esyr,
	Suren Baghdasaryan, Marco Elver, linux-fsdevel, LKML, netdev,
	bpf, Cgroups, Linux MM, duanxiongchun

On Sun, Feb 28, 2021 at 10:25 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> We want to reuse the obj_cgroup APIs to reparent the kmem pages when
> the memcg offlined. If we do this, we should store an object cgroup
> pointer to page->memcg_data for the kmem pages.
>
> Finally, page->memcg_data can have 3 different meanings.
>
>   1) For the slab pages, page->memcg_data points to an object cgroups
>      vector.
>
>   2) For the kmem pages (exclude the slab pages), page->memcg_data
>      points to an object cgroup.
>
>   3) For the user pages (e.g. the LRU pages), page->memcg_data points
>      to a memory cgroup.
>
> Currently we always get the memcg associated with a page via page_memcg
> or page_memcg_rcu. page_memcg_check is special, it has to be used in
> cases when it's not known if a page has an associated memory cgroup
> pointer or an object cgroups vector. Because the page->memcg_data of
> the kmem page is not pointing to a memory cgroup in the later patch,
> the page_memcg and page_memcg_rcu cannot be applicable for the kmem
> pages. In this patch, we introduce page_memcg_kmem to get the memcg
> associated with the kmem pages. And make page_memcg and page_memcg_rcu
> no longer apply to the kmem pages.
>
> In the end, there are 4 helpers to get the memcg associated with a
> page. The usage is as follows.
>
>   1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU
>      pages).
>
>      - page_memcg()
>      - page_memcg_rcu()

Can you rename these to page_memcg_lru[_rcu] to make them explicitly
for LRU pages?

>
>   2) Get the memory cgroup associated with a kmem page (exclude the slab
>      pages).
>
>      - page_memcg_kmem()
>
>   3) Get the memory cgroup associated with a page. It has to be used in
>      cases when it's not known if a page has an associated memory cgroup
>      pointer or an object cgroups vector. Returns NULL for slab pages or
>      uncharged pages, otherwise, returns memory cgroup for charged pages
>      (e.g. kmem pages, LRU pages).
>
>      - page_memcg_check()
>
> In some place, we use page_memcg to check whether the page is charged.
> Now we introduce page_memcg_charged helper to do this.
>
> This is a preparation for reparenting the kmem pages. To support reparent
> kmem pages, we just need to adjust page_memcg_kmem and page_memcg_check in
> the later patch.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
[snip]
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -855,10 +855,11 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
>                              int val)
>  {
>         struct page *head = compound_head(page); /* rmap on tail pages */
> -       struct mem_cgroup *memcg = page_memcg(head);
> +       struct mem_cgroup *memcg;
>         pg_data_t *pgdat = page_pgdat(page);
>         struct lruvec *lruvec;
>
> +       memcg = PageMemcgKmem(head) ? page_memcg_kmem(head) : page_memcg(head);

Should page_memcg_check() be used here?

>         /* Untracked pages have no memcg, no lruvec. Update only the node */
>         if (!memcg) {
>                 __mod_node_page_state(pgdat, idx, val);
> @@ -3170,12 +3171,13 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
>   */
>  void __memcg_kmem_uncharge_page(struct page *page, int order)
>  {
> -       struct mem_cgroup *memcg = page_memcg(page);
> +       struct mem_cgroup *memcg;
>         unsigned int nr_pages = 1 << order;
>
> -       if (!memcg)
> +       if (!page_memcg_charged(page))
>                 return;
>
> +       memcg = page_memcg_kmem(page);
>         VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
>         __memcg_kmem_uncharge(memcg, nr_pages);
>         page->memcg_data = 0;
> @@ -6831,24 +6833,25 @@ static void uncharge_batch(const struct uncharge_gather *ug)
>  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>  {
>         unsigned long nr_pages;
> +       struct mem_cgroup *memcg;
>
>         VM_BUG_ON_PAGE(PageLRU(page), page);
>
> -       if (!page_memcg(page))
> +       if (!page_memcg_charged(page))
>                 return;
>
>         /*
>          * Nobody should be changing or seriously looking at
> -        * page_memcg(page) at this point, we have fully
> -        * exclusive access to the page.
> +        * page memcg at this point, we have fully exclusive
> +        * access to the page.
>          */
> -
> -       if (ug->memcg != page_memcg(page)) {
> +       memcg = PageMemcgKmem(page) ? page_memcg_kmem(page) : page_memcg(page);

Same, should page_memcg_check() be used here?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-01 18:11   ` Shakeel Butt
@ 2021-03-01 19:09     ` Johannes Weiner
  2021-03-02  3:49       ` [External] " Muchun Song
  2021-03-02  3:03     ` Muchun Song
  1 sibling, 1 reply; 18+ messages in thread
From: Johannes Weiner @ 2021-03-01 19:09 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Muchun Song, Alexander Viro, Jan Kara, Amir Goldstein,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, john.fastabend,
	kpsingh, Ingo Molnar, Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, dietmar.eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, bristot, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Roman Gushchin, Alex Shi,
	alexander.h.duyck, Chris Down, Wei Yang, Vlastimil Babka,
	Mathieu Desnoyers, Peter Oskolkov, Jann Horn, Joonsoo Kim,
	daniel.vetter, Waiman Long, Michel Lespinasse, Christian Brauner,
	Eric W. Biederman, Kees Cook, krisman, esyr, Suren Baghdasaryan,
	Marco Elver, linux-fsdevel, LKML, netdev, bpf, Cgroups, Linux MM,
	duanxiongchun

Muchun, can you please reduce the CC list to mm/memcg folks only for
the next submission? I think probably 80% of the current recipients
don't care ;-)

On Mon, Mar 01, 2021 at 10:11:45AM -0800, Shakeel Butt wrote:
> On Sun, Feb 28, 2021 at 10:25 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > We want to reuse the obj_cgroup APIs to reparent the kmem pages when
> > the memcg offlined. If we do this, we should store an object cgroup
> > pointer to page->memcg_data for the kmem pages.
> >
> > Finally, page->memcg_data can have 3 different meanings.
> >
> >   1) For the slab pages, page->memcg_data points to an object cgroups
> >      vector.
> >
> >   2) For the kmem pages (exclude the slab pages), page->memcg_data
> >      points to an object cgroup.
> >
> >   3) For the user pages (e.g. the LRU pages), page->memcg_data points
> >      to a memory cgroup.
> >
> > Currently we always get the memcg associated with a page via page_memcg
> > or page_memcg_rcu. page_memcg_check is special, it has to be used in
> > cases when it's not known if a page has an associated memory cgroup
> > pointer or an object cgroups vector. Because the page->memcg_data of
> > the kmem page is not pointing to a memory cgroup in the later patch,
> > the page_memcg and page_memcg_rcu cannot be applicable for the kmem
> > pages. In this patch, we introduce page_memcg_kmem to get the memcg
> > associated with the kmem pages. And make page_memcg and page_memcg_rcu
> > no longer apply to the kmem pages.
> >
> > In the end, there are 4 helpers to get the memcg associated with a
> > page. The usage is as follows.
> >
> >   1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU
> >      pages).
> >
> >      - page_memcg()
> >      - page_memcg_rcu()
> 
> Can you rename these to page_memcg_lru[_rcu] to make them explicitly
> for LRU pages?

The next patch removes page_memcg_kmem() again to replace it with
page_objcg(). That should (luckily) remove the need for this
distinction and keep page_memcg() simple and obvious.

It would be better to not introduce page_memcg_kmem() in the first
place in this patch, IMO.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] Use obj_cgroup APIs to change kmem pages
  2021-03-01  6:22 [PATCH 0/5] Use obj_cgroup APIs to change kmem pages Muchun Song
                   ` (4 preceding siblings ...)
  2021-03-01  6:22 ` [PATCH 5/5] mm: memcontrol: use object cgroup for remote memory cgroup charging Muchun Song
@ 2021-03-02  1:12 ` Roman Gushchin
  2021-03-02  2:50   ` [External] " Muchun Song
  5 siblings, 1 reply; 18+ messages in thread
From: Roman Gushchin @ 2021-03-02  1:12 UTC (permalink / raw)
  To: Muchun Song
  Cc: viro, jack, amir73il, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, kpsingh, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, hannes, mhocko, vdavydov.dev, akpm, shakeelb, alex.shi,
	alexander.h.duyck, chris, richard.weiyang, vbabka,
	mathieu.desnoyers, posk, jannh, iamjoonsoo.kim, daniel.vetter,
	longman, walken, christian.brauner, ebiederm, keescook, krisman,
	esyr, surenb, elver, linux-fsdevel, linux-kernel, netdev, bpf,
	cgroups, linux-mm, duanxiongchun

Hi Muchun!

On Mon, Mar 01, 2021 at 02:22:22PM +0800, Muchun Song wrote:
> Since Roman series "The new cgroup slab memory controller" applied. All
> slab objects are changed via the new APIs of obj_cgroup. This new APIs
> introduce a struct obj_cgroup instead of using struct mem_cgroup directly
> to charge slab objects. It prevents long-living objects from pinning the
> original memory cgroup in the memory. But there are still some corner
> objects (e.g. allocations larger than order-1 page on SLUB) which are
> not charged via the API of obj_cgroup. Those objects (include the pages
> which are allocated from buddy allocator directly) are charged as kmem
> pages which still hold a reference to the memory cgroup.

Yes, this is a good idea, large kmallocs should be treated the same
way as small ones.

> 
> E.g. We know that the kernel stack is charged as kmem pages because the
> size of the kernel stack can be greater than 2 pages (e.g. 16KB on x86_64
> or arm64). If we create a thread (suppose the thread stack is charged to
> memory cgroup A) and then move it from memory cgroup A to memory cgroup
> B. Because the kernel stack of the thread hold a reference to the memory
> cgroup A. The thread can pin the memory cgroup A in the memory even if
> we remove the cgroup A. If we want to see this scenario by using the
> following script. We can see that the system has added 500 dying cgroups.
> 
> 	#!/bin/bash
> 
> 	cat /proc/cgroups | grep memory
> 
> 	cd /sys/fs/cgroup/memory
> 	echo 1 > memory.move_charge_at_immigrate
> 
> 	for i in range{1..500}
> 	do
> 		mkdir kmem_test
> 		echo $$ > kmem_test/cgroup.procs
> 		sleep 3600 &
> 		echo $$ > cgroup.procs
> 		echo `cat kmem_test/cgroup.procs` > cgroup.procs
> 		rmdir kmem_test
> 	done
> 
> 	cat /proc/cgroups | grep memory

Well, moving processes between cgroups always created a lot of issues
and corner cases and this one is definitely not the worst. So this problem
looks a bit artificial, unless I'm missing something. But if it doesn't
introduce any new performance costs and doesn't make the code more complex,
I have nothing against.

Btw, can you, please, run the spell-checker on commit logs? There are many
typos (starting from the title of the series, I guess), which make the patchset
look less appealing.

Thank you!

> 
> This patchset aims to make those kmem pages drop the reference to memory
> cgroup by using the APIs of obj_cgroup. Finally, we can see that the number
> of the dying cgroups will not increase if we run the above test script.
> 
> Patch 1-3 are using obj_cgroup APIs to charge kmem pages. The remote
> memory cgroup charing APIs is a mechanism to charge kernel memory to a
> given memory cgroup. So I also make it use the APIs of obj_cgroup.
> Patch 4-5 are doing this.
> 
> Muchun Song (5):
>   mm: memcontrol: introduce obj_cgroup_{un}charge_page
>   mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem
>     page
>   mm: memcontrol: reparent the kmem pages on cgroup removal
>   mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM
>   mm: memcontrol: use object cgroup for remote memory cgroup charging
> 
>  fs/buffer.c                          |  10 +-
>  fs/notify/fanotify/fanotify.c        |   6 +-
>  fs/notify/fanotify/fanotify_user.c   |   2 +-
>  fs/notify/group.c                    |   3 +-
>  fs/notify/inotify/inotify_fsnotify.c |   8 +-
>  fs/notify/inotify/inotify_user.c     |   2 +-
>  include/linux/bpf.h                  |   2 +-
>  include/linux/fsnotify_backend.h     |   2 +-
>  include/linux/memcontrol.h           | 109 +++++++++++---
>  include/linux/sched.h                |   6 +-
>  include/linux/sched/mm.h             |  30 ++--
>  kernel/bpf/syscall.c                 |  35 ++---
>  kernel/fork.c                        |   4 +-
>  mm/memcontrol.c                      | 276 ++++++++++++++++++++++-------------
>  mm/page_alloc.c                      |   4 +-
>  15 files changed, 324 insertions(+), 175 deletions(-)
> 
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM
  2021-03-01  6:22 ` [PATCH 4/5] mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM Muchun Song
@ 2021-03-02  1:15   ` Roman Gushchin
  2021-03-02  3:43     ` Shakeel Butt
  2021-03-02  4:12     ` [External] " Muchun Song
  0 siblings, 2 replies; 18+ messages in thread
From: Roman Gushchin @ 2021-03-02  1:15 UTC (permalink / raw)
  To: Muchun Song
  Cc: viro, jack, amir73il, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, kpsingh, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, hannes, mhocko, vdavydov.dev, akpm, shakeelb, alex.shi,
	alexander.h.duyck, chris, richard.weiyang, vbabka,
	mathieu.desnoyers, posk, jannh, iamjoonsoo.kim, daniel.vetter,
	longman, walken, christian.brauner, ebiederm, keescook, krisman,
	esyr, surenb, elver, linux-fsdevel, linux-kernel, netdev, bpf,
	cgroups, linux-mm, duanxiongchun

On Mon, Mar 01, 2021 at 02:22:26PM +0800, Muchun Song wrote:
> The remote memcg charing APIs is a mechanism to charge kernel memory
> to a given memcg. So we can move the infrastructure to the scope of
> the CONFIG_MEMCG_KMEM.

This is not a good idea, because there is nothing kmem-specific
in the idea of remote charging, and we definitely will see cases
when user memory is charged to the process different from the current.

> 
> As a bonus, on !CONFIG_MEMCG_KMEM build some functions and variables
> can be compiled out.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  include/linux/sched.h    | 2 ++
>  include/linux/sched/mm.h | 2 +-
>  kernel/fork.c            | 2 +-
>  mm/memcontrol.c          | 4 ++++
>  4 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index ee46f5cab95b..c2d488eddf85 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1314,7 +1314,9 @@ struct task_struct {
>  
>  	/* Number of pages to reclaim on returning to userland: */
>  	unsigned int			memcg_nr_pages_over_high;
> +#endif
>  
> +#ifdef CONFIG_MEMCG_KMEM
>  	/* Used by memcontrol for targeted memcg charge: */
>  	struct mem_cgroup		*active_memcg;
>  #endif
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 1ae08b8462a4..64a72975270e 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -294,7 +294,7 @@ static inline void memalloc_nocma_restore(unsigned int flags)
>  }
>  #endif
>  
> -#ifdef CONFIG_MEMCG
> +#ifdef CONFIG_MEMCG_KMEM
>  DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
>  /**
>   * set_active_memcg - Starts the remote memcg charging scope.
> diff --git a/kernel/fork.c b/kernel/fork.c
> index d66cd1014211..d66718bc82d5 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -942,7 +942,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>  	tsk->use_memdelay = 0;
>  #endif
>  
> -#ifdef CONFIG_MEMCG
> +#ifdef CONFIG_MEMCG_KMEM
>  	tsk->active_memcg = NULL;
>  #endif
>  	return tsk;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 39cb8c5bf8b2..092dc4588b43 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -76,8 +76,10 @@ EXPORT_SYMBOL(memory_cgrp_subsys);
>  
>  struct mem_cgroup *root_mem_cgroup __read_mostly;
>  
> +#ifdef CONFIG_MEMCG_KMEM
>  /* Active memory cgroup to use from an interrupt context */
>  DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg);
> +#endif
>  
>  /* Socket memory accounting disabled? */
>  static bool cgroup_memory_nosocket;
> @@ -1054,6 +1056,7 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
>  }
>  EXPORT_SYMBOL(get_mem_cgroup_from_mm);
>  
> +#ifdef CONFIG_MEMCG_KMEM
>  static __always_inline struct mem_cgroup *active_memcg(void)
>  {
>  	if (in_interrupt())
> @@ -1074,6 +1077,7 @@ static __always_inline bool memcg_kmem_bypass(void)
>  
>  	return false;
>  }
> +#endif
>  
>  /**
>   * mem_cgroup_iter - iterate over memory cgroup hierarchy
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [External] Re: [PATCH 0/5] Use obj_cgroup APIs to change kmem pages
  2021-03-02  1:12 ` [PATCH 0/5] Use obj_cgroup APIs to change kmem pages Roman Gushchin
@ 2021-03-02  2:50   ` Muchun Song
  0 siblings, 0 replies; 18+ messages in thread
From: Muchun Song @ 2021-03-02  2:50 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: viro, Jan Kara, amir73il, Alexei Starovoitov, Daniel Borkmann,
	andrii, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, kpsingh, mingo, Peter Zijlstra, juri.lelli,
	Vincent Guittot, dietmar.eggemann, Steven Rostedt,
	Benjamin Segall, mgorman, bristot, Johannes Weiner, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Shakeel Butt, Alex Shi,
	Chris Down, richard.weiyang, Vlastimil Babka, mathieu.desnoyers,
	posk, Jann Horn, Joonsoo Kim, Daniel Vetter, longman,
	Michel Lespinasse, Christian Brauner, Eric W. Biederman,
	Kees Cook, krisman, esyr, Suren Baghdasaryan, Marco Elver,
	linux-fsdevel, LKML, Networking, bpf, Cgroups,
	Linux Memory Management List, Xiongchun duan

On Tue, Mar 2, 2021 at 9:12 AM Roman Gushchin <guro@fb.com> wrote:
>
> Hi Muchun!
>
> On Mon, Mar 01, 2021 at 02:22:22PM +0800, Muchun Song wrote:
> > Since Roman series "The new cgroup slab memory controller" applied. All
> > slab objects are changed via the new APIs of obj_cgroup. This new APIs
> > introduce a struct obj_cgroup instead of using struct mem_cgroup directly
> > to charge slab objects. It prevents long-living objects from pinning the
> > original memory cgroup in the memory. But there are still some corner
> > objects (e.g. allocations larger than order-1 page on SLUB) which are
> > not charged via the API of obj_cgroup. Those objects (include the pages
> > which are allocated from buddy allocator directly) are charged as kmem
> > pages which still hold a reference to the memory cgroup.
>
> Yes, this is a good idea, large kmallocs should be treated the same
> way as small ones.
>
> >
> > E.g. We know that the kernel stack is charged as kmem pages because the
> > size of the kernel stack can be greater than 2 pages (e.g. 16KB on x86_64
> > or arm64). If we create a thread (suppose the thread stack is charged to
> > memory cgroup A) and then move it from memory cgroup A to memory cgroup
> > B. Because the kernel stack of the thread hold a reference to the memory
> > cgroup A. The thread can pin the memory cgroup A in the memory even if
> > we remove the cgroup A. If we want to see this scenario by using the
> > following script. We can see that the system has added 500 dying cgroups.
> >
> >       #!/bin/bash
> >
> >       cat /proc/cgroups | grep memory
> >
> >       cd /sys/fs/cgroup/memory
> >       echo 1 > memory.move_charge_at_immigrate
> >
> >       for i in range{1..500}
> >       do
> >               mkdir kmem_test
> >               echo $$ > kmem_test/cgroup.procs
> >               sleep 3600 &
> >               echo $$ > cgroup.procs
> >               echo `cat kmem_test/cgroup.procs` > cgroup.procs
> >               rmdir kmem_test
> >       done
> >
> >       cat /proc/cgroups | grep memory
>
> Well, moving processes between cgroups always created a lot of issues
> and corner cases and this one is definitely not the worst. So this problem
> looks a bit artificial, unless I'm missing something. But if it doesn't
> introduce any new performance costs and doesn't make the code more complex,
> I have nothing against.

OK. I just want to show that large kmallocs are charged as kmem pages.
So I constructed this test case.

>
> Btw, can you, please, run the spell-checker on commit logs? There are many
> typos (starting from the title of the series, I guess), which make the patchset
> look less appealing.

Sorry for my poor English. I will do that. Thanks for your suggestions.


>
> Thank you!
>
> >
> > This patchset aims to make those kmem pages drop the reference to memory
> > cgroup by using the APIs of obj_cgroup. Finally, we can see that the number
> > of the dying cgroups will not increase if we run the above test script.
> >
> > Patch 1-3 are using obj_cgroup APIs to charge kmem pages. The remote
> > memory cgroup charing APIs is a mechanism to charge kernel memory to a
> > given memory cgroup. So I also make it use the APIs of obj_cgroup.
> > Patch 4-5 are doing this.
> >
> > Muchun Song (5):
> >   mm: memcontrol: introduce obj_cgroup_{un}charge_page
> >   mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem
> >     page
> >   mm: memcontrol: reparent the kmem pages on cgroup removal
> >   mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM
> >   mm: memcontrol: use object cgroup for remote memory cgroup charging
> >
> >  fs/buffer.c                          |  10 +-
> >  fs/notify/fanotify/fanotify.c        |   6 +-
> >  fs/notify/fanotify/fanotify_user.c   |   2 +-
> >  fs/notify/group.c                    |   3 +-
> >  fs/notify/inotify/inotify_fsnotify.c |   8 +-
> >  fs/notify/inotify/inotify_user.c     |   2 +-
> >  include/linux/bpf.h                  |   2 +-
> >  include/linux/fsnotify_backend.h     |   2 +-
> >  include/linux/memcontrol.h           | 109 +++++++++++---
> >  include/linux/sched.h                |   6 +-
> >  include/linux/sched/mm.h             |  30 ++--
> >  kernel/bpf/syscall.c                 |  35 ++---
> >  kernel/fork.c                        |   4 +-
> >  mm/memcontrol.c                      | 276 ++++++++++++++++++++++-------------
> >  mm/page_alloc.c                      |   4 +-
> >  15 files changed, 324 insertions(+), 175 deletions(-)
> >
> > --
> > 2.11.0
> >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [External] Re: [PATCH 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-01 18:11   ` Shakeel Butt
  2021-03-01 19:09     ` Johannes Weiner
@ 2021-03-02  3:03     ` Muchun Song
  2021-03-02  3:35       ` Shakeel Butt
  1 sibling, 1 reply; 18+ messages in thread
From: Muchun Song @ 2021-03-02  3:03 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Alexander Viro, Jan Kara, Amir Goldstein, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, kpsingh, Ingo Molnar,
	Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, dietmar.eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, bristot, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Alex Shi, Chris Down, Wei Yang, Vlastimil Babka,
	Mathieu Desnoyers, Peter Oskolkov, Jann Horn, Joonsoo Kim,
	Daniel Vetter, Waiman Long, Michel Lespinasse, Christian Brauner,
	Eric W. Biederman, Kees Cook, krisman, esyr, Suren Baghdasaryan,
	Marco Elver, linux-fsdevel, LKML, netdev, bpf, Cgroups, Linux MM,
	Xiongchun duan

On Tue, Mar 2, 2021 at 2:11 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Sun, Feb 28, 2021 at 10:25 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > We want to reuse the obj_cgroup APIs to reparent the kmem pages when
> > the memcg offlined. If we do this, we should store an object cgroup
> > pointer to page->memcg_data for the kmem pages.
> >
> > Finally, page->memcg_data can have 3 different meanings.
> >
> >   1) For the slab pages, page->memcg_data points to an object cgroups
> >      vector.
> >
> >   2) For the kmem pages (exclude the slab pages), page->memcg_data
> >      points to an object cgroup.
> >
> >   3) For the user pages (e.g. the LRU pages), page->memcg_data points
> >      to a memory cgroup.
> >
> > Currently we always get the memcg associated with a page via page_memcg
> > or page_memcg_rcu. page_memcg_check is special, it has to be used in
> > cases when it's not known if a page has an associated memory cgroup
> > pointer or an object cgroups vector. Because the page->memcg_data of
> > the kmem page is not pointing to a memory cgroup in the later patch,
> > the page_memcg and page_memcg_rcu cannot be applicable for the kmem
> > pages. In this patch, we introduce page_memcg_kmem to get the memcg
> > associated with the kmem pages. And make page_memcg and page_memcg_rcu
> > no longer apply to the kmem pages.
> >
> > In the end, there are 4 helpers to get the memcg associated with a
> > page. The usage is as follows.
> >
> >   1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU
> >      pages).
> >
> >      - page_memcg()
> >      - page_memcg_rcu()
>
> Can you rename these to page_memcg_lru[_rcu] to make them explicitly
> for LRU pages?

Yes. Will do. Thanks.

>
> >
> >   2) Get the memory cgroup associated with a kmem page (exclude the slab
> >      pages).
> >
> >      - page_memcg_kmem()
> >
> >   3) Get the memory cgroup associated with a page. It has to be used in
> >      cases when it's not known if a page has an associated memory cgroup
> >      pointer or an object cgroups vector. Returns NULL for slab pages or
> >      uncharged pages, otherwise, returns memory cgroup for charged pages
> >      (e.g. kmem pages, LRU pages).
> >
> >      - page_memcg_check()
> >
> > In some place, we use page_memcg to check whether the page is charged.
> > Now we introduce page_memcg_charged helper to do this.
> >
> > This is a preparation for reparenting the kmem pages. To support reparent
> > kmem pages, we just need to adjust page_memcg_kmem and page_memcg_check in
> > the later patch.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> [snip]
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -855,10 +855,11 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
> >                              int val)
> >  {
> >         struct page *head = compound_head(page); /* rmap on tail pages */
> > -       struct mem_cgroup *memcg = page_memcg(head);
> > +       struct mem_cgroup *memcg;
> >         pg_data_t *pgdat = page_pgdat(page);
> >         struct lruvec *lruvec;
> >
> > +       memcg = PageMemcgKmem(head) ? page_memcg_kmem(head) : page_memcg(head);
>
> Should page_memcg_check() be used here?

Yeah. page_memcg_check() can be used here.
But on the inside of the page_memcg_check(),
there is a READ_ONCE(). Actually, we do not
need READ_ONCE() here. So I use page_memcg
or page_memcg_kmem directly. Thanks.

>
> >         /* Untracked pages have no memcg, no lruvec. Update only the node */
> >         if (!memcg) {
> >                 __mod_node_page_state(pgdat, idx, val);
> > @@ -3170,12 +3171,13 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
> >   */
> >  void __memcg_kmem_uncharge_page(struct page *page, int order)
> >  {
> > -       struct mem_cgroup *memcg = page_memcg(page);
> > +       struct mem_cgroup *memcg;
> >         unsigned int nr_pages = 1 << order;
> >
> > -       if (!memcg)
> > +       if (!page_memcg_charged(page))
> >                 return;
> >
> > +       memcg = page_memcg_kmem(page);
> >         VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
> >         __memcg_kmem_uncharge(memcg, nr_pages);
> >         page->memcg_data = 0;
> > @@ -6831,24 +6833,25 @@ static void uncharge_batch(const struct uncharge_gather *ug)
> >  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
> >  {
> >         unsigned long nr_pages;
> > +       struct mem_cgroup *memcg;
> >
> >         VM_BUG_ON_PAGE(PageLRU(page), page);
> >
> > -       if (!page_memcg(page))
> > +       if (!page_memcg_charged(page))
> >                 return;
> >
> >         /*
> >          * Nobody should be changing or seriously looking at
> > -        * page_memcg(page) at this point, we have fully
> > -        * exclusive access to the page.
> > +        * page memcg at this point, we have fully exclusive
> > +        * access to the page.
> >          */
> > -
> > -       if (ug->memcg != page_memcg(page)) {
> > +       memcg = PageMemcgKmem(page) ? page_memcg_kmem(page) : page_memcg(page);
>
> Same, should page_memcg_check() be used here?

Same as above.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [External] Re: [PATCH 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-02  3:03     ` Muchun Song
@ 2021-03-02  3:35       ` Shakeel Butt
  2021-03-02  3:51         ` Muchun Song
  0 siblings, 1 reply; 18+ messages in thread
From: Shakeel Butt @ 2021-03-02  3:35 UTC (permalink / raw)
  To: Muchun Song
  Cc: Alexander Viro, Jan Kara, Amir Goldstein, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, kpsingh, Ingo Molnar,
	Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, dietmar.eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, bristot, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Alex Shi, Chris Down, Wei Yang, Vlastimil Babka,
	Mathieu Desnoyers, Peter Oskolkov, Jann Horn, Joonsoo Kim,
	Daniel Vetter, Waiman Long, Michel Lespinasse, Christian Brauner,
	Eric W. Biederman, Kees Cook, krisman, esyr, Suren Baghdasaryan,
	Marco Elver, linux-fsdevel, LKML, netdev, bpf, Cgroups, Linux MM,
	Xiongchun duan

On Mon, Mar 1, 2021 at 7:03 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Tue, Mar 2, 2021 at 2:11 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Sun, Feb 28, 2021 at 10:25 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > We want to reuse the obj_cgroup APIs to reparent the kmem pages when
> > > the memcg offlined. If we do this, we should store an object cgroup
> > > pointer to page->memcg_data for the kmem pages.
> > >
> > > Finally, page->memcg_data can have 3 different meanings.
> > >
> > >   1) For the slab pages, page->memcg_data points to an object cgroups
> > >      vector.
> > >
> > >   2) For the kmem pages (exclude the slab pages), page->memcg_data
> > >      points to an object cgroup.
> > >
> > >   3) For the user pages (e.g. the LRU pages), page->memcg_data points
> > >      to a memory cgroup.
> > >
> > > Currently we always get the memcg associated with a page via page_memcg
> > > or page_memcg_rcu. page_memcg_check is special, it has to be used in
> > > cases when it's not known if a page has an associated memory cgroup
> > > pointer or an object cgroups vector. Because the page->memcg_data of
> > > the kmem page is not pointing to a memory cgroup in the later patch,
> > > the page_memcg and page_memcg_rcu cannot be applicable for the kmem
> > > pages. In this patch, we introduce page_memcg_kmem to get the memcg
> > > associated with the kmem pages. And make page_memcg and page_memcg_rcu
> > > no longer apply to the kmem pages.
> > >
> > > In the end, there are 4 helpers to get the memcg associated with a
> > > page. The usage is as follows.
> > >
> > >   1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU
> > >      pages).
> > >
> > >      - page_memcg()
> > >      - page_memcg_rcu()
> >
> > Can you rename these to page_memcg_lru[_rcu] to make them explicitly
> > for LRU pages?
>
> Yes. Will do. Thanks.
>

Please follow Johannes' suggestion regarding page_memcg_kmem() and
then no need to rename these.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM
  2021-03-02  1:15   ` Roman Gushchin
@ 2021-03-02  3:43     ` Shakeel Butt
  2021-03-02  3:58       ` Roman Gushchin
  2021-03-02  4:12     ` [External] " Muchun Song
  1 sibling, 1 reply; 18+ messages in thread
From: Shakeel Butt @ 2021-03-02  3:43 UTC (permalink / raw)
  To: Roman Gushchin, Dan Schatzberg
  Cc: Muchun Song, Johannes Weiner, Michal Hocko, Andrew Morton, esyr,
	linux-fsdevel, LKML, netdev, bpf, Cgroups, Linux MM

On Mon, Mar 1, 2021 at 5:16 PM Roman Gushchin <guro@fb.com> wrote:
>
> On Mon, Mar 01, 2021 at 02:22:26PM +0800, Muchun Song wrote:
> > The remote memcg charing APIs is a mechanism to charge kernel memory
> > to a given memcg. So we can move the infrastructure to the scope of
> > the CONFIG_MEMCG_KMEM.
>
> This is not a good idea, because there is nothing kmem-specific
> in the idea of remote charging, and we definitely will see cases
> when user memory is charged to the process different from the current.
>

Indeed and which remind me: what happened to the "Charge loop device
i/o to issuing cgroup" series? That series was doing remote charging
for user pages.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [External] Re: [PATCH 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-01 19:09     ` Johannes Weiner
@ 2021-03-02  3:49       ` Muchun Song
  0 siblings, 0 replies; 18+ messages in thread
From: Muchun Song @ 2021-03-02  3:49 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Shakeel Butt, Alexander Viro, Jan Kara, Amir Goldstein,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	kpsingh, Ingo Molnar, Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, dietmar.eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, bristot, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Roman Gushchin, Alex Shi,
	Chris Down, Wei Yang, Vlastimil Babka, Mathieu Desnoyers,
	Peter Oskolkov, Jann Horn, Joonsoo Kim, Daniel Vetter,
	Waiman Long, Michel Lespinasse, Christian Brauner,
	Eric W. Biederman, Kees Cook, krisman, esyr, Suren Baghdasaryan,
	Marco Elver, linux-fsdevel, LKML, netdev, bpf, Cgroups, Linux MM,
	Xiongchun duan

On Tue, Mar 2, 2021 at 3:09 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> Muchun, can you please reduce the CC list to mm/memcg folks only for
> the next submission? I think probably 80% of the current recipients
> don't care ;-)

At first, I just used scripts/get_maintainer.pl to get the
CC list. I will reduce the CC list in the next version.
Thanks.

>
> On Mon, Mar 01, 2021 at 10:11:45AM -0800, Shakeel Butt wrote:
> > On Sun, Feb 28, 2021 at 10:25 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > We want to reuse the obj_cgroup APIs to reparent the kmem pages when
> > > the memcg offlined. If we do this, we should store an object cgroup
> > > pointer to page->memcg_data for the kmem pages.
> > >
> > > Finally, page->memcg_data can have 3 different meanings.
> > >
> > >   1) For the slab pages, page->memcg_data points to an object cgroups
> > >      vector.
> > >
> > >   2) For the kmem pages (exclude the slab pages), page->memcg_data
> > >      points to an object cgroup.
> > >
> > >   3) For the user pages (e.g. the LRU pages), page->memcg_data points
> > >      to a memory cgroup.
> > >
> > > Currently we always get the memcg associated with a page via page_memcg
> > > or page_memcg_rcu. page_memcg_check is special, it has to be used in
> > > cases when it's not known if a page has an associated memory cgroup
> > > pointer or an object cgroups vector. Because the page->memcg_data of
> > > the kmem page is not pointing to a memory cgroup in the later patch,
> > > the page_memcg and page_memcg_rcu cannot be applicable for the kmem
> > > pages. In this patch, we introduce page_memcg_kmem to get the memcg
> > > associated with the kmem pages. And make page_memcg and page_memcg_rcu
> > > no longer apply to the kmem pages.
> > >
> > > In the end, there are 4 helpers to get the memcg associated with a
> > > page. The usage is as follows.
> > >
> > >   1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU
> > >      pages).
> > >
> > >      - page_memcg()
> > >      - page_memcg_rcu()
> >
> > Can you rename these to page_memcg_lru[_rcu] to make them explicitly
> > for LRU pages?
>
> The next patch removes page_memcg_kmem() again to replace it with
> page_objcg(). That should (luckily) remove the need for this
> distinction and keep page_memcg() simple and obvious.
>
> It would be better to not introduce page_memcg_kmem() in the first
> place in this patch, IMO.

OK. I will follow your suggestion. Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [External] Re: [PATCH 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-02  3:35       ` Shakeel Butt
@ 2021-03-02  3:51         ` Muchun Song
  0 siblings, 0 replies; 18+ messages in thread
From: Muchun Song @ 2021-03-02  3:51 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Alexander Viro, Jan Kara, Amir Goldstein, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, kpsingh, Ingo Molnar,
	Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, dietmar.eggemann, Steven Rostedt,
	Benjamin Segall, Mel Gorman, bristot, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Alex Shi, Chris Down, Wei Yang, Vlastimil Babka,
	Mathieu Desnoyers, Peter Oskolkov, Jann Horn, Joonsoo Kim,
	Daniel Vetter, Waiman Long, Michel Lespinasse, Christian Brauner,
	Eric W. Biederman, Kees Cook, krisman, esyr, Suren Baghdasaryan,
	Marco Elver, linux-fsdevel, LKML, netdev, bpf, Cgroups, Linux MM,
	Xiongchun duan

On Tue, Mar 2, 2021 at 11:36 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Mon, Mar 1, 2021 at 7:03 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Tue, Mar 2, 2021 at 2:11 AM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > On Sun, Feb 28, 2021 at 10:25 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > > We want to reuse the obj_cgroup APIs to reparent the kmem pages when
> > > > the memcg offlined. If we do this, we should store an object cgroup
> > > > pointer to page->memcg_data for the kmem pages.
> > > >
> > > > Finally, page->memcg_data can have 3 different meanings.
> > > >
> > > >   1) For the slab pages, page->memcg_data points to an object cgroups
> > > >      vector.
> > > >
> > > >   2) For the kmem pages (exclude the slab pages), page->memcg_data
> > > >      points to an object cgroup.
> > > >
> > > >   3) For the user pages (e.g. the LRU pages), page->memcg_data points
> > > >      to a memory cgroup.
> > > >
> > > > Currently we always get the memcg associated with a page via page_memcg
> > > > or page_memcg_rcu. page_memcg_check is special, it has to be used in
> > > > cases when it's not known if a page has an associated memory cgroup
> > > > pointer or an object cgroups vector. Because the page->memcg_data of
> > > > the kmem page is not pointing to a memory cgroup in the later patch,
> > > > the page_memcg and page_memcg_rcu cannot be applicable for the kmem
> > > > pages. In this patch, we introduce page_memcg_kmem to get the memcg
> > > > associated with the kmem pages. And make page_memcg and page_memcg_rcu
> > > > no longer apply to the kmem pages.
> > > >
> > > > In the end, there are 4 helpers to get the memcg associated with a
> > > > page. The usage is as follows.
> > > >
> > > >   1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU
> > > >      pages).
> > > >
> > > >      - page_memcg()
> > > >      - page_memcg_rcu()
> > >
> > > Can you rename these to page_memcg_lru[_rcu] to make them explicitly
> > > for LRU pages?
> >
> > Yes. Will do. Thanks.
> >
>
> Please follow Johannes' suggestion regarding page_memcg_kmem() and
> then no need to rename these.

OK.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM
  2021-03-02  3:43     ` Shakeel Butt
@ 2021-03-02  3:58       ` Roman Gushchin
  0 siblings, 0 replies; 18+ messages in thread
From: Roman Gushchin @ 2021-03-02  3:58 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Dan Schatzberg, Muchun Song, Johannes Weiner, Michal Hocko,
	Andrew Morton, esyr, linux-fsdevel, LKML, netdev, bpf, Cgroups,
	Linux MM

On Mon, Mar 01, 2021 at 07:43:27PM -0800, Shakeel Butt wrote:
> On Mon, Mar 1, 2021 at 5:16 PM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Mon, Mar 01, 2021 at 02:22:26PM +0800, Muchun Song wrote:
> > > The remote memcg charing APIs is a mechanism to charge kernel memory
> > > to a given memcg. So we can move the infrastructure to the scope of
> > > the CONFIG_MEMCG_KMEM.
> >
> > This is not a good idea, because there is nothing kmem-specific
> > in the idea of remote charging, and we definitely will see cases
> > when user memory is charged to the process different from the current.
> >
> 
> Indeed and which remind me: what happened to the "Charge loop device
> i/o to issuing cgroup" series? That series was doing remote charging
> for user pages.

Yeah, this is exactly what I minded. We're using it internally, and as I
remember there were no obstacles to upstream it too.
I'll ping Dan when after the merge window.

Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM
  2021-03-02  1:15   ` Roman Gushchin
  2021-03-02  3:43     ` Shakeel Butt
@ 2021-03-02  4:12     ` Muchun Song
  1 sibling, 0 replies; 18+ messages in thread
From: Muchun Song @ 2021-03-02  4:12 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexander Viro, Jan Kara, Amir Goldstein, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, kpsingh, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, dietmar.eggemann,
	Steven Rostedt, Benjamin Segall, Mel Gorman, bristot,
	Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Alex Shi, alexander.h.duyck, Chris Down, Wei Yang,
	Vlastimil Babka, Mathieu Desnoyers, Peter Oskolkov, Jann Horn,
	Joonsoo Kim, Daniel Vetter, Waiman Long, Michel Lespinasse,
	Christian Brauner, Eric W. Biederman, Kees Cook, krisman, esyr,
	Suren Baghdasaryan, Marco Elver, linux-fsdevel, LKML, Networking,
	bpf, Cgroups, Linux Memory Management List, Xiongchun duan

On Tue, Mar 2, 2021 at 9:15 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Mon, Mar 01, 2021 at 02:22:26PM +0800, Muchun Song wrote:
> > The remote memcg charing APIs is a mechanism to charge kernel memory
> > to a given memcg. So we can move the infrastructure to the scope of
> > the CONFIG_MEMCG_KMEM.
>
> This is not a good idea, because there is nothing kmem-specific
> in the idea of remote charging, and we definitely will see cases
> when user memory is charged to the process different from the current.

Got it. Thanks for your reminder.


>
> >
> > As a bonus, on !CONFIG_MEMCG_KMEM build some functions and variables
> > can be compiled out.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  include/linux/sched.h    | 2 ++
> >  include/linux/sched/mm.h | 2 +-
> >  kernel/fork.c            | 2 +-
> >  mm/memcontrol.c          | 4 ++++
> >  4 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index ee46f5cab95b..c2d488eddf85 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1314,7 +1314,9 @@ struct task_struct {
> >
> >       /* Number of pages to reclaim on returning to userland: */
> >       unsigned int                    memcg_nr_pages_over_high;
> > +#endif
> >
> > +#ifdef CONFIG_MEMCG_KMEM
> >       /* Used by memcontrol for targeted memcg charge: */
> >       struct mem_cgroup               *active_memcg;
> >  #endif
> > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > index 1ae08b8462a4..64a72975270e 100644
> > --- a/include/linux/sched/mm.h
> > +++ b/include/linux/sched/mm.h
> > @@ -294,7 +294,7 @@ static inline void memalloc_nocma_restore(unsigned int flags)
> >  }
> >  #endif
> >
> > -#ifdef CONFIG_MEMCG
> > +#ifdef CONFIG_MEMCG_KMEM
> >  DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
> >  /**
> >   * set_active_memcg - Starts the remote memcg charging scope.
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index d66cd1014211..d66718bc82d5 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -942,7 +942,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
> >       tsk->use_memdelay = 0;
> >  #endif
> >
> > -#ifdef CONFIG_MEMCG
> > +#ifdef CONFIG_MEMCG_KMEM
> >       tsk->active_memcg = NULL;
> >  #endif
> >       return tsk;
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 39cb8c5bf8b2..092dc4588b43 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -76,8 +76,10 @@ EXPORT_SYMBOL(memory_cgrp_subsys);
> >
> >  struct mem_cgroup *root_mem_cgroup __read_mostly;
> >
> > +#ifdef CONFIG_MEMCG_KMEM
> >  /* Active memory cgroup to use from an interrupt context */
> >  DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg);
> > +#endif
> >
> >  /* Socket memory accounting disabled? */
> >  static bool cgroup_memory_nosocket;
> > @@ -1054,6 +1056,7 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
> >  }
> >  EXPORT_SYMBOL(get_mem_cgroup_from_mm);
> >
> > +#ifdef CONFIG_MEMCG_KMEM
> >  static __always_inline struct mem_cgroup *active_memcg(void)
> >  {
> >       if (in_interrupt())
> > @@ -1074,6 +1077,7 @@ static __always_inline bool memcg_kmem_bypass(void)
> >
> >       return false;
> >  }
> > +#endif
> >
> >  /**
> >   * mem_cgroup_iter - iterate over memory cgroup hierarchy
> > --
> > 2.11.0
> >

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-03-02 10:39 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-01  6:22 [PATCH 0/5] Use obj_cgroup APIs to change kmem pages Muchun Song
2021-03-01  6:22 ` [PATCH 1/5] mm: memcontrol: introduce obj_cgroup_{un}charge_page Muchun Song
2021-03-01  6:22 ` [PATCH 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page Muchun Song
2021-03-01 18:11   ` Shakeel Butt
2021-03-01 19:09     ` Johannes Weiner
2021-03-02  3:49       ` [External] " Muchun Song
2021-03-02  3:03     ` Muchun Song
2021-03-02  3:35       ` Shakeel Butt
2021-03-02  3:51         ` Muchun Song
2021-03-01  6:22 ` [PATCH 3/5] mm: memcontrol: reparent the kmem pages on cgroup removal Muchun Song
2021-03-01  6:22 ` [PATCH 4/5] mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM Muchun Song
2021-03-02  1:15   ` Roman Gushchin
2021-03-02  3:43     ` Shakeel Butt
2021-03-02  3:58       ` Roman Gushchin
2021-03-02  4:12     ` [External] " Muchun Song
2021-03-01  6:22 ` [PATCH 5/5] mm: memcontrol: use object cgroup for remote memory cgroup charging Muchun Song
2021-03-02  1:12 ` [PATCH 0/5] Use obj_cgroup APIs to change kmem pages Roman Gushchin
2021-03-02  2:50   ` [External] " Muchun Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).