linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 5.10.y 00/11] mm: memcontrol: fix nullptr in __mod_lruvec_page_state()
@ 2021-08-16  7:21 Chen Huang
  2021-08-16  7:21 ` [PATCH 5.10.y 01/11] mm: memcontrol: Use helpers to read page's memcg data Chen Huang
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: Chen Huang @ 2021-08-16  7:21 UTC (permalink / raw)
  To: Roman Gushchin, Muchun Song, Wang Hai, Greg Kroah-Hartman
  Cc: linux-kernel, linux-mm, stable, Chen Huang

We found a nullptr in __mod_lruvec_page_state(),
  UIO driver:
        kmalloc(PAGE_SIZE)
  UIO user:
        mmap() then read, but before user read the page, others may alloc the
page that belong to the same compound page and modify the head page's obj_cgroups
likes that:
[   94.845687]  memcg_alloc_page_obj_cgroups+0x50/0xa0
[   94.846334]  slab_post_alloc_hook+0xc8/0x184
[   94.846852]  kmem_cache_alloc+0x148/0x2a4
[   94.847346]  __d_alloc+0x30/0x2e4
[   94.847809]  d_alloc+0x30/0xc0

Then when the user reads the page, in __mod_lruvec_page_state(), it will get the
nullptr in head->mem_cgroup.

[   94.882699] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080
[   94.882773] Mem abort info:
[   94.882819]   ESR = 0x96000006
[   94.882953]   EC = 0x25: DABT (current EL), IL = 32 bits
[   94.883000]   SET = 0, FnV = 0
[   94.883043]   EA = 0, S1PTW = 0
[   94.883089] Data abort info:
[   94.883134]   ISV = 0, ISS = 0x00000006
[   94.883179]   CM = 0, WnR = 0
[   94.883402] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010c355000
[   94.883495] [0000000000000080] pgd=000000010c046003, p4d=000000010c046003, pud=000000010c368003, pmd=0000000000000000
[   94.884225] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[   94.884480] Modules linked in:
[   94.884788] CPU: 0 PID: 250 Comm: uio_user_mmap Tainted: G    B             5.10.0-07799-ged92fcf8d408-dirty #112
[   94.884837] Hardware name: linux,dummy-virt (DT)
[   94.885052] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
[   94.885169] pc : __mod_lruvec_page_state+0x118/0x180
[   94.885249] lr : __mod_lruvec_page_state+0x118/0x180
[   94.885297] sp : ffff2872ce25fb40
[   94.885402] x29: ffff2872ce25fb40 x28: 0000000000000254
[   94.885572] x27: 0000000000000000 x26: ffff2872fe2d7c38
[   94.885724] x25: ffffa000242e7dc0 x24: 0000000000000001
[   94.885872] x23: 0000000000000012 x22: ffffa00022bcfc60
[   94.886030] x21: ffff2872fffeb380 x20: 0000000000000144
[   94.886169] x19: 0000000000000000 x18: 0000000000000000
[   94.886331] x17: 0000000000000000 x16: 0000000000000000
[   94.886476] x15: 0000000000000000 x14: 3078303a7865646e
[   94.886625] x13: 6920303030303030 x12: 1fffe50e5b713f20
[   94.886765] x11: ffff850e5b713f20 x10: 616d20303a746e75
[   94.886947] x9 : dfffa00000000000 x8 : 3266666666203d20
[   94.887095] x7 : ffff2872db89f903 x6 : 0000000000000000
[   94.887236] x5 : 0000000000000000 x4 : dfffa00000000000
[   94.887381] x3 : ffffa00021e6c5dc x2 : 0000000000000000
[   94.887515] x1 : 0000000000000008 x0 : 0000000000000000
[   94.887702] Call trace:
[   94.887840]  __mod_lruvec_page_state+0x118/0x180
[   94.887919]  page_add_file_rmap+0xa8/0xe0
[   94.887998]  alloc_set_pte+0x2c4/0x2d0
[   94.888074]  finish_fault+0x94/0xcc
[   94.888157]  handle_mm_fault+0x7c8/0x1094
[   94.888230]  do_page_fault+0x358/0x490
[   94.888300]  do_translation_fault+0x38/0x54
[   94.888370]  do_mem_abort+0x5c/0xe4
[   94.888435]  el0_da+0x3c/0x4c
[   94.888506]  el0_sync_handler+0xd8/0x14c
[   94.888573]  el0_sync+0x148/0x180
[   94.888963] Code: d2835101 8b0102b3 91020260 9400e8da (f9404260)
[   94.889860] ---[ end trace 1de53a0bd9084cde ]---
[   94.890244] Kernel panic - not syncing: Oops: Fatal exception
[   94.890620] SMP: stopping secondary CPUs
[   94.891117] Kernel Offset: 0x11c00000 from 0xffffa00010000000
[   94.891179] PHYS_OFFSET: 0xffffd78e40000000
[   94.891293] CPU features: 0x0660012,41002000
[   94.891365] Memory Limit: none
[   94.927552] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---

1. Roman Gushchin's 4 patch remove this limitation by moving the PageKmemcg 
flag into one of the free bits of the page->mem_cgroup pointer. Also it
formalizes accesses to the page->mem_cgroup and page->obj_cgroups
using new helpers, adds several checks and removes a couple of obsolete
functions.
Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com

2. Muchun Song's patchset aim to make those kmem pages to drop the reference 
to memory cgroup by using the APIs of obj_cgroup.
Link: https://lkml.kernel.org/r/20210319163821.20704-1-songmuchun@bytedance.com

3. Wang Hai's patch is a bugfix for "mm: memcontrol/slab: Use helpers to 
access slab page's memcg_data"
Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com

Muchun Song (6):
  mm: memcontrol: introduce obj_cgroup_{un}charge_pages
  mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c
  mm: memcontrol: change ug->dummy_page only if memcg changed
  mm: memcontrol: use obj_cgroup APIs to charge kmem pages
	Conflict for commit c47d5032ed3002311a4188eae51f4641ec436beb not merged
  mm: memcontrol: inline __memcg_kmem_{un}charge() into
    obj_cgroup_{un}charge_pages()
  mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM

Roman Gushchin (4):
  mm: memcontrol: Use helpers to read page's memcg data
	Conflict function:split_page_memcg(), for commit 002ea848d7fd3bdcb6281e75bdde28095c2cd549
  mm: memcontrol/slab: Use helpers to access slab page's memcg_data
  mm: Introduce page memcg flags
  mm: Convert page kmemcg type to a page memcg flag

Wang Hai (1):
  mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()

 fs/buffer.c                      |   2 +-
 fs/iomap/buffered-io.c           |   2 +-
 include/linux/memcontrol.h       | 320 +++++++++++++++++++++++++++--
 include/linux/mm.h               |  22 --
 include/linux/mm_types.h         |   5 +-
 include/linux/page-flags.h       |  11 +-
 include/trace/events/writeback.h |   2 +-
 kernel/fork.c                    |   7 +-
 mm/debug.c                       |   4 +-
 mm/huge_memory.c                 |   4 +-
 mm/memcontrol.c                  | 336 +++++++++++++++----------------
 mm/page_alloc.c                  |   8 +-
 mm/page_io.c                     |   6 +-
 mm/slab.h                        |  38 +---
 mm/workingset.c                  |   2 +-
 15 files changed, 493 insertions(+), 276 deletions(-)

-- 
2.18.0.huawei.25


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 5.10.y 01/11] mm: memcontrol: Use helpers to read page's memcg data
  2021-08-16  7:21 [PATCH 5.10.y 00/11] mm: memcontrol: fix nullptr in __mod_lruvec_page_state() Chen Huang
@ 2021-08-16  7:21 ` Chen Huang
  2021-08-16  8:34   ` Greg Kroah-Hartman
  2021-08-16  7:21 ` [PATCH 5.10.y 02/11] mm: memcontrol/slab: Use helpers to access slab page's memcg_data Chen Huang
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Chen Huang @ 2021-08-16  7:21 UTC (permalink / raw)
  To: Roman Gushchin, Muchun Song, Wang Hai, Greg Kroah-Hartman
  Cc: linux-kernel, linux-mm, stable, Chen Huang, Andrew Morton,
	Alexei Starovoitov

From: Roman Gushchin <guro@fb.com>

Patch series "mm: allow mapping accounted kernel pages to userspace", v6.

Currently a non-slab kernel page which has been charged to a memory cgroup
can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
flag is defined as a page type (like buddy, offline, etc), so it takes a
bit from a page->mapped counter.  Pages with a type set can't be mapped to
userspace.

But in general the kmemcg flag has nothing to do with mapping to
userspace.  It only means that the page has been accounted by the page
allocator, so it has to be properly uncharged on release.

Some bpf maps are mapping the vmalloc-based memory to userspace, and their
memory can't be accounted because of this implementation detail.

This patchset removes this limitation by moving the PageKmemcg flag into
one of the free bits of the page->mem_cgroup pointer.  Also it formalizes
accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
adds several checks and removes a couple of obsolete functions.  As the
result the code became more robust with fewer open-coded bit tricks.

This patch (of 4):

Currently there are many open-coded reads of the page->mem_cgroup pointer,
as well as a couple of read helpers, which are barely used.

It creates an obstacle on a way to reuse some bits of the pointer for
storing additional bits of information.  In fact, we already do this for
slab pages, where the last bit indicates that a pointer has an attached
vector of objcg pointers instead of a regular memcg pointer.

This commits uses 2 existing helpers and introduces a new helper to
converts all read sides to calls of these helpers:
  struct mem_cgroup *page_memcg(struct page *page);
  struct mem_cgroup *page_memcg_rcu(struct page *page);
  struct mem_cgroup *page_memcg_check(struct page *page);

page_memcg_check() is intended to be used in cases when the page can be a
slab page and have a memcg pointer pointing at objcg vector.  It does
check the lowest bit, and if set, returns NULL.  page_memcg() contains a
VM_BUG_ON_PAGE() check for the page not being a slab page.

To make sure nobody uses a direct access, struct page's
mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com

Conflicts:
	mm/memcontrol.c
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
---
 fs/buffer.c                      |   2 +-
 fs/iomap/buffered-io.c           |   2 +-
 include/linux/memcontrol.h       | 114 ++++++++++++++++++++++++++---
 include/linux/mm.h               |  22 ------
 include/linux/mm_types.h         |   5 +-
 include/trace/events/writeback.h |   2 +-
 kernel/fork.c                    |   7 +-
 mm/debug.c                       |   4 +-
 mm/huge_memory.c                 |   4 +-
 mm/memcontrol.c                  | 122 ++++++++++++++-----------------
 mm/page_alloc.c                  |   4 +-
 mm/page_io.c                     |   6 +-
 mm/slab.h                        |   9 +--
 mm/workingset.c                  |   2 +-
 14 files changed, 184 insertions(+), 121 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 23f645657488..b56f99f82b5b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -657,7 +657,7 @@ int __set_page_dirty_buffers(struct page *page)
 		} while (bh != head);
 	}
 	/*
-	 * Lock out page->mem_cgroup migration to keep PageDirty
+	 * Lock out page's memcg migration to keep PageDirty
 	 * synchronized with per-memcg dirty page counters.
 	 */
 	lock_page_memcg(page);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 10cc7979ce38..16a1e82e3aeb 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -650,7 +650,7 @@ iomap_set_page_dirty(struct page *page)
 		return !TestSetPageDirty(page);
 
 	/*
-	 * Lock out page->mem_cgroup migration to keep PageDirty
+	 * Lock out page's memcg migration to keep PageDirty
 	 * synchronized with per-memcg dirty page counters.
 	 */
 	lock_page_memcg(page);
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index c691b1ac95f8..e3abc814f01b 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -343,6 +343,79 @@ struct mem_cgroup {
 
 extern struct mem_cgroup *root_mem_cgroup;
 
+/*
+ * page_memcg - get the memory cgroup associated with a page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the memory cgroup associated with the page,
+ * or NULL. This function assumes that the page is known to have a
+ * proper memory cgroup pointer. It's not safe to call this function
+ * against some type of pages, e.g. slab pages or ex-slab pages.
+ *
+ * Any of the following ensures page and memcg binding stability:
+ * - the page lock
+ * - LRU isolation
+ * - lock_page_memcg()
+ * - exclusive reference
+ */
+static inline struct mem_cgroup *page_memcg(struct page *page)
+{
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+	return (struct mem_cgroup *)page->memcg_data;
+}
+
+/*
+ * page_memcg_rcu - locklessly get the memory cgroup associated with a page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the memory cgroup associated with the page,
+ * or NULL. This function assumes that the page is known to have a
+ * proper memory cgroup pointer. It's not safe to call this function
+ * against some type of pages, e.g. slab pages or ex-slab pages.
+ */
+static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
+{
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	return (struct mem_cgroup *)READ_ONCE(page->memcg_data);
+}
+
+/*
+ * page_memcg_check - get the memory cgroup associated with a page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the memory cgroup associated with the page,
+ * or NULL. This function unlike page_memcg() can take any  page
+ * as an argument. It has to be used in cases when it's not known if a page
+ * has an associated memory cgroup pointer or an object cgroups vector.
+ *
+ * Any of the following ensures page and memcg binding stability:
+ * - the page lock
+ * - LRU isolation
+ * - lock_page_memcg()
+ * - exclusive reference
+ */
+static inline struct mem_cgroup *page_memcg_check(struct page *page)
+{
+	/*
+	 * Because page->memcg_data might be changed asynchronously
+	 * for slab pages, READ_ONCE() should be used here.
+	 */
+	unsigned long memcg_data = READ_ONCE(page->memcg_data);
+
+	/*
+	 * The lowest bit set means that memcg isn't a valid
+	 * memcg pointer, but a obj_cgroups pointer.
+	 * In this case the page is shared and doesn't belong
+	 * to any specific memory cgroup.
+	 */
+	if (memcg_data & 0x1UL)
+		return NULL;
+
+	return (struct mem_cgroup *)memcg_data;
+}
+
 static __always_inline bool memcg_stat_item_in_bytes(int idx)
 {
 	if (idx == MEMCG_PERCPU_B)
@@ -743,15 +816,19 @@ static inline void mod_memcg_state(struct mem_cgroup *memcg,
 static inline void __mod_memcg_page_state(struct page *page,
 					  int idx, int val)
 {
-	if (page->mem_cgroup)
-		__mod_memcg_state(page->mem_cgroup, idx, val);
+	struct mem_cgroup *memcg = page_memcg(page);
+
+	if (memcg)
+		__mod_memcg_state(memcg, idx, val);
 }
 
 static inline void mod_memcg_page_state(struct page *page,
 					int idx, int val)
 {
-	if (page->mem_cgroup)
-		mod_memcg_state(page->mem_cgroup, idx, val);
+	struct mem_cgroup *memcg = page_memcg(page);
+
+	if (memcg)
+		mod_memcg_state(memcg, idx, val);
 }
 
 static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
@@ -834,16 +911,17 @@ static inline void __mod_lruvec_page_state(struct page *page,
 					   enum node_stat_item idx, int val)
 {
 	struct page *head = compound_head(page); /* rmap on tail pages */
+	struct mem_cgroup *memcg = page_memcg(head);
 	pg_data_t *pgdat = page_pgdat(page);
 	struct lruvec *lruvec;
 
 	/* Untracked pages have no memcg, no lruvec. Update only the node */
-	if (!head->mem_cgroup) {
+	if (!memcg) {
 		__mod_node_page_state(pgdat, idx, val);
 		return;
 	}
 
-	lruvec = mem_cgroup_lruvec(head->mem_cgroup, pgdat);
+	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 	__mod_lruvec_state(lruvec, idx, val);
 }
 
@@ -878,8 +956,10 @@ static inline void count_memcg_events(struct mem_cgroup *memcg,
 static inline void count_memcg_page_event(struct page *page,
 					  enum vm_event_item idx)
 {
-	if (page->mem_cgroup)
-		count_memcg_events(page->mem_cgroup, idx, 1);
+	struct mem_cgroup *memcg = page_memcg(page);
+
+	if (memcg)
+		count_memcg_events(memcg, idx, 1);
 }
 
 static inline void count_memcg_event_mm(struct mm_struct *mm,
@@ -946,6 +1026,22 @@ void split_page_memcg(struct page *head, unsigned int nr);
 
 struct mem_cgroup;
 
+static inline struct mem_cgroup *page_memcg(struct page *page)
+{
+	return NULL;
+}
+
+static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
+{
+	WARN_ON_ONCE(!rcu_read_lock_held());
+	return NULL;
+}
+
+static inline struct mem_cgroup *page_memcg_check(struct page *page)
+{
+	return NULL;
+}
+
 static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg)
 {
 	return true;
@@ -1435,7 +1531,7 @@ static inline void mem_cgroup_track_foreign_dirty(struct page *page,
 	if (mem_cgroup_disabled())
 		return;
 
-	if (unlikely(&page->mem_cgroup->css != wb->memcg_css))
+	if (unlikely(&page_memcg(page)->css != wb->memcg_css))
 		mem_cgroup_track_foreign_dirty_slowpath(page, wb);
 }
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 289c26f055cd..fd5b9992d5d5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1497,28 +1497,6 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
 #endif
 }
 
-#ifdef CONFIG_MEMCG
-static inline struct mem_cgroup *page_memcg(struct page *page)
-{
-	return page->mem_cgroup;
-}
-static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
-{
-	WARN_ON_ONCE(!rcu_read_lock_held());
-	return READ_ONCE(page->mem_cgroup);
-}
-#else
-static inline struct mem_cgroup *page_memcg(struct page *page)
-{
-	return NULL;
-}
-static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
-{
-	WARN_ON_ONCE(!rcu_read_lock_held());
-	return NULL;
-}
-#endif
-
 /*
  * Some inline functions in vmstat.h depend on page_zone()
  */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 4eb38918da8f..6733abbe7846 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -201,10 +201,7 @@ struct page {
 	atomic_t _refcount;
 
 #ifdef CONFIG_MEMCG
-	union {
-		struct mem_cgroup *mem_cgroup;
-		struct obj_cgroup **obj_cgroups;
-	};
+	unsigned long memcg_data;
 #endif
 
 	/*
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 57d795365987..1efa463c4979 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -257,7 +257,7 @@ TRACE_EVENT(track_foreign_dirty,
 		__entry->ino		= inode ? inode->i_ino : 0;
 		__entry->memcg_id	= wb->memcg_css->id;
 		__entry->cgroup_ino	= __trace_wb_assign_cgroup(wb);
-		__entry->page_cgroup_ino = cgroup_ino(page->mem_cgroup->css.cgroup);
+		__entry->page_cgroup_ino = cgroup_ino(page_memcg(page)->css.cgroup);
 	),
 
 	TP_printk("bdi %s[%llu]: ino=%lu memcg_id=%u cgroup_ino=%lu page_cgroup_ino=%lu",
diff --git a/kernel/fork.c b/kernel/fork.c
index 096945ef49ad..a668fc14f5a7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -404,9 +404,10 @@ static int memcg_charge_kernel_stack(struct task_struct *tsk)
 
 		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) {
 			/*
-			 * If memcg_kmem_charge_page() fails, page->mem_cgroup
-			 * pointer is NULL, and memcg_kmem_uncharge_page() in
-			 * free_thread_stack() will ignore this page.
+			 * If memcg_kmem_charge_page() fails, page's
+			 * memory cgroup pointer is NULL, and
+			 * memcg_kmem_uncharge_page() in free_thread_stack()
+			 * will ignore this page.
 			 */
 			ret = memcg_kmem_charge_page(vm->pages[i], GFP_KERNEL,
 						     0);
diff --git a/mm/debug.c b/mm/debug.c
index ccca576b2899..8a40b3fefbeb 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -182,8 +182,8 @@ void __dump_page(struct page *page, const char *reason)
 		pr_warn("page dumped because: %s\n", reason);
 
 #ifdef CONFIG_MEMCG
-	if (!page_poisoned && page->mem_cgroup)
-		pr_warn("page->mem_cgroup:%px\n", page->mem_cgroup);
+	if (!page_poisoned && page->memcg_data)
+		pr_warn("pages's memcg:%lx\n", page->memcg_data);
 #endif
 }
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 594368f6134f..df4f0660a279 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -486,7 +486,7 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 #ifdef CONFIG_MEMCG
 static inline struct deferred_split *get_deferred_split_queue(struct page *page)
 {
-	struct mem_cgroup *memcg = compound_head(page)->mem_cgroup;
+	struct mem_cgroup *memcg = page_memcg(compound_head(page));
 	struct pglist_data *pgdat = NODE_DATA(page_to_nid(page));
 
 	if (memcg)
@@ -2784,7 +2784,7 @@ void deferred_split_huge_page(struct page *page)
 {
 	struct deferred_split *ds_queue = get_deferred_split_queue(page);
 #ifdef CONFIG_MEMCG
-	struct mem_cgroup *memcg = compound_head(page)->mem_cgroup;
+	struct mem_cgroup *memcg = page_memcg(compound_head(page));
 #endif
 	unsigned long flags;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 92bf987d0a41..dea907c83d40 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -533,7 +533,7 @@ struct cgroup_subsys_state *mem_cgroup_css_from_page(struct page *page)
 {
 	struct mem_cgroup *memcg;
 
-	memcg = page->mem_cgroup;
+	memcg = page_memcg(page);
 
 	if (!memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		memcg = root_mem_cgroup;
@@ -560,16 +560,7 @@ ino_t page_cgroup_ino(struct page *page)
 	unsigned long ino = 0;
 
 	rcu_read_lock();
-	memcg = page->mem_cgroup;
-
-	/*
-	 * The lowest bit set means that memcg isn't a valid
-	 * memcg pointer, but a obj_cgroups pointer.
-	 * In this case the page is shared and doesn't belong
-	 * to any specific memory cgroup.
-	 */
-	if ((unsigned long) memcg & 0x1UL)
-		memcg = NULL;
+	memcg = page_memcg_check(page);
 
 	while (memcg && !(memcg->css.flags & CSS_ONLINE))
 		memcg = parent_mem_cgroup(memcg);
@@ -1055,7 +1046,7 @@ EXPORT_SYMBOL(get_mem_cgroup_from_mm);
  */
 struct mem_cgroup *get_mem_cgroup_from_page(struct page *page)
 {
-	struct mem_cgroup *memcg = page->mem_cgroup;
+	struct mem_cgroup *memcg = page_memcg(page);
 
 	if (mem_cgroup_disabled())
 		return NULL;
@@ -1350,7 +1341,7 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
 		goto out;
 	}
 
-	memcg = page->mem_cgroup;
+	memcg = page_memcg(page);
 	/*
 	 * Swapcache readahead pages are added to the LRU - and
 	 * possibly migrated - before they are charged.
@@ -2110,7 +2101,7 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg)
 }
 
 /**
- * lock_page_memcg - lock a page->mem_cgroup binding
+ * lock_page_memcg - lock a page and memcg binding
  * @page: the page
  *
  * This function protects unlocked LRU pages from being moved to
@@ -2142,7 +2133,7 @@ struct mem_cgroup *lock_page_memcg(struct page *page)
 	if (mem_cgroup_disabled())
 		return NULL;
 again:
-	memcg = head->mem_cgroup;
+	memcg = page_memcg(head);
 	if (unlikely(!memcg))
 		return NULL;
 
@@ -2150,7 +2141,7 @@ struct mem_cgroup *lock_page_memcg(struct page *page)
 		return memcg;
 
 	spin_lock_irqsave(&memcg->move_lock, flags);
-	if (memcg != head->mem_cgroup) {
+	if (memcg != page_memcg(head)) {
 		spin_unlock_irqrestore(&memcg->move_lock, flags);
 		goto again;
 	}
@@ -2188,14 +2179,14 @@ void __unlock_page_memcg(struct mem_cgroup *memcg)
 }
 
 /**
- * unlock_page_memcg - unlock a page->mem_cgroup binding
+ * unlock_page_memcg - unlock a page and memcg binding
  * @page: the page
  */
 void unlock_page_memcg(struct page *page)
 {
 	struct page *head = compound_head(page);
 
-	__unlock_page_memcg(head->mem_cgroup);
+	__unlock_page_memcg(page_memcg(head));
 }
 EXPORT_SYMBOL(unlock_page_memcg);
 
@@ -2885,7 +2876,7 @@ static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
 
 static void commit_charge(struct page *page, struct mem_cgroup *memcg)
 {
-	VM_BUG_ON_PAGE(page->mem_cgroup, page);
+	VM_BUG_ON_PAGE(page_memcg(page), page);
 	/*
 	 * Any of the following ensures page->mem_cgroup stability:
 	 *
@@ -2894,7 +2885,7 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg)
 	 * - lock_page_memcg()
 	 * - exclusive reference
 	 */
-	page->mem_cgroup = memcg;
+	page->memcg_data = (unsigned long)memcg;
 }
 
 #ifdef CONFIG_MEMCG_KMEM
@@ -2917,8 +2908,7 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 	if (!vec)
 		return -ENOMEM;
 
-	if (cmpxchg(&page->obj_cgroups, NULL,
-		    (struct obj_cgroup **) ((unsigned long)vec | 0x1UL)))
+	if (cmpxchg(&page->memcg_data, 0, (unsigned long)vec | 0x1UL))
 		kfree(vec);
 	else
 		kmemleak_not_leak(vec);
@@ -2929,6 +2919,12 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 /*
  * Returns a pointer to the memory cgroup to which the kernel object is charged.
  *
+ * A passed kernel object can be a slab object or a generic kernel page, so
+ * different mechanisms for getting the memory cgroup pointer should be used.
+ * In certain cases (e.g. kernel stacks or large kmallocs with SLUB) the caller
+ * can not know for sure how the kernel object is implemented.
+ * mem_cgroup_from_obj() can be safely used in such cases.
+ *
  * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
  * cgroup_mutex, etc.
  */
@@ -2941,17 +2937,6 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
 
 	page = virt_to_head_page(p);
 
-	/*
-	 * If page->mem_cgroup is set, it's either a simple mem_cgroup pointer
-	 * or a pointer to obj_cgroup vector. In the latter case the lowest
-	 * bit of the pointer is set.
-	 * The page->mem_cgroup pointer can be asynchronously changed
-	 * from NULL to (obj_cgroup_vec | 0x1UL), but can't be changed
-	 * from a valid memcg pointer to objcg vector or back.
-	 */
-	if (!page->mem_cgroup)
-		return NULL;
-
 	/*
 	 * Slab objects are accounted individually, not per-page.
 	 * Memcg membership data for each individual object is saved in
@@ -2969,8 +2954,14 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
 		return NULL;
 	}
 
-	/* All other pages use page->mem_cgroup */
-	return page->mem_cgroup;
+	/*
+	 * page_memcg_check() is used here, because page_has_obj_cgroups()
+	 * check above could fail because the object cgroups vector wasn't set
+	 * at that moment, but it can be set concurrently.
+	 * page_memcg_check(page) will guarantee that a proper memory
+	 * cgroup pointer or NULL will be returned.
+	 */
+	return page_memcg_check(page);
 }
 
 __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
@@ -3107,7 +3098,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
 	if (memcg && !mem_cgroup_is_root(memcg)) {
 		ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
 		if (!ret) {
-			page->mem_cgroup = memcg;
+			page->memcg_data = (unsigned long)memcg;
 			__SetPageKmemcg(page);
 			return 0;
 		}
@@ -3123,7 +3114,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
  */
 void __memcg_kmem_uncharge_page(struct page *page, int order)
 {
-	struct mem_cgroup *memcg = page->mem_cgroup;
+	struct mem_cgroup *memcg = page_memcg(page);
 	unsigned int nr_pages = 1 << order;
 
 	if (!memcg)
@@ -3131,7 +3122,7 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
 
 	VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
 	__memcg_kmem_uncharge(memcg, nr_pages);
-	page->mem_cgroup = NULL;
+	page->memcg_data = 0;
 	css_put(&memcg->css);
 
 	/* slab pages do not have PageKmemcg flag set */
@@ -3289,15 +3280,14 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
  */
 void split_page_memcg(struct page *head, unsigned int nr)
 {
-	struct mem_cgroup *memcg = head->mem_cgroup;
-	int kmemcg = PageKmemcg(head);
+	struct mem_cgroup *memcg = page_memcg(head);
 	int i;
 
 	if (mem_cgroup_disabled() || !memcg)
 		return;
 
 	for (i = 1; i < nr; i++) {
-		head[i].mem_cgroup = memcg;
+		head[i].memcg_data = (unsigned long)memcg;
 		if (kmemcg)
 			__SetPageKmemcg(head + i);
 	}
@@ -4681,7 +4671,7 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages,
 void mem_cgroup_track_foreign_dirty_slowpath(struct page *page,
 					     struct bdi_writeback *wb)
 {
-	struct mem_cgroup *memcg = page->mem_cgroup;
+	struct mem_cgroup *memcg = page_memcg(page);
 	struct memcg_cgwb_frn *frn;
 	u64 now = get_jiffies_64();
 	u64 oldest_at = now;
@@ -5658,14 +5648,14 @@ static int mem_cgroup_move_account(struct page *page,
 
 	/*
 	 * Prevent mem_cgroup_migrate() from looking at
-	 * page->mem_cgroup of its source page while we change it.
+	 * page's memory cgroup of its source page while we change it.
 	 */
 	ret = -EBUSY;
 	if (!trylock_page(page))
 		goto out;
 
 	ret = -EINVAL;
-	if (page->mem_cgroup != from)
+	if (page_memcg(page) != from)
 		goto out_unlock;
 
 	pgdat = page_pgdat(page);
@@ -5718,13 +5708,13 @@ static int mem_cgroup_move_account(struct page *page,
 	/*
 	 * All state has been migrated, let's switch to the new memcg.
 	 *
-	 * It is safe to change page->mem_cgroup here because the page
+	 * It is safe to change page's memcg here because the page
 	 * is referenced, charged, isolated, and locked: we can't race
 	 * with (un)charging, migration, LRU putback, or anything else
-	 * that would rely on a stable page->mem_cgroup.
+	 * that would rely on a stable page's memory cgroup.
 	 *
 	 * Note that lock_page_memcg is a memcg lock, not a page lock,
-	 * to save space. As soon as we switch page->mem_cgroup to a
+	 * to save space. As soon as we switch page's memory cgroup to a
 	 * new memcg that isn't locked, the above state can change
 	 * concurrently again. Make sure we're truly done with it.
 	 */
@@ -5733,7 +5723,7 @@ static int mem_cgroup_move_account(struct page *page,
 	css_get(&to->css);
 	css_put(&from->css);
 
-	page->mem_cgroup = to;
+	page->memcg_data = (unsigned long)to;
 
 	__unlock_page_memcg(from);
 
@@ -5799,7 +5789,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
 		 * mem_cgroup_move_account() checks the page is valid or
 		 * not under LRU exclusion.
 		 */
-		if (page->mem_cgroup == mc.from) {
+		if (page_memcg(page) == mc.from) {
 			ret = MC_TARGET_PAGE;
 			if (is_device_private_page(page))
 				ret = MC_TARGET_DEVICE;
@@ -5843,7 +5833,7 @@ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
 	VM_BUG_ON_PAGE(!page || !PageHead(page), page);
 	if (!(mc.flags & MOVE_ANON))
 		return ret;
-	if (page->mem_cgroup == mc.from) {
+	if (page_memcg(page) == mc.from) {
 		ret = MC_TARGET_PAGE;
 		if (target) {
 			get_page(page);
@@ -6788,12 +6778,12 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
 		/*
 		 * Every swap fault against a single page tries to charge the
 		 * page, bail as early as possible.  shmem_unuse() encounters
-		 * already charged pages, too.  page->mem_cgroup is protected
-		 * by the page lock, which serializes swap cache removal, which
-		 * in turn serializes uncharging.
+		 * already charged pages, too.  page and memcg binding is
+		 * protected by the page lock, which serializes swap cache
+		 * removal, which in turn serializes uncharging.
 		 */
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
-		if (compound_head(page)->mem_cgroup)
+		if (page_memcg(compound_head(page)))
 			goto out;
 
 		id = lookup_swap_cgroup_id(ent);
@@ -6889,21 +6879,21 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 
-	if (!page->mem_cgroup)
+	if (!page_memcg(page))
 		return;
 
 	/*
 	 * Nobody should be changing or seriously looking at
-	 * page->mem_cgroup at this point, we have fully
+	 * page_memcg(page) at this point, we have fully
 	 * exclusive access to the page.
 	 */
 
-	if (ug->memcg != page->mem_cgroup) {
+	if (ug->memcg != page_memcg(page)) {
 		if (ug->memcg) {
 			uncharge_batch(ug);
 			uncharge_gather_clear(ug);
 		}
-		ug->memcg = page->mem_cgroup;
+		ug->memcg = page_memcg(page);
 
 		/* pairs with css_put in uncharge_batch */
 		css_get(&ug->memcg->css);
@@ -6920,7 +6910,7 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 	}
 
 	ug->dummy_page = page;
-	page->mem_cgroup = NULL;
+	page->memcg_data = 0;
 	css_put(&ug->memcg->css);
 }
 
@@ -6963,7 +6953,7 @@ void mem_cgroup_uncharge(struct page *page)
 		return;
 
 	/* Don't touch page->lru of any random page, pre-check: */
-	if (!page->mem_cgroup)
+	if (!page_memcg(page))
 		return;
 
 	uncharge_gather_clear(&ug);
@@ -7013,11 +7003,11 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage)
 		return;
 
 	/* Page cache replacement: new page already charged? */
-	if (newpage->mem_cgroup)
+	if (page_memcg(newpage))
 		return;
 
 	/* Swapcache readahead pages can get replaced before being charged */
-	memcg = oldpage->mem_cgroup;
+	memcg = page_memcg(oldpage);
 	if (!memcg)
 		return;
 
@@ -7212,7 +7202,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
 	if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return;
 
-	memcg = page->mem_cgroup;
+	memcg = page_memcg(page);
 
 	/* Readahead page, never charged */
 	if (!memcg)
@@ -7233,7 +7223,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
 	VM_BUG_ON_PAGE(oldid, page);
 	mod_memcg_state(swap_memcg, MEMCG_SWAP, nr_entries);
 
-	page->mem_cgroup = NULL;
+	page->memcg_data = 0;
 
 	if (!mem_cgroup_is_root(memcg))
 		page_counter_uncharge(&memcg->memory, nr_entries);
@@ -7276,7 +7266,7 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return 0;
 
-	memcg = page->mem_cgroup;
+	memcg = page_memcg(page);
 
 	/* Readahead page, never charged */
 	if (!memcg)
@@ -7357,7 +7347,7 @@ bool mem_cgroup_swap_full(struct page *page)
 	if (cgroup_memory_noswap || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return false;
 
-	memcg = page->mem_cgroup;
+	memcg = page_memcg(page);
 	if (!memcg)
 		return false;
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0166558d3d64..ef19a693721f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1094,7 +1094,7 @@ static inline bool page_expected_state(struct page *page,
 	if (unlikely((unsigned long)page->mapping |
 			page_ref_count(page) |
 #ifdef CONFIG_MEMCG
-			(unsigned long)page->mem_cgroup |
+			(unsigned long)page_memcg(page) |
 #endif
 			(page->flags & check_flags)))
 		return false;
@@ -1119,7 +1119,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
 			bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set";
 	}
 #ifdef CONFIG_MEMCG
-	if (unlikely(page->mem_cgroup))
+	if (unlikely(page_memcg(page)))
 		bad_reason = "page still charged to cgroup";
 #endif
 	return bad_reason;
diff --git a/mm/page_io.c b/mm/page_io.c
index 96479817ffae..21f3160d39a8 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -286,12 +286,14 @@ static inline void count_swpout_vm_event(struct page *page)
 static void bio_associate_blkg_from_page(struct bio *bio, struct page *page)
 {
 	struct cgroup_subsys_state *css;
+	struct mem_cgroup *memcg;
 
-	if (!page->mem_cgroup)
+	memcg = page_memcg(page);
+	if (!memcg)
 		return;
 
 	rcu_read_lock();
-	css = cgroup_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys);
+	css = cgroup_e_css(memcg->css.cgroup, &io_cgrp_subsys);
 	bio_associate_blkg_from_css(bio, css);
 	rcu_read_unlock();
 }
diff --git a/mm/slab.h b/mm/slab.h
index 944e8b2040ae..ac43829b73d4 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -240,18 +240,17 @@ static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t fla
 static inline struct obj_cgroup **page_obj_cgroups(struct page *page)
 {
 	/*
-	 * page->mem_cgroup and page->obj_cgroups are sharing the same
+	 * Page's memory cgroup and obj_cgroups vector are sharing the same
 	 * space. To distinguish between them in case we don't know for sure
 	 * that the page is a slab page (e.g. page_cgroup_ino()), let's
 	 * always set the lowest bit of obj_cgroups.
 	 */
-	return (struct obj_cgroup **)
-		((unsigned long)page->obj_cgroups & ~0x1UL);
+	return (struct obj_cgroup **)(page->memcg_data & ~0x1UL);
 }
 
 static inline bool page_has_obj_cgroups(struct page *page)
 {
-	return ((unsigned long)page->obj_cgroups & 0x1UL);
+	return page->memcg_data & 0x1UL;
 }
 
 int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
@@ -260,7 +259,7 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 static inline void memcg_free_page_obj_cgroups(struct page *page)
 {
 	kfree(page_obj_cgroups(page));
-	page->obj_cgroups = NULL;
+	page->memcg_data = 0;
 }
 
 static inline size_t obj_full_size(struct kmem_cache *s)
diff --git a/mm/workingset.c b/mm/workingset.c
index 975a4d2dd02e..130348cbf40a 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -257,7 +257,7 @@ void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg)
 	struct lruvec *lruvec;
 	int memcgid;
 
-	/* Page is fully exclusive and pins page->mem_cgroup */
+	/* Page is fully exclusive and pins page's memory cgroup pointer */
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 	VM_BUG_ON_PAGE(page_count(page), page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5.10.y 02/11] mm: memcontrol/slab: Use helpers to access slab page's memcg_data
  2021-08-16  7:21 [PATCH 5.10.y 00/11] mm: memcontrol: fix nullptr in __mod_lruvec_page_state() Chen Huang
  2021-08-16  7:21 ` [PATCH 5.10.y 01/11] mm: memcontrol: Use helpers to read page's memcg data Chen Huang
@ 2021-08-16  7:21 ` Chen Huang
  2021-08-16  7:21 ` [PATCH 5.10.y 03/11] mm: Introduce page memcg flags Chen Huang
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Chen Huang @ 2021-08-16  7:21 UTC (permalink / raw)
  To: Roman Gushchin, Muchun Song, Wang Hai, Greg Kroah-Hartman
  Cc: linux-kernel, linux-mm, stable, Chen Huang, Andrew Morton,
	Alexei Starovoitov

From: Roman Gushchin <guro@fb.com>

To gather all direct accesses to struct page's memcg_data field in one
place, let's introduce 3 new helpers to use in the slab accounting code:

  struct obj_cgroup **page_objcgs(struct page *page);
  struct obj_cgroup **page_objcgs_check(struct page *page);
  bool set_page_objcgs(struct page *page, struct obj_cgroup **objcgs);

They are similar to the corresponding API for generic pages, except that
the setter can return false, indicating that the value has been already
set from a different thread.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lkml.kernel.org/r/20201027001657.3398190-3-guro@fb.com
Link: https://lore.kernel.org/bpf/20201201215900.3569844-3-guro@fb.com
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
---
 include/linux/memcontrol.h | 64 ++++++++++++++++++++++++++++++++++++++
 mm/memcontrol.c            |  6 ++--
 mm/slab.h                  | 35 +++++----------------
 3 files changed, 75 insertions(+), 30 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e3abc814f01b..2805fe81f97d 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -416,6 +416,70 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
 	return (struct mem_cgroup *)memcg_data;
 }
 
+#ifdef CONFIG_MEMCG_KMEM
+/*
+ * page_objcgs - get the object cgroups vector associated with a page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the object cgroups vector associated with the page,
+ * or NULL. This function assumes that the page is known to have an
+ * associated object cgroups vector. It's not safe to call this function
+ * against pages, which might have an associated memory cgroup: e.g.
+ * kernel stack pages.
+ */
+static inline struct obj_cgroup **page_objcgs(struct page *page)
+{
+	return (struct obj_cgroup **)(READ_ONCE(page->memcg_data) & ~0x1UL);
+}
+
+/*
+ * page_objcgs_check - get the object cgroups vector associated with a page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the object cgroups vector associated with the page,
+ * or NULL. This function is safe to use if the page can be directly associated
+ * with a memory cgroup.
+ */
+static inline struct obj_cgroup **page_objcgs_check(struct page *page)
+{
+	unsigned long memcg_data = READ_ONCE(page->memcg_data);
+
+	if (memcg_data && (memcg_data & 0x1UL))
+		return (struct obj_cgroup **)(memcg_data & ~0x1UL);
+
+	return NULL;
+}
+
+/*
+ * set_page_objcgs - associate a page with a object cgroups vector
+ * @page: a pointer to the page struct
+ * @objcgs: a pointer to the object cgroups vector
+ *
+ * Atomically associates a page with a vector of object cgroups.
+ */
+static inline bool set_page_objcgs(struct page *page,
+					struct obj_cgroup **objcgs)
+{
+	return !cmpxchg(&page->memcg_data, 0, (unsigned long)objcgs | 0x1UL);
+}
+#else
+static inline struct obj_cgroup **page_objcgs(struct page *page)
+{
+	return NULL;
+}
+
+static inline struct obj_cgroup **page_objcgs_check(struct page *page)
+{
+	return NULL;
+}
+
+static inline bool set_page_objcgs(struct page *page,
+					struct obj_cgroup **objcgs)
+{
+	return true;
+}
+#endif
+
 static __always_inline bool memcg_stat_item_in_bytes(int idx)
 {
 	if (idx == MEMCG_PERCPU_B)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index dea907c83d40..b9f49ddc5ced 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2908,7 +2908,7 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 	if (!vec)
 		return -ENOMEM;
 
-	if (cmpxchg(&page->memcg_data, 0, (unsigned long)vec | 0x1UL))
+	if (!set_page_objcgs(page, vec))
 		kfree(vec);
 	else
 		kmemleak_not_leak(vec);
@@ -2942,12 +2942,12 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
 	 * Memcg membership data for each individual object is saved in
 	 * the page->obj_cgroups.
 	 */
-	if (page_has_obj_cgroups(page)) {
+	if (page_objcgs_check(page)) {
 		struct obj_cgroup *objcg;
 		unsigned int off;
 
 		off = obj_to_index(page->slab_cache, page, p);
-		objcg = page_obj_cgroups(page)[off];
+		objcg = page_objcgs(page)[off];
 		if (objcg)
 			return obj_cgroup_memcg(objcg);
 
diff --git a/mm/slab.h b/mm/slab.h
index ac43829b73d4..571757eb4a8f 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -237,28 +237,12 @@ static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t fla
 }
 
 #ifdef CONFIG_MEMCG_KMEM
-static inline struct obj_cgroup **page_obj_cgroups(struct page *page)
-{
-	/*
-	 * Page's memory cgroup and obj_cgroups vector are sharing the same
-	 * space. To distinguish between them in case we don't know for sure
-	 * that the page is a slab page (e.g. page_cgroup_ino()), let's
-	 * always set the lowest bit of obj_cgroups.
-	 */
-	return (struct obj_cgroup **)(page->memcg_data & ~0x1UL);
-}
-
-static inline bool page_has_obj_cgroups(struct page *page)
-{
-	return page->memcg_data & 0x1UL;
-}
-
 int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 				 gfp_t gfp);
 
 static inline void memcg_free_page_obj_cgroups(struct page *page)
 {
-	kfree(page_obj_cgroups(page));
+	kfree(page_objcgs(page));
 	page->memcg_data = 0;
 }
 
@@ -329,7 +313,7 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
 		if (likely(p[i])) {
 			page = virt_to_head_page(p[i]);
 
-			if (!page_has_obj_cgroups(page) &&
+			if (!page_objcgs(page) &&
 			    memcg_alloc_page_obj_cgroups(page, s, flags)) {
 				obj_cgroup_uncharge(objcg, obj_full_size(s));
 				continue;
@@ -337,7 +321,7 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
 
 			off = obj_to_index(s, page, p[i]);
 			obj_cgroup_get(objcg);
-			page_obj_cgroups(page)[off] = objcg;
+			page_objcgs(page)[off] = objcg;
 			mod_objcg_state(objcg, page_pgdat(page),
 					cache_vmstat_idx(s), obj_full_size(s));
 		} else {
@@ -351,6 +335,7 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s_orig,
 					void **p, int objects)
 {
 	struct kmem_cache *s;
+	struct obj_cgroup **objcgs;
 	struct obj_cgroup *objcg;
 	struct page *page;
 	unsigned int off;
@@ -364,7 +349,8 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s_orig,
 			continue;
 
 		page = virt_to_head_page(p[i]);
-		if (!page_has_obj_cgroups(page))
+		objcgs = page_objcgs(page);
+		if (!objcgs)
 			continue;
 
 		if (!s_orig)
@@ -373,11 +359,11 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s_orig,
 			s = s_orig;
 
 		off = obj_to_index(s, page, p[i]);
-		objcg = page_obj_cgroups(page)[off];
+		objcg = objcgs[off];
 		if (!objcg)
 			continue;
 
-		page_obj_cgroups(page)[off] = NULL;
+		objcgs[off] = NULL;
 		obj_cgroup_uncharge(objcg, obj_full_size(s));
 		mod_objcg_state(objcg, page_pgdat(page), cache_vmstat_idx(s),
 				-obj_full_size(s));
@@ -386,11 +372,6 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s_orig,
 }
 
 #else /* CONFIG_MEMCG_KMEM */
-static inline bool page_has_obj_cgroups(struct page *page)
-{
-	return false;
-}
-
 static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr)
 {
 	return NULL;
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5.10.y 03/11] mm: Introduce page memcg flags
  2021-08-16  7:21 [PATCH 5.10.y 00/11] mm: memcontrol: fix nullptr in __mod_lruvec_page_state() Chen Huang
  2021-08-16  7:21 ` [PATCH 5.10.y 01/11] mm: memcontrol: Use helpers to read page's memcg data Chen Huang
  2021-08-16  7:21 ` [PATCH 5.10.y 02/11] mm: memcontrol/slab: Use helpers to access slab page's memcg_data Chen Huang
@ 2021-08-16  7:21 ` Chen Huang
  2021-08-16  7:21 ` [PATCH 5.10.y 04/11] mm: Convert page kmemcg type to a page memcg flag Chen Huang
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Chen Huang @ 2021-08-16  7:21 UTC (permalink / raw)
  To: Roman Gushchin, Muchun Song, Wang Hai, Greg Kroah-Hartman
  Cc: linux-kernel, linux-mm, stable, Chen Huang, Andrew Morton,
	Alexei Starovoitov

From: Roman Gushchin <guro@fb.com>

The lowest bit in page->memcg_data is used to distinguish between struct
memory_cgroup pointer and a pointer to a objcgs array.  All checks and
modifications of this bit are open-coded.

Let's formalize it using page memcg flags, defined in enum
page_memcg_data_flags.

Additional flags might be added later.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Link: https://lkml.kernel.org/r/20201027001657.3398190-4-guro@fb.com
Link: https://lore.kernel.org/bpf/20201201215900.3569844-4-guro@fb.com
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
---
 include/linux/memcontrol.h | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 2805fe81f97d..4a0feb9d4b82 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -343,6 +343,15 @@ struct mem_cgroup {
 
 extern struct mem_cgroup *root_mem_cgroup;
 
+enum page_memcg_data_flags {
+	/* page->memcg_data is a pointer to an objcgs vector */
+	MEMCG_DATA_OBJCGS = (1UL << 0),
+	/* the next bit after the last actual flag */
+	__NR_MEMCG_DATA_FLAGS  = (1UL << 1),
+};
+
+#define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
+
 /*
  * page_memcg - get the memory cgroup associated with a page
  * @page: a pointer to the page struct
@@ -404,13 +413,7 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
 	 */
 	unsigned long memcg_data = READ_ONCE(page->memcg_data);
 
-	/*
-	 * The lowest bit set means that memcg isn't a valid
-	 * memcg pointer, but a obj_cgroups pointer.
-	 * In this case the page is shared and doesn't belong
-	 * to any specific memory cgroup.
-	 */
-	if (memcg_data & 0x1UL)
+	if (memcg_data & MEMCG_DATA_OBJCGS)
 		return NULL;
 
 	return (struct mem_cgroup *)memcg_data;
@@ -429,7 +432,11 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
  */
 static inline struct obj_cgroup **page_objcgs(struct page *page)
 {
-	return (struct obj_cgroup **)(READ_ONCE(page->memcg_data) & ~0x1UL);
+	unsigned long memcg_data = READ_ONCE(page->memcg_data);
+
+	VM_BUG_ON_PAGE(memcg_data && !(memcg_data & MEMCG_DATA_OBJCGS), page);
+
+	return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
 /*
@@ -444,10 +451,10 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
 {
 	unsigned long memcg_data = READ_ONCE(page->memcg_data);
 
-	if (memcg_data && (memcg_data & 0x1UL))
-		return (struct obj_cgroup **)(memcg_data & ~0x1UL);
+	if (!memcg_data || !(memcg_data & MEMCG_DATA_OBJCGS))
+		return NULL;
 
-	return NULL;
+	return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
 /*
@@ -460,7 +467,8 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
 static inline bool set_page_objcgs(struct page *page,
 					struct obj_cgroup **objcgs)
 {
-	return !cmpxchg(&page->memcg_data, 0, (unsigned long)objcgs | 0x1UL);
+	return !cmpxchg(&page->memcg_data, 0, (unsigned long)objcgs |
+			MEMCG_DATA_OBJCGS);
 }
 #else
 static inline struct obj_cgroup **page_objcgs(struct page *page)
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5.10.y 04/11] mm: Convert page kmemcg type to a page memcg flag
  2021-08-16  7:21 [PATCH 5.10.y 00/11] mm: memcontrol: fix nullptr in __mod_lruvec_page_state() Chen Huang
                   ` (2 preceding siblings ...)
  2021-08-16  7:21 ` [PATCH 5.10.y 03/11] mm: Introduce page memcg flags Chen Huang
@ 2021-08-16  7:21 ` Chen Huang
  2021-08-16  7:21 ` [PATCH 5.10.y 05/11] mm: memcontrol: introduce obj_cgroup_{un}charge_pages Chen Huang
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Chen Huang @ 2021-08-16  7:21 UTC (permalink / raw)
  To: Roman Gushchin, Muchun Song, Wang Hai, Greg Kroah-Hartman
  Cc: linux-kernel, linux-mm, stable, Chen Huang, Andrew Morton,
	Alexei Starovoitov

From: Roman Gushchin <guro@fb.com>

PageKmemcg flag is currently defined as a page type (like buddy, offline,
table and guard).  Semantically it means that the page was accounted as a
kernel memory by the page allocator and has to be uncharged on the
release.

As a side effect of defining the flag as a page type, the accounted page
can't be mapped to userspace (look at page_has_type() and comments above).
In particular, this blocks the accounting of vmalloc-backed memory used
by some bpf maps, because these maps do map the memory to userspace.

One option is to fix it by complicating the access to page->mapcount,
which provides some free bits for page->page_type.

But it's way better to move this flag into page->memcg_data flags.
Indeed, the flag makes no sense without enabled memory cgroups and memory
cgroup pointer set in particular.

This commit replaces PageKmemcg() and __SetPageKmemcg() with
PageMemcgKmem() and an open-coded OR operation setting the memcg pointer
with the MEMCG_DATA_KMEM bit.  __ClearPageKmemcg() can be simple deleted,
as the whole memcg_data is zeroed at once.

As a bonus, on !CONFIG_MEMCG build the PageMemcgKmem() check will be
compiled out.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Link: https://lkml.kernel.org/r/20201027001657.3398190-5-guro@fb.com
Link: https://lore.kernel.org/bpf/20201201215900.3569844-5-guro@fb.com
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
---
 include/linux/memcontrol.h | 37 +++++++++++++++++++++++++++++++++----
 include/linux/page-flags.h | 11 ++---------
 mm/memcontrol.c            | 16 +++++-----------
 mm/page_alloc.c            |  4 ++--
 4 files changed, 42 insertions(+), 26 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4a0feb9d4b82..1a357edd2a1e 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -346,8 +346,10 @@ extern struct mem_cgroup *root_mem_cgroup;
 enum page_memcg_data_flags {
 	/* page->memcg_data is a pointer to an objcgs vector */
 	MEMCG_DATA_OBJCGS = (1UL << 0),
+	/* page has been accounted as a non-slab kernel page */
+	MEMCG_DATA_KMEM = (1UL << 1),
 	/* the next bit after the last actual flag */
-	__NR_MEMCG_DATA_FLAGS  = (1UL << 1),
+	__NR_MEMCG_DATA_FLAGS  = (1UL << 2),
 };
 
 #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
@@ -369,8 +371,12 @@ enum page_memcg_data_flags {
  */
 static inline struct mem_cgroup *page_memcg(struct page *page)
 {
+	unsigned long memcg_data = page->memcg_data;
+
 	VM_BUG_ON_PAGE(PageSlab(page), page);
-	return (struct mem_cgroup *)page->memcg_data;
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+
+	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
 /*
@@ -387,7 +393,8 @@ static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
 	VM_BUG_ON_PAGE(PageSlab(page), page);
 	WARN_ON_ONCE(!rcu_read_lock_held());
 
-	return (struct mem_cgroup *)READ_ONCE(page->memcg_data);
+	return (struct mem_cgroup *)(READ_ONCE(page->memcg_data) &
+				     ~MEMCG_DATA_FLAGS_MASK);
 }
 
 /*
@@ -416,7 +423,21 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
 	if (memcg_data & MEMCG_DATA_OBJCGS)
 		return NULL;
 
-	return (struct mem_cgroup *)memcg_data;
+	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+}
+
+/*
+ * PageMemcgKmem - check if the page has MemcgKmem flag set
+ * @page: a pointer to the page struct
+ *
+ * Checks if the page has MemcgKmem flag set. The caller must ensure that
+ * the page has an associated memory cgroup. It's not safe to call this function
+ * against some types of pages, e.g. slab pages.
+ */
+static inline bool PageMemcgKmem(struct page *page)
+{
+	VM_BUG_ON_PAGE(page->memcg_data & MEMCG_DATA_OBJCGS, page);
+	return page->memcg_data & MEMCG_DATA_KMEM;
 }
 
 #ifdef CONFIG_MEMCG_KMEM
@@ -435,6 +456,7 @@ static inline struct obj_cgroup **page_objcgs(struct page *page)
 	unsigned long memcg_data = READ_ONCE(page->memcg_data);
 
 	VM_BUG_ON_PAGE(memcg_data && !(memcg_data & MEMCG_DATA_OBJCGS), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
 
 	return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
@@ -454,6 +476,8 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
 	if (!memcg_data || !(memcg_data & MEMCG_DATA_OBJCGS))
 		return NULL;
 
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
+
 	return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
@@ -1114,6 +1138,11 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
 	return NULL;
 }
 
+static inline bool PageMemcgKmem(struct page *page)
+{
+	return false;
+}
+
 static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg)
 {
 	return true;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 4f6ba9379112..fc0e1bd48e73 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -715,9 +715,8 @@ PAGEFLAG_FALSE(DoubleMap)
 #define PAGE_MAPCOUNT_RESERVE	-128
 #define PG_buddy	0x00000080
 #define PG_offline	0x00000100
-#define PG_kmemcg	0x00000200
-#define PG_table	0x00000400
-#define PG_guard	0x00000800
+#define PG_table	0x00000200
+#define PG_guard	0x00000400
 
 #define PageType(page, flag)						\
 	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
@@ -768,12 +767,6 @@ PAGE_TYPE_OPS(Buddy, buddy)
  */
 PAGE_TYPE_OPS(Offline, offline)
 
-/*
- * If kmemcg is enabled, the buddy allocator will set PageKmemcg() on
- * pages allocated with __GFP_ACCOUNT. It gets cleared on page free.
- */
-PAGE_TYPE_OPS(Kmemcg, kmemcg)
-
 /*
  * Marks pages in use as page tables.
  */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b9f49ddc5ced..abeaf5cede74 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3098,8 +3098,8 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
 	if (memcg && !mem_cgroup_is_root(memcg)) {
 		ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
 		if (!ret) {
-			page->memcg_data = (unsigned long)memcg;
-			__SetPageKmemcg(page);
+			page->memcg_data = (unsigned long)memcg |
+				MEMCG_DATA_KMEM;
 			return 0;
 		}
 		css_put(&memcg->css);
@@ -3124,10 +3124,6 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
 	__memcg_kmem_uncharge(memcg, nr_pages);
 	page->memcg_data = 0;
 	css_put(&memcg->css);
-
-	/* slab pages do not have PageKmemcg flag set */
-	if (PageKmemcg(page))
-		__ClearPageKmemcg(page);
 }
 
 static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
@@ -6902,12 +6898,10 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 	nr_pages = compound_nr(page);
 	ug->nr_pages += nr_pages;
 
-	if (!PageKmemcg(page)) {
-		ug->pgpgout++;
-	} else {
+	if (PageMemcgKmem(page))
 		ug->nr_kmem += nr_pages;
-		__ClearPageKmemcg(page);
-	}
+	else
+		ug->pgpgout++;
 
 	ug->dummy_page = page;
 	page->memcg_data = 0;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ef19a693721f..8ec194271b91 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1216,7 +1216,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 		 * Do not let hwpoison pages hit pcplists/buddy
 		 * Untie memcg state and reset page's owner
 		 */
-		if (memcg_kmem_enabled() && PageKmemcg(page))
+		if (memcg_kmem_enabled() && PageMemcgKmem(page))
 			__memcg_kmem_uncharge_page(page, order);
 		reset_page_owner(page, order);
 		return false;
@@ -1246,7 +1246,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 	}
 	if (PageMappingFlags(page))
 		page->mapping = NULL;
-	if (memcg_kmem_enabled() && PageKmemcg(page))
+	if (memcg_kmem_enabled() && PageMemcgKmem(page))
 		__memcg_kmem_uncharge_page(page, order);
 	if (check_free)
 		bad += check_free_page(page);
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5.10.y 05/11] mm: memcontrol: introduce obj_cgroup_{un}charge_pages
  2021-08-16  7:21 [PATCH 5.10.y 00/11] mm: memcontrol: fix nullptr in __mod_lruvec_page_state() Chen Huang
                   ` (3 preceding siblings ...)
  2021-08-16  7:21 ` [PATCH 5.10.y 04/11] mm: Convert page kmemcg type to a page memcg flag Chen Huang
@ 2021-08-16  7:21 ` Chen Huang
  2021-08-16  7:21 ` [PATCH 5.10.y 06/11] mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c Chen Huang
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Chen Huang @ 2021-08-16  7:21 UTC (permalink / raw)
  To: Roman Gushchin, Muchun Song, Wang Hai, Greg Kroah-Hartman
  Cc: linux-kernel, linux-mm, stable, Chen Huang, Michal Hocko,
	Vladimir Davydov, Xiongchun Duan, Andrew Morton, Linus Torvalds

From: Muchun Song <songmuchun@bytedance.com>

We know that the unit of slab object charging is bytes, the unit of kmem
page charging is PAGE_SIZE.  If we want to reuse obj_cgroup APIs to
charge the kmem pages, we should pass PAGE_SIZE (as third parameter) to
obj_cgroup_charge().  Because the size is already PAGE_SIZE, we can skip
touch the objcg stock.  And obj_cgroup_{un}charge_pages() are introduced
to charge in units of page level.

In the latter patch, we also can reuse those two helpers to charge or
uncharge a number of kernel pages to a object cgroup.  This is just a
code movement without any functional changes.

Link: https://lkml.kernel.org/r/20210319163821.20704-3-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
---
 mm/memcontrol.c | 63 +++++++++++++++++++++++++++++++------------------
 1 file changed, 40 insertions(+), 23 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index abeaf5cede74..d45c2c97d9b8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2888,6 +2888,20 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg)
 	page->memcg_data = (unsigned long)memcg;
 }
 
+static struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg)
+{
+	struct mem_cgroup *memcg;
+
+	rcu_read_lock();
+retry:
+	memcg = obj_cgroup_memcg(objcg);
+	if (unlikely(!css_tryget(&memcg->css)))
+		goto retry;
+	rcu_read_unlock();
+
+	return memcg;
+}
+
 #ifdef CONFIG_MEMCG_KMEM
 /*
  * The allocated objcg pointers array is not accounted directly.
@@ -3032,6 +3046,29 @@ static void memcg_free_cache_id(int id)
 	ida_simple_remove(&memcg_cache_ida, id);
 }
 
+static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
+				      unsigned int nr_pages)
+{
+	struct mem_cgroup *memcg;
+
+	memcg = get_mem_cgroup_from_objcg(objcg);
+	__memcg_kmem_uncharge(memcg, nr_pages);
+	css_put(&memcg->css);
+}
+
+static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
+				   unsigned int nr_pages)
+{
+	struct mem_cgroup *memcg;
+	int ret;
+
+	memcg = get_mem_cgroup_from_objcg(objcg);
+	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
+	css_put(&memcg->css);
+
+	return ret;
+}
+
 /**
  * __memcg_kmem_charge: charge a number of kernel pages to a memcg
  * @memcg: memory cgroup to charge
@@ -3156,19 +3193,8 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
 		unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
 		unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1);
 
-		if (nr_pages) {
-			struct mem_cgroup *memcg;
-
-			rcu_read_lock();
-retry:
-			memcg = obj_cgroup_memcg(old);
-			if (unlikely(!css_tryget(&memcg->css)))
-				goto retry;
-			rcu_read_unlock();
-
-			__memcg_kmem_uncharge(memcg, nr_pages);
-			css_put(&memcg->css);
-		}
+		if (nr_pages)
+			obj_cgroup_uncharge_pages(old, nr_pages);
 
 		/*
 		 * The leftover is flushed to the centralized per-memcg value.
@@ -3226,7 +3252,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
 
 int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
 {
-	struct mem_cgroup *memcg;
 	unsigned int nr_pages, nr_bytes;
 	int ret;
 
@@ -3243,24 +3268,16 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
 	 * refill_obj_stock(), called from this function or
 	 * independently later.
 	 */
-	rcu_read_lock();
-retry:
-	memcg = obj_cgroup_memcg(objcg);
-	if (unlikely(!css_tryget(&memcg->css)))
-		goto retry;
-	rcu_read_unlock();
-
 	nr_pages = size >> PAGE_SHIFT;
 	nr_bytes = size & (PAGE_SIZE - 1);
 
 	if (nr_bytes)
 		nr_pages += 1;
 
-	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
+	ret = obj_cgroup_charge_pages(objcg, gfp, nr_pages);
 	if (!ret && nr_bytes)
 		refill_obj_stock(objcg, PAGE_SIZE - nr_bytes);
 
-	css_put(&memcg->css);
 	return ret;
 }
 
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5.10.y 06/11] mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c
  2021-08-16  7:21 [PATCH 5.10.y 00/11] mm: memcontrol: fix nullptr in __mod_lruvec_page_state() Chen Huang
                   ` (4 preceding siblings ...)
  2021-08-16  7:21 ` [PATCH 5.10.y 05/11] mm: memcontrol: introduce obj_cgroup_{un}charge_pages Chen Huang
@ 2021-08-16  7:21 ` Chen Huang
  2021-08-16  7:21 ` [PATCH 5.10.y 07/11] mm: memcontrol: change ug->dummy_page only if memcg changed Chen Huang
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Chen Huang @ 2021-08-16  7:21 UTC (permalink / raw)
  To: Roman Gushchin, Muchun Song, Wang Hai, Greg Kroah-Hartman
  Cc: linux-kernel, linux-mm, stable, Chen Huang, Michal Hocko,
	Vladimir Davydov, Xiongchun Duan, Andrew Morton, Linus Torvalds

From: Muchun Song <songmuchun@bytedance.com>

page_memcg() is not suitable for use by page_expected_state() and
page_bad_reason().  Because it can BUG_ON() for the slab pages when
CONFIG_DEBUG_VM is enabled.  As neither lru, nor kmem, nor slab page
should have anything left in there by the time the page is freed, what
we care about is whether the value of page->memcg_data is 0.  So just
directly access page->memcg_data here.

Link: https://lkml.kernel.org/r/20210319163821.20704-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8ec194271b91..12deac86a7ac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1094,7 +1094,7 @@ static inline bool page_expected_state(struct page *page,
 	if (unlikely((unsigned long)page->mapping |
 			page_ref_count(page) |
 #ifdef CONFIG_MEMCG
-			(unsigned long)page_memcg(page) |
+			page->memcg_data |
 #endif
 			(page->flags & check_flags)))
 		return false;
@@ -1119,7 +1119,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
 			bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set";
 	}
 #ifdef CONFIG_MEMCG
-	if (unlikely(page_memcg(page)))
+	if (unlikely(page->memcg_data))
 		bad_reason = "page still charged to cgroup";
 #endif
 	return bad_reason;
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5.10.y 07/11] mm: memcontrol: change ug->dummy_page only if memcg changed
  2021-08-16  7:21 [PATCH 5.10.y 00/11] mm: memcontrol: fix nullptr in __mod_lruvec_page_state() Chen Huang
                   ` (5 preceding siblings ...)
  2021-08-16  7:21 ` [PATCH 5.10.y 06/11] mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c Chen Huang
@ 2021-08-16  7:21 ` Chen Huang
  2021-08-16  7:21 ` [PATCH 5.10.y 08/11] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Chen Huang
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Chen Huang @ 2021-08-16  7:21 UTC (permalink / raw)
  To: Roman Gushchin, Muchun Song, Wang Hai, Greg Kroah-Hartman
  Cc: linux-kernel, linux-mm, stable, Chen Huang, Michal Hocko,
	Vladimir Davydov, Xiongchun Duan, Andrew Morton, Linus Torvalds

From: Muchun Song <songmuchun@bytedance.com>

Just like assignment to ug->memcg, we only need to update ug->dummy_page
if memcg changed.  So move it to there.  This is a very small
optimization.

Link: https://lkml.kernel.org/r/20210319163821.20704-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d45c2c97d9b8..73418413958c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6907,6 +6907,7 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 			uncharge_gather_clear(ug);
 		}
 		ug->memcg = page_memcg(page);
+		ug->dummy_page = page;
 
 		/* pairs with css_put in uncharge_batch */
 		css_get(&ug->memcg->css);
@@ -6920,7 +6921,6 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 	else
 		ug->pgpgout++;
 
-	ug->dummy_page = page;
 	page->memcg_data = 0;
 	css_put(&ug->memcg->css);
 }
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5.10.y 08/11] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
  2021-08-16  7:21 [PATCH 5.10.y 00/11] mm: memcontrol: fix nullptr in __mod_lruvec_page_state() Chen Huang
                   ` (6 preceding siblings ...)
  2021-08-16  7:21 ` [PATCH 5.10.y 07/11] mm: memcontrol: change ug->dummy_page only if memcg changed Chen Huang
@ 2021-08-16  7:21 ` Chen Huang
  2021-08-16  7:21 ` [PATCH 5.10.y 09/11] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages() Chen Huang
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Chen Huang @ 2021-08-16  7:21 UTC (permalink / raw)
  To: Roman Gushchin, Muchun Song, Wang Hai, Greg Kroah-Hartman
  Cc: linux-kernel, linux-mm, stable, Chen Huang, Michal Hocko,
	Vladimir Davydov, Xiongchun Duan, Christian Borntraeger,
	Andrew Morton, Linus Torvalds

From: Muchun Song <songmuchun@bytedance.com>

Since Roman's series "The new cgroup slab memory controller" applied.
All slab objects are charged via the new APIs of obj_cgroup.  The new
APIs introduce a struct obj_cgroup to charge slab objects.  It prevents
long-living objects from pinning the original memory cgroup in the
memory.  But there are still some corner objects (e.g.  allocations
larger than order-1 page on SLUB) which are not charged via the new
APIs.  Those objects (include the pages which are allocated from buddy
allocator directly) are charged as kmem pages which still hold a
reference to the memory cgroup.

We want to reuse the obj_cgroup APIs to charge the kmem pages.  If we do
that, we should store an object cgroup pointer to page->memcg_data for
the kmem pages.

Finally, page->memcg_data will have 3 different meanings.

  1) For the slab pages, page->memcg_data points to an object cgroups
     vector.

  2) For the kmem pages (exclude the slab pages), page->memcg_data
     points to an object cgroup.

  3) For the user pages (e.g. the LRU pages), page->memcg_data points
     to a memory cgroup.

We do not change the behavior of page_memcg() and page_memcg_rcu().  They
are also suitable for LRU pages and kmem pages.  Why?

Because memory allocations pinning memcgs for a long time - it exists at a
larger scale and is causing recurring problems in the real world: page
cache doesn't get reclaimed for a long time, or is used by the second,
third, fourth, ...  instance of the same job that was restarted into a new
cgroup every time.  Unreclaimable dying cgroups pile up, waste memory, and
make page reclaim very inefficient.

We can convert LRU pages and most other raw memcg pins to the objcg
direction to fix this problem, and then the page->memcg will always point
to an object cgroup pointer.  At that time, LRU pages and kmem pages will
be treated the same.  The implementation of page_memcg() will remove the
kmem page check.

This patch aims to charge the kmem pages by using the new APIs of
obj_cgroup.  Finally, the page->memcg_data of the kmem page points to an
object cgroup.  We can use the __page_objcg() to get the object cgroup
associated with a kmem page.  Or we can use page_memcg() to get the memory
cgroup associated with a kmem page, but caller must ensure that the
returned memcg won't be released (e.g.  acquire the rcu_read_lock or
css_set_lock).

  Link: https://lkml.kernel.org/r/20210401030141.37061-1-songmuchun@bytedance.com

Link: https://lkml.kernel.org/r/20210319163821.20704-6-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
[songmuchun@bytedance.com: fix forget to obtain the ref to objcg in split_page_memcg]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Conflicts:
	include/linux/memcontrol.h
	mm/memcontrol.c
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
---
 include/linux/memcontrol.h | 126 ++++++++++++++++++++++++++++++-------
 mm/memcontrol.c            | 117 ++++++++++++++++------------------
 2 files changed, 157 insertions(+), 86 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1a357edd2a1e..b8bb5d37d4ad 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -354,6 +354,62 @@ enum page_memcg_data_flags {
 
 #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
 
+static inline bool PageMemcgKmem(struct page *page);
+
+/*
+ * After the initialization objcg->memcg is always pointing at
+ * a valid memcg, but can be atomically swapped to the parent memcg.
+ *
+ * The caller must ensure that the returned memcg won't be released:
+ * e.g. acquire the rcu_read_lock or css_set_lock.
+ */
+static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
+{
+	return READ_ONCE(objcg->memcg);
+}
+
+/*
+ * __page_memcg - get the memory cgroup associated with a non-kmem page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the memory cgroup associated with the page,
+ * or NULL. This function assumes that the page is known to have a
+ * proper memory cgroup pointer. It's not safe to call this function
+ * against some type of pages, e.g. slab pages or ex-slab pages or
+ * kmem pages.
+ */
+static inline struct mem_cgroup *__page_memcg(struct page *page)
+{
+	unsigned long memcg_data = page->memcg_data;
+
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
+
+	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+}
+
+/*
+ * __page_objcg - get the object cgroup associated with a kmem page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the object cgroup associated with the page,
+ * or NULL. This function assumes that the page is known to have a
+ * proper object cgroup pointer. It's not safe to call this function
+ * against some type of pages, e.g. slab pages or ex-slab pages or
+ * LRU pages.
+ */
+static inline struct obj_cgroup *__page_objcg(struct page *page)
+{
+	unsigned long memcg_data = page->memcg_data;
+
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+	VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page);
+
+	return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+}
+
 /*
  * page_memcg - get the memory cgroup associated with a page
  * @page: a pointer to the page struct
@@ -363,20 +419,23 @@ enum page_memcg_data_flags {
  * proper memory cgroup pointer. It's not safe to call this function
  * against some type of pages, e.g. slab pages or ex-slab pages.
  *
- * Any of the following ensures page and memcg binding stability:
+ * For a non-kmem page any of the following ensures page and memcg binding
+ * stability:
+ *
  * - the page lock
  * - LRU isolation
  * - lock_page_memcg()
  * - exclusive reference
+ *
+ * For a kmem page a caller should hold an rcu read lock to protect memcg
+ * associated with a kmem page from being released.
  */
 static inline struct mem_cgroup *page_memcg(struct page *page)
 {
-	unsigned long memcg_data = page->memcg_data;
-
-	VM_BUG_ON_PAGE(PageSlab(page), page);
-	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
-
-	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+	if (PageMemcgKmem(page))
+		return obj_cgroup_memcg(__page_objcg(page));
+	else
+		return __page_memcg(page);
 }
 
 /*
@@ -390,11 +449,19 @@ static inline struct mem_cgroup *page_memcg(struct page *page)
  */
 static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
 {
+	unsigned long memcg_data = READ_ONCE(page->memcg_data);
+
 	VM_BUG_ON_PAGE(PageSlab(page), page);
 	WARN_ON_ONCE(!rcu_read_lock_held());
 
-	return (struct mem_cgroup *)(READ_ONCE(page->memcg_data) &
-				     ~MEMCG_DATA_FLAGS_MASK);
+	if (memcg_data & MEMCG_DATA_KMEM) {
+		struct obj_cgroup *objcg;
+
+		objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+		return obj_cgroup_memcg(objcg);
+	}
+
+	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
 /*
@@ -402,15 +469,21 @@ static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
  * @page: a pointer to the page struct
  *
  * Returns a pointer to the memory cgroup associated with the page,
- * or NULL. This function unlike page_memcg() can take any  page
+ * or NULL. This function unlike page_memcg() can take any page
  * as an argument. It has to be used in cases when it's not known if a page
- * has an associated memory cgroup pointer or an object cgroups vector.
+ * has an associated memory cgroup pointer or an object cgroups vector or
+ * an object cgroup.
+ *
+ * For a non-kmem page any of the following ensures page and memcg binding
+ * stability:
  *
- * Any of the following ensures page and memcg binding stability:
  * - the page lock
  * - LRU isolation
  * - lock_page_memcg()
  * - exclusive reference
+ *
+ * For a kmem page a caller should hold an rcu read lock to protect memcg
+ * associated with a kmem page from being released.
  */
 static inline struct mem_cgroup *page_memcg_check(struct page *page)
 {
@@ -423,6 +496,13 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
 	if (memcg_data & MEMCG_DATA_OBJCGS)
 		return NULL;
 
+	if (memcg_data & MEMCG_DATA_KMEM) {
+		struct obj_cgroup *objcg;
+
+		objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+		return obj_cgroup_memcg(objcg);
+	}
+
 	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
@@ -681,21 +761,15 @@ static inline void obj_cgroup_get(struct obj_cgroup *objcg)
 	percpu_ref_get(&objcg->refcnt);
 }
 
-static inline void obj_cgroup_put(struct obj_cgroup *objcg)
+static inline void obj_cgroup_get_many(struct obj_cgroup *objcg,
+				       unsigned long nr)
 {
-	percpu_ref_put(&objcg->refcnt);
+	percpu_ref_get_many(&objcg->refcnt, nr);
 }
 
-/*
- * After the initialization objcg->memcg is always pointing at
- * a valid memcg, but can be atomically swapped to the parent memcg.
- *
- * The caller must ensure that the returned memcg won't be released:
- * e.g. acquire the rcu_read_lock or css_set_lock.
- */
-static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
+static inline void obj_cgroup_put(struct obj_cgroup *objcg)
 {
-	return READ_ONCE(objcg->memcg);
+	percpu_ref_put(&objcg->refcnt);
 }
 
 static inline void mem_cgroup_put(struct mem_cgroup *memcg)
@@ -1007,18 +1081,22 @@ static inline void __mod_lruvec_page_state(struct page *page,
 					   enum node_stat_item idx, int val)
 {
 	struct page *head = compound_head(page); /* rmap on tail pages */
-	struct mem_cgroup *memcg = page_memcg(head);
+	struct mem_cgroup *memcg;
 	pg_data_t *pgdat = page_pgdat(page);
 	struct lruvec *lruvec;
 
+	rcu_read_lock();
+	memcg = page_memcg(head);
 	/* Untracked pages have no memcg, no lruvec. Update only the node */
 	if (!memcg) {
+		rcu_read_unlock();
 		__mod_node_page_state(pgdat, idx, val);
 		return;
 	}
 
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 	__mod_lruvec_state(lruvec, idx, val);
+	rcu_read_unlock();
 }
 
 static inline void mod_lruvec_page_state(struct page *page,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 73418413958c..738051c79cdd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1068,20 +1068,6 @@ static __always_inline struct mem_cgroup *active_memcg(void)
 		return current->active_memcg;
 }
 
-static __always_inline struct mem_cgroup *get_active_memcg(void)
-{
-	struct mem_cgroup *memcg;
-
-	rcu_read_lock();
-	memcg = active_memcg();
-	/* remote memcg must hold a ref. */
-	if (memcg && WARN_ON_ONCE(!css_tryget(&memcg->css)))
-		memcg = root_mem_cgroup;
-	rcu_read_unlock();
-
-	return memcg;
-}
-
 static __always_inline bool memcg_kmem_bypass(void)
 {
 	/* Allow remote memcg charging from any context. */
@@ -1095,20 +1081,6 @@ static __always_inline bool memcg_kmem_bypass(void)
 	return false;
 }
 
-/**
- * If active memcg is set, do not fallback to current->mm->memcg.
- */
-static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void)
-{
-	if (memcg_kmem_bypass())
-		return NULL;
-
-	if (unlikely(active_memcg()))
-		return get_active_memcg();
-
-	return get_mem_cgroup_from_mm(current->mm);
-}
-
 /**
  * mem_cgroup_iter - iterate over memory cgroup hierarchy
  * @root: hierarchy root
@@ -3128,18 +3100,18 @@ void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages)
  */
 int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
 {
-	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
 	int ret = 0;
 
-	memcg = get_mem_cgroup_from_current();
-	if (memcg && !mem_cgroup_is_root(memcg)) {
-		ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
+	objcg = get_obj_cgroup_from_current();
+	if (objcg) {
+		ret = obj_cgroup_charge_pages(objcg, gfp, 1 << order);
 		if (!ret) {
-			page->memcg_data = (unsigned long)memcg |
+			page->memcg_data = (unsigned long)objcg |
 				MEMCG_DATA_KMEM;
 			return 0;
 		}
-		css_put(&memcg->css);
+		obj_cgroup_put(objcg);
 	}
 	return ret;
 }
@@ -3151,16 +3123,16 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
  */
 void __memcg_kmem_uncharge_page(struct page *page, int order)
 {
-	struct mem_cgroup *memcg = page_memcg(page);
+	struct obj_cgroup *objcg;
 	unsigned int nr_pages = 1 << order;
 
-	if (!memcg)
+	if (!PageMemcgKmem(page))
 		return;
 
-	VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
-	__memcg_kmem_uncharge(memcg, nr_pages);
+	objcg = __page_objcg(page);
+	obj_cgroup_uncharge_pages(objcg, nr_pages);
 	page->memcg_data = 0;
-	css_put(&memcg->css);
+	obj_cgroup_put(objcg);
 }
 
 static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
@@ -3299,12 +3271,13 @@ void split_page_memcg(struct page *head, unsigned int nr)
 	if (mem_cgroup_disabled() || !memcg)
 		return;
 
-	for (i = 1; i < nr; i++) {
-		head[i].memcg_data = (unsigned long)memcg;
-		if (kmemcg)
-			__SetPageKmemcg(head + i);
-	}
-	css_get_many(&memcg->css, nr - 1);
+	for (i = 1; i < nr; i++)
+		head[i].memcg_data = head->memcg_data;
+
+	if (PageMemcgKmem(head))
+		obj_cgroup_get_many(__page_objcg(head), nr - 1);
+	else
+		css_get_many(&memcg->css, nr - 1);
 }
 
 #ifdef CONFIG_MEMCG_SWAP
@@ -6852,7 +6825,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
 
 struct uncharge_gather {
 	struct mem_cgroup *memcg;
-	unsigned long nr_pages;
+	unsigned long nr_memory;
 	unsigned long pgpgout;
 	unsigned long nr_kmem;
 	struct page *dummy_page;
@@ -6867,10 +6840,10 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 {
 	unsigned long flags;
 
-	if (!mem_cgroup_is_root(ug->memcg)) {
-		page_counter_uncharge(&ug->memcg->memory, ug->nr_pages);
+	if (ug->nr_memory) {
+		page_counter_uncharge(&ug->memcg->memory, ug->nr_memory);
 		if (do_memsw_account())
-			page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages);
+			page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
 		if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
 			page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
 		memcg_oom_recover(ug->memcg);
@@ -6878,7 +6851,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 
 	local_irq_save(flags);
 	__count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout);
-	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages);
+	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_memory);
 	memcg_check_events(ug->memcg, ug->dummy_page);
 	local_irq_restore(flags);
 
@@ -6889,40 +6862,60 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 {
 	unsigned long nr_pages;
+	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
 
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 
-	if (!page_memcg(page))
-		return;
-
 	/*
 	 * Nobody should be changing or seriously looking at
-	 * page_memcg(page) at this point, we have fully
+	 * page memcg or objcg at this point, we have fully
 	 * exclusive access to the page.
 	 */
+	if (PageMemcgKmem(page)) {
+		objcg = __page_objcg(page);
+		/*
+		 * This get matches the put at the end of the function and
+		 * kmem pages do not hold memcg references anymore.
+		 */
+		memcg = get_mem_cgroup_from_objcg(objcg);
+	} else {
+		memcg = __page_memcg(page);
+	}
+
+	if (!memcg)
+		return;
 
-	if (ug->memcg != page_memcg(page)) {
+	if (ug->memcg != memcg) {
 		if (ug->memcg) {
 			uncharge_batch(ug);
 			uncharge_gather_clear(ug);
 		}
-		ug->memcg = page_memcg(page);
+		ug->memcg = memcg;
 		ug->dummy_page = page;
 
 		/* pairs with css_put in uncharge_batch */
-		css_get(&ug->memcg->css);
+		css_get(&memcg->css);
 	}
 
 	nr_pages = compound_nr(page);
-	ug->nr_pages += nr_pages;
 
-	if (PageMemcgKmem(page))
+	if (PageMemcgKmem(page)) {
+		ug->nr_memory += nr_pages;
 		ug->nr_kmem += nr_pages;
-	else
+
+		page->memcg_data = 0;
+		obj_cgroup_put(objcg);
+	} else {
+		/* LRU pages aren't accounted at the root level */
+		if (!mem_cgroup_is_root(memcg))
+			ug->nr_memory += nr_pages;
 		ug->pgpgout++;
 
-	page->memcg_data = 0;
-	css_put(&ug->memcg->css);
+		page->memcg_data = 0;
+	}
+
+	css_put(&memcg->css);
 }
 
 static void uncharge_list(struct list_head *page_list)
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5.10.y 09/11] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages()
  2021-08-16  7:21 [PATCH 5.10.y 00/11] mm: memcontrol: fix nullptr in __mod_lruvec_page_state() Chen Huang
                   ` (7 preceding siblings ...)
  2021-08-16  7:21 ` [PATCH 5.10.y 08/11] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Chen Huang
@ 2021-08-16  7:21 ` Chen Huang
  2021-08-16  7:21 ` [PATCH 5.10.y 10/11] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM Chen Huang
  2021-08-16  7:21 ` [PATCH 5.10.y 11/11] mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() Chen Huang
  10 siblings, 0 replies; 20+ messages in thread
From: Chen Huang @ 2021-08-16  7:21 UTC (permalink / raw)
  To: Roman Gushchin, Muchun Song, Wang Hai, Greg Kroah-Hartman
  Cc: linux-kernel, linux-mm, stable, Chen Huang, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Xiongchun Duan, Andrew Morton,
	Linus Torvalds

From: Muchun Song <songmuchun@bytedance.com>

There is only one user of __memcg_kmem_charge(), so manually inline
__memcg_kmem_charge() to obj_cgroup_charge_pages().  Similarly manually
inline __memcg_kmem_uncharge() into obj_cgroup_uncharge_pages() and call
obj_cgroup_uncharge_pages() in obj_cgroup_release().

This is just code cleanup without any functionality changes.

Link: https://lkml.kernel.org/r/20210319163821.20704-7-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Conflicts:
	mm/memcontrol.c
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
---
 mm/memcontrol.c | 60 +++++++++++++++++++++----------------------------
 1 file changed, 26 insertions(+), 34 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 738051c79cdd..8932f986bf2e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -252,6 +252,9 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr)
 #ifdef CONFIG_MEMCG_KMEM
 extern spinlock_t css_set_lock;
 
+static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
+				      unsigned int nr_pages);
+
 static void obj_cgroup_release(struct percpu_ref *ref)
 {
 	struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt);
@@ -287,7 +290,7 @@ static void obj_cgroup_release(struct percpu_ref *ref)
 	spin_lock_irqsave(&css_set_lock, flags);
 	memcg = obj_cgroup_memcg(objcg);
 	if (nr_pages)
-		__memcg_kmem_uncharge(memcg, nr_pages);
+		obj_cgroup_uncharge_pages(objcg, nr_pages);
 	list_del(&objcg->list);
 	mem_cgroup_put(memcg);
 	spin_unlock_irqrestore(&css_set_lock, flags);
@@ -3018,46 +3021,45 @@ static void memcg_free_cache_id(int id)
 	ida_simple_remove(&memcg_cache_ida, id);
 }
 
+/*
+ * obj_cgroup_uncharge_pages: uncharge a number of kernel pages from a objcg
+ * @objcg: object cgroup to uncharge
+ * @nr_pages: number of pages to uncharge
+ */
 static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
 				      unsigned int nr_pages)
 {
 	struct mem_cgroup *memcg;
 
 	memcg = get_mem_cgroup_from_objcg(objcg);
-	__memcg_kmem_uncharge(memcg, nr_pages);
-	css_put(&memcg->css);
-}
 
-static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
-				   unsigned int nr_pages)
-{
-	struct mem_cgroup *memcg;
-	int ret;
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
+		page_counter_uncharge(&memcg->kmem, nr_pages);
+	refill_stock(memcg, nr_pages);
 
-	memcg = get_mem_cgroup_from_objcg(objcg);
-	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
 	css_put(&memcg->css);
-
-	return ret;
 }
 
-/**
- * __memcg_kmem_charge: charge a number of kernel pages to a memcg
- * @memcg: memory cgroup to charge
+/*
+ * obj_cgroup_charge_pages: charge a number of kernel pages to a objcg
+ * @objcg: object cgroup to charge
  * @gfp: reclaim mode
  * @nr_pages: number of pages to charge
  *
  * Returns 0 on success, an error code on failure.
  */
-int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp,
-			unsigned int nr_pages)
+static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
+				   unsigned int nr_pages)
 {
 	struct page_counter *counter;
+	struct mem_cgroup *memcg;
 	int ret;
 
+	memcg = get_mem_cgroup_from_objcg(objcg);
+
 	ret = try_charge(memcg, gfp, nr_pages);
 	if (ret)
-		return ret;
+		goto out;
 
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
 	    !page_counter_try_charge(&memcg->kmem, nr_pages, &counter)) {
@@ -3069,25 +3071,15 @@ int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp,
 		 */
 		if (gfp & __GFP_NOFAIL) {
 			page_counter_charge(&memcg->kmem, nr_pages);
-			return 0;
+			goto out;
 		}
 		cancel_charge(memcg, nr_pages);
-		return -ENOMEM;
+		ret = -ENOMEM;
 	}
-	return 0;
-}
-
-/**
- * __memcg_kmem_uncharge: uncharge a number of kernel pages from a memcg
- * @memcg: memcg to uncharge
- * @nr_pages: number of pages to uncharge
- */
-void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
-	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
-		page_counter_uncharge(&memcg->kmem, nr_pages);
+out:
+	css_put(&memcg->css);
 
-	refill_stock(memcg, nr_pages);
+	return ret;
 }
 
 /**
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5.10.y 10/11] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM
  2021-08-16  7:21 [PATCH 5.10.y 00/11] mm: memcontrol: fix nullptr in __mod_lruvec_page_state() Chen Huang
                   ` (8 preceding siblings ...)
  2021-08-16  7:21 ` [PATCH 5.10.y 09/11] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages() Chen Huang
@ 2021-08-16  7:21 ` Chen Huang
  2021-08-16  7:21 ` [PATCH 5.10.y 11/11] mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() Chen Huang
  10 siblings, 0 replies; 20+ messages in thread
From: Chen Huang @ 2021-08-16  7:21 UTC (permalink / raw)
  To: Roman Gushchin, Muchun Song, Wang Hai, Greg Kroah-Hartman
  Cc: linux-kernel, linux-mm, stable, Chen Huang, Michal Hocko,
	Vladimir Davydov, Xiongchun Duan, Andrew Morton, Linus Torvalds

From: Muchun Song <songmuchun@bytedance.com>

The page only can be marked as kmem when CONFIG_MEMCG_KMEM is enabled.
So move PageMemcgKmem() to the scope of the CONFIG_MEMCG_KMEM.

As a bonus, on !CONFIG_MEMCG_KMEM build some code can be compiled out.

Link: https://lkml.kernel.org/r/20210319163821.20704-8-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
---
 include/linux/memcontrol.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index b8bb5d37d4ad..f07463cf7dac 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -506,6 +506,7 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
 	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
+#ifdef CONFIG_MEMCG_KMEM
 /*
  * PageMemcgKmem - check if the page has MemcgKmem flag set
  * @page: a pointer to the page struct
@@ -520,7 +521,6 @@ static inline bool PageMemcgKmem(struct page *page)
 	return page->memcg_data & MEMCG_DATA_KMEM;
 }
 
-#ifdef CONFIG_MEMCG_KMEM
 /*
  * page_objcgs - get the object cgroups vector associated with a page
  * @page: a pointer to the page struct
@@ -575,6 +575,11 @@ static inline bool set_page_objcgs(struct page *page,
 			MEMCG_DATA_OBJCGS);
 }
 #else
+static inline bool PageMemcgKmem(struct page *page)
+{
+	return false;
+}
+
 static inline struct obj_cgroup **page_objcgs(struct page *page)
 {
 	return NULL;
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5.10.y 11/11] mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()
  2021-08-16  7:21 [PATCH 5.10.y 00/11] mm: memcontrol: fix nullptr in __mod_lruvec_page_state() Chen Huang
                   ` (9 preceding siblings ...)
  2021-08-16  7:21 ` [PATCH 5.10.y 10/11] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM Chen Huang
@ 2021-08-16  7:21 ` Chen Huang
  10 siblings, 0 replies; 20+ messages in thread
From: Chen Huang @ 2021-08-16  7:21 UTC (permalink / raw)
  To: Roman Gushchin, Muchun Song, Wang Hai, Greg Kroah-Hartman
  Cc: linux-kernel, linux-mm, stable, Chen Huang, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Vlastimil Babka,
	Johannes Weiner, Alexei Starovoitov, Andrew Morton,
	Linus Torvalds

From: Wang Hai <wanghai38@huawei.com>

When I use kfree_rcu() to free a large memory allocated by kmalloc_node(),
the following dump occurs.

  BUG: kernel NULL pointer dereference, address: 0000000000000020
  [...]
  Oops: 0000 [#1] SMP
  [...]
  Workqueue: events kfree_rcu_work
  RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline]
  RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline]
  RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363
  [...]
  Call Trace:
    kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293
    kfree_bulk include/linux/slab.h:413 [inline]
    kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300
    process_one_work+0x207/0x530 kernel/workqueue.c:2276
    worker_thread+0x320/0x610 kernel/workqueue.c:2422
    kthread+0x13d/0x160 kernel/kthread.c:313
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

When kmalloc_node() a large memory, page is allocated, not slab, so when
freeing memory via kfree_rcu(), this large memory should not be used by
memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for
slab.

Using page_objcgs_check() instead of page_objcgs() in
memcg_slab_free_hook() to fix this bug.

Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com
Fixes: 270c6a71460e ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
---
 mm/slab.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.h b/mm/slab.h
index 571757eb4a8f..9759992c720c 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -349,7 +349,7 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s_orig,
 			continue;
 
 		page = virt_to_head_page(p[i]);
-		objcgs = page_objcgs(page);
+		objcgs = page_objcgs_check(page);
 		if (!objcgs)
 			continue;
 
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 5.10.y 01/11] mm: memcontrol: Use helpers to read page's memcg data
  2021-08-16  7:21 ` [PATCH 5.10.y 01/11] mm: memcontrol: Use helpers to read page's memcg data Chen Huang
@ 2021-08-16  8:34   ` Greg Kroah-Hartman
  2021-08-16 13:21     ` Chen Huang
  0 siblings, 1 reply; 20+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-16  8:34 UTC (permalink / raw)
  To: Chen Huang
  Cc: Roman Gushchin, Muchun Song, Wang Hai, linux-kernel, linux-mm,
	stable, Andrew Morton, Alexei Starovoitov

On Mon, Aug 16, 2021 at 07:21:37AM +0000, Chen Huang wrote:
> From: Roman Gushchin <guro@fb.com>

What is the git commit id of this patch in Linus's tree?

> 
> Patch series "mm: allow mapping accounted kernel pages to userspace", v6.
> 
> Currently a non-slab kernel page which has been charged to a memory cgroup
> can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
> flag is defined as a page type (like buddy, offline, etc), so it takes a
> bit from a page->mapped counter.  Pages with a type set can't be mapped to
> userspace.
> 
> But in general the kmemcg flag has nothing to do with mapping to
> userspace.  It only means that the page has been accounted by the page
> allocator, so it has to be properly uncharged on release.
> 
> Some bpf maps are mapping the vmalloc-based memory to userspace, and their
> memory can't be accounted because of this implementation detail.
> 
> This patchset removes this limitation by moving the PageKmemcg flag into
> one of the free bits of the page->mem_cgroup pointer.  Also it formalizes
> accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
> adds several checks and removes a couple of obsolete functions.  As the
> result the code became more robust with fewer open-coded bit tricks.
> 
> This patch (of 4):
> 
> Currently there are many open-coded reads of the page->mem_cgroup pointer,
> as well as a couple of read helpers, which are barely used.
> 
> It creates an obstacle on a way to reuse some bits of the pointer for
> storing additional bits of information.  In fact, we already do this for
> slab pages, where the last bit indicates that a pointer has an attached
> vector of objcg pointers instead of a regular memcg pointer.
> 
> This commits uses 2 existing helpers and introduces a new helper to
> converts all read sides to calls of these helpers:
>   struct mem_cgroup *page_memcg(struct page *page);
>   struct mem_cgroup *page_memcg_rcu(struct page *page);
>   struct mem_cgroup *page_memcg_check(struct page *page);
> 
> page_memcg_check() is intended to be used in cases when the page can be a
> slab page and have a memcg pointer pointing at objcg vector.  It does
> check the lowest bit, and if set, returns NULL.  page_memcg() contains a
> VM_BUG_ON_PAGE() check for the page not being a slab page.
> 
> To make sure nobody uses a direct access, struct page's
> mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
> Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
> Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
> 
> Conflicts:
> 	mm/memcontrol.c

The "Conflicts:" lines should be removed.

Please fix up the patch series and resubmit.  But note, this seems
really intrusive, are you sure these are all needed?

What UIO driver are you using that is showing problems like this?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5.10.y 01/11] mm: memcontrol: Use helpers to read page's memcg data
  2021-08-16  8:34   ` Greg Kroah-Hartman
@ 2021-08-16 13:21     ` Chen Huang
  2021-08-16 13:35       ` Greg Kroah-Hartman
  2021-08-18  2:02       ` Roman Gushchin
  0 siblings, 2 replies; 20+ messages in thread
From: Chen Huang @ 2021-08-16 13:21 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Roman Gushchin, Muchun Song, Wang Hai, linux-kernel, linux-mm,
	stable, Andrew Morton, Alexei Starovoitov



在 2021/8/16 16:34, Greg Kroah-Hartman 写道:
> On Mon, Aug 16, 2021 at 07:21:37AM +0000, Chen Huang wrote:
>> From: Roman Gushchin <guro@fb.com>
> 
> What is the git commit id of this patch in Linus's tree?
> 
>>
>> Patch series "mm: allow mapping accounted kernel pages to userspace", v6.
>>
>> Currently a non-slab kernel page which has been charged to a memory cgroup
>> can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
>> flag is defined as a page type (like buddy, offline, etc), so it takes a
>> bit from a page->mapped counter.  Pages with a type set can't be mapped to
>> userspace.
>>
>> But in general the kmemcg flag has nothing to do with mapping to
>> userspace.  It only means that the page has been accounted by the page
>> allocator, so it has to be properly uncharged on release.
>>
>> Some bpf maps are mapping the vmalloc-based memory to userspace, and their
>> memory can't be accounted because of this implementation detail.
>>
>> This patchset removes this limitation by moving the PageKmemcg flag into
>> one of the free bits of the page->mem_cgroup pointer.  Also it formalizes
>> accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
>> adds several checks and removes a couple of obsolete functions.  As the
>> result the code became more robust with fewer open-coded bit tricks.
>>
>> This patch (of 4):
>>
>> Currently there are many open-coded reads of the page->mem_cgroup pointer,
>> as well as a couple of read helpers, which are barely used.
>>
>> It creates an obstacle on a way to reuse some bits of the pointer for
>> storing additional bits of information.  In fact, we already do this for
>> slab pages, where the last bit indicates that a pointer has an attached
>> vector of objcg pointers instead of a regular memcg pointer.
>>
>> This commits uses 2 existing helpers and introduces a new helper to
>> converts all read sides to calls of these helpers:
>>   struct mem_cgroup *page_memcg(struct page *page);
>>   struct mem_cgroup *page_memcg_rcu(struct page *page);
>>   struct mem_cgroup *page_memcg_check(struct page *page);
>>
>> page_memcg_check() is intended to be used in cases when the page can be a
>> slab page and have a memcg pointer pointing at objcg vector.  It does
>> check the lowest bit, and if set, returns NULL.  page_memcg() contains a
>> VM_BUG_ON_PAGE() check for the page not being a slab page.
>>
>> To make sure nobody uses a direct access, struct page's
>> mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
>>
>> Signed-off-by: Roman Gushchin <guro@fb.com>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> Reviewed-by: Shakeel Butt <shakeelb@google.com>
>> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>> Acked-by: Michal Hocko <mhocko@suse.com>
>> Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
>> Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
>> Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
>>
>> Conflicts:
>> 	mm/memcontrol.c
> 
> The "Conflicts:" lines should be removed.
> 
> Please fix up the patch series and resubmit.  But note, this seems
> really intrusive, are you sure these are all needed?
> 

OK,I will resend the patchset.
Roman Gushchin's patchset formalize accesses to the page->mem_cgroup and
page->obj_cgroups. But for LRU pages and most other raw memcg, they may
pin to a memcg cgroup pointer, which should always point to an object cgroup
pointer. That's the problem I met. And Muchun Song's patchset fix this.
So I think these are all needed.

> What UIO driver are you using that is showing problems like this?
> 

The UIO driver is my own driver, and it's creation likes this:
First, we register a device
	pdev = platform_device_register_simple("uio_driver,0, NULL, 0);
and use uio_info to describe the UIO driver, the page is alloced and used
for uio_vma_fault
	info->mem[0].addr = (phys_addr_t) kzalloc(PAGE_SIZE, GFP_ATOMIC);
then we register the UIO driver.
	uio_register_device(&pdev->dev, info)

Thanks!

> thanks,
> 
> greg k-h
> 
> .
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5.10.y 01/11] mm: memcontrol: Use helpers to read page's memcg data
  2021-08-16 13:21     ` Chen Huang
@ 2021-08-16 13:35       ` Greg Kroah-Hartman
  2021-08-17  1:45         ` Chen Huang
  2021-08-18  2:02       ` Roman Gushchin
  1 sibling, 1 reply; 20+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-16 13:35 UTC (permalink / raw)
  To: Chen Huang
  Cc: Roman Gushchin, Muchun Song, Wang Hai, linux-kernel, linux-mm,
	stable, Andrew Morton, Alexei Starovoitov

On Mon, Aug 16, 2021 at 09:21:11PM +0800, Chen Huang wrote:
> 
> 
> 在 2021/8/16 16:34, Greg Kroah-Hartman 写道:
> > On Mon, Aug 16, 2021 at 07:21:37AM +0000, Chen Huang wrote:
> >> From: Roman Gushchin <guro@fb.com>
> > 
> > What is the git commit id of this patch in Linus's tree?
> > 
> >>
> >> Patch series "mm: allow mapping accounted kernel pages to userspace", v6.
> >>
> >> Currently a non-slab kernel page which has been charged to a memory cgroup
> >> can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
> >> flag is defined as a page type (like buddy, offline, etc), so it takes a
> >> bit from a page->mapped counter.  Pages with a type set can't be mapped to
> >> userspace.
> >>
> >> But in general the kmemcg flag has nothing to do with mapping to
> >> userspace.  It only means that the page has been accounted by the page
> >> allocator, so it has to be properly uncharged on release.
> >>
> >> Some bpf maps are mapping the vmalloc-based memory to userspace, and their
> >> memory can't be accounted because of this implementation detail.
> >>
> >> This patchset removes this limitation by moving the PageKmemcg flag into
> >> one of the free bits of the page->mem_cgroup pointer.  Also it formalizes
> >> accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
> >> adds several checks and removes a couple of obsolete functions.  As the
> >> result the code became more robust with fewer open-coded bit tricks.
> >>
> >> This patch (of 4):
> >>
> >> Currently there are many open-coded reads of the page->mem_cgroup pointer,
> >> as well as a couple of read helpers, which are barely used.
> >>
> >> It creates an obstacle on a way to reuse some bits of the pointer for
> >> storing additional bits of information.  In fact, we already do this for
> >> slab pages, where the last bit indicates that a pointer has an attached
> >> vector of objcg pointers instead of a regular memcg pointer.
> >>
> >> This commits uses 2 existing helpers and introduces a new helper to
> >> converts all read sides to calls of these helpers:
> >>   struct mem_cgroup *page_memcg(struct page *page);
> >>   struct mem_cgroup *page_memcg_rcu(struct page *page);
> >>   struct mem_cgroup *page_memcg_check(struct page *page);
> >>
> >> page_memcg_check() is intended to be used in cases when the page can be a
> >> slab page and have a memcg pointer pointing at objcg vector.  It does
> >> check the lowest bit, and if set, returns NULL.  page_memcg() contains a
> >> VM_BUG_ON_PAGE() check for the page not being a slab page.
> >>
> >> To make sure nobody uses a direct access, struct page's
> >> mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
> >>
> >> Signed-off-by: Roman Gushchin <guro@fb.com>
> >> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> >> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> >> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> >> Acked-by: Michal Hocko <mhocko@suse.com>
> >> Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
> >> Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
> >> Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
> >>
> >> Conflicts:
> >> 	mm/memcontrol.c
> > 
> > The "Conflicts:" lines should be removed.
> > 
> > Please fix up the patch series and resubmit.  But note, this seems
> > really intrusive, are you sure these are all needed?
> > 
> 
> OK,I will resend the patchset.
> Roman Gushchin's patchset formalize accesses to the page->mem_cgroup and
> page->obj_cgroups. But for LRU pages and most other raw memcg, they may
> pin to a memcg cgroup pointer, which should always point to an object cgroup
> pointer. That's the problem I met. And Muchun Song's patchset fix this.
> So I think these are all needed.

What in-tree driver causes this to happen and under what workload?

> > What UIO driver are you using that is showing problems like this?
> > 
> 
> The UIO driver is my own driver, and it's creation likes this:
> First, we register a device
> 	pdev = platform_device_register_simple("uio_driver,0, NULL, 0);
> and use uio_info to describe the UIO driver, the page is alloced and used
> for uio_vma_fault
> 	info->mem[0].addr = (phys_addr_t) kzalloc(PAGE_SIZE, GFP_ATOMIC);

That is not a physical address, and is not what the uio api is for at
all.  Please do not abuse it that way.

> then we register the UIO driver.
> 	uio_register_device(&pdev->dev, info)

So no in-tree drivers are having problems with the existing code, only
fake ones?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5.10.y 01/11] mm: memcontrol: Use helpers to read page's memcg data
  2021-08-16 13:35       ` Greg Kroah-Hartman
@ 2021-08-17  1:45         ` Chen Huang
  2021-08-17  6:14           ` Greg Kroah-Hartman
  0 siblings, 1 reply; 20+ messages in thread
From: Chen Huang @ 2021-08-17  1:45 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Roman Gushchin, Muchun Song, Wang Hai, linux-kernel, linux-mm,
	stable, Andrew Morton, Alexei Starovoitov



在 2021/8/16 21:35, Greg Kroah-Hartman 写道:
> On Mon, Aug 16, 2021 at 09:21:11PM +0800, Chen Huang wrote:
>>
>>
>> 在 2021/8/16 16:34, Greg Kroah-Hartman 写道:
>>> On Mon, Aug 16, 2021 at 07:21:37AM +0000, Chen Huang wrote:
>>>> From: Roman Gushchin <guro@fb.com>
>>>
>>> What is the git commit id of this patch in Linus's tree?
>>>
>>>>
>>>> Patch series "mm: allow mapping accounted kernel pages to userspace", v6.
>>>>
>>>> Currently a non-slab kernel page which has been charged to a memory cgroup
>>>> can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
>>>> flag is defined as a page type (like buddy, offline, etc), so it takes a
>>>> bit from a page->mapped counter.  Pages with a type set can't be mapped to
>>>> userspace.
>>>>
>>>> But in general the kmemcg flag has nothing to do with mapping to
>>>> userspace.  It only means that the page has been accounted by the page
>>>> allocator, so it has to be properly uncharged on release.
>>>>
>>>> Some bpf maps are mapping the vmalloc-based memory to userspace, and their
>>>> memory can't be accounted because of this implementation detail.
>>>>
>>>> This patchset removes this limitation by moving the PageKmemcg flag into
>>>> one of the free bits of the page->mem_cgroup pointer.  Also it formalizes
>>>> accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
>>>> adds several checks and removes a couple of obsolete functions.  As the
>>>> result the code became more robust with fewer open-coded bit tricks.
>>>>
>>>> This patch (of 4):
>>>>
>>>> Currently there are many open-coded reads of the page->mem_cgroup pointer,
>>>> as well as a couple of read helpers, which are barely used.
>>>>
>>>> It creates an obstacle on a way to reuse some bits of the pointer for
>>>> storing additional bits of information.  In fact, we already do this for
>>>> slab pages, where the last bit indicates that a pointer has an attached
>>>> vector of objcg pointers instead of a regular memcg pointer.
>>>>
>>>> This commits uses 2 existing helpers and introduces a new helper to
>>>> converts all read sides to calls of these helpers:
>>>>   struct mem_cgroup *page_memcg(struct page *page);
>>>>   struct mem_cgroup *page_memcg_rcu(struct page *page);
>>>>   struct mem_cgroup *page_memcg_check(struct page *page);
>>>>
>>>> page_memcg_check() is intended to be used in cases when the page can be a
>>>> slab page and have a memcg pointer pointing at objcg vector.  It does
>>>> check the lowest bit, and if set, returns NULL.  page_memcg() contains a
>>>> VM_BUG_ON_PAGE() check for the page not being a slab page.
>>>>
>>>> To make sure nobody uses a direct access, struct page's
>>>> mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
>>>>
>>>> Signed-off-by: Roman Gushchin <guro@fb.com>
>>>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>>>> Reviewed-by: Shakeel Butt <shakeelb@google.com>
>>>> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>>>> Acked-by: Michal Hocko <mhocko@suse.com>
>>>> Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
>>>> Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
>>>> Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
>>>>
>>>> Conflicts:
>>>> 	mm/memcontrol.c
>>>
>>> The "Conflicts:" lines should be removed.
>>>
>>> Please fix up the patch series and resubmit.  But note, this seems
>>> really intrusive, are you sure these are all needed?
>>>
>>
>> OK,I will resend the patchset.
>> Roman Gushchin's patchset formalize accesses to the page->mem_cgroup and
>> page->obj_cgroups. But for LRU pages and most other raw memcg, they may
>> pin to a memcg cgroup pointer, which should always point to an object cgroup
>> pointer. That's the problem I met. And Muchun Song's patchset fix this.
>> So I think these are all needed.
> 
> What in-tree driver causes this to happen and under what workload?
> 
>>> What UIO driver are you using that is showing problems like this?
>>>
>>
>> The UIO driver is my own driver, and it's creation likes this:
>> First, we register a device
>> 	pdev = platform_device_register_simple("uio_driver,0, NULL, 0);
>> and use uio_info to describe the UIO driver, the page is alloced and used
>> for uio_vma_fault
>> 	info->mem[0].addr = (phys_addr_t) kzalloc(PAGE_SIZE, GFP_ATOMIC);
> 
> That is not a physical address, and is not what the uio api is for at
> all.  Please do not abuse it that way.
> 
>> then we register the UIO driver.
>> 	uio_register_device(&pdev->dev, info)
> 
> So no in-tree drivers are having problems with the existing code, only
> fake ones?

Yes, but the nullptr porblem may not just about uio driver. For now, page struct
has a union
union {
	struct mem_cgroup *mem_cgroup;
	struct obj_cgroup **obj_cgroups;
};
For the slab pages, the union info should belong to obj_cgroups. And for user
pages, it should belong to mem_cgroup. When a slab page changes its obj_cgroups,
then another user page which is in the same compound page of that slab page will
gets the wrong mem_cgroup in __mod_lruvec_page_state(), and will trigger nullptr
in mem_cgroup_lruvec(). Correct me if I'm wrong. Thanks!

static inline void __mod_lruvec_page_state(struct page *page,
                                           enum node_stat_item idx, int val)
{
        struct page *head = compound_head(page); /* rmap on tail pages */
        pg_data_t *pgdat = page_pgdat(page);
        struct lruvec *lruvec;

        /* Untracked pages have no memcg, no lruvec. Update only the node */
        if (!head->mem_cgroup) {
                __mod_node_page_state(pgdat, idx, val);
                return;
        }

        lruvec = mem_cgroup_lruvec(head->mem_cgroup, pgdat);
        __mod_lruvec_state(lruvec, idx, val);
}

> 
> thanks,
> 
> greg k-h
> .
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5.10.y 01/11] mm: memcontrol: Use helpers to read page's memcg data
  2021-08-17  1:45         ` Chen Huang
@ 2021-08-17  6:14           ` Greg Kroah-Hartman
  2021-08-19 11:43             ` Chen Huang
  0 siblings, 1 reply; 20+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-17  6:14 UTC (permalink / raw)
  To: Chen Huang
  Cc: Roman Gushchin, Muchun Song, Wang Hai, linux-kernel, linux-mm,
	stable, Andrew Morton, Alexei Starovoitov

On Tue, Aug 17, 2021 at 09:45:00AM +0800, Chen Huang wrote:
> 
> 
> 在 2021/8/16 21:35, Greg Kroah-Hartman 写道:
> > On Mon, Aug 16, 2021 at 09:21:11PM +0800, Chen Huang wrote:
> >>
> >>
> >> 在 2021/8/16 16:34, Greg Kroah-Hartman 写道:
> >>> On Mon, Aug 16, 2021 at 07:21:37AM +0000, Chen Huang wrote:
> >>>> From: Roman Gushchin <guro@fb.com>
> >>>
> >>> What is the git commit id of this patch in Linus's tree?
> >>>
> >>>>
> >>>> Patch series "mm: allow mapping accounted kernel pages to userspace", v6.
> >>>>
> >>>> Currently a non-slab kernel page which has been charged to a memory cgroup
> >>>> can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
> >>>> flag is defined as a page type (like buddy, offline, etc), so it takes a
> >>>> bit from a page->mapped counter.  Pages with a type set can't be mapped to
> >>>> userspace.
> >>>>
> >>>> But in general the kmemcg flag has nothing to do with mapping to
> >>>> userspace.  It only means that the page has been accounted by the page
> >>>> allocator, so it has to be properly uncharged on release.
> >>>>
> >>>> Some bpf maps are mapping the vmalloc-based memory to userspace, and their
> >>>> memory can't be accounted because of this implementation detail.
> >>>>
> >>>> This patchset removes this limitation by moving the PageKmemcg flag into
> >>>> one of the free bits of the page->mem_cgroup pointer.  Also it formalizes
> >>>> accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
> >>>> adds several checks and removes a couple of obsolete functions.  As the
> >>>> result the code became more robust with fewer open-coded bit tricks.
> >>>>
> >>>> This patch (of 4):
> >>>>
> >>>> Currently there are many open-coded reads of the page->mem_cgroup pointer,
> >>>> as well as a couple of read helpers, which are barely used.
> >>>>
> >>>> It creates an obstacle on a way to reuse some bits of the pointer for
> >>>> storing additional bits of information.  In fact, we already do this for
> >>>> slab pages, where the last bit indicates that a pointer has an attached
> >>>> vector of objcg pointers instead of a regular memcg pointer.
> >>>>
> >>>> This commits uses 2 existing helpers and introduces a new helper to
> >>>> converts all read sides to calls of these helpers:
> >>>>   struct mem_cgroup *page_memcg(struct page *page);
> >>>>   struct mem_cgroup *page_memcg_rcu(struct page *page);
> >>>>   struct mem_cgroup *page_memcg_check(struct page *page);
> >>>>
> >>>> page_memcg_check() is intended to be used in cases when the page can be a
> >>>> slab page and have a memcg pointer pointing at objcg vector.  It does
> >>>> check the lowest bit, and if set, returns NULL.  page_memcg() contains a
> >>>> VM_BUG_ON_PAGE() check for the page not being a slab page.
> >>>>
> >>>> To make sure nobody uses a direct access, struct page's
> >>>> mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
> >>>>
> >>>> Signed-off-by: Roman Gushchin <guro@fb.com>
> >>>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >>>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> >>>> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> >>>> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> >>>> Acked-by: Michal Hocko <mhocko@suse.com>
> >>>> Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
> >>>> Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
> >>>> Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
> >>>>
> >>>> Conflicts:
> >>>> 	mm/memcontrol.c
> >>>
> >>> The "Conflicts:" lines should be removed.
> >>>
> >>> Please fix up the patch series and resubmit.  But note, this seems
> >>> really intrusive, are you sure these are all needed?
> >>>
> >>
> >> OK,I will resend the patchset.
> >> Roman Gushchin's patchset formalize accesses to the page->mem_cgroup and
> >> page->obj_cgroups. But for LRU pages and most other raw memcg, they may
> >> pin to a memcg cgroup pointer, which should always point to an object cgroup
> >> pointer. That's the problem I met. And Muchun Song's patchset fix this.
> >> So I think these are all needed.
> > 
> > What in-tree driver causes this to happen and under what workload?
> > 
> >>> What UIO driver are you using that is showing problems like this?
> >>>
> >>
> >> The UIO driver is my own driver, and it's creation likes this:
> >> First, we register a device
> >> 	pdev = platform_device_register_simple("uio_driver,0, NULL, 0);
> >> and use uio_info to describe the UIO driver, the page is alloced and used
> >> for uio_vma_fault
> >> 	info->mem[0].addr = (phys_addr_t) kzalloc(PAGE_SIZE, GFP_ATOMIC);
> > 
> > That is not a physical address, and is not what the uio api is for at
> > all.  Please do not abuse it that way.
> > 
> >> then we register the UIO driver.
> >> 	uio_register_device(&pdev->dev, info)
> > 
> > So no in-tree drivers are having problems with the existing code, only
> > fake ones?
> 
> Yes, but the nullptr porblem may not just about uio driver. For now, page struct
> has a union
> union {
> 	struct mem_cgroup *mem_cgroup;
> 	struct obj_cgroup **obj_cgroups;
> };
> For the slab pages, the union info should belong to obj_cgroups. And for user
> pages, it should belong to mem_cgroup. When a slab page changes its obj_cgroups,
> then another user page which is in the same compound page of that slab page will
> gets the wrong mem_cgroup in __mod_lruvec_page_state(), and will trigger nullptr
> in mem_cgroup_lruvec(). Correct me if I'm wrong. Thanks!

And how can that be triggered by a user in the 5.10.y kernel tree at the
moment?

I'm all for fixing problems, but this one does not seem like it is an
actual issue for the 5.10 tree right now.  Am I missing something?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5.10.y 01/11] mm: memcontrol: Use helpers to read page's memcg data
  2021-08-16 13:21     ` Chen Huang
  2021-08-16 13:35       ` Greg Kroah-Hartman
@ 2021-08-18  2:02       ` Roman Gushchin
  1 sibling, 0 replies; 20+ messages in thread
From: Roman Gushchin @ 2021-08-18  2:02 UTC (permalink / raw)
  To: Chen Huang
  Cc: Greg Kroah-Hartman, Muchun Song, Wang Hai, linux-kernel,
	linux-mm, stable, Andrew Morton, Alexei Starovoitov

On Mon, Aug 16, 2021 at 09:21:11PM +0800, Chen Huang wrote:
> 
> 
> 在 2021/8/16 16:34, Greg Kroah-Hartman 写道:
> > On Mon, Aug 16, 2021 at 07:21:37AM +0000, Chen Huang wrote:
> >> From: Roman Gushchin <guro@fb.com>
> > 
> > What is the git commit id of this patch in Linus's tree?
> > 
> >>
> >> Patch series "mm: allow mapping accounted kernel pages to userspace", v6.
> >>
> >> Currently a non-slab kernel page which has been charged to a memory cgroup
> >> can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
> >> flag is defined as a page type (like buddy, offline, etc), so it takes a
> >> bit from a page->mapped counter.  Pages with a type set can't be mapped to
> >> userspace.
> >>
> >> But in general the kmemcg flag has nothing to do with mapping to
> >> userspace.  It only means that the page has been accounted by the page
> >> allocator, so it has to be properly uncharged on release.
> >>
> >> Some bpf maps are mapping the vmalloc-based memory to userspace, and their
> >> memory can't be accounted because of this implementation detail.
> >>
> >> This patchset removes this limitation by moving the PageKmemcg flag into
> >> one of the free bits of the page->mem_cgroup pointer.  Also it formalizes
> >> accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
> >> adds several checks and removes a couple of obsolete functions.  As the
> >> result the code became more robust with fewer open-coded bit tricks.
> >>
> >> This patch (of 4):
> >>
> >> Currently there are many open-coded reads of the page->mem_cgroup pointer,
> >> as well as a couple of read helpers, which are barely used.
> >>
> >> It creates an obstacle on a way to reuse some bits of the pointer for
> >> storing additional bits of information.  In fact, we already do this for
> >> slab pages, where the last bit indicates that a pointer has an attached
> >> vector of objcg pointers instead of a regular memcg pointer.
> >>
> >> This commits uses 2 existing helpers and introduces a new helper to
> >> converts all read sides to calls of these helpers:
> >>   struct mem_cgroup *page_memcg(struct page *page);
> >>   struct mem_cgroup *page_memcg_rcu(struct page *page);
> >>   struct mem_cgroup *page_memcg_check(struct page *page);
> >>
> >> page_memcg_check() is intended to be used in cases when the page can be a
> >> slab page and have a memcg pointer pointing at objcg vector.  It does
> >> check the lowest bit, and if set, returns NULL.  page_memcg() contains a
> >> VM_BUG_ON_PAGE() check for the page not being a slab page.
> >>
> >> To make sure nobody uses a direct access, struct page's
> >> mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
> >>
> >> Signed-off-by: Roman Gushchin <guro@fb.com>
> >> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> >> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> >> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> >> Acked-by: Michal Hocko <mhocko@suse.com>
> >> Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
> >> Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
> >> Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
> >>
> >> Conflicts:
> >> 	mm/memcontrol.c
> > 
> > The "Conflicts:" lines should be removed.
> > 
> > Please fix up the patch series and resubmit.  But note, this seems
> > really intrusive, are you sure these are all needed?
> >

Sorry for jumping in late.

I agree that the patchset is quite intrusive and I really doubt we
need to backport it. The main goal of my patchset was to enable
memory accounting for bpf maps (which can be mmaped to userspace).
I don't see why we need it otherwise.
Muchun's patchset unifies the treatment of non-slab kernel objects
(e.g. large kmallocs) with slab objects and prevents them to pin
dying memory cgroups. However the problem existed for years and
I doubt we need it in 5.10 so badly.

> 
> OK,I will resend the patchset.
> Roman Gushchin's patchset formalize accesses to the page->mem_cgroup and
> page->obj_cgroups. But for LRU pages and most other raw memcg, they may
> pin to a memcg cgroup pointer, which should always point to an object cgroup
> pointer. That's the problem I met. And Muchun Song's patchset fix this.
> So I think these are all needed.

Can you, please, be more specific here?

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5.10.y 01/11] mm: memcontrol: Use helpers to read page's memcg data
  2021-08-17  6:14           ` Greg Kroah-Hartman
@ 2021-08-19 11:43             ` Chen Huang
  2021-08-19 14:55               ` Greg Kroah-Hartman
  0 siblings, 1 reply; 20+ messages in thread
From: Chen Huang @ 2021-08-19 11:43 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Roman Gushchin
  Cc: Muchun Song, Wang Hai, linux-kernel, linux-mm, stable,
	Andrew Morton, Alexei Starovoitov



在 2021/8/17 14:14, Greg Kroah-Hartman 写道:
> On Tue, Aug 17, 2021 at 09:45:00AM +0800, Chen Huang wrote:
>>
>>
>> 在 2021/8/16 21:35, Greg Kroah-Hartman 写道:
>>> On Mon, Aug 16, 2021 at 09:21:11PM +0800, Chen Huang wrote:
>>>>
>>>>
>>>> 在 2021/8/16 16:34, Greg Kroah-Hartman 写道:
>>>>> On Mon, Aug 16, 2021 at 07:21:37AM +0000, Chen Huang wrote:
>>>>>> From: Roman Gushchin <guro@fb.com>
>>>>>
>>>>> What is the git commit id of this patch in Linus's tree?
>>>>>
>>>>>>
>>>>>> Patch series "mm: allow mapping accounted kernel pages to userspace", v6.
>>>>>>
>>>>>> Currently a non-slab kernel page which has been charged to a memory cgroup
>>>>>> can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
>>>>>> flag is defined as a page type (like buddy, offline, etc), so it takes a
>>>>>> bit from a page->mapped counter.  Pages with a type set can't be mapped to
>>>>>> userspace.
>>>>>>
>>>>>> But in general the kmemcg flag has nothing to do with mapping to
>>>>>> userspace.  It only means that the page has been accounted by the page
>>>>>> allocator, so it has to be properly uncharged on release.
>>>>>>
>>>>>> Some bpf maps are mapping the vmalloc-based memory to userspace, and their
>>>>>> memory can't be accounted because of this implementation detail.
>>>>>>
>>>>>> This patchset removes this limitation by moving the PageKmemcg flag into
>>>>>> one of the free bits of the page->mem_cgroup pointer.  Also it formalizes
>>>>>> accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
>>>>>> adds several checks and removes a couple of obsolete functions.  As the
>>>>>> result the code became more robust with fewer open-coded bit tricks.
>>>>>>
>>>>>> This patch (of 4):
>>>>>>
>>>>>> Currently there are many open-coded reads of the page->mem_cgroup pointer,
>>>>>> as well as a couple of read helpers, which are barely used.
>>>>>>
>>>>>> It creates an obstacle on a way to reuse some bits of the pointer for
>>>>>> storing additional bits of information.  In fact, we already do this for
>>>>>> slab pages, where the last bit indicates that a pointer has an attached
>>>>>> vector of objcg pointers instead of a regular memcg pointer.
>>>>>>
>>>>>> This commits uses 2 existing helpers and introduces a new helper to
>>>>>> converts all read sides to calls of these helpers:
>>>>>>   struct mem_cgroup *page_memcg(struct page *page);
>>>>>>   struct mem_cgroup *page_memcg_rcu(struct page *page);
>>>>>>   struct mem_cgroup *page_memcg_check(struct page *page);
>>>>>>
>>>>>> page_memcg_check() is intended to be used in cases when the page can be a
>>>>>> slab page and have a memcg pointer pointing at objcg vector.  It does
>>>>>> check the lowest bit, and if set, returns NULL.  page_memcg() contains a
>>>>>> VM_BUG_ON_PAGE() check for the page not being a slab page.
>>>>>>
>>>>>> To make sure nobody uses a direct access, struct page's
>>>>>> mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
>>>>>>
>>>>>> Signed-off-by: Roman Gushchin <guro@fb.com>
>>>>>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>>>>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>>>>>> Reviewed-by: Shakeel Butt <shakeelb@google.com>
>>>>>> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>>>>>> Acked-by: Michal Hocko <mhocko@suse.com>
>>>>>> Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
>>>>>> Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
>>>>>> Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
>>>>>>
>>>>>> Conflicts:
>>>>>> 	mm/memcontrol.c
>>>>>
>>>>> The "Conflicts:" lines should be removed.
>>>>>
>>>>> Please fix up the patch series and resubmit.  But note, this seems
>>>>> really intrusive, are you sure these are all needed?
>>>>>
>>>>
>>>> OK,I will resend the patchset.
>>>> Roman Gushchin's patchset formalize accesses to the page->mem_cgroup and
>>>> page->obj_cgroups. But for LRU pages and most other raw memcg, they may
>>>> pin to a memcg cgroup pointer, which should always point to an object cgroup
>>>> pointer. That's the problem I met. And Muchun Song's patchset fix this.
>>>> So I think these are all needed.
>>>
>>> What in-tree driver causes this to happen and under what workload?
>>>
>>>>> What UIO driver are you using that is showing problems like this?
>>>>>
>>>>
>>>> The UIO driver is my own driver, and it's creation likes this:
>>>> First, we register a device
>>>> 	pdev = platform_device_register_simple("uio_driver,0, NULL, 0);
>>>> and use uio_info to describe the UIO driver, the page is alloced and used
>>>> for uio_vma_fault
>>>> 	info->mem[0].addr = (phys_addr_t) kzalloc(PAGE_SIZE, GFP_ATOMIC);
>>>
>>> That is not a physical address, and is not what the uio api is for at
>>> all.  Please do not abuse it that way.
>>>
>>>> then we register the UIO driver.
>>>> 	uio_register_device(&pdev->dev, info)
>>>
>>> So no in-tree drivers are having problems with the existing code, only
>>> fake ones?
>>
>> Yes, but the nullptr porblem may not just about uio driver. For now, page struct
>> has a union
>> union {
>> 	struct mem_cgroup *mem_cgroup;
>> 	struct obj_cgroup **obj_cgroups;
>> };
>> For the slab pages, the union info should belong to obj_cgroups. And for user
>> pages, it should belong to mem_cgroup. When a slab page changes its obj_cgroups,
>> then another user page which is in the same compound page of that slab page will
>> gets the wrong mem_cgroup in __mod_lruvec_page_state(), and will trigger nullptr
>> in mem_cgroup_lruvec(). Correct me if I'm wrong. Thanks!
> 
> And how can that be triggered by a user in the 5.10.y kernel tree at the
> moment?
> 
> I'm all for fixing problems, but this one does not seem like it is an
> actual issue for the 5.10 tree right now.  Am I missing something?
> 
> thanks,
> 
Sorry, it maybe just the problem of my own driver.
Please ignore the patchset.

Thanks!
> greg k-h
> 
> .
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5.10.y 01/11] mm: memcontrol: Use helpers to read page's memcg data
  2021-08-19 11:43             ` Chen Huang
@ 2021-08-19 14:55               ` Greg Kroah-Hartman
  0 siblings, 0 replies; 20+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-19 14:55 UTC (permalink / raw)
  To: Chen Huang
  Cc: Roman Gushchin, Muchun Song, Wang Hai, linux-kernel, linux-mm,
	stable, Andrew Morton, Alexei Starovoitov

On Thu, Aug 19, 2021 at 07:43:37PM +0800, Chen Huang wrote:
> 
> 
> 在 2021/8/17 14:14, Greg Kroah-Hartman 写道:
> > On Tue, Aug 17, 2021 at 09:45:00AM +0800, Chen Huang wrote:
> >>
> >>
> >> 在 2021/8/16 21:35, Greg Kroah-Hartman 写道:
> >>> On Mon, Aug 16, 2021 at 09:21:11PM +0800, Chen Huang wrote:
> >>>>
> >>>>
> >>>> 在 2021/8/16 16:34, Greg Kroah-Hartman 写道:
> >>>>> On Mon, Aug 16, 2021 at 07:21:37AM +0000, Chen Huang wrote:
> >>>>>> From: Roman Gushchin <guro@fb.com>
> >>>>>
> >>>>> What is the git commit id of this patch in Linus's tree?
> >>>>>
> >>>>>>
> >>>>>> Patch series "mm: allow mapping accounted kernel pages to userspace", v6.
> >>>>>>
> >>>>>> Currently a non-slab kernel page which has been charged to a memory cgroup
> >>>>>> can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
> >>>>>> flag is defined as a page type (like buddy, offline, etc), so it takes a
> >>>>>> bit from a page->mapped counter.  Pages with a type set can't be mapped to
> >>>>>> userspace.
> >>>>>>
> >>>>>> But in general the kmemcg flag has nothing to do with mapping to
> >>>>>> userspace.  It only means that the page has been accounted by the page
> >>>>>> allocator, so it has to be properly uncharged on release.
> >>>>>>
> >>>>>> Some bpf maps are mapping the vmalloc-based memory to userspace, and their
> >>>>>> memory can't be accounted because of this implementation detail.
> >>>>>>
> >>>>>> This patchset removes this limitation by moving the PageKmemcg flag into
> >>>>>> one of the free bits of the page->mem_cgroup pointer.  Also it formalizes
> >>>>>> accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
> >>>>>> adds several checks and removes a couple of obsolete functions.  As the
> >>>>>> result the code became more robust with fewer open-coded bit tricks.
> >>>>>>
> >>>>>> This patch (of 4):
> >>>>>>
> >>>>>> Currently there are many open-coded reads of the page->mem_cgroup pointer,
> >>>>>> as well as a couple of read helpers, which are barely used.
> >>>>>>
> >>>>>> It creates an obstacle on a way to reuse some bits of the pointer for
> >>>>>> storing additional bits of information.  In fact, we already do this for
> >>>>>> slab pages, where the last bit indicates that a pointer has an attached
> >>>>>> vector of objcg pointers instead of a regular memcg pointer.
> >>>>>>
> >>>>>> This commits uses 2 existing helpers and introduces a new helper to
> >>>>>> converts all read sides to calls of these helpers:
> >>>>>>   struct mem_cgroup *page_memcg(struct page *page);
> >>>>>>   struct mem_cgroup *page_memcg_rcu(struct page *page);
> >>>>>>   struct mem_cgroup *page_memcg_check(struct page *page);
> >>>>>>
> >>>>>> page_memcg_check() is intended to be used in cases when the page can be a
> >>>>>> slab page and have a memcg pointer pointing at objcg vector.  It does
> >>>>>> check the lowest bit, and if set, returns NULL.  page_memcg() contains a
> >>>>>> VM_BUG_ON_PAGE() check for the page not being a slab page.
> >>>>>>
> >>>>>> To make sure nobody uses a direct access, struct page's
> >>>>>> mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
> >>>>>>
> >>>>>> Signed-off-by: Roman Gushchin <guro@fb.com>
> >>>>>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >>>>>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> >>>>>> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> >>>>>> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> >>>>>> Acked-by: Michal Hocko <mhocko@suse.com>
> >>>>>> Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
> >>>>>> Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
> >>>>>> Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
> >>>>>>
> >>>>>> Conflicts:
> >>>>>> 	mm/memcontrol.c
> >>>>>
> >>>>> The "Conflicts:" lines should be removed.
> >>>>>
> >>>>> Please fix up the patch series and resubmit.  But note, this seems
> >>>>> really intrusive, are you sure these are all needed?
> >>>>>
> >>>>
> >>>> OK,I will resend the patchset.
> >>>> Roman Gushchin's patchset formalize accesses to the page->mem_cgroup and
> >>>> page->obj_cgroups. But for LRU pages and most other raw memcg, they may
> >>>> pin to a memcg cgroup pointer, which should always point to an object cgroup
> >>>> pointer. That's the problem I met. And Muchun Song's patchset fix this.
> >>>> So I think these are all needed.
> >>>
> >>> What in-tree driver causes this to happen and under what workload?
> >>>
> >>>>> What UIO driver are you using that is showing problems like this?
> >>>>>
> >>>>
> >>>> The UIO driver is my own driver, and it's creation likes this:
> >>>> First, we register a device
> >>>> 	pdev = platform_device_register_simple("uio_driver,0, NULL, 0);
> >>>> and use uio_info to describe the UIO driver, the page is alloced and used
> >>>> for uio_vma_fault
> >>>> 	info->mem[0].addr = (phys_addr_t) kzalloc(PAGE_SIZE, GFP_ATOMIC);
> >>>
> >>> That is not a physical address, and is not what the uio api is for at
> >>> all.  Please do not abuse it that way.
> >>>
> >>>> then we register the UIO driver.
> >>>> 	uio_register_device(&pdev->dev, info)
> >>>
> >>> So no in-tree drivers are having problems with the existing code, only
> >>> fake ones?
> >>
> >> Yes, but the nullptr porblem may not just about uio driver. For now, page struct
> >> has a union
> >> union {
> >> 	struct mem_cgroup *mem_cgroup;
> >> 	struct obj_cgroup **obj_cgroups;
> >> };
> >> For the slab pages, the union info should belong to obj_cgroups. And for user
> >> pages, it should belong to mem_cgroup. When a slab page changes its obj_cgroups,
> >> then another user page which is in the same compound page of that slab page will
> >> gets the wrong mem_cgroup in __mod_lruvec_page_state(), and will trigger nullptr
> >> in mem_cgroup_lruvec(). Correct me if I'm wrong. Thanks!
> > 
> > And how can that be triggered by a user in the 5.10.y kernel tree at the
> > moment?
> > 
> > I'm all for fixing problems, but this one does not seem like it is an
> > actual issue for the 5.10 tree right now.  Am I missing something?
> > 
> > thanks,
> > 
> Sorry, it maybe just the problem of my own driver.

What driver is it?  Please submit it to be included in the tree so it
can be reviewed properly and bugs like this can be fixed :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-08-19 14:56 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-16  7:21 [PATCH 5.10.y 00/11] mm: memcontrol: fix nullptr in __mod_lruvec_page_state() Chen Huang
2021-08-16  7:21 ` [PATCH 5.10.y 01/11] mm: memcontrol: Use helpers to read page's memcg data Chen Huang
2021-08-16  8:34   ` Greg Kroah-Hartman
2021-08-16 13:21     ` Chen Huang
2021-08-16 13:35       ` Greg Kroah-Hartman
2021-08-17  1:45         ` Chen Huang
2021-08-17  6:14           ` Greg Kroah-Hartman
2021-08-19 11:43             ` Chen Huang
2021-08-19 14:55               ` Greg Kroah-Hartman
2021-08-18  2:02       ` Roman Gushchin
2021-08-16  7:21 ` [PATCH 5.10.y 02/11] mm: memcontrol/slab: Use helpers to access slab page's memcg_data Chen Huang
2021-08-16  7:21 ` [PATCH 5.10.y 03/11] mm: Introduce page memcg flags Chen Huang
2021-08-16  7:21 ` [PATCH 5.10.y 04/11] mm: Convert page kmemcg type to a page memcg flag Chen Huang
2021-08-16  7:21 ` [PATCH 5.10.y 05/11] mm: memcontrol: introduce obj_cgroup_{un}charge_pages Chen Huang
2021-08-16  7:21 ` [PATCH 5.10.y 06/11] mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c Chen Huang
2021-08-16  7:21 ` [PATCH 5.10.y 07/11] mm: memcontrol: change ug->dummy_page only if memcg changed Chen Huang
2021-08-16  7:21 ` [PATCH 5.10.y 08/11] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Chen Huang
2021-08-16  7:21 ` [PATCH 5.10.y 09/11] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages() Chen Huang
2021-08-16  7:21 ` [PATCH 5.10.y 10/11] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM Chen Huang
2021-08-16  7:21 ` [PATCH 5.10.y 11/11] mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() Chen Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).