linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] mm: allow mapping accounted kernel pages to userspace
@ 2020-10-01  0:27 Roman Gushchin
  2020-10-01  0:27 ` [PATCH v4 1/4] mm: memcontrol: use helpers to access page's memcg data Roman Gushchin
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Roman Gushchin @ 2020-10-01  0:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, linux-kernel,
	linux-mm, kernel-team, Roman Gushchin

Currently a non-slab kernel page which has been charged to a memory
cgroup can't be mapped to userspace. The underlying reason is simple:
PageKmemcg flag is defined as a page type (like buddy, offline, etc),
so it takes a bit from a page->mapped counter. Pages with a type set
can't be mapped to userspace.

But in general the kmemcg flag has nothing to do with mapping to
userspace. It only means that the page has been accounted by the page
allocator, so it has to be properly uncharged on release.

Some bpf maps are mapping the vmalloc-based memory to userspace, and
their memory can't be accounted because of this implementation detail.

This patchset removes this limitation by moving the PageKmemcg flag
into one of the free bits of the page->mem_cgroup pointer. Also it
formalizes all accesses to the page->mem_cgroup and page->obj_cgroups
using new helpers, adds several checks and removes a couple of obsolete
functions. As the result the code became more robust with fewer
open-coded bits tricks.

v4:
  - more cosmetic changes, by Johannes

v3:
  - READ_ONCE() in page_memcg_rcu() and page_objcgs*(), by Johannes
  - many cosmetic changes and renamings, by Johannes

v2:
  - fixed a bug in page_obj_cgroups_check()
  - moved some definitions between patches, by Shakeel
  - dropped the memcg flags mutual exclusion requirement, by Shakeel

v1:
  - added and fixed comments, by Shakeel
  - added some VM_BUG_ON() checks
  - fixed the debug output format of page->memcg_data


Roman Gushchin (4):
  mm: memcontrol: use helpers to access page's memcg data
  mm: memcontrol/slab: use helpers to access slab page's memcg_data
  mm: introduce page memcg flags
  mm: convert page kmemcg type to a page memcg flag

 fs/buffer.c                      |   2 +-
 fs/iomap/buffered-io.c           |   2 +-
 include/linux/memcontrol.h       | 242 +++++++++++++++++++++++++++++--
 include/linux/mm.h               |  22 ---
 include/linux/mm_types.h         |   5 +-
 include/linux/page-flags.h       |  11 +-
 include/trace/events/writeback.h |   2 +-
 kernel/fork.c                    |   7 +-
 mm/debug.c                       |   4 +-
 mm/huge_memory.c                 |   4 +-
 mm/memcontrol.c                  | 139 ++++++++----------
 mm/page_alloc.c                  |   8 +-
 mm/page_io.c                     |   6 +-
 mm/slab.h                        |  38 ++---
 mm/workingset.c                  |   2 +-
 15 files changed, 327 insertions(+), 167 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 1/4] mm: memcontrol: use helpers to access page's memcg data
  2020-10-01  0:27 [PATCH v4 0/4] mm: allow mapping accounted kernel pages to userspace Roman Gushchin
@ 2020-10-01  0:27 ` Roman Gushchin
  2020-10-01 13:46   ` Johannes Weiner
  2020-10-01  0:27 ` [PATCH v4 2/4] mm: memcontrol/slab: use helpers to access slab page's memcg_data Roman Gushchin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Roman Gushchin @ 2020-10-01  0:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, linux-kernel,
	linux-mm, kernel-team, Roman Gushchin

Currently there are many open-coded reads and writes of the
page->mem_cgroup pointer, as well as a couple of read helpers,
which are barely used.

It creates an obstacle on a way to reuse some bits of the pointer
for storing additional bits of information. In fact, we already do
this for slab pages, where the last bit indicates that a pointer has
an attached vector of objcg pointers instead of a regular memcg
pointer.

This commits uses 2 existing helpers and introduces 2 new helpers to
converts all raw accesses to page->mem_cgroup to calls of these helpers:
  struct mem_cgroup *page_memcg(struct page *page);
  struct mem_cgroup *page_memcg_rcu(struct page *page);
  struct mem_cgroup *page_memcg_check(struct page *page);
  void set_page_memcg(struct page *page, struct mem_cgroup *memcg);

page_memcg_check() is intended to be used in cases when the page
can be a slab page and have a memcg pointer pointing at objcg vector.
It does check the lowest bit, and if set, returns NULL.
page_memcg() contains a VM_BUG_ON_PAGE() check for the page not
being a slab page. So does set_page_memcg().

To make sure nobody uses a direct access, struct page's
mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
Only new helpers and a couple of slab-accounting related functions
access this field directly.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 fs/buffer.c                      |   2 +-
 fs/iomap/buffered-io.c           |   2 +-
 include/linux/memcontrol.h       | 138 +++++++++++++++++++++++++++++--
 include/linux/mm.h               |  22 -----
 include/linux/mm_types.h         |   5 +-
 include/trace/events/writeback.h |   2 +-
 kernel/fork.c                    |   7 +-
 mm/debug.c                       |   4 +-
 mm/huge_memory.c                 |   4 +-
 mm/memcontrol.c                  | 121 +++++++++++++--------------
 mm/page_alloc.c                  |   4 +-
 mm/page_io.c                     |   6 +-
 mm/slab.h                        |   9 +-
 mm/workingset.c                  |   2 +-
 14 files changed, 208 insertions(+), 120 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index bf4d8037f91b..64564ac7dcc5 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -657,7 +657,7 @@ int __set_page_dirty_buffers(struct page *page)
 		} while (bh != head);
 	}
 	/*
-	 * Lock out page->mem_cgroup migration to keep PageDirty
+	 * Lock out page's memcg migration to keep PageDirty
 	 * synchronized with per-memcg dirty page counters.
 	 */
 	lock_page_memcg(page);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8180061b9e16..f96094a32851 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -650,7 +650,7 @@ iomap_set_page_dirty(struct page *page)
 		return !TestSetPageDirty(page);
 
 	/*
-	 * Lock out page->mem_cgroup migration to keep PageDirty
+	 * Lock out page's memcg migration to keep PageDirty
 	 * synchronized with per-memcg dirty page counters.
 	 */
 	lock_page_memcg(page);
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e391e3c56de5..9609f919350e 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -343,6 +343,98 @@ struct mem_cgroup {
 
 extern struct mem_cgroup *root_mem_cgroup;
 
+/*
+ * page_memcg - get the memory cgroup associated with a page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the memory cgroup associated with the page,
+ * or NULL. This function assumes that the page is known to have a
+ * proper memory cgroup pointer. It's not safe to call this function
+ * against some type of pages, e.g. slab pages or ex-slab pages.
+ *
+ * Any of the following ensures page and memcg binding stability:
+ * - the page lock
+ * - LRU isolation
+ * - lock_page_memcg()
+ * - exclusive reference
+ */
+static inline struct mem_cgroup *page_memcg(struct page *page)
+{
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+	return (struct mem_cgroup *)page->memcg_data;
+}
+
+/*
+ * page_memcg_rcu - locklessly get the memory cgroup associated with a page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the memory cgroup associated with the page,
+ * or NULL. This function assumes that the page is known to have a
+ * proper memory cgroup pointer. It's not safe to call this function
+ * against some type of pages, e.g. slab pages or ex-slab pages.
+ */
+static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
+{
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	return (struct mem_cgroup *)READ_ONCE(page->memcg_data);
+}
+
+/*
+ * page_memcg_check - get the memory cgroup associated with a page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the memory cgroup associated with the page,
+ * or NULL. This function unlike page_memcg() can take any  page
+ * as an argument. It has to be used in cases when it's not known if a page
+ * has an associated memory cgroup pointer or an object cgroups vector.
+ *
+ * Any of the following ensures page and memcg binding stability:
+ * - the page lock
+ * - LRU isolation
+ * - lock_page_memcg()
+ * - exclusive reference
+ */
+static inline struct mem_cgroup *page_memcg_check(struct page *page)
+{
+	/*
+	 * Because page->memcg_data might be changed asynchronously
+	 * for slab pages, READ_ONCE() should be used here.
+	 */
+	unsigned long memcg_data = READ_ONCE(page->memcg_data);
+
+	/*
+	 * The lowest bit set means that memcg isn't a valid
+	 * memcg pointer, but a obj_cgroups pointer.
+	 * In this case the page is shared and doesn't belong
+	 * to any specific memory cgroup.
+	 */
+	if (memcg_data & 0x1UL)
+		return NULL;
+
+	return (struct mem_cgroup *)memcg_data;
+}
+
+/*
+ * set_page_memcg - associate a page with a memory cgroup
+ * @page: a pointer to the page struct
+ * @memcg: a pointer to the memory cgroup
+ *
+ * Associates a page with a memory cgroup.
+ */
+static inline void set_page_memcg(struct page *page, struct mem_cgroup *memcg)
+{
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+
+	/*
+	 * Please, refer to page_memcg()'s description for the page and memcg
+	 * binding stability requirements.
+	 */
+	page->memcg_data = (unsigned long)memcg;
+}
+
+
 static __always_inline bool memcg_stat_item_in_bytes(int idx)
 {
 	if (idx == MEMCG_PERCPU_B)
@@ -743,15 +835,19 @@ static inline void mod_memcg_state(struct mem_cgroup *memcg,
 static inline void __mod_memcg_page_state(struct page *page,
 					  int idx, int val)
 {
-	if (page->mem_cgroup)
-		__mod_memcg_state(page->mem_cgroup, idx, val);
+	struct mem_cgroup *memcg = page_memcg(page);
+
+	if (memcg)
+		__mod_memcg_state(memcg, idx, val);
 }
 
 static inline void mod_memcg_page_state(struct page *page,
 					int idx, int val)
 {
-	if (page->mem_cgroup)
-		mod_memcg_state(page->mem_cgroup, idx, val);
+	struct mem_cgroup *memcg = page_memcg(page);
+
+	if (memcg)
+		mod_memcg_state(memcg, idx, val);
 }
 
 static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
@@ -834,16 +930,17 @@ static inline void __mod_lruvec_page_state(struct page *page,
 					   enum node_stat_item idx, int val)
 {
 	struct page *head = compound_head(page); /* rmap on tail pages */
+	struct mem_cgroup *memcg = page_memcg(head);
 	pg_data_t *pgdat = page_pgdat(page);
 	struct lruvec *lruvec;
 
 	/* Untracked pages have no memcg, no lruvec. Update only the node */
-	if (!head->mem_cgroup) {
+	if (!memcg) {
 		__mod_node_page_state(pgdat, idx, val);
 		return;
 	}
 
-	lruvec = mem_cgroup_lruvec(head->mem_cgroup, pgdat);
+	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 	__mod_lruvec_state(lruvec, idx, val);
 }
 
@@ -878,8 +975,10 @@ static inline void count_memcg_events(struct mem_cgroup *memcg,
 static inline void count_memcg_page_event(struct page *page,
 					  enum vm_event_item idx)
 {
-	if (page->mem_cgroup)
-		count_memcg_events(page->mem_cgroup, idx, 1);
+	struct mem_cgroup *memcg = page_memcg(page);
+
+	if (memcg)
+		count_memcg_events(memcg, idx, 1);
 }
 
 static inline void count_memcg_event_mm(struct mm_struct *mm,
@@ -941,6 +1040,27 @@ void mem_cgroup_split_huge_fixup(struct page *head);
 
 struct mem_cgroup;
 
+static inline struct mem_cgroup *page_memcg(struct page *page)
+{
+	return NULL;
+}
+
+static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
+{
+	WARN_ON_ONCE(!rcu_read_lock_held());
+	return NULL;
+}
+
+static inline struct mem_cgroup *page_memcg_check(struct page *page)
+{
+	return NULL;
+}
+
+static inline void set_page_memcg(struct page *page,
+				       struct mem_cgroup *memcg)
+{
+}
+
 static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg)
 {
 	return true;
@@ -1430,7 +1550,7 @@ static inline void mem_cgroup_track_foreign_dirty(struct page *page,
 	if (mem_cgroup_disabled())
 		return;
 
-	if (unlikely(&page->mem_cgroup->css != wb->memcg_css))
+	if (unlikely(&page_memcg(page)->css != wb->memcg_css))
 		mem_cgroup_track_foreign_dirty_slowpath(page, wb);
 }
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 27c64d0d7520..6707801b014d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1484,28 +1484,6 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
 #endif
 }
 
-#ifdef CONFIG_MEMCG
-static inline struct mem_cgroup *page_memcg(struct page *page)
-{
-	return page->mem_cgroup;
-}
-static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
-{
-	WARN_ON_ONCE(!rcu_read_lock_held());
-	return READ_ONCE(page->mem_cgroup);
-}
-#else
-static inline struct mem_cgroup *page_memcg(struct page *page)
-{
-	return NULL;
-}
-static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
-{
-	WARN_ON_ONCE(!rcu_read_lock_held());
-	return NULL;
-}
-#endif
-
 /*
  * Some inline functions in vmstat.h depend on page_zone()
  */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5a9238f6caad..80f5d755c037 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -199,10 +199,7 @@ struct page {
 	atomic_t _refcount;
 
 #ifdef CONFIG_MEMCG
-	union {
-		struct mem_cgroup *mem_cgroup;
-		struct obj_cgroup **obj_cgroups;
-	};
+	unsigned long memcg_data;
 #endif
 
 	/*
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index e7cbccc7c14c..39a40dfb578a 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -257,7 +257,7 @@ TRACE_EVENT(track_foreign_dirty,
 		__entry->ino		= inode ? inode->i_ino : 0;
 		__entry->memcg_id	= wb->memcg_css->id;
 		__entry->cgroup_ino	= __trace_wb_assign_cgroup(wb);
-		__entry->page_cgroup_ino = cgroup_ino(page->mem_cgroup->css.cgroup);
+		__entry->page_cgroup_ino = cgroup_ino(page_memcg(page)->css.cgroup);
 	),
 
 	TP_printk("bdi %s[%llu]: ino=%lu memcg_id=%u cgroup_ino=%lu page_cgroup_ino=%lu",
diff --git a/kernel/fork.c b/kernel/fork.c
index 9be04e54de7f..74e5776f2cea 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -403,9 +403,10 @@ static int memcg_charge_kernel_stack(struct task_struct *tsk)
 
 		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) {
 			/*
-			 * If memcg_kmem_charge_page() fails, page->mem_cgroup
-			 * pointer is NULL, and memcg_kmem_uncharge_page() in
-			 * free_thread_stack() will ignore this page.
+			 * If memcg_kmem_charge_page() fails, page's
+			 * memory cgroup pointer is NULL, and
+			 * memcg_kmem_uncharge_page() in free_thread_stack()
+			 * will ignore this page.
 			 */
 			ret = memcg_kmem_charge_page(vm->pages[i], GFP_KERNEL,
 						     0);
diff --git a/mm/debug.c b/mm/debug.c
index ccca576b2899..8a40b3fefbeb 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -182,8 +182,8 @@ void __dump_page(struct page *page, const char *reason)
 		pr_warn("page dumped because: %s\n", reason);
 
 #ifdef CONFIG_MEMCG
-	if (!page_poisoned && page->mem_cgroup)
-		pr_warn("page->mem_cgroup:%px\n", page->mem_cgroup);
+	if (!page_poisoned && page->memcg_data)
+		pr_warn("pages's memcg:%lx\n", page->memcg_data);
 #endif
 }
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index cba3812a5c3e..999f8a4dcb56 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -470,7 +470,7 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 #ifdef CONFIG_MEMCG
 static inline struct deferred_split *get_deferred_split_queue(struct page *page)
 {
-	struct mem_cgroup *memcg = compound_head(page)->mem_cgroup;
+	struct mem_cgroup *memcg = page_memcg(compound_head(page));
 	struct pglist_data *pgdat = NODE_DATA(page_to_nid(page));
 
 	if (memcg)
@@ -2766,7 +2766,7 @@ void deferred_split_huge_page(struct page *page)
 {
 	struct deferred_split *ds_queue = get_deferred_split_queue(page);
 #ifdef CONFIG_MEMCG
-	struct mem_cgroup *memcg = compound_head(page)->mem_cgroup;
+	struct mem_cgroup *memcg = page_memcg(compound_head(page));
 #endif
 	unsigned long flags;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 888cbc48311f..e97454623c62 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -533,7 +533,7 @@ struct cgroup_subsys_state *mem_cgroup_css_from_page(struct page *page)
 {
 	struct mem_cgroup *memcg;
 
-	memcg = page->mem_cgroup;
+	memcg = page_memcg(page);
 
 	if (!memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		memcg = root_mem_cgroup;
@@ -560,16 +560,7 @@ ino_t page_cgroup_ino(struct page *page)
 	unsigned long ino = 0;
 
 	rcu_read_lock();
-	memcg = page->mem_cgroup;
-
-	/*
-	 * The lowest bit set means that memcg isn't a valid
-	 * memcg pointer, but a obj_cgroups pointer.
-	 * In this case the page is shared and doesn't belong
-	 * to any specific memory cgroup.
-	 */
-	if ((unsigned long) memcg & 0x1UL)
-		memcg = NULL;
+	memcg = page_memcg_check(page);
 
 	while (memcg && !(memcg->css.flags & CSS_ONLINE))
 		memcg = parent_mem_cgroup(memcg);
@@ -1050,7 +1041,7 @@ EXPORT_SYMBOL(get_mem_cgroup_from_mm);
  */
 struct mem_cgroup *get_mem_cgroup_from_page(struct page *page)
 {
-	struct mem_cgroup *memcg = page->mem_cgroup;
+	struct mem_cgroup *memcg = page_memcg(page);
 
 	if (mem_cgroup_disabled())
 		return NULL;
@@ -1349,7 +1340,7 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
 		goto out;
 	}
 
-	memcg = page->mem_cgroup;
+	memcg = page_memcg(page);
 	/*
 	 * Swapcache readahead pages are added to the LRU - and
 	 * possibly migrated - before they are charged.
@@ -2109,7 +2100,7 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg)
 }
 
 /**
- * lock_page_memcg - lock a page->mem_cgroup binding
+ * lock_page_memcg - lock a page and memcg binding
  * @page: the page
  *
  * This function protects unlocked LRU pages from being moved to
@@ -2141,7 +2132,7 @@ struct mem_cgroup *lock_page_memcg(struct page *page)
 	if (mem_cgroup_disabled())
 		return NULL;
 again:
-	memcg = head->mem_cgroup;
+	memcg = page_memcg(head);
 	if (unlikely(!memcg))
 		return NULL;
 
@@ -2149,7 +2140,7 @@ struct mem_cgroup *lock_page_memcg(struct page *page)
 		return memcg;
 
 	spin_lock_irqsave(&memcg->move_lock, flags);
-	if (memcg != head->mem_cgroup) {
+	if (memcg != page_memcg(head)) {
 		spin_unlock_irqrestore(&memcg->move_lock, flags);
 		goto again;
 	}
@@ -2187,14 +2178,14 @@ void __unlock_page_memcg(struct mem_cgroup *memcg)
 }
 
 /**
- * unlock_page_memcg - unlock a page->mem_cgroup binding
+ * unlock_page_memcg - unlock a page and memcg binding
  * @page: the page
  */
 void unlock_page_memcg(struct page *page)
 {
 	struct page *head = compound_head(page);
 
-	__unlock_page_memcg(head->mem_cgroup);
+	__unlock_page_memcg(page_memcg(head));
 }
 EXPORT_SYMBOL(unlock_page_memcg);
 
@@ -2884,7 +2875,7 @@ static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
 
 static void commit_charge(struct page *page, struct mem_cgroup *memcg)
 {
-	VM_BUG_ON_PAGE(page->mem_cgroup, page);
+	VM_BUG_ON_PAGE(page_memcg(page), page);
 	/*
 	 * Any of the following ensures page->mem_cgroup stability:
 	 *
@@ -2893,7 +2884,7 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg)
 	 * - lock_page_memcg()
 	 * - exclusive reference
 	 */
-	page->mem_cgroup = memcg;
+	set_page_memcg(page, memcg);
 }
 
 #ifdef CONFIG_MEMCG_KMEM
@@ -2908,8 +2899,7 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 	if (!vec)
 		return -ENOMEM;
 
-	if (cmpxchg(&page->obj_cgroups, NULL,
-		    (struct obj_cgroup **) ((unsigned long)vec | 0x1UL)))
+	if (cmpxchg(&page->memcg_data, 0, (unsigned long)vec | 0x1UL))
 		kfree(vec);
 	else
 		kmemleak_not_leak(vec);
@@ -2920,6 +2910,12 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 /*
  * Returns a pointer to the memory cgroup to which the kernel object is charged.
  *
+ * A passed kernel object can be a slab object or a generic kernel page, so
+ * different mechanisms for getting the memory cgroup pointer should be used.
+ * In certain cases (e.g. kernel stacks or large kmallocs with SLUB) the caller
+ * can not know for sure how the kernel object is implemented.
+ * mem_cgroup_from_obj() can be safely used in such cases.
+ *
  * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
  * cgroup_mutex, etc.
  */
@@ -2932,17 +2928,6 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
 
 	page = virt_to_head_page(p);
 
-	/*
-	 * If page->mem_cgroup is set, it's either a simple mem_cgroup pointer
-	 * or a pointer to obj_cgroup vector. In the latter case the lowest
-	 * bit of the pointer is set.
-	 * The page->mem_cgroup pointer can be asynchronously changed
-	 * from NULL to (obj_cgroup_vec | 0x1UL), but can't be changed
-	 * from a valid memcg pointer to objcg vector or back.
-	 */
-	if (!page->mem_cgroup)
-		return NULL;
-
 	/*
 	 * Slab objects are accounted individually, not per-page.
 	 * Memcg membership data for each individual object is saved in
@@ -2960,8 +2945,14 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
 		return NULL;
 	}
 
-	/* All other pages use page->mem_cgroup */
-	return page->mem_cgroup;
+	/*
+	 * page_memcg_check() is used here, because page_has_obj_cgroups()
+	 * check above could fail because the object cgroups vector wasn't set
+	 * at that moment, but it can be set concurrently.
+	 * page_memcg_check(page) will guarantee that a proper memory
+	 * cgroup pointer or NULL will be returned.
+	 */
+	return page_memcg_check(page);
 }
 
 __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
@@ -3099,7 +3090,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
 	if (memcg && !mem_cgroup_is_root(memcg)) {
 		ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
 		if (!ret) {
-			page->mem_cgroup = memcg;
+			set_page_memcg(page, memcg);
 			__SetPageKmemcg(page);
 			return 0;
 		}
@@ -3115,7 +3106,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
  */
 void __memcg_kmem_uncharge_page(struct page *page, int order)
 {
-	struct mem_cgroup *memcg = page->mem_cgroup;
+	struct mem_cgroup *memcg = page_memcg(page);
 	unsigned int nr_pages = 1 << order;
 
 	if (!memcg)
@@ -3123,7 +3114,7 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
 
 	VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
 	__memcg_kmem_uncharge(memcg, nr_pages);
-	page->mem_cgroup = NULL;
+	page->memcg_data = 0;
 	css_put(&memcg->css);
 
 	/* slab pages do not have PageKmemcg flag set */
@@ -3274,7 +3265,7 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
  */
 void mem_cgroup_split_huge_fixup(struct page *head)
 {
-	struct mem_cgroup *memcg = head->mem_cgroup;
+	struct mem_cgroup *memcg = page_memcg(head);
 	int i;
 
 	if (mem_cgroup_disabled())
@@ -3282,7 +3273,7 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 
 	for (i = 1; i < HPAGE_PMD_NR; i++) {
 		css_get(&memcg->css);
-		head[i].mem_cgroup = memcg;
+		set_page_memcg(&head[i], memcg);
 	}
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
@@ -4658,7 +4649,7 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages,
 void mem_cgroup_track_foreign_dirty_slowpath(struct page *page,
 					     struct bdi_writeback *wb)
 {
-	struct mem_cgroup *memcg = page->mem_cgroup;
+	struct mem_cgroup *memcg = page_memcg(page);
 	struct memcg_cgwb_frn *frn;
 	u64 now = get_jiffies_64();
 	u64 oldest_at = now;
@@ -5627,14 +5618,14 @@ static int mem_cgroup_move_account(struct page *page,
 
 	/*
 	 * Prevent mem_cgroup_migrate() from looking at
-	 * page->mem_cgroup of its source page while we change it.
+	 * page's memory cgroup of its source page while we change it.
 	 */
 	ret = -EBUSY;
 	if (!trylock_page(page))
 		goto out;
 
 	ret = -EINVAL;
-	if (page->mem_cgroup != from)
+	if (page_memcg(page) != from)
 		goto out_unlock;
 
 	pgdat = page_pgdat(page);
@@ -5689,13 +5680,13 @@ static int mem_cgroup_move_account(struct page *page,
 	/*
 	 * All state has been migrated, let's switch to the new memcg.
 	 *
-	 * It is safe to change page->mem_cgroup here because the page
+	 * It is safe to change page's memcg here because the page
 	 * is referenced, charged, isolated, and locked: we can't race
 	 * with (un)charging, migration, LRU putback, or anything else
-	 * that would rely on a stable page->mem_cgroup.
+	 * that would rely on a stable page's memory cgroup.
 	 *
 	 * Note that lock_page_memcg is a memcg lock, not a page lock,
-	 * to save space. As soon as we switch page->mem_cgroup to a
+	 * to save space. As soon as we switch page's memory cgroup to a
 	 * new memcg that isn't locked, the above state can change
 	 * concurrently again. Make sure we're truly done with it.
 	 */
@@ -5704,7 +5695,7 @@ static int mem_cgroup_move_account(struct page *page,
 	css_get(&to->css);
 	css_put(&from->css);
 
-	page->mem_cgroup = to;
+	set_page_memcg(page, to);
 
 	__unlock_page_memcg(from);
 
@@ -5770,7 +5761,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
 		 * mem_cgroup_move_account() checks the page is valid or
 		 * not under LRU exclusion.
 		 */
-		if (page->mem_cgroup == mc.from) {
+		if (page_memcg(page) == mc.from) {
 			ret = MC_TARGET_PAGE;
 			if (is_device_private_page(page))
 				ret = MC_TARGET_DEVICE;
@@ -5814,7 +5805,7 @@ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
 	VM_BUG_ON_PAGE(!page || !PageHead(page), page);
 	if (!(mc.flags & MOVE_ANON))
 		return ret;
-	if (page->mem_cgroup == mc.from) {
+	if (page_memcg(page) == mc.from) {
 		ret = MC_TARGET_PAGE;
 		if (target) {
 			get_page(page);
@@ -6760,12 +6751,12 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
 		/*
 		 * Every swap fault against a single page tries to charge the
 		 * page, bail as early as possible.  shmem_unuse() encounters
-		 * already charged pages, too.  page->mem_cgroup is protected
-		 * by the page lock, which serializes swap cache removal, which
-		 * in turn serializes uncharging.
+		 * already charged pages, too.  page and memcg binding is
+		 * protected by the page lock, which serializes swap cache
+		 * removal, which in turn serializes uncharging.
 		 */
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
-		if (compound_head(page)->mem_cgroup)
+		if (page_memcg(compound_head(page)))
 			goto out;
 
 		id = lookup_swap_cgroup_id(ent);
@@ -6849,21 +6840,21 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 
-	if (!page->mem_cgroup)
+	if (!page_memcg(page))
 		return;
 
 	/*
 	 * Nobody should be changing or seriously looking at
-	 * page->mem_cgroup at this point, we have fully
+	 * page_memcg(page) at this point, we have fully
 	 * exclusive access to the page.
 	 */
 
-	if (ug->memcg != page->mem_cgroup) {
+	if (ug->memcg != page_memcg(page)) {
 		if (ug->memcg) {
 			uncharge_batch(ug);
 			uncharge_gather_clear(ug);
 		}
-		ug->memcg = page->mem_cgroup;
+		ug->memcg = page_memcg(page);
 
 		/* pairs with css_put in uncharge_batch */
 		css_get(&ug->memcg->css);
@@ -6880,7 +6871,7 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 	}
 
 	ug->dummy_page = page;
-	page->mem_cgroup = NULL;
+	page->memcg_data = 0;
 	css_put(&ug->memcg->css);
 }
 
@@ -6923,7 +6914,7 @@ void mem_cgroup_uncharge(struct page *page)
 		return;
 
 	/* Don't touch page->lru of any random page, pre-check: */
-	if (!page->mem_cgroup)
+	if (!page_memcg(page))
 		return;
 
 	uncharge_gather_clear(&ug);
@@ -6973,11 +6964,11 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage)
 		return;
 
 	/* Page cache replacement: new page already charged? */
-	if (newpage->mem_cgroup)
+	if (page_memcg(newpage))
 		return;
 
 	/* Swapcache readahead pages can get replaced before being charged */
-	memcg = oldpage->mem_cgroup;
+	memcg = page_memcg(oldpage);
 	if (!memcg)
 		return;
 
@@ -7172,7 +7163,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
 	if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return;
 
-	memcg = page->mem_cgroup;
+	memcg = page_memcg(page);
 
 	/* Readahead page, never charged */
 	if (!memcg)
@@ -7193,7 +7184,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
 	VM_BUG_ON_PAGE(oldid, page);
 	mod_memcg_state(swap_memcg, MEMCG_SWAP, nr_entries);
 
-	page->mem_cgroup = NULL;
+	page->memcg_data = 0;
 
 	if (!mem_cgroup_is_root(memcg))
 		page_counter_uncharge(&memcg->memory, nr_entries);
@@ -7236,7 +7227,7 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return 0;
 
-	memcg = page->mem_cgroup;
+	memcg = page_memcg(page);
 
 	/* Readahead page, never charged */
 	if (!memcg)
@@ -7317,7 +7308,7 @@ bool mem_cgroup_swap_full(struct page *page)
 	if (cgroup_memory_noswap || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return false;
 
-	memcg = page->mem_cgroup;
+	memcg = page_memcg(page);
 	if (!memcg)
 		return false;
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index eb0962976a0e..e9f0fe4a143e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1057,7 +1057,7 @@ static inline bool page_expected_state(struct page *page,
 	if (unlikely((unsigned long)page->mapping |
 			page_ref_count(page) |
 #ifdef CONFIG_MEMCG
-			(unsigned long)page->mem_cgroup |
+			(unsigned long)page_memcg(page) |
 #endif
 			(page->flags & check_flags)))
 		return false;
@@ -1082,7 +1082,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
 			bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set";
 	}
 #ifdef CONFIG_MEMCG
-	if (unlikely(page->mem_cgroup))
+	if (unlikely(page_memcg(page)))
 		bad_reason = "page still charged to cgroup";
 #endif
 	return bad_reason;
diff --git a/mm/page_io.c b/mm/page_io.c
index 433df1263349..9bca17ecc4df 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -291,12 +291,14 @@ static inline void count_swpout_vm_event(struct page *page)
 static void bio_associate_blkg_from_page(struct bio *bio, struct page *page)
 {
 	struct cgroup_subsys_state *css;
+	struct mem_cgroup *memcg;
 
-	if (!page->mem_cgroup)
+	memcg = page_memcg(page);
+	if (!memcg)
 		return;
 
 	rcu_read_lock();
-	css = cgroup_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys);
+	css = cgroup_e_css(memcg->css.cgroup, &io_cgrp_subsys);
 	bio_associate_blkg_from_css(bio, css);
 	rcu_read_unlock();
 }
diff --git a/mm/slab.h b/mm/slab.h
index 4a24e1702923..5ac89260f329 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -242,18 +242,17 @@ static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t fla
 static inline struct obj_cgroup **page_obj_cgroups(struct page *page)
 {
 	/*
-	 * page->mem_cgroup and page->obj_cgroups are sharing the same
+	 * Page's memory cgroup and obj_cgroups vector are sharing the same
 	 * space. To distinguish between them in case we don't know for sure
 	 * that the page is a slab page (e.g. page_cgroup_ino()), let's
 	 * always set the lowest bit of obj_cgroups.
 	 */
-	return (struct obj_cgroup **)
-		((unsigned long)page->obj_cgroups & ~0x1UL);
+	return (struct obj_cgroup **)(page->memcg_data & ~0x1UL);
 }
 
 static inline bool page_has_obj_cgroups(struct page *page)
 {
-	return ((unsigned long)page->obj_cgroups & 0x1UL);
+	return page->memcg_data & 0x1UL;
 }
 
 int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
@@ -262,7 +261,7 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 static inline void memcg_free_page_obj_cgroups(struct page *page)
 {
 	kfree(page_obj_cgroups(page));
-	page->obj_cgroups = NULL;
+	page->memcg_data = 0;
 }
 
 static inline size_t obj_full_size(struct kmem_cache *s)
diff --git a/mm/workingset.c b/mm/workingset.c
index 8ed8e6296d8c..ef563f94a0e3 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -257,7 +257,7 @@ void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg)
 	struct lruvec *lruvec;
 	int memcgid;
 
-	/* Page is fully exclusive and pins page->mem_cgroup */
+	/* Page is fully exclusive and pins page's memory cgroup pointer */
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 	VM_BUG_ON_PAGE(page_count(page), page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 2/4] mm: memcontrol/slab: use helpers to access slab page's memcg_data
  2020-10-01  0:27 [PATCH v4 0/4] mm: allow mapping accounted kernel pages to userspace Roman Gushchin
  2020-10-01  0:27 ` [PATCH v4 1/4] mm: memcontrol: use helpers to access page's memcg data Roman Gushchin
@ 2020-10-01  0:27 ` Roman Gushchin
  2020-10-01  0:27 ` [PATCH v4 3/4] mm: introduce page memcg flags Roman Gushchin
  2020-10-01  0:27 ` [PATCH v4 4/4] mm: convert page kmemcg type to a page memcg flag Roman Gushchin
  3 siblings, 0 replies; 17+ messages in thread
From: Roman Gushchin @ 2020-10-01  0:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, linux-kernel,
	linux-mm, kernel-team, Roman Gushchin

To gather all direct accesses to struct page's memcg_data field
in one place, let's introduce 3 new helpers to use in the slab
accounting code:
  struct obj_cgroup **page_objcgs(struct page *page);
  struct obj_cgroup **page_objcgs_check(struct page *page);
  bool set_page_objcgs(struct page *page, struct obj_cgroup **objcgs);

They are similar to the corresponding API for generic pages, except
that the setter can return false, indicating that the value has been
already set from a different thread.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 64 ++++++++++++++++++++++++++++++++++++++
 mm/memcontrol.c            |  6 ++--
 mm/slab.h                  | 35 +++++----------------
 3 files changed, 75 insertions(+), 30 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 9609f919350e..7e3da7cdc1ba 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -435,6 +435,70 @@ static inline void set_page_memcg(struct page *page, struct mem_cgroup *memcg)
 }
 
 
+#ifdef CONFIG_MEMCG_KMEM
+/*
+ * page_objcgs - get the object cgroups vector associated with a page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the object cgroups vector associated with the page,
+ * or NULL. This function assumes that the page is known to have an
+ * associated object cgroups vector. It's not safe to call this function
+ * against pages, which might have an associated memory cgroup: e.g.
+ * kernel stack pages.
+ */
+static inline struct obj_cgroup **page_objcgs(struct page *page)
+{
+	return (struct obj_cgroup **)(READ_ONCE(page->memcg_data) & ~0x1UL);
+}
+
+/*
+ * page_objcgs_check - get the object cgroups vector associated with a page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the object cgroups vector associated with the page,
+ * or NULL. This function is safe to use if the page can be directly associated
+ * with a memory cgroup.
+ */
+static inline struct obj_cgroup **page_objcgs_check(struct page *page)
+{
+	unsigned long memcg_data = READ_ONCE(page->memcg_data);
+
+	if (memcg_data && (memcg_data & 0x1UL))
+		return (struct obj_cgroup **)(memcg_data & ~0x1UL);
+
+	return NULL;
+}
+
+/*
+ * set_page_objcgs - associate a page with a object cgroups vector
+ * @page: a pointer to the page struct
+ * @objcgs: a pointer to the object cgroups vector
+ *
+ * Atomically associates a page with a vector of object cgroups.
+ */
+static inline bool set_page_objcgs(struct page *page,
+					struct obj_cgroup **objcgs)
+{
+	return !cmpxchg(&page->memcg_data, 0, (unsigned long)objcgs | 0x1UL);
+}
+#else
+static inline struct obj_cgroup **page_objcgs(struct page *page)
+{
+	return NULL;
+}
+
+static inline struct obj_cgroup **page_objcgs_check(struct page *page)
+{
+	return NULL;
+}
+
+static inline bool set_page_objcgs(struct page *page,
+					struct obj_cgroup **objcgs)
+{
+	return true;
+}
+#endif
+
 static __always_inline bool memcg_stat_item_in_bytes(int idx)
 {
 	if (idx == MEMCG_PERCPU_B)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e97454623c62..7e690424a917 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2899,7 +2899,7 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 	if (!vec)
 		return -ENOMEM;
 
-	if (cmpxchg(&page->memcg_data, 0, (unsigned long)vec | 0x1UL))
+	if (!set_page_objcgs(page, vec))
 		kfree(vec);
 	else
 		kmemleak_not_leak(vec);
@@ -2933,12 +2933,12 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
 	 * Memcg membership data for each individual object is saved in
 	 * the page->obj_cgroups.
 	 */
-	if (page_has_obj_cgroups(page)) {
+	if (page_objcgs_check(page)) {
 		struct obj_cgroup *objcg;
 		unsigned int off;
 
 		off = obj_to_index(page->slab_cache, page, p);
-		objcg = page_obj_cgroups(page)[off];
+		objcg = page_objcgs(page)[off];
 		if (objcg)
 			return obj_cgroup_memcg(objcg);
 
diff --git a/mm/slab.h b/mm/slab.h
index 5ac89260f329..9b70d30d21c7 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -239,28 +239,12 @@ static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t fla
 }
 
 #ifdef CONFIG_MEMCG_KMEM
-static inline struct obj_cgroup **page_obj_cgroups(struct page *page)
-{
-	/*
-	 * Page's memory cgroup and obj_cgroups vector are sharing the same
-	 * space. To distinguish between them in case we don't know for sure
-	 * that the page is a slab page (e.g. page_cgroup_ino()), let's
-	 * always set the lowest bit of obj_cgroups.
-	 */
-	return (struct obj_cgroup **)(page->memcg_data & ~0x1UL);
-}
-
-static inline bool page_has_obj_cgroups(struct page *page)
-{
-	return page->memcg_data & 0x1UL;
-}
-
 int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 				 gfp_t gfp);
 
 static inline void memcg_free_page_obj_cgroups(struct page *page)
 {
-	kfree(page_obj_cgroups(page));
+	kfree(page_objcgs(page));
 	page->memcg_data = 0;
 }
 
@@ -322,7 +306,7 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
 		if (likely(p[i])) {
 			page = virt_to_head_page(p[i]);
 
-			if (!page_has_obj_cgroups(page) &&
+			if (!page_objcgs(page) &&
 			    memcg_alloc_page_obj_cgroups(page, s, flags)) {
 				obj_cgroup_uncharge(objcg, obj_full_size(s));
 				continue;
@@ -330,7 +314,7 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
 
 			off = obj_to_index(s, page, p[i]);
 			obj_cgroup_get(objcg);
-			page_obj_cgroups(page)[off] = objcg;
+			page_objcgs(page)[off] = objcg;
 			mod_objcg_state(objcg, page_pgdat(page),
 					cache_vmstat_idx(s), obj_full_size(s));
 		} else {
@@ -343,18 +327,20 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
 static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page,
 					void *p)
 {
+	struct obj_cgroup **objcgs;
 	struct obj_cgroup *objcg;
 	unsigned int off;
 
 	if (!memcg_kmem_enabled())
 		return;
 
-	if (!page_has_obj_cgroups(page))
+	objcgs = page_objcgs(page);
+	if (!objcgs)
 		return;
 
 	off = obj_to_index(s, page, p);
-	objcg = page_obj_cgroups(page)[off];
-	page_obj_cgroups(page)[off] = NULL;
+	objcg = objcgs[off];
+	objcgs[off] = NULL;
 
 	if (!objcg)
 		return;
@@ -367,11 +353,6 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page,
 }
 
 #else /* CONFIG_MEMCG_KMEM */
-static inline bool page_has_obj_cgroups(struct page *page)
-{
-	return false;
-}
-
 static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr)
 {
 	return NULL;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 3/4] mm: introduce page memcg flags
  2020-10-01  0:27 [PATCH v4 0/4] mm: allow mapping accounted kernel pages to userspace Roman Gushchin
  2020-10-01  0:27 ` [PATCH v4 1/4] mm: memcontrol: use helpers to access page's memcg data Roman Gushchin
  2020-10-01  0:27 ` [PATCH v4 2/4] mm: memcontrol/slab: use helpers to access slab page's memcg_data Roman Gushchin
@ 2020-10-01  0:27 ` Roman Gushchin
  2020-10-01 13:44   ` Johannes Weiner
  2020-10-01  0:27 ` [PATCH v4 4/4] mm: convert page kmemcg type to a page memcg flag Roman Gushchin
  3 siblings, 1 reply; 17+ messages in thread
From: Roman Gushchin @ 2020-10-01  0:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, linux-kernel,
	linux-mm, kernel-team, Roman Gushchin

The lowest bit in page->memcg_data is used to distinguish between
struct memory_cgroup pointer and a pointer to a objcgs array.
All checks and modifications of this bit are open-coded.

Let's formalize it using page memcg flags, defined in enum
page_memcg_data_flags.

Additional flags might be added later.

Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
---
 include/linux/memcontrol.h | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 7e3da7cdc1ba..5e05599e1f74 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -343,6 +343,15 @@ struct mem_cgroup {
 
 extern struct mem_cgroup *root_mem_cgroup;
 
+enum page_memcg_data_flags {
+	/* page->memcg_data is a pointer to an objcgs vector */
+	MEMCG_DATA_OBJCGS = (1UL << 0),
+	/* the next bit after the last actual flag */
+	__NR_MEMCG_DATA_FLAGS  = (1UL << 1),
+};
+
+#define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
+
 /*
  * page_memcg - get the memory cgroup associated with a page
  * @page: a pointer to the page struct
@@ -404,13 +413,7 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
 	 */
 	unsigned long memcg_data = READ_ONCE(page->memcg_data);
 
-	/*
-	 * The lowest bit set means that memcg isn't a valid
-	 * memcg pointer, but a obj_cgroups pointer.
-	 * In this case the page is shared and doesn't belong
-	 * to any specific memory cgroup.
-	 */
-	if (memcg_data & 0x1UL)
+	if (memcg_data & MEMCG_DATA_OBJCGS)
 		return NULL;
 
 	return (struct mem_cgroup *)memcg_data;
@@ -448,7 +451,12 @@ static inline void set_page_memcg(struct page *page, struct mem_cgroup *memcg)
  */
 static inline struct obj_cgroup **page_objcgs(struct page *page)
 {
-	return (struct obj_cgroup **)(READ_ONCE(page->memcg_data) & ~0x1UL);
+	unsigned long memcg_data = READ_ONCE(page->memcg_data);
+
+	VM_BUG_ON_PAGE(memcg_data &&
+		       !(memcg_data & MEMCG_DATA_OBJCGS), page);
+
+	return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
 /*
@@ -463,8 +471,9 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
 {
 	unsigned long memcg_data = READ_ONCE(page->memcg_data);
 
-	if (memcg_data && (memcg_data & 0x1UL))
-		return (struct obj_cgroup **)(memcg_data & ~0x1UL);
+	if (memcg_data && (memcg_data & MEMCG_DATA_OBJCGS))
+		return (struct obj_cgroup **)
+			(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 
 	return NULL;
 }
@@ -479,7 +488,11 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
 static inline bool set_page_objcgs(struct page *page,
 					struct obj_cgroup **objcgs)
 {
-	return !cmpxchg(&page->memcg_data, 0, (unsigned long)objcgs | 0x1UL);
+	unsigned long memcg_data = (unsigned long)objcgs;
+
+	memcg_data |= MEMCG_DATA_OBJCGS;
+
+	return !cmpxchg(&page->memcg_data, 0, memcg_data);
 }
 #else
 static inline struct obj_cgroup **page_objcgs(struct page *page)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 4/4] mm: convert page kmemcg type to a page memcg flag
  2020-10-01  0:27 [PATCH v4 0/4] mm: allow mapping accounted kernel pages to userspace Roman Gushchin
                   ` (2 preceding siblings ...)
  2020-10-01  0:27 ` [PATCH v4 3/4] mm: introduce page memcg flags Roman Gushchin
@ 2020-10-01  0:27 ` Roman Gushchin
  2020-10-01 13:48   ` Johannes Weiner
  2020-10-01 17:00   ` Michal Koutný
  3 siblings, 2 replies; 17+ messages in thread
From: Roman Gushchin @ 2020-10-01  0:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, linux-kernel,
	linux-mm, kernel-team, Roman Gushchin

PageKmemcg flag is currently defined as a page type (like buddy,
offline, table and guard). Semantically it means that the page
was accounted as a kernel memory by the page allocator and has
to be uncharged on the release.

As a side effect of defining the flag as a page type, the accounted
page can't be mapped to userspace (look at page_has_type() and
comments above). In particular, this blocks the accounting of
vmalloc-backed memory used by some bpf maps, because these maps
do map the memory to userspace.

One option is to fix it by complicating the access to page->mapcount,
which provides some free bits for page->page_type.

But it's way better to move this flag into page->memcg_data flags.
Indeed, the flag makes no sense without enabled memory cgroups
and memory cgroup pointer set in particular.

This commit replaces PageKmemcg() and __SetPageKmemcg() with
PageMemcgKmem() and an open-coded OR operation setting the memcg
pointer with the MEMCG_DATA_KMEM bit. __ClearPageKmemcg() can be
simple deleted, as the whole memcg_data is zeroed at once.

As a bonus, on !CONFIG_MEMCG build the PageMemcgKmem() check will
be compiled out.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 include/linux/memcontrol.h | 39 ++++++++++++++++++++++++++++++++------
 include/linux/page-flags.h | 11 ++---------
 mm/memcontrol.c            | 16 +++++-----------
 mm/page_alloc.c            |  4 ++--
 4 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 5e05599e1f74..8c15978294b2 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -346,8 +346,10 @@ extern struct mem_cgroup *root_mem_cgroup;
 enum page_memcg_data_flags {
 	/* page->memcg_data is a pointer to an objcgs vector */
 	MEMCG_DATA_OBJCGS = (1UL << 0),
+	/* page has been accounted as a non-slab kernel page */
+	MEMCG_DATA_KMEM = (1UL << 1),
 	/* the next bit after the last actual flag */
-	__NR_MEMCG_DATA_FLAGS  = (1UL << 1),
+	__NR_MEMCG_DATA_FLAGS  = (1UL << 2),
 };
 
 #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
@@ -369,8 +371,12 @@ enum page_memcg_data_flags {
  */
 static inline struct mem_cgroup *page_memcg(struct page *page)
 {
+	unsigned long memcg_data = page->memcg_data;
+
 	VM_BUG_ON_PAGE(PageSlab(page), page);
-	return (struct mem_cgroup *)page->memcg_data;
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+
+	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
 /*
@@ -416,7 +422,7 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
 	if (memcg_data & MEMCG_DATA_OBJCGS)
 		return NULL;
 
-	return (struct mem_cgroup *)memcg_data;
+	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
 /*
@@ -438,6 +444,20 @@ static inline void set_page_memcg(struct page *page, struct mem_cgroup *memcg)
 }
 
 
+/*
+ * PageMemcgKmem - check if the page has MemcgKmem flag set
+ * @page: a pointer to the page struct
+ *
+ * Checks if the page has MemcgKmem flag set. The caller must ensure that
+ * the page has an associated memory cgroup. It's not safe to call this function
+ * against some types of pages, e.g. slab pages.
+ */
+static inline bool PageMemcgKmem(struct page *page)
+{
+	VM_BUG_ON_PAGE(page->memcg_data & MEMCG_DATA_OBJCGS, page);
+	return page->memcg_data & MEMCG_DATA_KMEM;
+}
+
 #ifdef CONFIG_MEMCG_KMEM
 /*
  * page_objcgs - get the object cgroups vector associated with a page
@@ -453,8 +473,8 @@ static inline struct obj_cgroup **page_objcgs(struct page *page)
 {
 	unsigned long memcg_data = READ_ONCE(page->memcg_data);
 
-	VM_BUG_ON_PAGE(memcg_data &&
-		       !(memcg_data & MEMCG_DATA_OBJCGS), page);
+	VM_BUG_ON_PAGE(memcg_data && !(memcg_data & MEMCG_DATA_OBJCGS), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
 
 	return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
@@ -471,9 +491,11 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
 {
 	unsigned long memcg_data = READ_ONCE(page->memcg_data);
 
-	if (memcg_data && (memcg_data & MEMCG_DATA_OBJCGS))
+	if (memcg_data && (memcg_data & MEMCG_DATA_OBJCGS)) {
+		VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
 		return (struct obj_cgroup **)
 			(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+	}
 
 	return NULL;
 }
@@ -1138,6 +1160,11 @@ static inline void set_page_memcg(struct page *page,
 {
 }
 
+static inline bool PageMemcgKmem(struct page *page)
+{
+	return false;
+}
+
 static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg)
 {
 	return true;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 4f6ba9379112..fc0e1bd48e73 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -715,9 +715,8 @@ PAGEFLAG_FALSE(DoubleMap)
 #define PAGE_MAPCOUNT_RESERVE	-128
 #define PG_buddy	0x00000080
 #define PG_offline	0x00000100
-#define PG_kmemcg	0x00000200
-#define PG_table	0x00000400
-#define PG_guard	0x00000800
+#define PG_table	0x00000200
+#define PG_guard	0x00000400
 
 #define PageType(page, flag)						\
 	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
@@ -768,12 +767,6 @@ PAGE_TYPE_OPS(Buddy, buddy)
  */
 PAGE_TYPE_OPS(Offline, offline)
 
-/*
- * If kmemcg is enabled, the buddy allocator will set PageKmemcg() on
- * pages allocated with __GFP_ACCOUNT. It gets cleared on page free.
- */
-PAGE_TYPE_OPS(Kmemcg, kmemcg)
-
 /*
  * Marks pages in use as page tables.
  */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7e690424a917..ff9bb3e8333a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3090,8 +3090,8 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
 	if (memcg && !mem_cgroup_is_root(memcg)) {
 		ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
 		if (!ret) {
-			set_page_memcg(page, memcg);
-			__SetPageKmemcg(page);
+			page->memcg_data = (unsigned long)memcg |
+				MEMCG_DATA_KMEM;
 			return 0;
 		}
 		css_put(&memcg->css);
@@ -3116,10 +3116,6 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
 	__memcg_kmem_uncharge(memcg, nr_pages);
 	page->memcg_data = 0;
 	css_put(&memcg->css);
-
-	/* slab pages do not have PageKmemcg flag set */
-	if (PageKmemcg(page))
-		__ClearPageKmemcg(page);
 }
 
 static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
@@ -6863,12 +6859,10 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 	nr_pages = compound_nr(page);
 	ug->nr_pages += nr_pages;
 
-	if (!PageKmemcg(page)) {
-		ug->pgpgout++;
-	} else {
+	if (PageMemcgKmem(page))
 		ug->nr_kmem += nr_pages;
-		__ClearPageKmemcg(page);
-	}
+	else
+		ug->pgpgout++;
 
 	ug->dummy_page = page;
 	page->memcg_data = 0;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e9f0fe4a143e..5ebd50183d93 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1179,7 +1179,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 		 * Do not let hwpoison pages hit pcplists/buddy
 		 * Untie memcg state and reset page's owner
 		 */
-		if (memcg_kmem_enabled() && PageKmemcg(page))
+		if (memcg_kmem_enabled() && PageMemcgKmem(page))
 			__memcg_kmem_uncharge_page(page, order);
 		reset_page_owner(page, order);
 		return false;
@@ -1209,7 +1209,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 	}
 	if (PageMappingFlags(page))
 		page->mapping = NULL;
-	if (memcg_kmem_enabled() && PageKmemcg(page))
+	if (memcg_kmem_enabled() && PageMemcgKmem(page))
 		__memcg_kmem_uncharge_page(page, order);
 	if (check_free)
 		bad += check_free_page(page);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 3/4] mm: introduce page memcg flags
  2020-10-01  0:27 ` [PATCH v4 3/4] mm: introduce page memcg flags Roman Gushchin
@ 2020-10-01 13:44   ` Johannes Weiner
  0 siblings, 0 replies; 17+ messages in thread
From: Johannes Weiner @ 2020-10-01 13:44 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, Shakeel Butt, Michal Hocko, linux-kernel,
	linux-mm, kernel-team

On Wed, Sep 30, 2020 at 05:27:09PM -0700, Roman Gushchin wrote:
> @@ -448,7 +451,12 @@ static inline void set_page_memcg(struct page *page, struct mem_cgroup *memcg)
>   */
>  static inline struct obj_cgroup **page_objcgs(struct page *page)
>  {
> -	return (struct obj_cgroup **)(READ_ONCE(page->memcg_data) & ~0x1UL);
> +	unsigned long memcg_data = READ_ONCE(page->memcg_data);
> +
> +	VM_BUG_ON_PAGE(memcg_data &&
> +		       !(memcg_data & MEMCG_DATA_OBJCGS), page);

This fits on a single line.

> +	return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
>  }
>  
>  /*
> @@ -463,8 +471,9 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
>  {
>  	unsigned long memcg_data = READ_ONCE(page->memcg_data);
>  
> -	if (memcg_data && (memcg_data & 0x1UL))
> -		return (struct obj_cgroup **)(memcg_data & ~0x1UL);
> +	if (memcg_data && (memcg_data & MEMCG_DATA_OBJCGS))
> +		return (struct obj_cgroup **)
> +			(memcg_data & ~MEMCG_DATA_FLAGS_MASK);

	if (!memcg_data || (memcg_data & MEMCG_DATA_OBJCGS))
		return NULL;

	return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);

> @@ -479,7 +488,11 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
>  static inline bool set_page_objcgs(struct page *page,
>  					struct obj_cgroup **objcgs)
>  {
> -	return !cmpxchg(&page->memcg_data, 0, (unsigned long)objcgs | 0x1UL);
> +	unsigned long memcg_data = (unsigned long)objcgs;
> +
> +	memcg_data |= MEMCG_DATA_OBJCGS;
> +
> +	return !cmpxchg(&page->memcg_data, 0, memcg_data);

	return !cmpxchg(&page->memcg_data, 0, memcg_data | MEMCG_DATA_OBJCGS);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/4] mm: memcontrol: use helpers to access page's memcg data
  2020-10-01  0:27 ` [PATCH v4 1/4] mm: memcontrol: use helpers to access page's memcg data Roman Gushchin
@ 2020-10-01 13:46   ` Johannes Weiner
  2020-10-01 18:27     ` Roman Gushchin
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Weiner @ 2020-10-01 13:46 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, Shakeel Butt, Michal Hocko, linux-kernel,
	linux-mm, kernel-team

On Wed, Sep 30, 2020 at 05:27:07PM -0700, Roman Gushchin wrote:
> +/*
> + * set_page_memcg - associate a page with a memory cgroup
> + * @page: a pointer to the page struct
> + * @memcg: a pointer to the memory cgroup
> + *
> + * Associates a page with a memory cgroup.
> + */
> +static inline void set_page_memcg(struct page *page, struct mem_cgroup *memcg)
> +{
> +	VM_BUG_ON_PAGE(PageSlab(page), page);
> +
> +	/*
> +	 * Please, refer to page_memcg()'s description for the page and memcg
> +	 * binding stability requirements.
> +	 */
> +	page->memcg_data = (unsigned long)memcg;
> +}

Please delete and inline this as per previous feedback, thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/4] mm: convert page kmemcg type to a page memcg flag
  2020-10-01  0:27 ` [PATCH v4 4/4] mm: convert page kmemcg type to a page memcg flag Roman Gushchin
@ 2020-10-01 13:48   ` Johannes Weiner
  2020-10-01 17:00   ` Michal Koutný
  1 sibling, 0 replies; 17+ messages in thread
From: Johannes Weiner @ 2020-10-01 13:48 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, Shakeel Butt, Michal Hocko, linux-kernel,
	linux-mm, kernel-team

On Wed, Sep 30, 2020 at 05:27:10PM -0700, Roman Gushchin wrote:
> PageKmemcg flag is currently defined as a page type (like buddy,
> offline, table and guard). Semantically it means that the page
> was accounted as a kernel memory by the page allocator and has
> to be uncharged on the release.
> 
> As a side effect of defining the flag as a page type, the accounted
> page can't be mapped to userspace (look at page_has_type() and
> comments above). In particular, this blocks the accounting of
> vmalloc-backed memory used by some bpf maps, because these maps
> do map the memory to userspace.
> 
> One option is to fix it by complicating the access to page->mapcount,
> which provides some free bits for page->page_type.
> 
> But it's way better to move this flag into page->memcg_data flags.
> Indeed, the flag makes no sense without enabled memory cgroups
> and memory cgroup pointer set in particular.
> 
> This commit replaces PageKmemcg() and __SetPageKmemcg() with
> PageMemcgKmem() and an open-coded OR operation setting the memcg
> pointer with the MEMCG_DATA_KMEM bit. __ClearPageKmemcg() can be
> simple deleted, as the whole memcg_data is zeroed at once.
> 
> As a bonus, on !CONFIG_MEMCG build the PageMemcgKmem() check will
> be compiled out.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/4] mm: convert page kmemcg type to a page memcg flag
  2020-10-01  0:27 ` [PATCH v4 4/4] mm: convert page kmemcg type to a page memcg flag Roman Gushchin
  2020-10-01 13:48   ` Johannes Weiner
@ 2020-10-01 17:00   ` Michal Koutný
  2020-10-01 17:27     ` Johannes Weiner
  2020-10-01 18:30     ` Roman Gushchin
  1 sibling, 2 replies; 17+ messages in thread
From: Michal Koutný @ 2020-10-01 17:00 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, Shakeel Butt, Johannes Weiner, Michal Hocko,
	linux-kernel, linux-mm, kernel-team

[-- Attachment #1: Type: text/plain, Size: 837 bytes --]

Hi.

On Wed, Sep 30, 2020 at 05:27:10PM -0700, Roman Gushchin <guro@fb.com> wrote:
> @@ -369,8 +371,12 @@ enum page_memcg_data_flags {
>   */
>  static inline struct mem_cgroup *page_memcg(struct page *page)
>  {
> +	unsigned long memcg_data = page->memcg_data;
> +
>  	VM_BUG_ON_PAGE(PageSlab(page), page);
> -	return (struct mem_cgroup *)page->memcg_data;
> +	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> +
> +	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
>  }
Shouldn't this change go also into page_memcg_rcu()? (I don't think the
current single user (workingset_activation() would pass a non-slab
kernel page but for consistency sake.)

Alternatively, I'm thinking why (in its single use) is there
page_memcg_rcu() a separate function to page_memcg() (cross memcg page
migration?).

Regards,
Michal

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/4] mm: convert page kmemcg type to a page memcg flag
  2020-10-01 17:00   ` Michal Koutný
@ 2020-10-01 17:27     ` Johannes Weiner
  2020-10-02 10:03       ` Michal Koutný
  2020-10-01 18:30     ` Roman Gushchin
  1 sibling, 1 reply; 17+ messages in thread
From: Johannes Weiner @ 2020-10-01 17:27 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Roman Gushchin, Andrew Morton, Shakeel Butt, Michal Hocko,
	linux-kernel, linux-mm, kernel-team

On Thu, Oct 01, 2020 at 07:00:36PM +0200, Michal Koutný wrote:
> Hi.
> 
> On Wed, Sep 30, 2020 at 05:27:10PM -0700, Roman Gushchin <guro@fb.com> wrote:
> > @@ -369,8 +371,12 @@ enum page_memcg_data_flags {
> >   */
> >  static inline struct mem_cgroup *page_memcg(struct page *page)
> >  {
> > +	unsigned long memcg_data = page->memcg_data;
> > +
> >  	VM_BUG_ON_PAGE(PageSlab(page), page);
> > -	return (struct mem_cgroup *)page->memcg_data;
> > +	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> > +
> > +	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> >  }
> Shouldn't this change go also into page_memcg_rcu()? (I don't think the
> current single user (workingset_activation() would pass a non-slab
> kernel page but for consistency sake.)

+1

> Alternatively, I'm thinking why (in its single use) is there
> page_memcg_rcu() a separate function to page_memcg() (cross memcg page
> migration?).

It goes back to commit 55779ec759ccc3c12b917b3712a7716e1140c652.

The activation code is the only path where page migration is not
excluded. Because unlike with page state statistics, we don't really
mind a race when counting an activation event.

I do think there is a bug, though: mem_cgroup_move_account() should
use WRITE_ONCE() on page->mem_cgroup.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/4] mm: memcontrol: use helpers to access page's memcg data
  2020-10-01 13:46   ` Johannes Weiner
@ 2020-10-01 18:27     ` Roman Gushchin
  2020-10-01 18:59       ` Johannes Weiner
  0 siblings, 1 reply; 17+ messages in thread
From: Roman Gushchin @ 2020-10-01 18:27 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Shakeel Butt, Michal Hocko, linux-kernel,
	linux-mm, kernel-team

On Thu, Oct 01, 2020 at 09:46:38AM -0400, Johannes Weiner wrote:
> On Wed, Sep 30, 2020 at 05:27:07PM -0700, Roman Gushchin wrote:
> > +/*
> > + * set_page_memcg - associate a page with a memory cgroup
> > + * @page: a pointer to the page struct
> > + * @memcg: a pointer to the memory cgroup
> > + *
> > + * Associates a page with a memory cgroup.
> > + */
> > +static inline void set_page_memcg(struct page *page, struct mem_cgroup *memcg)
> > +{
> > +	VM_BUG_ON_PAGE(PageSlab(page), page);
> > +
> > +	/*
> > +	 * Please, refer to page_memcg()'s description for the page and memcg
> > +	 * binding stability requirements.
> > +	 */
> > +	page->memcg_data = (unsigned long)memcg;
> > +}
> 
> Please delete and inline this as per previous feedback, thanks.

Why it's better?
It's ok for set_page_memcg(), but obviously worse for set_page_objcgs():
it was nice to have all bit magic in one place, in few helper functions.
And now it spills into several places. What's the win?

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/4] mm: convert page kmemcg type to a page memcg flag
  2020-10-01 17:00   ` Michal Koutný
  2020-10-01 17:27     ` Johannes Weiner
@ 2020-10-01 18:30     ` Roman Gushchin
  1 sibling, 0 replies; 17+ messages in thread
From: Roman Gushchin @ 2020-10-01 18:30 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Andrew Morton, Shakeel Butt, Johannes Weiner, Michal Hocko,
	linux-kernel, linux-mm, kernel-team

On Thu, Oct 01, 2020 at 07:00:36PM +0200, Michal Koutny wrote:
> Hi.
> 
> On Wed, Sep 30, 2020 at 05:27:10PM -0700, Roman Gushchin <guro@fb.com> wrote:
> > @@ -369,8 +371,12 @@ enum page_memcg_data_flags {
> >   */
> >  static inline struct mem_cgroup *page_memcg(struct page *page)
> >  {
> > +	unsigned long memcg_data = page->memcg_data;
> > +
> >  	VM_BUG_ON_PAGE(PageSlab(page), page);
> > -	return (struct mem_cgroup *)page->memcg_data;
> > +	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> > +
> > +	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> >  }
> Shouldn't this change go also into page_memcg_rcu()? (I don't think the
> current single user (workingset_activation() would pass a non-slab
> kernel page but for consistency sake.)

Yes, a good idea, I will add it.

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/4] mm: memcontrol: use helpers to access page's memcg data
  2020-10-01 18:27     ` Roman Gushchin
@ 2020-10-01 18:59       ` Johannes Weiner
  2020-10-01 20:51         ` Roman Gushchin
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Weiner @ 2020-10-01 18:59 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, Shakeel Butt, Michal Hocko, linux-kernel,
	linux-mm, kernel-team

On Thu, Oct 01, 2020 at 11:27:39AM -0700, Roman Gushchin wrote:
> On Thu, Oct 01, 2020 at 09:46:38AM -0400, Johannes Weiner wrote:
> > On Wed, Sep 30, 2020 at 05:27:07PM -0700, Roman Gushchin wrote:
> > > +/*
> > > + * set_page_memcg - associate a page with a memory cgroup
> > > + * @page: a pointer to the page struct
> > > + * @memcg: a pointer to the memory cgroup
> > > + *
> > > + * Associates a page with a memory cgroup.
> > > + */
> > > +static inline void set_page_memcg(struct page *page, struct mem_cgroup *memcg)
> > > +{
> > > +	VM_BUG_ON_PAGE(PageSlab(page), page);
> > > +
> > > +	/*
> > > +	 * Please, refer to page_memcg()'s description for the page and memcg
> > > +	 * binding stability requirements.
> > > +	 */
> > > +	page->memcg_data = (unsigned long)memcg;
> > > +}
> > 
> > Please delete and inline this as per previous feedback, thanks.
> 
> Why it's better?
> It's ok for set_page_memcg(), but obviously worse for set_page_objcgs():
> it was nice to have all bit magic in one place, in few helper functions.
> And now it spills into several places. What's the win?

set_page_objcgs() is a worthwhile abstraction because it includes the
synchronization primitives that make it safe to use wrt
page_objcgs(). They encapsulate the cmpxchg and the READ_ONCE().

set_page_memcg() doesn't do any synchronization and relies fully on
the contextual locking. The name implies that it includes things to
make it safe wrt page_memcg(), which isn't true at all. It's a long
and misleading name for '='.

Btw, I really don't mind having this discussion, but please don't send
revisions that silently ignore feedback you don't agree with.

Thanks

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/4] mm: memcontrol: use helpers to access page's memcg data
  2020-10-01 18:59       ` Johannes Weiner
@ 2020-10-01 20:51         ` Roman Gushchin
  2020-10-02 14:22           ` Johannes Weiner
  0 siblings, 1 reply; 17+ messages in thread
From: Roman Gushchin @ 2020-10-01 20:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Shakeel Butt, Michal Hocko, linux-kernel,
	linux-mm, kernel-team

On Thu, Oct 01, 2020 at 02:59:50PM -0400, Johannes Weiner wrote:
> On Thu, Oct 01, 2020 at 11:27:39AM -0700, Roman Gushchin wrote:
> > On Thu, Oct 01, 2020 at 09:46:38AM -0400, Johannes Weiner wrote:
> > > On Wed, Sep 30, 2020 at 05:27:07PM -0700, Roman Gushchin wrote:
> > > > +/*
> > > > + * set_page_memcg - associate a page with a memory cgroup
> > > > + * @page: a pointer to the page struct
> > > > + * @memcg: a pointer to the memory cgroup
> > > > + *
> > > > + * Associates a page with a memory cgroup.
> > > > + */
> > > > +static inline void set_page_memcg(struct page *page, struct mem_cgroup *memcg)
> > > > +{
> > > > +	VM_BUG_ON_PAGE(PageSlab(page), page);
> > > > +
> > > > +	/*
> > > > +	 * Please, refer to page_memcg()'s description for the page and memcg
> > > > +	 * binding stability requirements.
> > > > +	 */
> > > > +	page->memcg_data = (unsigned long)memcg;
> > > > +}
> > > 
> > > Please delete and inline this as per previous feedback, thanks.
> > 
> > Why it's better?
> > It's ok for set_page_memcg(), but obviously worse for set_page_objcgs():
> > it was nice to have all bit magic in one place, in few helper functions.
> > And now it spills into several places. What's the win?
> 
> set_page_objcgs() is a worthwhile abstraction because it includes the
> synchronization primitives that make it safe to use wrt
> page_objcgs(). They encapsulate the cmpxchg and the READ_ONCE().
> 
> set_page_memcg() doesn't do any synchronization and relies fully on
> the contextual locking. The name implies that it includes things to
> make it safe wrt page_memcg(), which isn't true at all. It's a long
> and misleading name for '='.
> 
> Btw, I really don't mind having this discussion, but please don't send
> revisions that silently ignore feedback you don't agree with.

I'm not ignoring: I thought you was looking to remove clear_page_* functions,
but it wasn't clear you want eliminate set_page_memcg() function. Please, when
you asking for some "style" changes, please, provide some rationale, it's way
less obvious than you think, what exactly you don't like in the proposed version.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/4] mm: convert page kmemcg type to a page memcg flag
  2020-10-01 17:27     ` Johannes Weiner
@ 2020-10-02 10:03       ` Michal Koutný
  2020-10-02 13:25         ` Johannes Weiner
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Koutný @ 2020-10-02 10:03 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Roman Gushchin, Andrew Morton, Shakeel Butt, Michal Hocko,
	linux-kernel, linux-mm, kernel-team

[-- Attachment #1: Type: text/plain, Size: 559 bytes --]

On Thu, Oct 01, 2020 at 01:27:13PM -0400, Johannes Weiner <hannes@cmpxchg.org> wrote:
> The activation code is the only path where page migration is not
> excluded. Because unlike with page state statistics, we don't really
> mind a race when counting an activation event.
Thanks for the explanation. I see why the accessor trio is justified.

> I do think there is a bug, though: mem_cgroup_move_account() should
> use WRITE_ONCE() on page->mem_cgroup.
If this were a bug, wouldn't be the proper approach
rcu_assign_pointer()/rcu_dereference() pair?

Michal

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/4] mm: convert page kmemcg type to a page memcg flag
  2020-10-02 10:03       ` Michal Koutný
@ 2020-10-02 13:25         ` Johannes Weiner
  0 siblings, 0 replies; 17+ messages in thread
From: Johannes Weiner @ 2020-10-02 13:25 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Roman Gushchin, Andrew Morton, Shakeel Butt, Michal Hocko,
	linux-kernel, linux-mm, kernel-team

On Fri, Oct 02, 2020 at 12:03:50PM +0200, Michal Koutný wrote:
> On Thu, Oct 01, 2020 at 01:27:13PM -0400, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > I do think there is a bug, though: mem_cgroup_move_account() should
> > use WRITE_ONCE() on page->mem_cgroup.
> If this were a bug, wouldn't be the proper approach
> rcu_assign_pointer()/rcu_dereference() pair?

Yes, you're right.

I'll double check these code paths and follow up with a patch if
necessary.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/4] mm: memcontrol: use helpers to access page's memcg data
  2020-10-01 20:51         ` Roman Gushchin
@ 2020-10-02 14:22           ` Johannes Weiner
  0 siblings, 0 replies; 17+ messages in thread
From: Johannes Weiner @ 2020-10-02 14:22 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, Shakeel Butt, Michal Hocko, linux-kernel,
	linux-mm, kernel-team

On Thu, Oct 01, 2020 at 01:51:43PM -0700, Roman Gushchin wrote:
> On Thu, Oct 01, 2020 at 02:59:50PM -0400, Johannes Weiner wrote:
> > On Thu, Oct 01, 2020 at 11:27:39AM -0700, Roman Gushchin wrote:
> > > On Thu, Oct 01, 2020 at 09:46:38AM -0400, Johannes Weiner wrote:
> > > > On Wed, Sep 30, 2020 at 05:27:07PM -0700, Roman Gushchin wrote:
> > > > > +/*
> > > > > + * set_page_memcg - associate a page with a memory cgroup
> > > > > + * @page: a pointer to the page struct
> > > > > + * @memcg: a pointer to the memory cgroup
> > > > > + *
> > > > > + * Associates a page with a memory cgroup.
> > > > > + */
> > > > > +static inline void set_page_memcg(struct page *page, struct mem_cgroup *memcg)
> > > > > +{
> > > > > +	VM_BUG_ON_PAGE(PageSlab(page), page);
> > > > > +
> > > > > +	/*
> > > > > +	 * Please, refer to page_memcg()'s description for the page and memcg
> > > > > +	 * binding stability requirements.
> > > > > +	 */
> > > > > +	page->memcg_data = (unsigned long)memcg;
> > > > > +}
> > > > 
> > > > Please delete and inline this as per previous feedback, thanks.
> > > 
> > > Why it's better?
> > > It's ok for set_page_memcg(), but obviously worse for set_page_objcgs():
> > > it was nice to have all bit magic in one place, in few helper functions.
> > > And now it spills into several places. What's the win?
> > 
> > set_page_objcgs() is a worthwhile abstraction because it includes the
> > synchronization primitives that make it safe to use wrt
> > page_objcgs(). They encapsulate the cmpxchg and the READ_ONCE().
> > 
> > set_page_memcg() doesn't do any synchronization and relies fully on
> > the contextual locking. The name implies that it includes things to
> > make it safe wrt page_memcg(), which isn't true at all. It's a long
> > and misleading name for '='.
> > 
> > Btw, I really don't mind having this discussion, but please don't send
> > revisions that silently ignore feedback you don't agree with.
> 
> I'm not ignoring: I thought you was looking to remove clear_page_* functions,
> but it wasn't clear you want eliminate set_page_memcg() function. Please, when
> you asking for some "style" changes, please, provide some rationale, it's way
> less obvious than you think, what exactly you don't like in the proposed version.

https://lore.kernel.org/lkml/20200930210124.GB469663@cmpxchg.org/
mentions and provides a rationale for inlining the setting and
clearing (unnecessary abstraction/encapsulation).

This is a code base shared by many different developers. It's routine
to be asked for style changes that make the new code blend in with its
surroundings, follow precedents and use idiomatic constructs, or avoid
anti-patterns that are known to cause maintenance burden or make it
harder to hack or debug the code.

If I'm being too terse and something isn't clear, please just ask for
clarification.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-10-02 14:24 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-01  0:27 [PATCH v4 0/4] mm: allow mapping accounted kernel pages to userspace Roman Gushchin
2020-10-01  0:27 ` [PATCH v4 1/4] mm: memcontrol: use helpers to access page's memcg data Roman Gushchin
2020-10-01 13:46   ` Johannes Weiner
2020-10-01 18:27     ` Roman Gushchin
2020-10-01 18:59       ` Johannes Weiner
2020-10-01 20:51         ` Roman Gushchin
2020-10-02 14:22           ` Johannes Weiner
2020-10-01  0:27 ` [PATCH v4 2/4] mm: memcontrol/slab: use helpers to access slab page's memcg_data Roman Gushchin
2020-10-01  0:27 ` [PATCH v4 3/4] mm: introduce page memcg flags Roman Gushchin
2020-10-01 13:44   ` Johannes Weiner
2020-10-01  0:27 ` [PATCH v4 4/4] mm: convert page kmemcg type to a page memcg flag Roman Gushchin
2020-10-01 13:48   ` Johannes Weiner
2020-10-01 17:00   ` Michal Koutný
2020-10-01 17:27     ` Johannes Weiner
2020-10-02 10:03       ` Michal Koutný
2020-10-02 13:25         ` Johannes Weiner
2020-10-01 18:30     ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).