linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/4] mm: kmem: kernel memory accounting in an interrupt context
@ 2020-08-27 22:58 Roman Gushchin
  2020-08-27 22:58 ` [PATCH v1 1/4] mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current() Roman Gushchin
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Roman Gushchin @ 2020-08-27 22:58 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: =Shakeel Butt, Johannes Weiner, Michal Hocko, kernel-team,
	linux-kernel, Roman Gushchin

This patchset implements memcg-based memory accounting of
allocations made from an interrupt context.

Historically, such allocations were passed unaccounted mostly
because charging the memory cgroup of the current process wasn't
an option. Also performance reasons were likely a reason too.

The remote charging API allows to temporarily overwrite the
currently active memory cgroup, so that all memory allocations
are accounted towards some specified memory cgroup instead
of the memory cgroup of the current process.

This patchset extends the remote charging API so that it can be
used from an interrupt context. Then it removes the fence that
prevented the accounting of allocations made from an interrupt
context. It also contains a couple of optimizations/code
refactorings.

This patchset doesn't directly enable accounting for any specific
allocations, but prepares the code base for it. The bpf memory
accounting will likely be the first user of it: a typical
example is a bpf program parsing an incoming network packet,
which allocates an entry in hashmap map to store some information.

v1:
  - fixed a typo, by Shakeel

rfc:
  https://lkml.org/lkml/2020/8/27/1155


Roman Gushchin (4):
  mm: kmem: move memcg_kmem_bypass() calls to
    get_mem/obj_cgroup_from_current()
  mm: kmem: remove redundant checks from get_obj_cgroup_from_current()
  mm: kmem: prepare remote memcg charging infra for interrupt contexts
  mm: kmem: enable kernel memcg accounting from interrupt contexts

 include/linux/memcontrol.h | 12 -------
 include/linux/sched/mm.h   | 13 +++++--
 mm/memcontrol.c            | 69 ++++++++++++++++++++++++++++----------
 mm/percpu.c                |  3 +-
 mm/slab.h                  |  3 --
 5 files changed, 63 insertions(+), 37 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v1 1/4] mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current()
  2020-08-27 22:58 [PATCH v1 0/4] mm: kmem: kernel memory accounting in an interrupt context Roman Gushchin
@ 2020-08-27 22:58 ` Roman Gushchin
  2020-08-27 22:58 ` [PATCH v1 2/4] mm: kmem: remove redundant checks from get_obj_cgroup_from_current() Roman Gushchin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Roman Gushchin @ 2020-08-27 22:58 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: =Shakeel Butt, Johannes Weiner, Michal Hocko, kernel-team,
	linux-kernel, Roman Gushchin

Currently memcg_kmem_bypass() is called before obtaining the current
memory/obj cgroup using get_mem/obj_cgroup_from_current(). Moving
memcg_kmem_bypass() into get_mem/obj_cgroup_from_current() reduces
the number of call sites and allows further code simplifications.

Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
---
 mm/memcontrol.c | 13 ++++++++-----
 mm/percpu.c     |  3 +--
 mm/slab.h       |  3 ---
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index dc892a3c4b17..9c08d8d14bc0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1074,6 +1074,9 @@ EXPORT_SYMBOL(get_mem_cgroup_from_page);
  */
 static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void)
 {
+	if (memcg_kmem_bypass())
+		return NULL;
+
 	if (unlikely(current->active_memcg)) {
 		struct mem_cgroup *memcg;
 
@@ -2913,6 +2916,9 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
 	struct obj_cgroup *objcg = NULL;
 	struct mem_cgroup *memcg;
 
+	if (memcg_kmem_bypass())
+		return NULL;
+
 	if (unlikely(!current->mm && !current->active_memcg))
 		return NULL;
 
@@ -3039,19 +3045,16 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
 	struct mem_cgroup *memcg;
 	int ret = 0;
 
-	if (memcg_kmem_bypass())
-		return 0;
-
 	memcg = get_mem_cgroup_from_current();
-	if (!mem_cgroup_is_root(memcg)) {
+	if (memcg && !mem_cgroup_is_root(memcg)) {
 		ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
 		if (!ret) {
 			page->mem_cgroup = memcg;
 			__SetPageKmemcg(page);
 			return 0;
 		}
+		css_put(&memcg->css);
 	}
-	css_put(&memcg->css);
 	return ret;
 }
 
diff --git a/mm/percpu.c b/mm/percpu.c
index f4709629e6de..9b07bd5bc45f 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1584,8 +1584,7 @@ static enum pcpu_chunk_type pcpu_memcg_pre_alloc_hook(size_t size, gfp_t gfp,
 {
 	struct obj_cgroup *objcg;
 
-	if (!memcg_kmem_enabled() || !(gfp & __GFP_ACCOUNT) ||
-	    memcg_kmem_bypass())
+	if (!memcg_kmem_enabled() || !(gfp & __GFP_ACCOUNT))
 		return PCPU_CHUNK_ROOT;
 
 	objcg = get_obj_cgroup_from_current();
diff --git a/mm/slab.h b/mm/slab.h
index 95e5cc1bb2a3..4a24e1702923 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -280,9 +280,6 @@ static inline struct obj_cgroup *memcg_slab_pre_alloc_hook(struct kmem_cache *s,
 {
 	struct obj_cgroup *objcg;
 
-	if (memcg_kmem_bypass())
-		return NULL;
-
 	objcg = get_obj_cgroup_from_current();
 	if (!objcg)
 		return NULL;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v1 2/4] mm: kmem: remove redundant checks from get_obj_cgroup_from_current()
  2020-08-27 22:58 [PATCH v1 0/4] mm: kmem: kernel memory accounting in an interrupt context Roman Gushchin
  2020-08-27 22:58 ` [PATCH v1 1/4] mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current() Roman Gushchin
@ 2020-08-27 22:58 ` Roman Gushchin
  2020-08-27 22:58 ` [PATCH v1 3/4] mm: kmem: prepare remote memcg charging infra for interrupt contexts Roman Gushchin
  2020-08-27 22:58 ` [PATCH v1 4/4] mm: kmem: enable kernel memcg accounting from " Roman Gushchin
  3 siblings, 0 replies; 5+ messages in thread
From: Roman Gushchin @ 2020-08-27 22:58 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: =Shakeel Butt, Johannes Weiner, Michal Hocko, kernel-team,
	linux-kernel, Roman Gushchin

There are checks for current->mm and current->active_memcg
in get_obj_cgroup_from_current(), but these checks are redundant:
memcg_kmem_bypass() called just above performs same checks.

Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
---
 mm/memcontrol.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9c08d8d14bc0..5d847257a639 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2919,9 +2919,6 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
 	if (memcg_kmem_bypass())
 		return NULL;
 
-	if (unlikely(!current->mm && !current->active_memcg))
-		return NULL;
-
 	rcu_read_lock();
 	if (unlikely(current->active_memcg))
 		memcg = rcu_dereference(current->active_memcg);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v1 3/4] mm: kmem: prepare remote memcg charging infra for interrupt contexts
  2020-08-27 22:58 [PATCH v1 0/4] mm: kmem: kernel memory accounting in an interrupt context Roman Gushchin
  2020-08-27 22:58 ` [PATCH v1 1/4] mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current() Roman Gushchin
  2020-08-27 22:58 ` [PATCH v1 2/4] mm: kmem: remove redundant checks from get_obj_cgroup_from_current() Roman Gushchin
@ 2020-08-27 22:58 ` Roman Gushchin
  2020-08-27 22:58 ` [PATCH v1 4/4] mm: kmem: enable kernel memcg accounting from " Roman Gushchin
  3 siblings, 0 replies; 5+ messages in thread
From: Roman Gushchin @ 2020-08-27 22:58 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: =Shakeel Butt, Johannes Weiner, Michal Hocko, kernel-team,
	linux-kernel, Roman Gushchin

Remote memcg charging API uses current->active_memcg to store the
currently active memory cgroup, which overwrites the memory cgroup
of the current process. It works well for normal contexts, but doesn't
work for interrupt contexts: indeed, if an interrupt occurs during
the execution of a section with an active memcg set, all allocations
inside the interrupt will be charged to the active memcg set (given
that we'll enable accounting for allocations from an interrupt
context). But because the interrupt might have no relation to the
active memcg set outside, it's obviously wrong from the accounting
prospective.

To resolve this problem, let's add a global percpu int_active_memcg
variable, which will be used to store an active memory cgroup which
will be used from interrupt contexts. set_active_memcg() will
transparently use current->active_memcg or int_active_memcg depending
on the context.

To make the read part simple and transparent for the caller, let's
introduce two new functions:
  - struct mem_cgroup *active_memcg(void),
  - struct mem_cgroup *get_active_memcg(void).

They are returning the active memcg if it's set, hiding all
implementation details: where to get it depending on the current context.

Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
---
 include/linux/sched/mm.h | 13 +++++++++--
 mm/memcontrol.c          | 48 ++++++++++++++++++++++++++++------------
 2 files changed, 45 insertions(+), 16 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 4c69a4349ac1..030a1cf77b8a 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -304,6 +304,7 @@ static inline void memalloc_nocma_restore(unsigned int flags)
 #endif
 
 #ifdef CONFIG_MEMCG
+DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
 /**
  * set_active_memcg - Starts the remote memcg charging scope.
  * @memcg: memcg to charge.
@@ -318,8 +319,16 @@ static inline void memalloc_nocma_restore(unsigned int flags)
 static inline struct mem_cgroup *
 set_active_memcg(struct mem_cgroup *memcg)
 {
-	struct mem_cgroup *old = current->active_memcg;
-	current->active_memcg = memcg;
+	struct mem_cgroup *old;
+
+	if (in_interrupt()) {
+		old = this_cpu_read(int_active_memcg);
+		this_cpu_write(int_active_memcg, memcg);
+	} else {
+		old = current->active_memcg;
+		current->active_memcg = memcg;
+	}
+
 	return old;
 }
 #else
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5d847257a639..a51a6066079e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -73,6 +73,9 @@ EXPORT_SYMBOL(memory_cgrp_subsys);
 
 struct mem_cgroup *root_mem_cgroup __read_mostly;
 
+/* Active memory cgroup to use from an interrupt context */
+DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg);
+
 /* Socket memory accounting disabled? */
 static bool cgroup_memory_nosocket;
 
@@ -1069,26 +1072,43 @@ struct mem_cgroup *get_mem_cgroup_from_page(struct page *page)
 }
 EXPORT_SYMBOL(get_mem_cgroup_from_page);
 
-/**
- * If current->active_memcg is non-NULL, do not fallback to current->mm->memcg.
- */
-static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void)
+static __always_inline struct mem_cgroup *active_memcg(void)
 {
-	if (memcg_kmem_bypass())
-		return NULL;
+	if (in_interrupt())
+		return this_cpu_read(int_active_memcg);
+	else
+		return current->active_memcg;
+}
 
-	if (unlikely(current->active_memcg)) {
-		struct mem_cgroup *memcg;
+static __always_inline struct mem_cgroup *get_active_memcg(void)
+{
+	struct mem_cgroup *memcg;
 
-		rcu_read_lock();
+	rcu_read_lock();
+	memcg = active_memcg();
+	if (memcg) {
 		/* current->active_memcg must hold a ref. */
-		if (WARN_ON_ONCE(!css_tryget(&current->active_memcg->css)))
+		if (WARN_ON_ONCE(!css_tryget(&memcg->css)))
 			memcg = root_mem_cgroup;
 		else
 			memcg = current->active_memcg;
-		rcu_read_unlock();
-		return memcg;
 	}
+	rcu_read_unlock();
+
+	return memcg;
+}
+
+/**
+ * If active memcg is set, do not fallback to current->mm->memcg.
+ */
+static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void)
+{
+	if (memcg_kmem_bypass())
+		return NULL;
+
+	if (unlikely(active_memcg()))
+		return get_active_memcg();
+
 	return get_mem_cgroup_from_mm(current->mm);
 }
 
@@ -2920,8 +2940,8 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
 		return NULL;
 
 	rcu_read_lock();
-	if (unlikely(current->active_memcg))
-		memcg = rcu_dereference(current->active_memcg);
+	if (unlikely(active_memcg()))
+		memcg = active_memcg();
 	else
 		memcg = mem_cgroup_from_task(current);
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v1 4/4] mm: kmem: enable kernel memcg accounting from interrupt contexts
  2020-08-27 22:58 [PATCH v1 0/4] mm: kmem: kernel memory accounting in an interrupt context Roman Gushchin
                   ` (2 preceding siblings ...)
  2020-08-27 22:58 ` [PATCH v1 3/4] mm: kmem: prepare remote memcg charging infra for interrupt contexts Roman Gushchin
@ 2020-08-27 22:58 ` Roman Gushchin
  3 siblings, 0 replies; 5+ messages in thread
From: Roman Gushchin @ 2020-08-27 22:58 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: =Shakeel Butt, Johannes Weiner, Michal Hocko, kernel-team,
	linux-kernel, Roman Gushchin

If a memcg to charge can be determined (using remote charging API),
there are no reasons to exclude allocations made from an interrupt
context from the accounting.

Such allocations will pass even if the resulting memcg size will
exceed the hard limit, but it will affect the application of the
memory pressure and an inability to put the workload under the limit
will eventually trigger the OOM.

To use active_memcg() helper, memcg_kmem_bypass() is moved back
to memcontrol.c.

Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
---
 include/linux/memcontrol.h | 12 ------------
 mm/memcontrol.c            | 13 +++++++++++++
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d0b036123c6a..924177502479 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1528,18 +1528,6 @@ static inline bool memcg_kmem_enabled(void)
 	return static_branch_likely(&memcg_kmem_enabled_key);
 }
 
-static inline bool memcg_kmem_bypass(void)
-{
-	if (in_interrupt())
-		return true;
-
-	/* Allow remote memcg charging in kthread contexts. */
-	if ((!current->mm || (current->flags & PF_KTHREAD)) &&
-	     !current->active_memcg)
-		return true;
-	return false;
-}
-
 static inline int memcg_kmem_charge_page(struct page *page, gfp_t gfp,
 					 int order)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a51a6066079e..75cd1a1e66c8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1098,6 +1098,19 @@ static __always_inline struct mem_cgroup *get_active_memcg(void)
 	return memcg;
 }
 
+static __always_inline bool memcg_kmem_bypass(void)
+{
+	/* Allow remote memcg charging from any context. */
+	if (unlikely(active_memcg()))
+		return false;
+
+	/* Memcg to charge can't be determined. */
+	if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD))
+		return true;
+
+	return false;
+}
+
 /**
  * If active memcg is set, do not fallback to current->mm->memcg.
  */
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-08-27 22:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-27 22:58 [PATCH v1 0/4] mm: kmem: kernel memory accounting in an interrupt context Roman Gushchin
2020-08-27 22:58 ` [PATCH v1 1/4] mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current() Roman Gushchin
2020-08-27 22:58 ` [PATCH v1 2/4] mm: kmem: remove redundant checks from get_obj_cgroup_from_current() Roman Gushchin
2020-08-27 22:58 ` [PATCH v1 3/4] mm: kmem: prepare remote memcg charging infra for interrupt contexts Roman Gushchin
2020-08-27 22:58 ` [PATCH v1 4/4] mm: kmem: enable kernel memcg accounting from " Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).