linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf
@ 2023-02-05  6:58 Yafang Shao
  2023-02-05  6:58 ` [PATCH bpf-next 1/5] mm: memcontrol: add new kernel parameter cgroup.memory=nobpf Yafang Shao
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Yafang Shao @ 2023-02-05  6:58 UTC (permalink / raw)
  To: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
	roman.gushchin, shakeelb, muchun.song, akpm
  Cc: bpf, cgroups, linux-mm, Yafang Shao

The bpf memory accouting has some known problems in contianer
environment,

- The container memory usage is not consistent if there's pinned bpf
  program
  After the container restart, the leftover bpf programs won't account
  to the new generation, so the memory usage of the container is not
  consistent. This issue can be resolved by introducing selectable
  memcg, but we don't have an agreement on the solution yet. See also
  the discussions at https://lwn.net/Articles/905150/ .

- The leftover non-preallocated bpf map can't be limited
  The leftover bpf map will be reparented, and thus it will be limited by 
  the parent, rather than the container itself. Furthermore, if the
  parent is destroyed, it be will limited by its parent's parent, and so
  on. It can also be resolved by introducing selectable memcg.

- The memory dynamically allocated in bpf prog is charged into root memcg
  only
  Nowdays the bpf prog can dynamically allocate memory, for example via
  bpf_obj_new(), but it only allocate from the global bpf_mem_alloc
  pool, so it will charge into root memcg only. That needs to be
  addressed by a new proposal.

So let's give the user an option to disable bpf memory accouting.

The idea of "cgroup.memory=nobpf" is originally by Tejun[1].

[1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/

Yafang Shao (5):
  mm: memcontrol: add new kernel parameter cgroup.memory=nobpf
  bpf: use bpf_map_kvcalloc in bpf_local_storage
  bpf: introduce bpf_memcg_flags()
  bpf: allow to disable bpf map memory accounting
  bpf: allow to disable bpf prog memory accounting

 Documentation/admin-guide/kernel-parameters.txt |  1 +
 include/linux/bpf.h                             | 16 ++++++++++++++++
 include/linux/memcontrol.h                      | 11 +++++++++++
 kernel/bpf/bpf_local_storage.c                  |  4 ++--
 kernel/bpf/core.c                               | 13 +++++++------
 kernel/bpf/memalloc.c                           |  3 ++-
 kernel/bpf/syscall.c                            | 20 ++++++++++++++++++--
 mm/memcontrol.c                                 | 18 ++++++++++++++++++
 8 files changed, 75 insertions(+), 11 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 1/5] mm: memcontrol: add new kernel parameter cgroup.memory=nobpf
  2023-02-05  6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
@ 2023-02-05  6:58 ` Yafang Shao
  2023-02-05  6:58 ` [PATCH bpf-next 2/5] bpf: use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2023-02-05  6:58 UTC (permalink / raw)
  To: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
	roman.gushchin, shakeelb, muchun.song, akpm
  Cc: bpf, cgroups, linux-mm, Yafang Shao

Add new kernel parameter cgroup.memory=nobpf to allow user disable bpf
memory accounting. This is a preparation for the followup patch.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  1 +
 include/linux/memcontrol.h                      | 11 +++++++++++
 mm/memcontrol.c                                 | 18 ++++++++++++++++++
 3 files changed, 30 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6cfa6e3..29fb41e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -557,6 +557,7 @@
 			Format: <string>
 			nosocket -- Disable socket memory accounting.
 			nokmem -- Disable kernel memory accounting.
+			nobpf -- Disable BPF memory accounting.
 
 	checkreqprot=	[SELINUX] Set initial checkreqprot flag value.
 			Format: { "0" | "1" }
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d3c8203..26d4bf2 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1751,6 +1751,12 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
 int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size);
 void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size);
 
+extern struct static_key_false memcg_bpf_enabled_key;
+static inline bool memcg_bpf_enabled(void)
+{
+	return static_branch_likely(&memcg_bpf_enabled_key);
+}
+
 extern struct static_key_false memcg_kmem_enabled_key;
 
 static inline bool memcg_kmem_enabled(void)
@@ -1829,6 +1835,11 @@ static inline struct obj_cgroup *get_obj_cgroup_from_page(struct page *page)
 	return NULL;
 }
 
+static inline bool memcg_bpf_enabled(void)
+{
+	return false;
+}
+
 static inline bool memcg_kmem_enabled(void)
 {
 	return false;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ab457f0..590526f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -89,6 +89,9 @@
 /* Kernel memory accounting disabled? */
 static bool cgroup_memory_nokmem __ro_after_init;
 
+/* BPF memory accounting disabled? */
+static bool cgroup_memory_nobpf __ro_after_init;
+
 #ifdef CONFIG_CGROUP_WRITEBACK
 static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
 #endif
@@ -348,6 +351,9 @@ static void memcg_reparent_objcgs(struct mem_cgroup *memcg,
  */
 DEFINE_STATIC_KEY_FALSE(memcg_kmem_enabled_key);
 EXPORT_SYMBOL(memcg_kmem_enabled_key);
+
+DEFINE_STATIC_KEY_FALSE(memcg_bpf_enabled_key);
+EXPORT_SYMBOL(memcg_bpf_enabled_key);
 #endif
 
 /**
@@ -5362,6 +5368,11 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
 	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
 		static_branch_inc(&memcg_sockets_enabled_key);
 
+#if defined(CONFIG_MEMCG_KMEM)
+	if (!cgroup_memory_nobpf)
+		static_branch_inc(&memcg_bpf_enabled_key);
+#endif
+
 	return &memcg->css;
 }
 
@@ -5446,6 +5457,11 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_active)
 		static_branch_dec(&memcg_sockets_enabled_key);
 
+#if defined(CONFIG_MEMCG_KMEM)
+	if (!cgroup_memory_nobpf)
+		static_branch_dec(&memcg_bpf_enabled_key);
+#endif
+
 	vmpressure_cleanup(&memcg->vmpressure);
 	cancel_work_sync(&memcg->high_work);
 	mem_cgroup_remove_from_trees(memcg);
@@ -7310,6 +7326,8 @@ static int __init cgroup_memory(char *s)
 			cgroup_memory_nosocket = true;
 		if (!strcmp(token, "nokmem"))
 			cgroup_memory_nokmem = true;
+		if (!strcmp(token, "nobpf"))
+			cgroup_memory_nobpf = true;
 	}
 	return 1;
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 2/5] bpf: use bpf_map_kvcalloc in bpf_local_storage
  2023-02-05  6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
  2023-02-05  6:58 ` [PATCH bpf-next 1/5] mm: memcontrol: add new kernel parameter cgroup.memory=nobpf Yafang Shao
@ 2023-02-05  6:58 ` Yafang Shao
  2023-02-08 19:25   ` Johannes Weiner
  2023-02-05  6:58 ` [PATCH bpf-next 3/5] bpf: introduce bpf_memcg_flags() Yafang Shao
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Yafang Shao @ 2023-02-05  6:58 UTC (permalink / raw)
  To: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
	roman.gushchin, shakeelb, muchun.song, akpm
  Cc: bpf, cgroups, linux-mm, Yafang Shao

Introduce new helper bpf_map_kvcalloc() for this memory allocation. Then
bpf_local_storage will be the same with other map's creation.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/bpf.h            |  8 ++++++++
 kernel/bpf/bpf_local_storage.c |  4 ++--
 kernel/bpf/syscall.c           | 15 +++++++++++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 35c18a9..fe0bf48 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1886,6 +1886,8 @@ int  generic_map_delete_batch(struct bpf_map *map,
 void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
 			   int node);
 void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags);
+void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size,
+		       gfp_t flags);
 void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 				    size_t align, gfp_t flags);
 #else
@@ -1902,6 +1904,12 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 	return kzalloc(size, flags);
 }
 
+static inline void *
+bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size, gfp_t flags)
+{
+	return kvcalloc(n, size, flags);
+}
+
 static inline void __percpu *
 bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, size_t align,
 		     gfp_t flags)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 373c3c2..35f4138 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -568,8 +568,8 @@ static struct bpf_local_storage_map *__bpf_local_storage_map_alloc(union bpf_att
 	nbuckets = max_t(u32, 2, nbuckets);
 	smap->bucket_log = ilog2(nbuckets);
 
-	smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
-				 GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	smap->buckets = bpf_map_kvcalloc(&smap->map, sizeof(*smap->buckets),
+					 nbuckets, GFP_USER | __GFP_NOWARN);
 	if (!smap->buckets) {
 		bpf_map_area_free(smap);
 		return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bcc9761..9d94a35 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -464,6 +464,21 @@ void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags)
 	return ptr;
 }
 
+void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size,
+		       gfp_t flags)
+{
+	struct mem_cgroup *memcg, *old_memcg;
+	void *ptr;
+
+	memcg = bpf_map_get_memcg(map);
+	old_memcg = set_active_memcg(memcg);
+	ptr = kvcalloc(n, size, flags | __GFP_ACCOUNT);
+	set_active_memcg(old_memcg);
+	mem_cgroup_put(memcg);
+
+	return ptr;
+}
+
 void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 				    size_t align, gfp_t flags)
 {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 3/5] bpf: introduce bpf_memcg_flags()
  2023-02-05  6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
  2023-02-05  6:58 ` [PATCH bpf-next 1/5] mm: memcontrol: add new kernel parameter cgroup.memory=nobpf Yafang Shao
  2023-02-05  6:58 ` [PATCH bpf-next 2/5] bpf: use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
@ 2023-02-05  6:58 ` Yafang Shao
  2023-02-05  6:58 ` [PATCH bpf-next 4/5] bpf: allow to disable bpf map memory accounting Yafang Shao
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2023-02-05  6:58 UTC (permalink / raw)
  To: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
	roman.gushchin, shakeelb, muchun.song, akpm
  Cc: bpf, cgroups, linux-mm, Yafang Shao

This new helper will be used in both bpf prog and bpf map.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/bpf.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index fe0bf48..4385418 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -28,6 +28,7 @@
 #include <linux/btf.h>
 #include <linux/rcupdate_trace.h>
 #include <linux/static_call.h>
+#include <linux/memcontrol.h>
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
@@ -2933,4 +2934,11 @@ static inline bool type_is_alloc(u32 type)
 	return type & MEM_ALLOC;
 }
 
+static inline gfp_t bpf_memcg_flags(gfp_t flags)
+{
+	if (memcg_bpf_enabled())
+		return flags | __GFP_ACCOUNT;
+	return flags;
+}
+
 #endif /* _LINUX_BPF_H */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 4/5] bpf: allow to disable bpf map memory accounting
  2023-02-05  6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
                   ` (2 preceding siblings ...)
  2023-02-05  6:58 ` [PATCH bpf-next 3/5] bpf: introduce bpf_memcg_flags() Yafang Shao
@ 2023-02-05  6:58 ` Yafang Shao
  2023-02-05  6:58 ` [PATCH bpf-next 5/5] bpf: allow to disable bpf prog " Yafang Shao
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2023-02-05  6:58 UTC (permalink / raw)
  To: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
	roman.gushchin, shakeelb, muchun.song, akpm
  Cc: bpf, cgroups, linux-mm, Yafang Shao

We can simply set root memcg as the map's memcg to disable bpf memory
accounting. bpf_map_area_alloc is a little special as it gets the memcg
from current rather than from the map, so we need to disable GFP_ACCOUNT
specifically for it.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/memalloc.c | 3 ++-
 kernel/bpf/syscall.c  | 5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index ebcc3dd..6da9051 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -395,7 +395,8 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
 		unit_size = size;
 
 #ifdef CONFIG_MEMCG_KMEM
-		objcg = get_obj_cgroup_from_current();
+		if (memcg_bpf_enabled())
+			objcg = get_obj_cgroup_from_current();
 #endif
 		for_each_possible_cpu(cpu) {
 			c = per_cpu_ptr(pc, cpu);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9d94a35..cda8d00 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -309,7 +309,7 @@ static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
 	 * __GFP_RETRY_MAYFAIL to avoid such situations.
 	 */
 
-	const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_ACCOUNT;
+	gfp_t gfp = bpf_memcg_flags(__GFP_NOWARN | __GFP_ZERO);
 	unsigned int flags = 0;
 	unsigned long align = 1;
 	void *area;
@@ -418,7 +418,8 @@ static void bpf_map_save_memcg(struct bpf_map *map)
 	 * So we have to check map->objcg for being NULL each time it's
 	 * being used.
 	 */
-	map->objcg = get_obj_cgroup_from_current();
+	if (memcg_bpf_enabled())
+		map->objcg = get_obj_cgroup_from_current();
 }
 
 static void bpf_map_release_memcg(struct bpf_map *map)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 5/5] bpf: allow to disable bpf prog memory accounting
  2023-02-05  6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
                   ` (3 preceding siblings ...)
  2023-02-05  6:58 ` [PATCH bpf-next 4/5] bpf: allow to disable bpf map memory accounting Yafang Shao
@ 2023-02-05  6:58 ` Yafang Shao
  2023-02-08 19:29 ` [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Johannes Weiner
  2023-02-08 20:54 ` Roman Gushchin
  6 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2023-02-05  6:58 UTC (permalink / raw)
  To: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
	roman.gushchin, shakeelb, muchun.song, akpm
  Cc: bpf, cgroups, linux-mm, Yafang Shao

We can simply disable the bpf prog memory accouting by not setting the
GFP_ACCOUNT.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/core.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 16da510..3390961 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -35,6 +35,7 @@
 #include <linux/bpf_verifier.h>
 #include <linux/nodemask.h>
 #include <linux/bpf_mem_alloc.h>
+#include <linux/memcontrol.h>
 
 #include <asm/barrier.h>
 #include <asm/unaligned.h>
@@ -87,7 +88,7 @@ void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, uns
 
 struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flags)
 {
-	gfp_t gfp_flags = GFP_KERNEL_ACCOUNT | __GFP_ZERO | gfp_extra_flags;
+	gfp_t gfp_flags = bpf_memcg_flags(GFP_KERNEL | __GFP_ZERO | gfp_extra_flags);
 	struct bpf_prog_aux *aux;
 	struct bpf_prog *fp;
 
@@ -96,12 +97,12 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
 	if (fp == NULL)
 		return NULL;
 
-	aux = kzalloc(sizeof(*aux), GFP_KERNEL_ACCOUNT | gfp_extra_flags);
+	aux = kzalloc(sizeof(*aux), bpf_memcg_flags(GFP_KERNEL | gfp_extra_flags));
 	if (aux == NULL) {
 		vfree(fp);
 		return NULL;
 	}
-	fp->active = alloc_percpu_gfp(int, GFP_KERNEL_ACCOUNT | gfp_extra_flags);
+	fp->active = alloc_percpu_gfp(int, bpf_memcg_flags(GFP_KERNEL | gfp_extra_flags));
 	if (!fp->active) {
 		vfree(fp);
 		kfree(aux);
@@ -126,7 +127,7 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
 
 struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
 {
-	gfp_t gfp_flags = GFP_KERNEL_ACCOUNT | __GFP_ZERO | gfp_extra_flags;
+	gfp_t gfp_flags = bpf_memcg_flags(GFP_KERNEL | __GFP_ZERO | gfp_extra_flags);
 	struct bpf_prog *prog;
 	int cpu;
 
@@ -159,7 +160,7 @@ int bpf_prog_alloc_jited_linfo(struct bpf_prog *prog)
 
 	prog->aux->jited_linfo = kvcalloc(prog->aux->nr_linfo,
 					  sizeof(*prog->aux->jited_linfo),
-					  GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
+					  bpf_memcg_flags(GFP_KERNEL | __GFP_NOWARN));
 	if (!prog->aux->jited_linfo)
 		return -ENOMEM;
 
@@ -234,7 +235,7 @@ void bpf_prog_fill_jited_linfo(struct bpf_prog *prog,
 struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size,
 				  gfp_t gfp_extra_flags)
 {
-	gfp_t gfp_flags = GFP_KERNEL_ACCOUNT | __GFP_ZERO | gfp_extra_flags;
+	gfp_t gfp_flags = bpf_memcg_flags(GFP_KERNEL | __GFP_ZERO | gfp_extra_flags);
 	struct bpf_prog *fp;
 	u32 pages;
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 2/5] bpf: use bpf_map_kvcalloc in bpf_local_storage
  2023-02-05  6:58 ` [PATCH bpf-next 2/5] bpf: use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
@ 2023-02-08 19:25   ` Johannes Weiner
  2023-02-09 11:27     ` Yafang Shao
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2023-02-08 19:25 UTC (permalink / raw)
  To: Yafang Shao
  Cc: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, mhocko,
	roman.gushchin, shakeelb, muchun.song, akpm, bpf, cgroups,
	linux-mm

On Sun, Feb 05, 2023 at 06:58:02AM +0000, Yafang Shao wrote:
> Introduce new helper bpf_map_kvcalloc() for this memory allocation. Then
> bpf_local_storage will be the same with other map's creation.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

This looks good to me, but it could be helpful to explain the
user-visible part of the bug somewhat, i.e. who is being charged right
now for the allocation if it's not the map owner.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf
  2023-02-05  6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
                   ` (4 preceding siblings ...)
  2023-02-05  6:58 ` [PATCH bpf-next 5/5] bpf: allow to disable bpf prog " Yafang Shao
@ 2023-02-08 19:29 ` Johannes Weiner
  2023-02-08 20:54 ` Roman Gushchin
  6 siblings, 0 replies; 11+ messages in thread
From: Johannes Weiner @ 2023-02-08 19:29 UTC (permalink / raw)
  To: Yafang Shao
  Cc: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, mhocko,
	roman.gushchin, shakeelb, muchun.song, akpm, bpf, cgroups,
	linux-mm

On Sun, Feb 05, 2023 at 06:58:00AM +0000, Yafang Shao wrote:
> The bpf memory accouting has some known problems in contianer
> environment,
> 
> - The container memory usage is not consistent if there's pinned bpf
>   program
>   After the container restart, the leftover bpf programs won't account
>   to the new generation, so the memory usage of the container is not
>   consistent. This issue can be resolved by introducing selectable
>   memcg, but we don't have an agreement on the solution yet. See also
>   the discussions at https://lwn.net/Articles/905150/ .
> 
> - The leftover non-preallocated bpf map can't be limited
>   The leftover bpf map will be reparented, and thus it will be limited by 
>   the parent, rather than the container itself. Furthermore, if the
>   parent is destroyed, it be will limited by its parent's parent, and so
>   on. It can also be resolved by introducing selectable memcg.
> 
> - The memory dynamically allocated in bpf prog is charged into root memcg
>   only
>   Nowdays the bpf prog can dynamically allocate memory, for example via
>   bpf_obj_new(), but it only allocate from the global bpf_mem_alloc
>   pool, so it will charge into root memcg only. That needs to be
>   addressed by a new proposal.
> 
> So let's give the user an option to disable bpf memory accouting.
> 
> The idea of "cgroup.memory=nobpf" is originally by Tejun[1].

I'm not the most familiar with bpf internals, but the memcg bits and
adding the boot time flag look good to me:

Acked-by: Johannes Weiner <hannes@cmpxchg.org>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf
  2023-02-05  6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
                   ` (5 preceding siblings ...)
  2023-02-08 19:29 ` [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Johannes Weiner
@ 2023-02-08 20:54 ` Roman Gushchin
  2023-02-09 11:28   ` Yafang Shao
  6 siblings, 1 reply; 11+ messages in thread
From: Roman Gushchin @ 2023-02-08 20:54 UTC (permalink / raw)
  To: Yafang Shao
  Cc: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
	shakeelb, muchun.song, akpm, bpf, cgroups, linux-mm

On Sun, Feb 05, 2023 at 06:58:00AM +0000, Yafang Shao wrote:
> The bpf memory accouting has some known problems in contianer
> environment,
> 
> - The container memory usage is not consistent if there's pinned bpf
>   program
>   After the container restart, the leftover bpf programs won't account
>   to the new generation, so the memory usage of the container is not
>   consistent. This issue can be resolved by introducing selectable
>   memcg, but we don't have an agreement on the solution yet. See also
>   the discussions at https://lwn.net/Articles/905150/ .
> 
> - The leftover non-preallocated bpf map can't be limited
>   The leftover bpf map will be reparented, and thus it will be limited by 
>   the parent, rather than the container itself. Furthermore, if the
>   parent is destroyed, it be will limited by its parent's parent, and so
>   on. It can also be resolved by introducing selectable memcg.
> 
> - The memory dynamically allocated in bpf prog is charged into root memcg
>   only
>   Nowdays the bpf prog can dynamically allocate memory, for example via
>   bpf_obj_new(), but it only allocate from the global bpf_mem_alloc
>   pool, so it will charge into root memcg only. That needs to be
>   addressed by a new proposal.
> 
> So let's give the user an option to disable bpf memory accouting.
> 
> The idea of "cgroup.memory=nobpf" is originally by Tejun[1].
> 
> [1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/
> 
> Yafang Shao (5):
>   mm: memcontrol: add new kernel parameter cgroup.memory=nobpf
>   bpf: use bpf_map_kvcalloc in bpf_local_storage
>   bpf: introduce bpf_memcg_flags()
>   bpf: allow to disable bpf map memory accounting
>   bpf: allow to disable bpf prog memory accounting

Hello Yafang!

Overall the patch looks good to me, please, feel free to add
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

I'd squash patch (3) into (4), but up to you.

Thanks!


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 2/5] bpf: use bpf_map_kvcalloc in bpf_local_storage
  2023-02-08 19:25   ` Johannes Weiner
@ 2023-02-09 11:27     ` Yafang Shao
  0 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2023-02-09 11:27 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, mhocko,
	roman.gushchin, shakeelb, muchun.song, akpm, bpf, cgroups,
	linux-mm

On Thu, Feb 9, 2023 at 3:25 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Sun, Feb 05, 2023 at 06:58:02AM +0000, Yafang Shao wrote:
> > Introduce new helper bpf_map_kvcalloc() for this memory allocation. Then
> > bpf_local_storage will be the same with other map's creation.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>
> This looks good to me, but it could be helpful to explain the
> user-visible part of the bug somewhat, i.e. who is being charged right
> now for the allocation if it's not the map owner.

Sure. Will improve the commit log.

-- 
Regards
Yafang


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf
  2023-02-08 20:54 ` Roman Gushchin
@ 2023-02-09 11:28   ` Yafang Shao
  0 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2023-02-09 11:28 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
	shakeelb, muchun.song, akpm, bpf, cgroups, linux-mm

On Thu, Feb 9, 2023 at 4:54 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Sun, Feb 05, 2023 at 06:58:00AM +0000, Yafang Shao wrote:
> > The bpf memory accouting has some known problems in contianer
> > environment,
> >
> > - The container memory usage is not consistent if there's pinned bpf
> >   program
> >   After the container restart, the leftover bpf programs won't account
> >   to the new generation, so the memory usage of the container is not
> >   consistent. This issue can be resolved by introducing selectable
> >   memcg, but we don't have an agreement on the solution yet. See also
> >   the discussions at https://lwn.net/Articles/905150/ .
> >
> > - The leftover non-preallocated bpf map can't be limited
> >   The leftover bpf map will be reparented, and thus it will be limited by
> >   the parent, rather than the container itself. Furthermore, if the
> >   parent is destroyed, it be will limited by its parent's parent, and so
> >   on. It can also be resolved by introducing selectable memcg.
> >
> > - The memory dynamically allocated in bpf prog is charged into root memcg
> >   only
> >   Nowdays the bpf prog can dynamically allocate memory, for example via
> >   bpf_obj_new(), but it only allocate from the global bpf_mem_alloc
> >   pool, so it will charge into root memcg only. That needs to be
> >   addressed by a new proposal.
> >
> > So let's give the user an option to disable bpf memory accouting.
> >
> > The idea of "cgroup.memory=nobpf" is originally by Tejun[1].
> >
> > [1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/
> >
> > Yafang Shao (5):
> >   mm: memcontrol: add new kernel parameter cgroup.memory=nobpf
> >   bpf: use bpf_map_kvcalloc in bpf_local_storage
> >   bpf: introduce bpf_memcg_flags()
> >   bpf: allow to disable bpf map memory accounting
> >   bpf: allow to disable bpf prog memory accounting
>
> Hello Yafang!
>
> Overall the patch looks good to me, please, feel free to add
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
>
> I'd squash patch (3) into (4), but up to you.
>

Sure. Will do it.

-- 
Regards
Yafang


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-02-09 11:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-05  6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
2023-02-05  6:58 ` [PATCH bpf-next 1/5] mm: memcontrol: add new kernel parameter cgroup.memory=nobpf Yafang Shao
2023-02-05  6:58 ` [PATCH bpf-next 2/5] bpf: use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
2023-02-08 19:25   ` Johannes Weiner
2023-02-09 11:27     ` Yafang Shao
2023-02-05  6:58 ` [PATCH bpf-next 3/5] bpf: introduce bpf_memcg_flags() Yafang Shao
2023-02-05  6:58 ` [PATCH bpf-next 4/5] bpf: allow to disable bpf map memory accounting Yafang Shao
2023-02-05  6:58 ` [PATCH bpf-next 5/5] bpf: allow to disable bpf prog " Yafang Shao
2023-02-08 19:29 ` [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Johannes Weiner
2023-02-08 20:54 ` Roman Gushchin
2023-02-09 11:28   ` Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).