netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map
@ 2022-08-10 15:18 Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 01/15] bpf: Remove unneeded memset in queue_stack_map creation Yafang Shao
                   ` (15 more replies)
  0 siblings, 16 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

On our production environment, we may load, run and pin bpf programs and
maps in containers. For example, some of our networking bpf programs and
maps are loaded and pinned by a process running in a container on our
k8s environment. In this container, there're also running some other
user applications which watch the networking configurations from remote
servers and update them on this local host, log the error events, monitor
the traffic, and do some other stuffs. Sometimes we may need to update 
these user applications to a new release, and in this update process we
will destroy the old container and then start a new genration. In order not
to interrupt the bpf programs in the update process, we will pin the bpf
programs and maps in bpffs. That is the background and use case on our
production environment. 

After switching to memcg-based bpf memory accounting to limit the bpf
memory, some unexpected issues jumped out at us.
1. The memory usage is not consistent between the first generation and
new generations.
2. After the first generation is destroyed, the bpf memory can't be
limited if the bpf maps are not preallocated, because they will be
reparented.

This patchset tries to resolve these issues by introducing an
independent memcg to limit the bpf memory.

In the bpf map creation, we can assign a specific memcg instead of using
the current memcg.  That makes it flexible in containized environment.
For example, if we want to limit the pinned bpf maps, we can use below
hierarchy,

    Shared resources              Private resources 
                                    
     bpf-memcg                      k8s-memcg
     /        \                     /             
bpf-bar-memcg bpf-foo-memcg   srv-foo-memcg        
                  |               /        \
               (charged)     (not charged) (charged)                 
                  |           /              \
                  |          /                \
          bpf-foo-{progs, maps}              srv-foo

srv-foo loads and pins bpf-foo-{progs, maps}, but they are charged to an
independent memcg (bpf-foo-memcg) instead of srv-foo's memcg
(srv-foo-memcg).

Pls. note that there may be no process in bpf-foo-memcg, that means it
can be rmdir-ed by root user currently. Meanwhile we don't forcefully
destroy a memcg if it doesn't have any residents. So this hierarchy is
acceptible. 

In order to make the memcg of bpf maps seletectable, this patchset
introduces some memory allocation wrappers to allocate map related
memory. In these wrappers, it will get the memcg from the map and then
charge the allocated pages or objs.  

Currenly it only supports for bpf map, and we can extend it to bpf prog
as well. It only supports for cgroup2 now, but we can make an additional
change in cgroup_get_from_fd() to support it for cgroup1. 

The observebility can also be supported in the next step, for example,
showing the bpf map's memcg by 'bpftool map show' or even showing which
maps are charged to a specific memcg by 'bpftool cgroup show'.
Furthermore, we may also show an accurate memory size of a bpf map
instead of an estimated memory size in 'bpftool map show' in the future. 

RFC->v1:
- get rid of bpf_map container wrapper (Alexei)
- add the new field into the end of struct (Alexei)
- get rid of BPF_F_SELECTABLE_MEMCG (Alexei)
- save memcg in bpf_map_init_from_attr
- introduce bpf_ringbuf_pages_{alloc,free} and keep them inside
  kernel/bpf/ringbuf.c  (Andrii)

Yafang Shao (15):
  bpf: Remove unneeded memset in queue_stack_map creation
  bpf: Use bpf_map_area_free instread of kvfree
  bpf: Make __GFP_NOWARN consistent in bpf map creation
  bpf: Use bpf_map_area_alloc consistently on bpf map creation
  bpf: Fix incorrect mem_cgroup_put
  bpf: Define bpf_map_{get,put}_memcg for !CONFIG_MEMCG_KMEM
  bpf: Call bpf_map_init_from_attr() immediately after map creation
  bpf: Save memcg in bpf_map_init_from_attr()
  bpf: Use scoped-based charge in bpf_map_area_alloc
  bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free}
  bpf: Use bpf_map_kzalloc in arraymap
  bpf: Use bpf_map_kvcalloc in bpf_local_storage
  mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  bpf: Add return value for bpf_map_init_from_attr
  bpf: Introduce selectable memcg for bpf map

 include/linux/bpf.h            |  43 ++++++++++++-
 include/linux/memcontrol.h     |  11 ++++
 include/uapi/linux/bpf.h       |   1 +
 kernel/bpf/arraymap.c          |  34 ++++++-----
 kernel/bpf/bloom_filter.c      |  11 +++-
 kernel/bpf/bpf_local_storage.c |  17 ++++--
 kernel/bpf/bpf_struct_ops.c    |  19 +++---
 kernel/bpf/cpumap.c            |  17 ++++--
 kernel/bpf/devmap.c            |  30 ++++++----
 kernel/bpf/hashtab.c           |  26 ++++----
 kernel/bpf/local_storage.c     |  12 ++--
 kernel/bpf/lpm_trie.c          |  12 +++-
 kernel/bpf/offload.c           |  12 ++--
 kernel/bpf/queue_stack_maps.c  |  13 ++--
 kernel/bpf/reuseport_array.c   |  11 +++-
 kernel/bpf/ringbuf.c           | 104 ++++++++++++++++++++++----------
 kernel/bpf/stackmap.c          |  13 ++--
 kernel/bpf/syscall.c           | 133 ++++++++++++++++++++++++++++-------------
 mm/memcontrol.c                |  41 +++++++++++++
 net/core/sock_map.c            |  30 ++++++----
 net/xdp/xskmap.c               |  12 +++-
 tools/include/uapi/linux/bpf.h |   1 +
 tools/lib/bpf/bpf.c            |   3 +-
 tools/lib/bpf/bpf.h            |   3 +-
 tools/lib/bpf/gen_loader.c     |   2 +-
 tools/lib/bpf/libbpf.c         |   2 +
 tools/lib/bpf/skel_internal.h  |   2 +-
 27 files changed, 436 insertions(+), 179 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 01/15] bpf: Remove unneeded memset in queue_stack_map creation
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 02/15] bpf: Use bpf_map_area_free instread of kvfree Yafang Shao
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

__GFP_ZERO will clear the memory, so we don't need to memset it.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/queue_stack_maps.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index a1c0794..8a5e060 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -78,8 +78,6 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 	if (!qs)
 		return ERR_PTR(-ENOMEM);
 
-	memset(qs, 0, sizeof(*qs));
-
 	bpf_map_init_from_attr(&qs->map, attr);
 
 	qs->size = size;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 02/15] bpf: Use bpf_map_area_free instread of kvfree
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 01/15] bpf: Remove unneeded memset in queue_stack_map creation Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 03/15] bpf: Make __GFP_NOWARN consistent in bpf map creation Yafang Shao
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

bpf_map_area_alloc() should be paired with bpf_map_area_free().

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/ringbuf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index ded4fae..3fb54fe 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -116,7 +116,7 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
 err_free_pages:
 	for (i = 0; i < nr_pages; i++)
 		__free_page(pages[i]);
-	kvfree(pages);
+	bpf_map_area_free(pages);
 	return NULL;
 }
 
@@ -190,7 +190,7 @@ static void bpf_ringbuf_free(struct bpf_ringbuf *rb)
 	vunmap(rb);
 	for (i = 0; i < nr_pages; i++)
 		__free_page(pages[i]);
-	kvfree(pages);
+	bpf_map_area_free(pages);
 }
 
 static void ringbuf_map_free(struct bpf_map *map)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 03/15] bpf: Make __GFP_NOWARN consistent in bpf map creation
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 01/15] bpf: Remove unneeded memset in queue_stack_map creation Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 02/15] bpf: Use bpf_map_area_free instread of kvfree Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 04/15] bpf: Use bpf_map_area_alloc consistently on " Yafang Shao
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Some of the bpf maps are created with __GFP_NOWARN, i.e. arraymap,
bloom_filter, bpf_local_storage, bpf_struct_ops, lpm_trie,
queue_stack_maps, reuseport_array, stackmap and xskmap, while others are
created without __GFP_NOWARN, i.e. cpumap, devmap, hashtab,
local_storage, offload, ringbuf and sock_map. But there are not key
differences between the creation of these maps. So let make this
allocation flag consistent in all bpf maps creation. Then we can use a
generic helper to alloc all bpf maps.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/cpumap.c        | 2 +-
 kernel/bpf/devmap.c        | 2 +-
 kernel/bpf/hashtab.c       | 2 +-
 kernel/bpf/local_storage.c | 4 ++--
 kernel/bpf/offload.c       | 2 +-
 kernel/bpf/ringbuf.c       | 2 +-
 net/core/sock_map.c        | 4 ++--
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index f4860ac..b25ca9d 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -97,7 +97,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	    attr->map_flags & ~BPF_F_NUMA_NODE)
 		return ERR_PTR(-EINVAL);
 
-	cmap = kzalloc(sizeof(*cmap), GFP_USER | __GFP_ACCOUNT);
+	cmap = kzalloc(sizeof(*cmap), GFP_USER | __GFP_NOWARN |  __GFP_ACCOUNT);
 	if (!cmap)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index a0e02b0..88feaa0 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -163,7 +163,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	if (!capable(CAP_NET_ADMIN))
 		return ERR_PTR(-EPERM);
 
-	dtab = kzalloc(sizeof(*dtab), GFP_USER | __GFP_ACCOUNT);
+	dtab = kzalloc(sizeof(*dtab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!dtab)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index da75784..f1e5303 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -495,7 +495,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	struct bpf_htab *htab;
 	int err, i;
 
-	htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_ACCOUNT);
+	htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 49ef0ce..a64255e 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -313,8 +313,8 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 		/* max_entries is not used and enforced to be 0 */
 		return ERR_PTR(-EINVAL);
 
-	map = kmalloc_node(sizeof(struct bpf_cgroup_storage_map),
-			   __GFP_ZERO | GFP_USER | __GFP_ACCOUNT, numa_node);
+	map = kzalloc_node(sizeof(struct bpf_cgroup_storage_map),
+			   GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT, numa_node);
 	if (!map)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index bd09290..5a629a1 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -372,7 +372,7 @@ struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr)
 	    attr->map_type != BPF_MAP_TYPE_HASH)
 		return ERR_PTR(-EINVAL);
 
-	offmap = kzalloc(sizeof(*offmap), GFP_USER);
+	offmap = kzalloc(sizeof(*offmap), GFP_USER | __GFP_NOWARN);
 	if (!offmap)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 3fb54fe..df8062c 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -164,7 +164,7 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-E2BIG);
 #endif
 
-	rb_map = kzalloc(sizeof(*rb_map), GFP_USER | __GFP_ACCOUNT);
+	rb_map = kzalloc(sizeof(*rb_map), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!rb_map)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 028813d..763d771 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -41,7 +41,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 	    attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
 		return ERR_PTR(-EINVAL);
 
-	stab = kzalloc(sizeof(*stab), GFP_USER | __GFP_ACCOUNT);
+	stab = kzalloc(sizeof(*stab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!stab)
 		return ERR_PTR(-ENOMEM);
 
@@ -1076,7 +1076,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 	if (attr->key_size > MAX_BPF_STACK)
 		return ERR_PTR(-E2BIG);
 
-	htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_ACCOUNT);
+	htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 04/15] bpf: Use bpf_map_area_alloc consistently on bpf map creation
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (2 preceding siblings ...)
  2022-08-10 15:18 ` [PATCH bpf-next 03/15] bpf: Make __GFP_NOWARN consistent in bpf map creation Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put Yafang Shao
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Let's use the generic helper bpf_map_area_alloc() instead of the
open-coded kzalloc helpers in bpf maps creation path.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/bpf_local_storage.c |  6 +++---
 kernel/bpf/cpumap.c            |  6 +++---
 kernel/bpf/devmap.c            |  6 +++---
 kernel/bpf/hashtab.c           |  6 +++---
 kernel/bpf/local_storage.c     |  5 ++---
 kernel/bpf/lpm_trie.c          |  4 ++--
 kernel/bpf/offload.c           |  6 +++---
 kernel/bpf/ringbuf.c           |  6 +++---
 net/core/sock_map.c            | 12 ++++++------
 9 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 8ce40fd..4ee2e72 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -582,7 +582,7 @@ void bpf_local_storage_map_free(struct bpf_local_storage_map *smap,
 	synchronize_rcu();
 
 	kvfree(smap->buckets);
-	kfree(smap);
+	bpf_map_area_free(smap);
 }
 
 int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
@@ -610,7 +610,7 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	unsigned int i;
 	u32 nbuckets;
 
-	smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	smap = bpf_map_area_alloc(sizeof(*smap), NUMA_NO_NODE);
 	if (!smap)
 		return ERR_PTR(-ENOMEM);
 	bpf_map_init_from_attr(&smap->map, attr);
@@ -623,7 +623,7 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
 				 GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!smap->buckets) {
-		kfree(smap);
+		bpf_map_area_free(smap);
 		return ERR_PTR(-ENOMEM);
 	}
 
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index b25ca9d..b5ba34d 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -97,7 +97,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	    attr->map_flags & ~BPF_F_NUMA_NODE)
 		return ERR_PTR(-EINVAL);
 
-	cmap = kzalloc(sizeof(*cmap), GFP_USER | __GFP_NOWARN |  __GFP_ACCOUNT);
+	cmap = bpf_map_area_alloc(sizeof(*cmap), NUMA_NO_NODE);
 	if (!cmap)
 		return ERR_PTR(-ENOMEM);
 
@@ -118,7 +118,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 
 	return &cmap->map;
 free_cmap:
-	kfree(cmap);
+	bpf_map_area_free(cmap);
 	return ERR_PTR(err);
 }
 
@@ -623,7 +623,7 @@ static void cpu_map_free(struct bpf_map *map)
 		__cpu_map_entry_replace(cmap, i, NULL); /* call_rcu */
 	}
 	bpf_map_area_free(cmap->cpu_map);
-	kfree(cmap);
+	bpf_map_area_free(cmap);
 }
 
 /* Elements are kept alive by RCU; either by rcu_read_lock() (from syscall) or
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 88feaa0..f9a87dc 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -163,13 +163,13 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	if (!capable(CAP_NET_ADMIN))
 		return ERR_PTR(-EPERM);
 
-	dtab = kzalloc(sizeof(*dtab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	dtab = bpf_map_area_alloc(sizeof(*dtab), NUMA_NO_NODE);
 	if (!dtab)
 		return ERR_PTR(-ENOMEM);
 
 	err = dev_map_init_map(dtab, attr);
 	if (err) {
-		kfree(dtab);
+		bpf_map_area_free(dtab);
 		return ERR_PTR(err);
 	}
 
@@ -240,7 +240,7 @@ static void dev_map_free(struct bpf_map *map)
 		bpf_map_area_free(dtab->netdev_map);
 	}
 
-	kfree(dtab);
+	bpf_map_area_free(dtab);
 }
 
 static int dev_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index f1e5303..8392f7f 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -495,7 +495,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	struct bpf_htab *htab;
 	int err, i;
 
-	htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
@@ -579,7 +579,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	bpf_map_area_free(htab->buckets);
 free_htab:
 	lockdep_unregister_key(&htab->lockdep_key);
-	kfree(htab);
+	bpf_map_area_free(htab);
 	return ERR_PTR(err);
 }
 
@@ -1496,7 +1496,7 @@ static void htab_map_free(struct bpf_map *map)
 	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
 		free_percpu(htab->map_locked[i]);
 	lockdep_unregister_key(&htab->lockdep_key);
-	kfree(htab);
+	bpf_map_area_free(htab);
 }
 
 static void htab_map_seq_show_elem(struct bpf_map *map, void *key,
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index a64255e..098cf33 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -313,8 +313,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 		/* max_entries is not used and enforced to be 0 */
 		return ERR_PTR(-EINVAL);
 
-	map = kzalloc_node(sizeof(struct bpf_cgroup_storage_map),
-			   GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT, numa_node);
+	map = bpf_map_area_alloc(sizeof(struct bpf_cgroup_storage_map), numa_node);
 	if (!map)
 		return ERR_PTR(-ENOMEM);
 
@@ -346,7 +345,7 @@ static void cgroup_storage_map_free(struct bpf_map *_map)
 	WARN_ON(!RB_EMPTY_ROOT(&map->root));
 	WARN_ON(!list_empty(&map->list));
 
-	kfree(map);
+	bpf_map_area_free(map);
 }
 
 static int cgroup_storage_delete_elem(struct bpf_map *map, void *key)
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index d789e3b..d833496 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -558,7 +558,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 	    attr->value_size > LPM_VAL_SIZE_MAX)
 		return ERR_PTR(-EINVAL);
 
-	trie = kzalloc(sizeof(*trie), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	trie = bpf_map_area_alloc(sizeof(*trie), NUMA_NO_NODE);
 	if (!trie)
 		return ERR_PTR(-ENOMEM);
 
@@ -609,7 +609,7 @@ static void trie_free(struct bpf_map *map)
 	}
 
 out:
-	kfree(trie);
+	bpf_map_area_free(trie);
 }
 
 static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 5a629a1..13e4efc 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -372,7 +372,7 @@ struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr)
 	    attr->map_type != BPF_MAP_TYPE_HASH)
 		return ERR_PTR(-EINVAL);
 
-	offmap = kzalloc(sizeof(*offmap), GFP_USER | __GFP_NOWARN);
+	offmap = bpf_map_area_alloc(sizeof(*offmap), NUMA_NO_NODE);
 	if (!offmap)
 		return ERR_PTR(-ENOMEM);
 
@@ -404,7 +404,7 @@ struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr)
 err_unlock:
 	up_write(&bpf_devs_lock);
 	rtnl_unlock();
-	kfree(offmap);
+	bpf_map_area_free(offmap);
 	return ERR_PTR(err);
 }
 
@@ -428,7 +428,7 @@ void bpf_map_offload_map_free(struct bpf_map *map)
 	up_write(&bpf_devs_lock);
 	rtnl_unlock();
 
-	kfree(offmap);
+	bpf_map_area_free(offmap);
 }
 
 int bpf_map_offload_lookup_elem(struct bpf_map *map, void *key, void *value)
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index df8062c..b483aea 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -164,7 +164,7 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-E2BIG);
 #endif
 
-	rb_map = kzalloc(sizeof(*rb_map), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	rb_map = bpf_map_area_alloc(sizeof(*rb_map), NUMA_NO_NODE);
 	if (!rb_map)
 		return ERR_PTR(-ENOMEM);
 
@@ -172,7 +172,7 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
 
 	rb_map->rb = bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node);
 	if (!rb_map->rb) {
-		kfree(rb_map);
+		bpf_map_area_free(rb_map);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -199,7 +199,7 @@ static void ringbuf_map_free(struct bpf_map *map)
 
 	rb_map = container_of(map, struct bpf_ringbuf_map, map);
 	bpf_ringbuf_free(rb_map->rb);
-	kfree(rb_map);
+	bpf_map_area_free(rb_map);
 }
 
 static void *ringbuf_map_lookup_elem(struct bpf_map *map, void *key)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 763d771..d0c4338 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -41,7 +41,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 	    attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
 		return ERR_PTR(-EINVAL);
 
-	stab = kzalloc(sizeof(*stab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	stab = bpf_map_area_alloc(sizeof(*stab), NUMA_NO_NODE);
 	if (!stab)
 		return ERR_PTR(-ENOMEM);
 
@@ -52,7 +52,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 				       sizeof(struct sock *),
 				       stab->map.numa_node);
 	if (!stab->sks) {
-		kfree(stab);
+		bpf_map_area_free(stab);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -361,7 +361,7 @@ static void sock_map_free(struct bpf_map *map)
 	synchronize_rcu();
 
 	bpf_map_area_free(stab->sks);
-	kfree(stab);
+	bpf_map_area_free(stab);
 }
 
 static void sock_map_release_progs(struct bpf_map *map)
@@ -1076,7 +1076,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 	if (attr->key_size > MAX_BPF_STACK)
 		return ERR_PTR(-E2BIG);
 
-	htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
@@ -1106,7 +1106,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 
 	return &htab->map;
 free_htab:
-	kfree(htab);
+	bpf_map_area_free(htab);
 	return ERR_PTR(err);
 }
 
@@ -1159,7 +1159,7 @@ static void sock_hash_free(struct bpf_map *map)
 	synchronize_rcu();
 
 	bpf_map_area_free(htab->buckets);
-	kfree(htab);
+	bpf_map_area_free(htab);
 }
 
 static void *sock_hash_lookup_sys(struct bpf_map *map, void *key)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (3 preceding siblings ...)
  2022-08-10 15:18 ` [PATCH bpf-next 04/15] bpf: Use bpf_map_area_alloc consistently on " Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-10 17:07   ` Shakeel Butt
  2022-08-11 16:48   ` Roman Gushchin
  2022-08-10 15:18 ` [PATCH bpf-next 06/15] bpf: Define bpf_map_{get,put}_memcg for !CONFIG_MEMCG_KMEM Yafang Shao
                   ` (10 subsequent siblings)
  15 siblings, 2 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

The memcg may be the root_mem_cgroup, in which case we shouldn't put it.
So a new helper bpf_map_put_memcg() is introduced to pair with
bpf_map_get_memcg().

Fixes: 4201d9ab3e42 ("bpf: reparent bpf maps on memcg offlining")
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/syscall.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 83c7136..51ab8b1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -441,6 +441,14 @@ static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
 	return root_mem_cgroup;
 }
 
+static void bpf_map_put_memcg(struct mem_cgroup *memcg)
+{
+	if (mem_cgroup_is_root(memcg))
+		return;
+
+	mem_cgroup_put(memcg);
+}
+
 void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
 			   int node)
 {
@@ -451,7 +459,7 @@ void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
 	old_memcg = set_active_memcg(memcg);
 	ptr = kmalloc_node(size, flags | __GFP_ACCOUNT, node);
 	set_active_memcg(old_memcg);
-	mem_cgroup_put(memcg);
+	bpf_map_put_memcg(memcg);
 
 	return ptr;
 }
@@ -465,7 +473,7 @@ void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags)
 	old_memcg = set_active_memcg(memcg);
 	ptr = kzalloc(size, flags | __GFP_ACCOUNT);
 	set_active_memcg(old_memcg);
-	mem_cgroup_put(memcg);
+	bpf_map_put_memcg(memcg);
 
 	return ptr;
 }
@@ -480,7 +488,7 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 	old_memcg = set_active_memcg(memcg);
 	ptr = __alloc_percpu_gfp(size, align, flags | __GFP_ACCOUNT);
 	set_active_memcg(old_memcg);
-	mem_cgroup_put(memcg);
+	bpf_map_put_memcg(memcg);
 
 	return ptr;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 06/15] bpf: Define bpf_map_{get,put}_memcg for !CONFIG_MEMCG_KMEM
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (4 preceding siblings ...)
  2022-08-10 15:18 ` [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 07/15] bpf: Call bpf_map_init_from_attr() immediately after map creation Yafang Shao
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

We can use this helper when CONFIG_MEMCG_KMEM or CONFIG_MEMCG is not set.
It also moves bpf_map_{get,put}_memcg into include/linux/bpf.h, so
these two helpers can be used in other source files.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/bpf.h        | 29 +++++++++++++++++++++++++++++
 include/linux/memcontrol.h | 10 ++++++++++
 kernel/bpf/syscall.c       | 16 ----------------
 3 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 20c26ae..fe3b565 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -27,6 +27,7 @@
 #include <linux/bpfptr.h>
 #include <linux/btf.h>
 #include <linux/rcupdate_trace.h>
+#include <linux/memcontrol.h>
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
@@ -2571,4 +2572,32 @@ static inline void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype) {}
 static inline void bpf_cgroup_atype_put(int cgroup_atype) {}
 #endif /* CONFIG_BPF_LSM */
 
+#ifdef CONFIG_MEMCG_KMEM
+static inline struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
+{
+	if (map->objcg)
+		return get_mem_cgroup_from_objcg(map->objcg);
+
+	return root_mem_cgroup;
+}
+
+static inline void bpf_map_put_memcg(struct mem_cgroup *memcg)
+{
+	if (mem_cgroup_is_root(memcg))
+		return;
+
+	mem_cgroup_put(memcg);
+}
+
+#else
+static inline struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
+{
+	return root_memcg();
+}
+
+static inline void bpf_map_put_memcg(struct mem_cgroup *memcg)
+{
+}
+#endif
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 9ecead1..2f0a611 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -361,6 +361,11 @@ struct mem_cgroup {
 
 extern struct mem_cgroup *root_mem_cgroup;
 
+static inline struct mem_cgroup *root_memcg(void)
+{
+	return root_mem_cgroup;
+}
+
 enum page_memcg_data_flags {
 	/* page->memcg_data is a pointer to an objcgs vector */
 	MEMCG_DATA_OBJCGS = (1UL << 0),
@@ -1138,6 +1143,11 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 #define MEM_CGROUP_ID_SHIFT	0
 #define MEM_CGROUP_ID_MAX	0
 
+static inline struct mem_cgroup *root_memcg(void)
+{
+	return NULL;
+}
+
 static inline struct mem_cgroup *folio_memcg(struct folio *folio)
 {
 	return NULL;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 51ab8b1..19c3a81 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -433,22 +433,6 @@ static void bpf_map_release_memcg(struct bpf_map *map)
 		obj_cgroup_put(map->objcg);
 }
 
-static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
-{
-	if (map->objcg)
-		return get_mem_cgroup_from_objcg(map->objcg);
-
-	return root_mem_cgroup;
-}
-
-static void bpf_map_put_memcg(struct mem_cgroup *memcg)
-{
-	if (mem_cgroup_is_root(memcg))
-		return;
-
-	mem_cgroup_put(memcg);
-}
-
 void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
 			   int node)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 07/15] bpf: Call bpf_map_init_from_attr() immediately after map creation
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (5 preceding siblings ...)
  2022-08-10 15:18 ` [PATCH bpf-next 06/15] bpf: Define bpf_map_{get,put}_memcg for !CONFIG_MEMCG_KMEM Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 08/15] bpf: Save memcg in bpf_map_init_from_attr() Yafang Shao
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

In order to make all other map related memory allocations been allocated
after memcg is saved in the map, we should save the memcg immediately
after map creation. But the map is created in bpf_map_area_alloc(),
within which we can't get the related bpf_map (except with a pointer
casting which may be error prone), so we can do it in
bpf_map_init_from_attr(), which is used by all bpf maps.

bpf_map_init_from_attr() is executed immediately after
bpf_map_area_alloc() for almost all bpf maps except bpf_struct_ops,
devmap and hashmap, so this patch changes these three maps.

In the future we will change the return type of bpf_map_init_from_attr()
from void to int for error cases, so put it immediately after
bpf_map_area_alloc() will make it eary to handle the error case.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/bpf_struct_ops.c | 2 +-
 kernel/bpf/devmap.c         | 5 ++---
 kernel/bpf/hashtab.c        | 4 ++--
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 84b2d9d..36f24f8 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -624,6 +624,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
 
 	st_map->st_ops = st_ops;
 	map = &st_map->map;
+	bpf_map_init_from_attr(map, attr);
 
 	st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE);
 	st_map->links =
@@ -637,7 +638,6 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
 
 	mutex_init(&st_map->lock);
 	set_vm_flush_reset_perms(st_map->image);
-	bpf_map_init_from_attr(map, attr);
 
 	return map;
 }
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index f9a87dc..20decc7 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -127,9 +127,6 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
 	 */
 	attr->map_flags |= BPF_F_RDONLY_PROG;
 
-
-	bpf_map_init_from_attr(&dtab->map, attr);
-
 	if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
 		dtab->n_buckets = roundup_pow_of_two(dtab->map.max_entries);
 
@@ -167,6 +164,8 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	if (!dtab)
 		return ERR_PTR(-ENOMEM);
 
+	bpf_map_init_from_attr(&dtab->map, attr);
+
 	err = dev_map_init_map(dtab, attr);
 	if (err) {
 		bpf_map_area_free(dtab);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 8392f7f..48dc04c 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -499,10 +499,10 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
-	lockdep_register_key(&htab->lockdep_key);
-
 	bpf_map_init_from_attr(&htab->map, attr);
 
+	lockdep_register_key(&htab->lockdep_key);
+
 	if (percpu_lru) {
 		/* ensure each CPU's lru list has >=1 elements.
 		 * since we are at it, make each lru list has the same
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 08/15] bpf: Save memcg in bpf_map_init_from_attr()
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (6 preceding siblings ...)
  2022-08-10 15:18 ` [PATCH bpf-next 07/15] bpf: Call bpf_map_init_from_attr() immediately after map creation Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 09/15] bpf: Use scoped-based charge in bpf_map_area_alloc Yafang Shao
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Move bpf_map_save_memcg() into bpf_map_init_from_attr(), then all other
map related memory allocation will be allocated after saving the memcg.
And then we can get memcg from the map in the followup memory allocation.

To pair with this change, bpf_map_release_memcg() is moved into
bpf_map_area_free(). A new parameter struct bpf_map is introduced into
bpf_map_area_free() for this purpose.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/bpf.h            |  2 +-
 kernel/bpf/arraymap.c          |  8 +++---
 kernel/bpf/bloom_filter.c      |  2 +-
 kernel/bpf/bpf_local_storage.c |  4 +--
 kernel/bpf/bpf_struct_ops.c    |  6 ++---
 kernel/bpf/cpumap.c            |  6 ++---
 kernel/bpf/devmap.c            |  8 +++---
 kernel/bpf/hashtab.c           | 10 +++----
 kernel/bpf/local_storage.c     |  2 +-
 kernel/bpf/lpm_trie.c          |  2 +-
 kernel/bpf/offload.c           |  4 +--
 kernel/bpf/queue_stack_maps.c  |  2 +-
 kernel/bpf/reuseport_array.c   |  2 +-
 kernel/bpf/ringbuf.c           |  8 +++---
 kernel/bpf/stackmap.c          |  8 +++---
 kernel/bpf/syscall.c           | 60 ++++++++++++++++++++++--------------------
 net/core/sock_map.c            | 12 ++++-----
 net/xdp/xskmap.c               |  2 +-
 18 files changed, 76 insertions(+), 72 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index fe3b565..414f8b7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1637,7 +1637,7 @@ struct bpf_prog *bpf_prog_get_type_dev(u32 ufd, enum bpf_prog_type type,
 void bpf_map_put(struct bpf_map *map);
 void *bpf_map_area_alloc(u64 size, int numa_node);
 void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
-void bpf_map_area_free(void *base);
+void bpf_map_area_free(void *base, struct bpf_map *map);
 bool bpf_map_write_active(const struct bpf_map *map);
 void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
 int  generic_map_lookup_batch(struct bpf_map *map,
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index d3e734b..9ce4d1b 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -147,7 +147,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	array->elem_size = elem_size;
 
 	if (percpu && bpf_array_alloc_percpu(array)) {
-		bpf_map_area_free(array);
+		bpf_map_area_free(array, &array->map);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -430,9 +430,9 @@ static void array_map_free(struct bpf_map *map)
 		bpf_array_free_percpu(array);
 
 	if (array->map.map_flags & BPF_F_MMAPABLE)
-		bpf_map_area_free(array_map_vmalloc_addr(array));
+		bpf_map_area_free(array_map_vmalloc_addr(array), map);
 	else
-		bpf_map_area_free(array);
+		bpf_map_area_free(array, map);
 }
 
 static void array_map_seq_show_elem(struct bpf_map *map, void *key,
@@ -774,7 +774,7 @@ static void fd_array_map_free(struct bpf_map *map)
 	for (i = 0; i < array->map.max_entries; i++)
 		BUG_ON(array->ptrs[i] != NULL);
 
-	bpf_map_area_free(array);
+	bpf_map_area_free(array, map);
 }
 
 static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key)
diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
index b9ea539..e59064d 100644
--- a/kernel/bpf/bloom_filter.c
+++ b/kernel/bpf/bloom_filter.c
@@ -168,7 +168,7 @@ static void bloom_map_free(struct bpf_map *map)
 	struct bpf_bloom_filter *bloom =
 		container_of(map, struct bpf_bloom_filter, map);
 
-	bpf_map_area_free(bloom);
+	bpf_map_area_free(bloom, map);
 }
 
 static void *bloom_map_lookup_elem(struct bpf_map *map, void *key)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 4ee2e72..77e075b 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -582,7 +582,7 @@ void bpf_local_storage_map_free(struct bpf_local_storage_map *smap,
 	synchronize_rcu();
 
 	kvfree(smap->buckets);
-	bpf_map_area_free(smap);
+	bpf_map_area_free(smap, &smap->map);
 }
 
 int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
@@ -623,7 +623,7 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
 				 GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!smap->buckets) {
-		bpf_map_area_free(smap);
+		bpf_map_area_free(smap, &smap->map);
 		return ERR_PTR(-ENOMEM);
 	}
 
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 36f24f8..9fb8ad1 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -577,10 +577,10 @@ static void bpf_struct_ops_map_free(struct bpf_map *map)
 
 	if (st_map->links)
 		bpf_struct_ops_map_put_progs(st_map);
-	bpf_map_area_free(st_map->links);
+	bpf_map_area_free(st_map->links, NULL);
 	bpf_jit_free_exec(st_map->image);
-	bpf_map_area_free(st_map->uvalue);
-	bpf_map_area_free(st_map);
+	bpf_map_area_free(st_map->uvalue, NULL);
+	bpf_map_area_free(st_map, map);
 }
 
 static int bpf_struct_ops_map_alloc_check(union bpf_attr *attr)
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index b5ba34d..7de2ae6 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -118,7 +118,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 
 	return &cmap->map;
 free_cmap:
-	bpf_map_area_free(cmap);
+	bpf_map_area_free(cmap, &cmap->map);
 	return ERR_PTR(err);
 }
 
@@ -622,8 +622,8 @@ static void cpu_map_free(struct bpf_map *map)
 		/* bq flush and cleanup happens after RCU grace-period */
 		__cpu_map_entry_replace(cmap, i, NULL); /* call_rcu */
 	}
-	bpf_map_area_free(cmap->cpu_map);
-	bpf_map_area_free(cmap);
+	bpf_map_area_free(cmap->cpu_map, NULL);
+	bpf_map_area_free(cmap, map);
 }
 
 /* Elements are kept alive by RCU; either by rcu_read_lock() (from syscall) or
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 20decc7..3268ce7 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -168,7 +168,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 
 	err = dev_map_init_map(dtab, attr);
 	if (err) {
-		bpf_map_area_free(dtab);
+		bpf_map_area_free(dtab, &dtab->map);
 		return ERR_PTR(err);
 	}
 
@@ -221,7 +221,7 @@ static void dev_map_free(struct bpf_map *map)
 			}
 		}
 
-		bpf_map_area_free(dtab->dev_index_head);
+		bpf_map_area_free(dtab->dev_index_head, NULL);
 	} else {
 		for (i = 0; i < dtab->map.max_entries; i++) {
 			struct bpf_dtab_netdev *dev;
@@ -236,10 +236,10 @@ static void dev_map_free(struct bpf_map *map)
 			kfree(dev);
 		}
 
-		bpf_map_area_free(dtab->netdev_map);
+		bpf_map_area_free(dtab->netdev_map, NULL);
 	}
 
-	bpf_map_area_free(dtab);
+	bpf_map_area_free(dtab, &dtab->map);
 }
 
 static int dev_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 48dc04c..1b3653d 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -290,7 +290,7 @@ static void htab_free_elems(struct bpf_htab *htab)
 		cond_resched();
 	}
 free_elems:
-	bpf_map_area_free(htab->elems);
+	bpf_map_area_free(htab->elems, NULL);
 }
 
 /* The LRU list has a lock (lru_lock). Each htab bucket has a lock
@@ -576,10 +576,10 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 free_map_locked:
 	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
 		free_percpu(htab->map_locked[i]);
-	bpf_map_area_free(htab->buckets);
+	bpf_map_area_free(htab->buckets, NULL);
 free_htab:
 	lockdep_unregister_key(&htab->lockdep_key);
-	bpf_map_area_free(htab);
+	bpf_map_area_free(htab, &htab->map);
 	return ERR_PTR(err);
 }
 
@@ -1492,11 +1492,11 @@ static void htab_map_free(struct bpf_map *map)
 
 	bpf_map_free_kptr_off_tab(map);
 	free_percpu(htab->extra_elems);
-	bpf_map_area_free(htab->buckets);
+	bpf_map_area_free(htab->buckets, NULL);
 	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
 		free_percpu(htab->map_locked[i]);
 	lockdep_unregister_key(&htab->lockdep_key);
-	bpf_map_area_free(htab);
+	bpf_map_area_free(htab, map);
 }
 
 static void htab_map_seq_show_elem(struct bpf_map *map, void *key,
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 098cf33..c705d66 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -345,7 +345,7 @@ static void cgroup_storage_map_free(struct bpf_map *_map)
 	WARN_ON(!RB_EMPTY_ROOT(&map->root));
 	WARN_ON(!list_empty(&map->list));
 
-	bpf_map_area_free(map);
+	bpf_map_area_free(map, _map);
 }
 
 static int cgroup_storage_delete_elem(struct bpf_map *map, void *key)
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index d833496..fd99360 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -609,7 +609,7 @@ static void trie_free(struct bpf_map *map)
 	}
 
 out:
-	bpf_map_area_free(trie);
+	bpf_map_area_free(trie, map);
 }
 
 static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 13e4efc..c9941a9 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -404,7 +404,7 @@ struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr)
 err_unlock:
 	up_write(&bpf_devs_lock);
 	rtnl_unlock();
-	bpf_map_area_free(offmap);
+	bpf_map_area_free(offmap, &offmap->map);
 	return ERR_PTR(err);
 }
 
@@ -428,7 +428,7 @@ void bpf_map_offload_map_free(struct bpf_map *map)
 	up_write(&bpf_devs_lock);
 	rtnl_unlock();
 
-	bpf_map_area_free(offmap);
+	bpf_map_area_free(offmap, map);
 }
 
 int bpf_map_offload_lookup_elem(struct bpf_map *map, void *key, void *value)
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index 8a5e060..f2ec0c4 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -92,7 +92,7 @@ static void queue_stack_map_free(struct bpf_map *map)
 {
 	struct bpf_queue_stack *qs = bpf_queue_stack(map);
 
-	bpf_map_area_free(qs);
+	bpf_map_area_free(qs, map);
 }
 
 static int __queue_map_get(struct bpf_map *map, void *value, bool delete)
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index e2618fb..594cdb0 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -146,7 +146,7 @@ static void reuseport_array_free(struct bpf_map *map)
 	 * Once reaching here, all sk->sk_user_data is not
 	 * referencing this "array". "array" can be freed now.
 	 */
-	bpf_map_area_free(array);
+	bpf_map_area_free(array, map);
 }
 
 static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index b483aea..74dd8dc 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -116,7 +116,7 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
 err_free_pages:
 	for (i = 0; i < nr_pages; i++)
 		__free_page(pages[i]);
-	bpf_map_area_free(pages);
+	bpf_map_area_free(pages, NULL);
 	return NULL;
 }
 
@@ -172,7 +172,7 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
 
 	rb_map->rb = bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node);
 	if (!rb_map->rb) {
-		bpf_map_area_free(rb_map);
+		bpf_map_area_free(rb_map, &rb_map->map);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -190,7 +190,7 @@ static void bpf_ringbuf_free(struct bpf_ringbuf *rb)
 	vunmap(rb);
 	for (i = 0; i < nr_pages; i++)
 		__free_page(pages[i]);
-	bpf_map_area_free(pages);
+	bpf_map_area_free(pages, NULL);
 }
 
 static void ringbuf_map_free(struct bpf_map *map)
@@ -199,7 +199,7 @@ static void ringbuf_map_free(struct bpf_map *map)
 
 	rb_map = container_of(map, struct bpf_ringbuf_map, map);
 	bpf_ringbuf_free(rb_map->rb);
-	bpf_map_area_free(rb_map);
+	bpf_map_area_free(rb_map, map);
 }
 
 static void *ringbuf_map_lookup_elem(struct bpf_map *map, void *key)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 1adbe67..042b7d2 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -62,7 +62,7 @@ static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
 	return 0;
 
 free_elems:
-	bpf_map_area_free(smap->elems);
+	bpf_map_area_free(smap->elems, NULL);
 	return err;
 }
 
@@ -120,7 +120,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 put_buffers:
 	put_callchain_buffers();
 free_smap:
-	bpf_map_area_free(smap);
+	bpf_map_area_free(smap, &smap->map);
 	return ERR_PTR(err);
 }
 
@@ -648,9 +648,9 @@ static void stack_map_free(struct bpf_map *map)
 {
 	struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
 
-	bpf_map_area_free(smap->elems);
+	bpf_map_area_free(smap->elems, NULL);
 	pcpu_freelist_destroy(&smap->freelist);
-	bpf_map_area_free(smap);
+	bpf_map_area_free(smap, map);
 	put_callchain_buffers();
 }
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 19c3a81..01d8d4a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -293,6 +293,34 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
 	return err;
 }
 
+#ifdef CONFIG_MEMCG_KMEM
+static void bpf_map_save_memcg(struct bpf_map *map)
+{
+	/* Currently if a map is created by a process belonging to the root
+	 * memory cgroup, get_obj_cgroup_from_current() will return NULL.
+	 * So we have to check map->objcg for being NULL each time it's
+	 * being used.
+	 */
+	map->objcg = get_obj_cgroup_from_current();
+}
+
+static void bpf_map_release_memcg(struct bpf_map *map)
+{
+	if (map->objcg)
+		obj_cgroup_put(map->objcg);
+}
+
+#else
+static void bpf_map_save_memcg(struct bpf_map *map)
+{
+}
+
+static void bpf_map_release_memcg(struct bpf_map *map)
+{
+}
+
+#endif
+
 /* Please, do not use this function outside from the map creation path
  * (e.g. in map update path) without taking care of setting the active
  * memory cgroup (see at bpf_map_kmalloc_node() for example).
@@ -344,8 +372,10 @@ void *bpf_map_area_mmapable_alloc(u64 size, int numa_node)
 	return __bpf_map_area_alloc(size, numa_node, true);
 }
 
-void bpf_map_area_free(void *area)
+void bpf_map_area_free(void *area, struct bpf_map *map)
 {
+	if (map)
+		bpf_map_release_memcg(map);
 	kvfree(area);
 }
 
@@ -363,6 +393,7 @@ static u32 bpf_map_flags_retain_permanent(u32 flags)
 
 void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr)
 {
+	bpf_map_save_memcg(map);
 	map->map_type = attr->map_type;
 	map->key_size = attr->key_size;
 	map->value_size = attr->value_size;
@@ -417,22 +448,6 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
 }
 
 #ifdef CONFIG_MEMCG_KMEM
-static void bpf_map_save_memcg(struct bpf_map *map)
-{
-	/* Currently if a map is created by a process belonging to the root
-	 * memory cgroup, get_obj_cgroup_from_current() will return NULL.
-	 * So we have to check map->objcg for being NULL each time it's
-	 * being used.
-	 */
-	map->objcg = get_obj_cgroup_from_current();
-}
-
-static void bpf_map_release_memcg(struct bpf_map *map)
-{
-	if (map->objcg)
-		obj_cgroup_put(map->objcg);
-}
-
 void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
 			   int node)
 {
@@ -477,14 +492,6 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 	return ptr;
 }
 
-#else
-static void bpf_map_save_memcg(struct bpf_map *map)
-{
-}
-
-static void bpf_map_release_memcg(struct bpf_map *map)
-{
-}
 #endif
 
 static int bpf_map_kptr_off_cmp(const void *a, const void *b)
@@ -605,7 +612,6 @@ static void bpf_map_free_deferred(struct work_struct *work)
 
 	security_bpf_map_free(map);
 	kfree(map->off_arr);
-	bpf_map_release_memcg(map);
 	/* implementation dependent freeing, map_free callback also does
 	 * bpf_map_free_kptr_off_tab, if needed.
 	 */
@@ -1154,8 +1160,6 @@ static int map_create(union bpf_attr *attr)
 	if (err)
 		goto free_map_sec;
 
-	bpf_map_save_memcg(map);
-
 	err = bpf_map_new_fd(map, f_flags);
 	if (err < 0) {
 		/* failed to allocate fd.
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index d0c4338..e8f414a 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -52,7 +52,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 				       sizeof(struct sock *),
 				       stab->map.numa_node);
 	if (!stab->sks) {
-		bpf_map_area_free(stab);
+		bpf_map_area_free(stab, &stab->map);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -360,8 +360,8 @@ static void sock_map_free(struct bpf_map *map)
 	/* wait for psock readers accessing its map link */
 	synchronize_rcu();
 
-	bpf_map_area_free(stab->sks);
-	bpf_map_area_free(stab);
+	bpf_map_area_free(stab->sks, NULL);
+	bpf_map_area_free(stab, map);
 }
 
 static void sock_map_release_progs(struct bpf_map *map)
@@ -1106,7 +1106,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 
 	return &htab->map;
 free_htab:
-	bpf_map_area_free(htab);
+	bpf_map_area_free(htab, &htab->map);
 	return ERR_PTR(err);
 }
 
@@ -1158,8 +1158,8 @@ static void sock_hash_free(struct bpf_map *map)
 	/* wait for psock readers accessing its map link */
 	synchronize_rcu();
 
-	bpf_map_area_free(htab->buckets);
-	bpf_map_area_free(htab);
+	bpf_map_area_free(htab->buckets, NULL);
+	bpf_map_area_free(htab, map);
 }
 
 static void *sock_hash_lookup_sys(struct bpf_map *map, void *key)
diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
index acc8e52..5abb87e 100644
--- a/net/xdp/xskmap.c
+++ b/net/xdp/xskmap.c
@@ -90,7 +90,7 @@ static void xsk_map_free(struct bpf_map *map)
 	struct xsk_map *m = container_of(map, struct xsk_map, map);
 
 	synchronize_net();
-	bpf_map_area_free(m);
+	bpf_map_area_free(m, map);
 }
 
 static int xsk_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 09/15] bpf: Use scoped-based charge in bpf_map_area_alloc
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (7 preceding siblings ...)
  2022-08-10 15:18 ` [PATCH bpf-next 08/15] bpf: Save memcg in bpf_map_init_from_attr() Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 10/15] bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free} Yafang Shao
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Currently bpf_map_area_alloc() is used to allocate a container of struct
bpf_map or members in this container.  To distinguish the map creation
and the other case, a new parameter struct bpf_map is added into
bpf_map_area_alloc(). Then for the non-map-creation case, we could get
the memcg from the map instead of using the current memcg.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/bpf.h            |  2 +-
 kernel/bpf/arraymap.c          |  2 +-
 kernel/bpf/bloom_filter.c      |  2 +-
 kernel/bpf/bpf_local_storage.c |  2 +-
 kernel/bpf/bpf_struct_ops.c    |  6 +++---
 kernel/bpf/cpumap.c            |  5 +++--
 kernel/bpf/devmap.c            | 13 ++++++++-----
 kernel/bpf/hashtab.c           |  8 +++++---
 kernel/bpf/local_storage.c     |  2 +-
 kernel/bpf/lpm_trie.c          |  2 +-
 kernel/bpf/offload.c           |  2 +-
 kernel/bpf/queue_stack_maps.c  |  2 +-
 kernel/bpf/reuseport_array.c   |  2 +-
 kernel/bpf/ringbuf.c           | 15 +++++++++------
 kernel/bpf/stackmap.c          |  5 +++--
 kernel/bpf/syscall.c           | 16 ++++++++++++++--
 net/core/sock_map.c            | 10 ++++++----
 net/xdp/xskmap.c               |  2 +-
 18 files changed, 61 insertions(+), 37 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 414f8b7..d7485b7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1635,7 +1635,7 @@ struct bpf_prog *bpf_prog_get_type_dev(u32 ufd, enum bpf_prog_type type,
 struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map);
 void bpf_map_put_with_uref(struct bpf_map *map);
 void bpf_map_put(struct bpf_map *map);
-void *bpf_map_area_alloc(u64 size, int numa_node);
+void *bpf_map_area_alloc(u64 size, int numa_node, struct bpf_map *map);
 void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
 void bpf_map_area_free(void *base, struct bpf_map *map);
 bool bpf_map_write_active(const struct bpf_map *map);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 9ce4d1b..80974c5 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -135,7 +135,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 		array = data + PAGE_ALIGN(sizeof(struct bpf_array))
 			- offsetof(struct bpf_array, value);
 	} else {
-		array = bpf_map_area_alloc(array_size, numa_node);
+		array = bpf_map_area_alloc(array_size, numa_node, NULL);
 	}
 	if (!array)
 		return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
index e59064d..6691f79 100644
--- a/kernel/bpf/bloom_filter.c
+++ b/kernel/bpf/bloom_filter.c
@@ -142,7 +142,7 @@ static struct bpf_map *bloom_map_alloc(union bpf_attr *attr)
 	}
 
 	bitset_bytes = roundup(bitset_bytes, sizeof(unsigned long));
-	bloom = bpf_map_area_alloc(sizeof(*bloom) + bitset_bytes, numa_node);
+	bloom = bpf_map_area_alloc(sizeof(*bloom) + bitset_bytes, numa_node, NULL);
 
 	if (!bloom)
 		return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 77e075b..67ab249 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -610,7 +610,7 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	unsigned int i;
 	u32 nbuckets;
 
-	smap = bpf_map_area_alloc(sizeof(*smap), NUMA_NO_NODE);
+	smap = bpf_map_area_alloc(sizeof(*smap), NUMA_NO_NODE, NULL);
 	if (!smap)
 		return ERR_PTR(-ENOMEM);
 	bpf_map_init_from_attr(&smap->map, attr);
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 9fb8ad1..37ba5c0 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -618,7 +618,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
 		 */
 		(vt->size - sizeof(struct bpf_struct_ops_value));
 
-	st_map = bpf_map_area_alloc(st_map_size, NUMA_NO_NODE);
+	st_map = bpf_map_area_alloc(st_map_size, NUMA_NO_NODE, NULL);
 	if (!st_map)
 		return ERR_PTR(-ENOMEM);
 
@@ -626,10 +626,10 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
 	map = &st_map->map;
 	bpf_map_init_from_attr(map, attr);
 
-	st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE);
+	st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE, map);
 	st_map->links =
 		bpf_map_area_alloc(btf_type_vlen(t) * sizeof(struct bpf_links *),
-				   NUMA_NO_NODE);
+				   NUMA_NO_NODE, map);
 	st_map->image = bpf_jit_alloc_exec(PAGE_SIZE);
 	if (!st_map->uvalue || !st_map->links || !st_map->image) {
 		bpf_struct_ops_map_free(map);
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 7de2ae6..b593157 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -97,7 +97,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	    attr->map_flags & ~BPF_F_NUMA_NODE)
 		return ERR_PTR(-EINVAL);
 
-	cmap = bpf_map_area_alloc(sizeof(*cmap), NUMA_NO_NODE);
+	cmap = bpf_map_area_alloc(sizeof(*cmap), NUMA_NO_NODE, NULL);
 	if (!cmap)
 		return ERR_PTR(-ENOMEM);
 
@@ -112,7 +112,8 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	/* Alloc array for possible remote "destination" CPUs */
 	cmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries *
 					   sizeof(struct bpf_cpu_map_entry *),
-					   cmap->map.numa_node);
+					   cmap->map.numa_node,
+					   &cmap->map);
 	if (!cmap->cpu_map)
 		goto free_cmap;
 
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 3268ce7..807a4cd 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -89,12 +89,13 @@ struct bpf_dtab {
 static LIST_HEAD(dev_map_list);
 
 static struct hlist_head *dev_map_create_hash(unsigned int entries,
-					      int numa_node)
+					      int numa_node,
+					      struct bpf_map *map)
 {
 	int i;
 	struct hlist_head *hash;
 
-	hash = bpf_map_area_alloc((u64) entries * sizeof(*hash), numa_node);
+	hash = bpf_map_area_alloc((u64) entries * sizeof(*hash), numa_node, map);
 	if (hash != NULL)
 		for (i = 0; i < entries; i++)
 			INIT_HLIST_HEAD(&hash[i]);
@@ -136,7 +137,8 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
 
 	if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
 		dtab->dev_index_head = dev_map_create_hash(dtab->n_buckets,
-							   dtab->map.numa_node);
+							   dtab->map.numa_node,
+							   &dtab->map);
 		if (!dtab->dev_index_head)
 			return -ENOMEM;
 
@@ -144,7 +146,8 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
 	} else {
 		dtab->netdev_map = bpf_map_area_alloc((u64) dtab->map.max_entries *
 						      sizeof(struct bpf_dtab_netdev *),
-						      dtab->map.numa_node);
+						      dtab->map.numa_node,
+						      &dtab->map);
 		if (!dtab->netdev_map)
 			return -ENOMEM;
 	}
@@ -160,7 +163,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	if (!capable(CAP_NET_ADMIN))
 		return ERR_PTR(-EPERM);
 
-	dtab = bpf_map_area_alloc(sizeof(*dtab), NUMA_NO_NODE);
+	dtab = bpf_map_area_alloc(sizeof(*dtab), NUMA_NO_NODE, NULL);
 	if (!dtab)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 1b3653d..e9a6d2c4 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -332,7 +332,8 @@ static int prealloc_init(struct bpf_htab *htab)
 		num_entries += num_possible_cpus();
 
 	htab->elems = bpf_map_area_alloc((u64)htab->elem_size * num_entries,
-					 htab->map.numa_node);
+					 htab->map.numa_node,
+					 &htab->map);
 	if (!htab->elems)
 		return -ENOMEM;
 
@@ -495,7 +496,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	struct bpf_htab *htab;
 	int err, i;
 
-	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE);
+	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE, NULL);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
@@ -534,7 +535,8 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	err = -ENOMEM;
 	htab->buckets = bpf_map_area_alloc(htab->n_buckets *
 					   sizeof(struct bucket),
-					   htab->map.numa_node);
+					   htab->map.numa_node,
+					   &htab->map);
 	if (!htab->buckets)
 		goto free_htab;
 
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index c705d66..fcc7ece 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -313,7 +313,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 		/* max_entries is not used and enforced to be 0 */
 		return ERR_PTR(-EINVAL);
 
-	map = bpf_map_area_alloc(sizeof(struct bpf_cgroup_storage_map), numa_node);
+	map = bpf_map_area_alloc(sizeof(struct bpf_cgroup_storage_map), numa_node, NULL);
 	if (!map)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index fd99360..3d329ae 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -558,7 +558,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 	    attr->value_size > LPM_VAL_SIZE_MAX)
 		return ERR_PTR(-EINVAL);
 
-	trie = bpf_map_area_alloc(sizeof(*trie), NUMA_NO_NODE);
+	trie = bpf_map_area_alloc(sizeof(*trie), NUMA_NO_NODE, NULL);
 	if (!trie)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index c9941a9..87c59da 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -372,7 +372,7 @@ struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr)
 	    attr->map_type != BPF_MAP_TYPE_HASH)
 		return ERR_PTR(-EINVAL);
 
-	offmap = bpf_map_area_alloc(sizeof(*offmap), NUMA_NO_NODE);
+	offmap = bpf_map_area_alloc(sizeof(*offmap), NUMA_NO_NODE, NULL);
 	if (!offmap)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index f2ec0c4..bf57e45 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -74,7 +74,7 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 	size = (u64) attr->max_entries + 1;
 	queue_size = sizeof(*qs) + size * attr->value_size;
 
-	qs = bpf_map_area_alloc(queue_size, numa_node);
+	qs = bpf_map_area_alloc(queue_size, numa_node, NULL);
 	if (!qs)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 594cdb0..52c7e77 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -158,7 +158,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 		return ERR_PTR(-EPERM);
 
 	/* allocate all map elements and zero-initialize them */
-	array = bpf_map_area_alloc(struct_size(array, ptrs, attr->max_entries), numa_node);
+	array = bpf_map_area_alloc(struct_size(array, ptrs, attr->max_entries), numa_node, NULL);
 	if (!array)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 74dd8dc..5eb7820 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -59,7 +59,8 @@ struct bpf_ringbuf_hdr {
 	u32 pg_off;
 };
 
-static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
+static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node,
+						  struct bpf_map *map)
 {
 	const gfp_t flags = GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL |
 			    __GFP_NOWARN | __GFP_ZERO;
@@ -89,7 +90,7 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
 	 * user-space implementations significantly.
 	 */
 	array_size = (nr_meta_pages + 2 * nr_data_pages) * sizeof(*pages);
-	pages = bpf_map_area_alloc(array_size, numa_node);
+	pages = bpf_map_area_alloc(array_size, numa_node, map);
 	if (!pages)
 		return NULL;
 
@@ -127,11 +128,12 @@ static void bpf_ringbuf_notify(struct irq_work *work)
 	wake_up_all(&rb->waitq);
 }
 
-static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node)
+static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node,
+					     struct bpf_map *map)
 {
 	struct bpf_ringbuf *rb;
 
-	rb = bpf_ringbuf_area_alloc(data_sz, numa_node);
+	rb = bpf_ringbuf_area_alloc(data_sz, numa_node, map);
 	if (!rb)
 		return NULL;
 
@@ -164,13 +166,14 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-E2BIG);
 #endif
 
-	rb_map = bpf_map_area_alloc(sizeof(*rb_map), NUMA_NO_NODE);
+	rb_map = bpf_map_area_alloc(sizeof(*rb_map), NUMA_NO_NODE, NULL);
 	if (!rb_map)
 		return ERR_PTR(-ENOMEM);
 
 	bpf_map_init_from_attr(&rb_map->map, attr);
 
-	rb_map->rb = bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node);
+	rb_map->rb = bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node,
+				       &rb_map->map);
 	if (!rb_map->rb) {
 		bpf_map_area_free(rb_map, &rb_map->map);
 		return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 042b7d2..9440fab 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -49,7 +49,8 @@ static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
 	int err;
 
 	smap->elems = bpf_map_area_alloc(elem_size * smap->map.max_entries,
-					 smap->map.numa_node);
+					 smap->map.numa_node,
+					 &smap->map);
 	if (!smap->elems)
 		return -ENOMEM;
 
@@ -100,7 +101,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-E2BIG);
 
 	cost = n_buckets * sizeof(struct stack_map_bucket *) + sizeof(*smap);
-	smap = bpf_map_area_alloc(cost, bpf_map_attr_numa_node(attr));
+	smap = bpf_map_area_alloc(cost, bpf_map_attr_numa_node(attr), NULL);
 	if (!smap)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 01d8d4a..02ce7e9 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -362,9 +362,21 @@ static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
 			flags, numa_node, __builtin_return_address(0));
 }
 
-void *bpf_map_area_alloc(u64 size, int numa_node)
+void *bpf_map_area_alloc(u64 size, int numa_node, struct bpf_map *map)
 {
-	return __bpf_map_area_alloc(size, numa_node, false);
+	struct mem_cgroup *memcg, *old_memcg;
+	void *ptr;
+
+	if (!map)
+		return __bpf_map_area_alloc(size, numa_node, false);
+
+	memcg = bpf_map_get_memcg(map);
+	old_memcg = set_active_memcg(memcg);
+	ptr = __bpf_map_area_alloc(size, numa_node, false);
+	set_active_memcg(old_memcg);
+	bpf_map_put_memcg(memcg);
+
+	return ptr;
 }
 
 void *bpf_map_area_mmapable_alloc(u64 size, int numa_node)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index e8f414a..2b3b24e 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -41,7 +41,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 	    attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
 		return ERR_PTR(-EINVAL);
 
-	stab = bpf_map_area_alloc(sizeof(*stab), NUMA_NO_NODE);
+	stab = bpf_map_area_alloc(sizeof(*stab), NUMA_NO_NODE, NULL);
 	if (!stab)
 		return ERR_PTR(-ENOMEM);
 
@@ -50,7 +50,8 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 
 	stab->sks = bpf_map_area_alloc((u64) stab->map.max_entries *
 				       sizeof(struct sock *),
-				       stab->map.numa_node);
+				       stab->map.numa_node,
+				       &stab->map);
 	if (!stab->sks) {
 		bpf_map_area_free(stab, &stab->map);
 		return ERR_PTR(-ENOMEM);
@@ -1076,7 +1077,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 	if (attr->key_size > MAX_BPF_STACK)
 		return ERR_PTR(-E2BIG);
 
-	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE);
+	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE, NULL);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
@@ -1093,7 +1094,8 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 
 	htab->buckets = bpf_map_area_alloc(htab->buckets_num *
 					   sizeof(struct bpf_shtab_bucket),
-					   htab->map.numa_node);
+					   htab->map.numa_node,
+					   &htab->map);
 	if (!htab->buckets) {
 		err = -ENOMEM;
 		goto free_htab;
diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
index 5abb87e..beb11fd 100644
--- a/net/xdp/xskmap.c
+++ b/net/xdp/xskmap.c
@@ -75,7 +75,7 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 	numa_node = bpf_map_attr_numa_node(attr);
 	size = struct_size(m, xsk_map, attr->max_entries);
 
-	m = bpf_map_area_alloc(size, numa_node);
+	m = bpf_map_area_alloc(size, numa_node, NULL);
 	if (!m)
 		return ERR_PTR(-ENOMEM);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 10/15] bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free}
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (8 preceding siblings ...)
  2022-08-10 15:18 ` [PATCH bpf-next 09/15] bpf: Use scoped-based charge in bpf_map_area_alloc Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 11/15] bpf: Use bpf_map_kzalloc in arraymap Yafang Shao
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao, Andrii Nakryiko

Allocate pages related memory into the new helper
bpf_ringbuf_pages_alloc(), then it can be handled as a single unit.

Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/ringbuf.c | 80 ++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 56 insertions(+), 24 deletions(-)

diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 5eb7820..1e7284c 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -59,6 +59,57 @@ struct bpf_ringbuf_hdr {
 	u32 pg_off;
 };
 
+static void bpf_ringbuf_pages_free(struct page **pages, int nr_pages)
+{
+	int i;
+
+	for (i = 0; i < nr_pages; i++)
+		__free_page(pages[i]);
+	bpf_map_area_free(pages, NULL);
+}
+
+static struct page **bpf_ringbuf_pages_alloc(struct bpf_map *map,
+					     int nr_meta_pages,
+					     int nr_data_pages,
+					     int numa_node,
+					     const gfp_t flags)
+{
+	int nr_pages = nr_meta_pages + nr_data_pages;
+	struct mem_cgroup *memcg, *old_memcg;
+	struct page **pages, *page;
+	int array_size;
+	int i;
+
+	memcg = bpf_map_get_memcg(map);
+	old_memcg = set_active_memcg(memcg);
+	array_size = (nr_meta_pages + 2 * nr_data_pages) * sizeof(*pages);
+	pages = bpf_map_area_alloc(array_size, numa_node, NULL);
+	if (!pages)
+		goto err;
+
+	for (i = 0; i < nr_pages; i++) {
+		page = alloc_pages_node(numa_node, flags, 0);
+		if (!page) {
+			nr_pages = i;
+			goto err_free_pages;
+		}
+		pages[i] = page;
+		if (i >= nr_meta_pages)
+			pages[nr_data_pages + i] = page;
+	}
+	set_active_memcg(old_memcg);
+	bpf_map_put_memcg(memcg);
+
+	return pages;
+
+err_free_pages:
+	bpf_ringbuf_pages_free(pages, nr_pages);
+err:
+	set_active_memcg(old_memcg);
+	bpf_map_put_memcg(memcg);
+	return NULL;
+}
+
 static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node,
 						  struct bpf_map *map)
 {
@@ -67,10 +118,8 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node,
 	int nr_meta_pages = RINGBUF_PGOFF + RINGBUF_POS_PAGES;
 	int nr_data_pages = data_sz >> PAGE_SHIFT;
 	int nr_pages = nr_meta_pages + nr_data_pages;
-	struct page **pages, *page;
 	struct bpf_ringbuf *rb;
-	size_t array_size;
-	int i;
+	struct page **pages;
 
 	/* Each data page is mapped twice to allow "virtual"
 	 * continuous read of samples wrapping around the end of ring
@@ -89,22 +138,11 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node,
 	 * when mmap()'ed in user-space, simplifying both kernel and
 	 * user-space implementations significantly.
 	 */
-	array_size = (nr_meta_pages + 2 * nr_data_pages) * sizeof(*pages);
-	pages = bpf_map_area_alloc(array_size, numa_node, map);
+	pages = bpf_ringbuf_pages_alloc(map, nr_meta_pages, nr_data_pages,
+					numa_node, flags);
 	if (!pages)
 		return NULL;
 
-	for (i = 0; i < nr_pages; i++) {
-		page = alloc_pages_node(numa_node, flags, 0);
-		if (!page) {
-			nr_pages = i;
-			goto err_free_pages;
-		}
-		pages[i] = page;
-		if (i >= nr_meta_pages)
-			pages[nr_data_pages + i] = page;
-	}
-
 	rb = vmap(pages, nr_meta_pages + 2 * nr_data_pages,
 		  VM_MAP | VM_USERMAP, PAGE_KERNEL);
 	if (rb) {
@@ -114,10 +152,6 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node,
 		return rb;
 	}
 
-err_free_pages:
-	for (i = 0; i < nr_pages; i++)
-		__free_page(pages[i]);
-	bpf_map_area_free(pages, NULL);
 	return NULL;
 }
 
@@ -188,12 +222,10 @@ static void bpf_ringbuf_free(struct bpf_ringbuf *rb)
 	 * to unmap rb itself with vunmap() below
 	 */
 	struct page **pages = rb->pages;
-	int i, nr_pages = rb->nr_pages;
+	int nr_pages = rb->nr_pages;
 
 	vunmap(rb);
-	for (i = 0; i < nr_pages; i++)
-		__free_page(pages[i]);
-	bpf_map_area_free(pages, NULL);
+	bpf_ringbuf_pages_free(pages, nr_pages);
 }
 
 static void ringbuf_map_free(struct bpf_map *map)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 11/15] bpf: Use bpf_map_kzalloc in arraymap
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (9 preceding siblings ...)
  2022-08-10 15:18 ` [PATCH bpf-next 10/15] bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free} Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 12/15] bpf: Use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Allocates memory after map creation, then we can use the generic helper
bpf_map_kzalloc() instead of the open-coded kzalloc().

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/arraymap.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 80974c5..3039832 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -1090,20 +1090,20 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr)
 	struct bpf_array_aux *aux;
 	struct bpf_map *map;
 
-	aux = kzalloc(sizeof(*aux), GFP_KERNEL_ACCOUNT);
-	if (!aux)
+	map = array_map_alloc(attr);
+	if (IS_ERR(map))
 		return ERR_PTR(-ENOMEM);
 
+	aux = bpf_map_kzalloc(map, sizeof(*aux), GFP_KERNEL);
+	if (!aux) {
+		array_map_free(map);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	INIT_WORK(&aux->work, prog_array_map_clear_deferred);
 	INIT_LIST_HEAD(&aux->poke_progs);
 	mutex_init(&aux->poke_mutex);
 
-	map = array_map_alloc(attr);
-	if (IS_ERR(map)) {
-		kfree(aux);
-		return map;
-	}
-
 	container_of(map, struct bpf_array, map)->aux = aux;
 	aux->map = map;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 12/15] bpf: Use bpf_map_kvcalloc in bpf_local_storage
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (10 preceding siblings ...)
  2022-08-10 15:18 ` [PATCH bpf-next 11/15] bpf: Use bpf_map_kzalloc in arraymap Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup Yafang Shao
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Introduce new helper bpf_map_kvcalloc() for this memory allocation.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/bpf.h            |  8 ++++++++
 kernel/bpf/bpf_local_storage.c |  4 ++--
 kernel/bpf/syscall.c           | 15 +++++++++++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d7485b7..ebb6ed4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1656,6 +1656,8 @@ int  generic_map_delete_batch(struct bpf_map *map,
 void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
 			   int node);
 void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags);
+void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size,
+		       gfp_t flags);
 void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 				    size_t align, gfp_t flags);
 #else
@@ -1672,6 +1674,12 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 	return kzalloc(size, flags);
 }
 
+static inline void *
+bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size, gfp_t flags)
+{
+	return kvcalloc(n, size, flags);
+}
+
 static inline void __percpu *
 bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, size_t align,
 		     gfp_t flags)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 67ab249..71d5bf1 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -620,8 +620,8 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	nbuckets = max_t(u32, 2, nbuckets);
 	smap->bucket_log = ilog2(nbuckets);
 
-	smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
-				 GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	smap->buckets = bpf_map_kvcalloc(&smap->map, sizeof(*smap->buckets),
+					 nbuckets, GFP_USER | __GFP_NOWARN);
 	if (!smap->buckets) {
 		bpf_map_area_free(smap, &smap->map);
 		return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 02ce7e9..2bd3bcf 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -489,6 +489,21 @@ void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags)
 	return ptr;
 }
 
+void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size,
+		       gfp_t flags)
+{
+	struct mem_cgroup *memcg, *old_memcg;
+	void *ptr;
+
+	memcg = bpf_map_get_memcg(map);
+	old_memcg = set_active_memcg(memcg);
+	ptr = kvcalloc(n, size, flags | __GFP_ACCOUNT);
+	set_active_memcg(old_memcg);
+	bpf_map_put_memcg(memcg);
+
+	return ptr;
+}
+
 void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 				    size_t align, gfp_t flags)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (11 preceding siblings ...)
  2022-08-10 15:18 ` [PATCH bpf-next 12/15] bpf: Use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-11 16:16   ` Roman Gushchin
  2022-08-12 16:57   ` Shakeel Butt
  2022-08-10 15:18 ` [PATCH bpf-next 14/15] bpf: Add return value for bpf_map_init_from_attr Yafang Shao
                   ` (2 subsequent siblings)
  15 siblings, 2 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Introduce new helper get_obj_cgroup_from_cgroup() to get obj_cgroup from
a specific cgroup.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/memcontrol.h |  1 +
 mm/memcontrol.c            | 41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 2f0a611..901a921 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1713,6 +1713,7 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
 int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
 void __memcg_kmem_uncharge_page(struct page *page, int order);
 
+struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp);
 struct obj_cgroup *get_obj_cgroup_from_current(void);
 struct obj_cgroup *get_obj_cgroup_from_page(struct page *page);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 618c366..762cffa 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2908,6 +2908,47 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
 	return objcg;
 }
 
+static struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
+{
+	struct obj_cgroup *objcg;
+
+	if (memcg_kmem_bypass())
+		return NULL;
+
+	rcu_read_lock();
+	objcg = __get_obj_cgroup_from_memcg(memcg);
+	rcu_read_unlock();
+	return objcg;
+}
+
+struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp)
+{
+	struct cgroup_subsys_state *css;
+	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
+
+	rcu_read_lock();
+	css = rcu_dereference(cgrp->subsys[memory_cgrp_id]);
+	if (!css || !css_tryget_online(css)) {
+		rcu_read_unlock();
+		cgroup_put(cgrp);
+		return ERR_PTR(-EINVAL);
+	}
+	rcu_read_unlock();
+	cgroup_put(cgrp);
+
+	memcg = mem_cgroup_from_css(css);
+	if (!memcg) {
+		css_put(css);
+		return ERR_PTR(-EINVAL);
+	}
+
+	objcg = get_obj_cgroup_from_memcg(memcg);
+	css_put(css);
+
+	return objcg;
+}
+
 __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
 {
 	struct obj_cgroup *objcg = NULL;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 14/15] bpf: Add return value for bpf_map_init_from_attr
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (12 preceding siblings ...)
  2022-08-10 15:18 ` [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-10 15:18 ` [PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
  2022-08-10 19:00 ` [PATCH bpf-next 00/15] " patchwork-bot+netdevbpf
  15 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Add return value for bpf_map_init_from_attr() to indicate whether it
init successfully. This is a preparation of the followup patch.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/bpf.h            | 2 +-
 kernel/bpf/arraymap.c          | 8 +++++++-
 kernel/bpf/bloom_filter.c      | 7 ++++++-
 kernel/bpf/bpf_local_storage.c | 7 ++++++-
 kernel/bpf/bpf_struct_ops.c    | 7 ++++++-
 kernel/bpf/cpumap.c            | 6 +++++-
 kernel/bpf/devmap.c            | 6 +++++-
 kernel/bpf/hashtab.c           | 6 +++++-
 kernel/bpf/local_storage.c     | 7 ++++++-
 kernel/bpf/lpm_trie.c          | 8 +++++++-
 kernel/bpf/offload.c           | 6 +++++-
 kernel/bpf/queue_stack_maps.c  | 7 ++++++-
 kernel/bpf/reuseport_array.c   | 7 ++++++-
 kernel/bpf/ringbuf.c           | 7 ++++++-
 kernel/bpf/syscall.c           | 4 +++-
 net/core/sock_map.c            | 8 +++++++-
 net/xdp/xskmap.c               | 8 +++++++-
 17 files changed, 94 insertions(+), 17 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ebb6ed4..a60a68d 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1639,7 +1639,7 @@ struct bpf_prog *bpf_prog_get_type_dev(u32 ufd, enum bpf_prog_type type,
 void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
 void bpf_map_area_free(void *base, struct bpf_map *map);
 bool bpf_map_write_active(const struct bpf_map *map);
-void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
+int bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
 int  generic_map_lookup_batch(struct bpf_map *map,
 			      const union bpf_attr *attr,
 			      union bpf_attr __user *uattr);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 3039832..28d0310 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -85,6 +85,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	bool bypass_spec_v1 = bpf_bypass_spec_v1();
 	u64 array_size, mask64;
 	struct bpf_array *array;
+	int err;
 
 	elem_size = round_up(attr->value_size, 8);
 
@@ -143,7 +144,12 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	array->map.bypass_spec_v1 = bypass_spec_v1;
 
 	/* copy mandatory map attributes */
-	bpf_map_init_from_attr(&array->map, attr);
+	err = bpf_map_init_from_attr(&array->map, attr);
+	if (err) {
+		bpf_map_area_free(array, NULL);
+		return ERR_PTR(err);
+	}
+
 	array->elem_size = elem_size;
 
 	if (percpu && bpf_array_alloc_percpu(array)) {
diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
index 6691f79..be64227 100644
--- a/kernel/bpf/bloom_filter.c
+++ b/kernel/bpf/bloom_filter.c
@@ -93,6 +93,7 @@ static struct bpf_map *bloom_map_alloc(union bpf_attr *attr)
 	u32 bitset_bytes, bitset_mask, nr_hash_funcs, nr_bits;
 	int numa_node = bpf_map_attr_numa_node(attr);
 	struct bpf_bloom_filter *bloom;
+	int err;
 
 	if (!bpf_capable())
 		return ERR_PTR(-EPERM);
@@ -147,7 +148,11 @@ static struct bpf_map *bloom_map_alloc(union bpf_attr *attr)
 	if (!bloom)
 		return ERR_PTR(-ENOMEM);
 
-	bpf_map_init_from_attr(&bloom->map, attr);
+	err = bpf_map_init_from_attr(&bloom->map, attr);
+	if (err) {
+		bpf_map_area_free(bloom, NULL);
+		return ERR_PTR(err);
+	}
 
 	bloom->nr_hash_funcs = nr_hash_funcs;
 	bloom->bitset_mask = bitset_mask;
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 71d5bf1..b82d124 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -609,11 +609,16 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	struct bpf_local_storage_map *smap;
 	unsigned int i;
 	u32 nbuckets;
+	int err;
 
 	smap = bpf_map_area_alloc(sizeof(*smap), NUMA_NO_NODE, NULL);
 	if (!smap)
 		return ERR_PTR(-ENOMEM);
-	bpf_map_init_from_attr(&smap->map, attr);
+	err = bpf_map_init_from_attr(&smap->map, attr);
+	if (err) {
+		bpf_map_area_free(&smap->map, NULL);
+		return ERR_PTR(err);
+	}
 
 	nbuckets = roundup_pow_of_two(num_possible_cpus());
 	/* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 37ba5c0..7cfbabc 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -598,6 +598,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
 	struct bpf_struct_ops_map *st_map;
 	const struct btf_type *t, *vt;
 	struct bpf_map *map;
+	int err;
 
 	if (!bpf_capable())
 		return ERR_PTR(-EPERM);
@@ -624,7 +625,11 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
 
 	st_map->st_ops = st_ops;
 	map = &st_map->map;
-	bpf_map_init_from_attr(map, attr);
+	err = bpf_map_init_from_attr(map, attr);
+	if (err) {
+		bpf_map_area_free(st_map, NULL);
+		return ERR_PTR(err);
+	}
 
 	st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE, map);
 	st_map->links =
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index b593157..e672e62 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -101,7 +101,11 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	if (!cmap)
 		return ERR_PTR(-ENOMEM);
 
-	bpf_map_init_from_attr(&cmap->map, attr);
+	err = bpf_map_init_from_attr(&cmap->map, attr);
+	if (err) {
+		bpf_map_area_free(cmap, NULL);
+		return ERR_PTR(err);
+	}
 
 	/* Pre-limit array size based on NR_CPUS, not final CPU check */
 	if (cmap->map.max_entries > NR_CPUS) {
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 807a4cd..10f038d 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -167,7 +167,11 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	if (!dtab)
 		return ERR_PTR(-ENOMEM);
 
-	bpf_map_init_from_attr(&dtab->map, attr);
+	err = bpf_map_init_from_attr(&dtab->map, attr);
+	if (err) {
+		bpf_map_area_free(dtab, NULL);
+		return ERR_PTR(err);
+	}
 
 	err = dev_map_init_map(dtab, attr);
 	if (err) {
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index e9a6d2c4..f059ae0 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -500,7 +500,11 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
-	bpf_map_init_from_attr(&htab->map, attr);
+	err = bpf_map_init_from_attr(&htab->map, attr);
+	if (err) {
+		bpf_map_area_free(htab, NULL);
+		return ERR_PTR(err);
+	}
 
 	lockdep_register_key(&htab->lockdep_key);
 
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index fcc7ece..1901195 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -287,6 +287,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 	__u32 max_value_size = BPF_LOCAL_STORAGE_MAX_VALUE_SIZE;
 	int numa_node = bpf_map_attr_numa_node(attr);
 	struct bpf_cgroup_storage_map *map;
+	int err;
 
 	/* percpu is bound by PCPU_MIN_UNIT_SIZE, non-percu
 	 * is the same as other local storages.
@@ -318,7 +319,11 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-ENOMEM);
 
 	/* copy mandatory map attributes */
-	bpf_map_init_from_attr(&map->map, attr);
+	err = bpf_map_init_from_attr(&map->map, attr);
+	if (err) {
+		bpf_map_area_free(map, NULL);
+		return ERR_PTR(err);
+	}
 
 	spin_lock_init(&map->lock);
 	map->root = RB_ROOT;
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 3d329ae..38d7b00 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -543,6 +543,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
 static struct bpf_map *trie_alloc(union bpf_attr *attr)
 {
 	struct lpm_trie *trie;
+	int err;
 
 	if (!bpf_capable())
 		return ERR_PTR(-EPERM);
@@ -563,7 +564,12 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 		return ERR_PTR(-ENOMEM);
 
 	/* copy mandatory map attributes */
-	bpf_map_init_from_attr(&trie->map, attr);
+	err = bpf_map_init_from_attr(&trie->map, attr);
+	if (err) {
+		bpf_map_area_free(trie, NULL);
+		return ERR_PTR(err);
+	}
+
 	trie->data_size = attr->key_size -
 			  offsetof(struct bpf_lpm_trie_key, data);
 	trie->max_prefixlen = trie->data_size * 8;
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 87c59da..dba7ed2 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -376,7 +376,11 @@ struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr)
 	if (!offmap)
 		return ERR_PTR(-ENOMEM);
 
-	bpf_map_init_from_attr(&offmap->map, attr);
+	err = bpf_map_init_from_attr(&offmap->map, attr);
+	if (err) {
+		bpf_map_area_free(offmap, NULL);
+		return ERR_PTR(err);
+	}
 
 	rtnl_lock();
 	down_write(&bpf_devs_lock);
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index bf57e45..f231897 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -70,6 +70,7 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 	int numa_node = bpf_map_attr_numa_node(attr);
 	struct bpf_queue_stack *qs;
 	u64 size, queue_size;
+	int err;
 
 	size = (u64) attr->max_entries + 1;
 	queue_size = sizeof(*qs) + size * attr->value_size;
@@ -78,7 +79,11 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 	if (!qs)
 		return ERR_PTR(-ENOMEM);
 
-	bpf_map_init_from_attr(&qs->map, attr);
+	err = bpf_map_init_from_attr(&qs->map, attr);
+	if (err) {
+		bpf_map_area_free(qs, NULL);
+		return ERR_PTR(err);
+	}
 
 	qs->size = size;
 
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 52c7e77..4da969b 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -153,6 +153,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 {
 	int numa_node = bpf_map_attr_numa_node(attr);
 	struct reuseport_array *array;
+	int err;
 
 	if (!bpf_capable())
 		return ERR_PTR(-EPERM);
@@ -163,7 +164,11 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 		return ERR_PTR(-ENOMEM);
 
 	/* copy mandatory map attributes */
-	bpf_map_init_from_attr(&array->map, attr);
+	err = bpf_map_init_from_attr(&array->map, attr);
+	if (err) {
+		bpf_map_area_free(array, NULL);
+		return ERR_PTR(err);
+	}
 
 	return &array->map;
 }
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 1e7284c..3edefd3 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -185,6 +185,7 @@ static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node,
 static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
 {
 	struct bpf_ringbuf_map *rb_map;
+	int err;
 
 	if (attr->map_flags & ~RINGBUF_CREATE_FLAG_MASK)
 		return ERR_PTR(-EINVAL);
@@ -204,7 +205,11 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
 	if (!rb_map)
 		return ERR_PTR(-ENOMEM);
 
-	bpf_map_init_from_attr(&rb_map->map, attr);
+	err = bpf_map_init_from_attr(&rb_map->map, attr);
+	if (err) {
+		bpf_map_area_free(rb_map, NULL);
+		return ERR_PTR(err);
+	}
 
 	rb_map->rb = bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node,
 				       &rb_map->map);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 2bd3bcf..5f5cade4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -403,7 +403,7 @@ static u32 bpf_map_flags_retain_permanent(u32 flags)
 	return flags & ~(BPF_F_RDONLY | BPF_F_WRONLY);
 }
 
-void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr)
+int bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr)
 {
 	bpf_map_save_memcg(map);
 	map->map_type = attr->map_type;
@@ -413,6 +413,8 @@ void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr)
 	map->map_flags = bpf_map_flags_retain_permanent(attr->map_flags);
 	map->numa_node = bpf_map_attr_numa_node(attr);
 	map->map_extra = attr->map_extra;
+
+	return 0;
 }
 
 static int bpf_map_alloc_id(struct bpf_map *map)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 2b3b24e..766a260 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -31,6 +31,7 @@ static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
 static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 {
 	struct bpf_stab *stab;
+	int err;
 
 	if (!capable(CAP_NET_ADMIN))
 		return ERR_PTR(-EPERM);
@@ -45,7 +46,12 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 	if (!stab)
 		return ERR_PTR(-ENOMEM);
 
-	bpf_map_init_from_attr(&stab->map, attr);
+	err = bpf_map_init_from_attr(&stab->map, attr);
+	if (err) {
+		bpf_map_area_free(stab, NULL);
+		return ERR_PTR(err);
+	}
+
 	raw_spin_lock_init(&stab->lock);
 
 	stab->sks = bpf_map_area_alloc((u64) stab->map.max_entries *
diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
index beb11fd..8e7a5a6 100644
--- a/net/xdp/xskmap.c
+++ b/net/xdp/xskmap.c
@@ -63,6 +63,7 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 	struct xsk_map *m;
 	int numa_node;
 	u64 size;
+	int err;
 
 	if (!capable(CAP_NET_ADMIN))
 		return ERR_PTR(-EPERM);
@@ -79,7 +80,12 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 	if (!m)
 		return ERR_PTR(-ENOMEM);
 
-	bpf_map_init_from_attr(&m->map, attr);
+	err = bpf_map_init_from_attr(&m->map, attr);
+	if (err) {
+		bpf_map_area_free(m, NULL);
+		return ERR_PTR(err);
+	}
+
 	spin_lock_init(&m->lock);
 
 	return &m->map;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (13 preceding siblings ...)
  2022-08-10 15:18 ` [PATCH bpf-next 14/15] bpf: Add return value for bpf_map_init_from_attr Yafang Shao
@ 2022-08-10 15:18 ` Yafang Shao
  2022-08-10 19:00 ` [PATCH bpf-next 00/15] " patchwork-bot+netdevbpf
  15 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

A new bpf attr memcg_fd is introduced for map creation. The map creation
path in libbpf is changed consequently to set this attr.

A new member memcg_fd is introduced into bpf attr of BPF_MAP_CREATE
command, which is the fd of an opened cgroup directory. In this cgroup,
the memory subsystem must be enabled. The valid memcg_fd must be a postive
number, that means it can't be zero(a valid return value of open(2)). Once
the kernel get the memory cgroup from this fd, it will set this memcg into
bpf map, then all the subsequent memory allocation of this map will be
charged to the memcg. The map creation paths in libbpf are also changed
consequently.

Currently it is only supported for cgroup2 directory.

The usage of this new member as follows,
	struct bpf_map_create_opts map_opts = {
		.sz = sizeof(map_opts),
	};
	int memcg_fd, map_fd, old_fd;
	int key, value;

	memcg_fd = open("/cgroup2", O_DIRECTORY);
	if (memcg_fd < 0) {
		perror("memcg dir open");
		return -1;
	}

	/* 0 is a invalid fd */
	if (memcg_fd == 0) {
		old_fd = memcg_fd;
		memcg_fd = fcntl(memcg_fd, F_DUPFD_CLOEXEC, 3);
		close(old_fd);
		if (memcg_fd < 0) {
			perror("fcntl");
			return -1;
		}
	}

	map_opts.memcg_fd = memcg_fd;
	map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, "map_for_memcg",
				sizeof(key), sizeof(value),
				1024, &map_opts);
	if (map_fd <= 0) {
		close(memcg_fd);
		perror("map create");
		return -1;
	}

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/uapi/linux/bpf.h       |  1 +
 kernel/bpf/syscall.c           | 42 ++++++++++++++++++++++++++++++++----------
 tools/include/uapi/linux/bpf.h |  1 +
 tools/lib/bpf/bpf.c            |  3 ++-
 tools/lib/bpf/bpf.h            |  3 ++-
 tools/lib/bpf/gen_loader.c     |  2 +-
 tools/lib/bpf/libbpf.c         |  2 ++
 tools/lib/bpf/skel_internal.h  |  2 +-
 8 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 534e33f..d46acbb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1300,6 +1300,7 @@ struct bpf_stack_build_id {
 		 * to using 5 hash functions).
 		 */
 		__u64	map_extra;
+		__u32	memcg_fd;	/* selectable memcg */
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5f5cade4..91a6e15 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -294,14 +294,30 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
 }
 
 #ifdef CONFIG_MEMCG_KMEM
-static void bpf_map_save_memcg(struct bpf_map *map)
+static int bpf_map_save_memcg(struct bpf_map *map, u32 memcg_fd)
 {
-	/* Currently if a map is created by a process belonging to the root
-	 * memory cgroup, get_obj_cgroup_from_current() will return NULL.
-	 * So we have to check map->objcg for being NULL each time it's
-	 * being used.
-	 */
-	map->objcg = get_obj_cgroup_from_current();
+	struct obj_cgroup *objcg;
+	struct cgroup *cgrp;
+
+	if (memcg_fd) {
+		cgrp = cgroup_get_from_fd(memcg_fd);
+		if (IS_ERR(cgrp))
+			return -EINVAL;
+
+		objcg = get_obj_cgroup_from_cgroup(cgrp);
+		if (IS_ERR(objcg))
+			return PTR_ERR(objcg);
+	} else {
+		/* Currently if a map is created by a process belonging to the root
+		 * memory cgroup, get_obj_cgroup_from_current() will return NULL.
+		 * So we have to check map->objcg for being NULL each time it's
+		 * being used.
+		 */
+		objcg = get_obj_cgroup_from_current();
+	}
+
+	map->objcg = objcg;
+	return 0;
 }
 
 static void bpf_map_release_memcg(struct bpf_map *map)
@@ -311,8 +327,9 @@ static void bpf_map_release_memcg(struct bpf_map *map)
 }
 
 #else
-static void bpf_map_save_memcg(struct bpf_map *map)
+static int bpf_map_save_memcg(struct bpf_map *map, u32 memcg_fd)
 {
+	return 0;
 }
 
 static void bpf_map_release_memcg(struct bpf_map *map)
@@ -405,7 +422,12 @@ static u32 bpf_map_flags_retain_permanent(u32 flags)
 
 int bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr)
 {
-	bpf_map_save_memcg(map);
+	int err;
+
+	err = bpf_map_save_memcg(map, attr->memcg_fd);
+	if (err)
+		return err;
+
 	map->map_type = attr->map_type;
 	map->key_size = attr->key_size;
 	map->value_size = attr->value_size;
@@ -1091,7 +1113,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	return ret;
 }
 
-#define BPF_MAP_CREATE_LAST_FIELD map_extra
+#define BPF_MAP_CREATE_LAST_FIELD memcg_fd
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
 {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f58d58e..12203406 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1300,6 +1300,7 @@ struct bpf_stack_build_id {
 		 * to using 5 hash functions).
 		 */
 		__u64	map_extra;
+		__u32	memcg_fd;	/* selectable memcg */
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index efcc06d..9c613b2 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -171,7 +171,7 @@ int bpf_map_create(enum bpf_map_type map_type,
 		   __u32 max_entries,
 		   const struct bpf_map_create_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, map_extra);
+	const size_t attr_sz = offsetofend(union bpf_attr, memcg_fd);
 	union bpf_attr attr;
 	int fd;
 
@@ -199,6 +199,7 @@ int bpf_map_create(enum bpf_map_type map_type,
 	attr.map_extra = OPTS_GET(opts, map_extra, 0);
 	attr.numa_node = OPTS_GET(opts, numa_node, 0);
 	attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0);
+	attr.memcg_fd = OPTS_GET(opts, memcg_fd, 0);
 
 	fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
 	return libbpf_err_errno(fd);
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 9c50bea..dd0d929 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -51,8 +51,9 @@ struct bpf_map_create_opts {
 
 	__u32 numa_node;
 	__u32 map_ifindex;
+	__u32 memcg_fd;
 };
-#define bpf_map_create_opts__last_field map_ifindex
+#define bpf_map_create_opts__last_field memcg_fd
 
 LIBBPF_API int bpf_map_create(enum bpf_map_type map_type,
 			      const char *map_name,
diff --git a/tools/lib/bpf/gen_loader.c b/tools/lib/bpf/gen_loader.c
index 23f5c46..f35b014 100644
--- a/tools/lib/bpf/gen_loader.c
+++ b/tools/lib/bpf/gen_loader.c
@@ -451,7 +451,7 @@ void bpf_gen__map_create(struct bpf_gen *gen,
 			 __u32 key_size, __u32 value_size, __u32 max_entries,
 			 struct bpf_map_create_opts *map_attr, int map_idx)
 {
-	int attr_size = offsetofend(union bpf_attr, map_extra);
+	int attr_size = offsetofend(union bpf_attr, memcg_fd);
 	bool close_inner_map_fd = false;
 	int map_create_attr, idx;
 	union bpf_attr attr;
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index f7364ea..88c65e5 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -506,6 +506,7 @@ struct bpf_map {
 	bool reused;
 	bool autocreate;
 	__u64 map_extra;
+	__u32 memcg_fd;
 };
 
 enum extern_type {
@@ -4931,6 +4932,7 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
 	create_attr.map_flags = def->map_flags;
 	create_attr.numa_node = map->numa_node;
 	create_attr.map_extra = map->map_extra;
+	create_attr.memcg_fd = map->memcg_fd;
 
 	if (bpf_map__is_struct_ops(map))
 		create_attr.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id;
diff --git a/tools/lib/bpf/skel_internal.h b/tools/lib/bpf/skel_internal.h
index bd6f450..83403cc 100644
--- a/tools/lib/bpf/skel_internal.h
+++ b/tools/lib/bpf/skel_internal.h
@@ -222,7 +222,7 @@ static inline int skel_map_create(enum bpf_map_type map_type,
 				  __u32 value_size,
 				  __u32 max_entries)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, map_extra);
+	const size_t attr_sz = offsetofend(union bpf_attr, memcg_fd);
 	union bpf_attr attr;
 
 	memset(&attr, 0, attr_sz);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put
  2022-08-10 15:18 ` [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put Yafang Shao
@ 2022-08-10 17:07   ` Shakeel Butt
  2022-08-11  2:49     ` Yafang Shao
  2022-08-11 16:48   ` Roman Gushchin
  1 sibling, 1 reply; 33+ messages in thread
From: Shakeel Butt @ 2022-08-10 17:07 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	songmuchun, akpm, netdev, bpf, linux-mm

On Wed, Aug 10, 2022 at 03:18:30PM +0000, Yafang Shao wrote:
> The memcg may be the root_mem_cgroup, in which case we shouldn't put it.

No, it is ok to put root_mem_cgroup. css_put already handles the root
cgroups.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map
  2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (14 preceding siblings ...)
  2022-08-10 15:18 ` [PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
@ 2022-08-10 19:00 ` patchwork-bot+netdevbpf
  15 siblings, 0 replies; 33+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-08-10 19:00 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm, netdev, bpf, linux-mm

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Wed, 10 Aug 2022 15:18:25 +0000 you wrote:
> On our production environment, we may load, run and pin bpf programs and
> maps in containers. For example, some of our networking bpf programs and
> maps are loaded and pinned by a process running in a container on our
> k8s environment. In this container, there're also running some other
> user applications which watch the networking configurations from remote
> servers and update them on this local host, log the error events, monitor
> the traffic, and do some other stuffs. Sometimes we may need to update
> these user applications to a new release, and in this update process we
> will destroy the old container and then start a new genration. In order not
> to interrupt the bpf programs in the update process, we will pin the bpf
> programs and maps in bpffs. That is the background and use case on our
> production environment.
> 
> [...]

Here is the summary with links:
  - [bpf-next,01/15] bpf: Remove unneeded memset in queue_stack_map creation
    https://git.kernel.org/bpf/bpf-next/c/083818156d1e
  - [bpf-next,02/15] bpf: Use bpf_map_area_free instread of kvfree
    https://git.kernel.org/bpf/bpf-next/c/8f58ee54c2ea
  - [bpf-next,03/15] bpf: Make __GFP_NOWARN consistent in bpf map creation
    https://git.kernel.org/bpf/bpf-next/c/992c9e13f593
  - [bpf-next,04/15] bpf: Use bpf_map_area_alloc consistently on bpf map creation
    https://git.kernel.org/bpf/bpf-next/c/73cf09a36bf7
  - [bpf-next,05/15] bpf: Fix incorrect mem_cgroup_put
    (no matching commit)
  - [bpf-next,06/15] bpf: Define bpf_map_{get,put}_memcg for !CONFIG_MEMCG_KMEM
    (no matching commit)
  - [bpf-next,07/15] bpf: Call bpf_map_init_from_attr() immediately after map creation
    (no matching commit)
  - [bpf-next,08/15] bpf: Save memcg in bpf_map_init_from_attr()
    (no matching commit)
  - [bpf-next,09/15] bpf: Use scoped-based charge in bpf_map_area_alloc
    (no matching commit)
  - [bpf-next,10/15] bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free}
    (no matching commit)
  - [bpf-next,11/15] bpf: Use bpf_map_kzalloc in arraymap
    (no matching commit)
  - [bpf-next,12/15] bpf: Use bpf_map_kvcalloc in bpf_local_storage
    (no matching commit)
  - [bpf-next,13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup
    (no matching commit)
  - [bpf-next,14/15] bpf: Add return value for bpf_map_init_from_attr
    (no matching commit)
  - [bpf-next,15/15] bpf: Introduce selectable memcg for bpf map
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put
  2022-08-10 17:07   ` Shakeel Butt
@ 2022-08-11  2:49     ` Yafang Shao
  2022-08-11 15:47       ` Shakeel Butt
  0 siblings, 1 reply; 33+ messages in thread
From: Yafang Shao @ 2022-08-11  2:49 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton, netdev,
	bpf, Linux MM

On Thu, Aug 11, 2022 at 1:07 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Wed, Aug 10, 2022 at 03:18:30PM +0000, Yafang Shao wrote:
> > The memcg may be the root_mem_cgroup, in which case we shouldn't put it.
>
> No, it is ok to put root_mem_cgroup. css_put already handles the root
> cgroups.
>

Ah, this commit log doesn't describe the issue clearly. I should improve it.
The issue is that in bpf_map_get_memcg() it doesn't get the objcg if
map->objcg is NULL (that can happen if the map belongs to the root
memcg), so we shouldn't put the objcg if map->objcg is NULL neither in
bpf_map_put_memcg().
Maybe the change below would be more reasonable ?

+static void bpf_map_put_memcg(const struct bpf_map *map, struct
mem_cgroup *memcg)
+{
+       if (map->objcg)
+           mem_cgroup_put(memcg);
+}

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put
  2022-08-11  2:49     ` Yafang Shao
@ 2022-08-11 15:47       ` Shakeel Butt
  2022-08-12  0:27         ` Yafang Shao
  0 siblings, 1 reply; 33+ messages in thread
From: Shakeel Butt @ 2022-08-11 15:47 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton, netdev,
	bpf, Linux MM

On Thu, Aug 11, 2022 at 10:49:13AM +0800, Yafang Shao wrote:
> On Thu, Aug 11, 2022 at 1:07 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Wed, Aug 10, 2022 at 03:18:30PM +0000, Yafang Shao wrote:
> > > The memcg may be the root_mem_cgroup, in which case we shouldn't put it.
> >
> > No, it is ok to put root_mem_cgroup. css_put already handles the root
> > cgroups.
> >
> 
> Ah, this commit log doesn't describe the issue clearly. I should improve it.
> The issue is that in bpf_map_get_memcg() it doesn't get the objcg if
> map->objcg is NULL (that can happen if the map belongs to the root
> memcg), so we shouldn't put the objcg if map->objcg is NULL neither in
> bpf_map_put_memcg().

Sorry I am still not understanding. We are not 'getting' objcg in
bpf_map_get_memcg(). We are 'getting' memcg from map->objcg and if that
is NULL the function is returning root memcg and putting root memcg is
totally fine. Or are you saying that root_mem_cgroup is NULL?

> Maybe the change below would be more reasonable ?
> 
> +static void bpf_map_put_memcg(const struct bpf_map *map, struct
> mem_cgroup *memcg)
> +{
> +       if (map->objcg)
> +           mem_cgroup_put(memcg);
> +}
> 
> -- 
> Regards
> Yafang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  2022-08-10 15:18 ` [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup Yafang Shao
@ 2022-08-11 16:16   ` Roman Gushchin
  2022-08-12  0:35     ` Yafang Shao
  2022-08-12 16:57   ` Shakeel Butt
  1 sibling, 1 reply; 33+ messages in thread
From: Roman Gushchin @ 2022-08-11 16:16 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, shakeelb,
	songmuchun, akpm, netdev, bpf, linux-mm

On Wed, Aug 10, 2022 at 03:18:38PM +0000, Yafang Shao wrote:
> Introduce new helper get_obj_cgroup_from_cgroup() to get obj_cgroup from
> a specific cgroup.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  include/linux/memcontrol.h |  1 +
>  mm/memcontrol.c            | 41 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 2f0a611..901a921 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1713,6 +1713,7 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
>  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
>  void __memcg_kmem_uncharge_page(struct page *page, int order);
>  
> +struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp);
>  struct obj_cgroup *get_obj_cgroup_from_current(void);
>  struct obj_cgroup *get_obj_cgroup_from_page(struct page *page);
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 618c366..762cffa 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2908,6 +2908,47 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
>  	return objcg;
>  }
>  
> +static struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> +{
> +	struct obj_cgroup *objcg;
> +
> +	if (memcg_kmem_bypass())
> +		return NULL;
> +
> +	rcu_read_lock();
> +	objcg = __get_obj_cgroup_from_memcg(memcg);
> +	rcu_read_unlock();
> +	return objcg;

This code doesn't make sense to me. What does rcu read lock protect here?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put
  2022-08-10 15:18 ` [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put Yafang Shao
  2022-08-10 17:07   ` Shakeel Butt
@ 2022-08-11 16:48   ` Roman Gushchin
  2022-08-12  0:31     ` Yafang Shao
  1 sibling, 1 reply; 33+ messages in thread
From: Roman Gushchin @ 2022-08-11 16:48 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, shakeelb,
	songmuchun, akpm, netdev, bpf, linux-mm

On Wed, Aug 10, 2022 at 03:18:30PM +0000, Yafang Shao wrote:
> The memcg may be the root_mem_cgroup, in which case we shouldn't put it.
> So a new helper bpf_map_put_memcg() is introduced to pair with
> bpf_map_get_memcg().
> 
> Fixes: 4201d9ab3e42 ("bpf: reparent bpf maps on memcg offlining")
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Shakeel Butt <shakeelb@google.com>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  kernel/bpf/syscall.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 83c7136..51ab8b1 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -441,6 +441,14 @@ static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
>  	return root_mem_cgroup;
>  }
>  
> +static void bpf_map_put_memcg(struct mem_cgroup *memcg)
> +{
> +	if (mem_cgroup_is_root(memcg))
> +		return;
> +
> +	mem_cgroup_put(memcg);
> +}

+1 to what Shakeel said. mem_cgroup_put(root_mem_cgroup) is totally valid.
So this change does absolutely nothing.

The fixes tag assumes there is a bug in the existing code. If so, please,
describe the problem and how to reproduce it.

Also, if it's not related to the rest of the patchset, please, send it
separately.

Thanks!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put
  2022-08-11 15:47       ` Shakeel Butt
@ 2022-08-12  0:27         ` Yafang Shao
  2022-08-12  5:33           ` Muchun Song
  0 siblings, 1 reply; 33+ messages in thread
From: Yafang Shao @ 2022-08-12  0:27 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton, netdev,
	bpf, Linux MM

On Thu, Aug 11, 2022 at 11:47 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Thu, Aug 11, 2022 at 10:49:13AM +0800, Yafang Shao wrote:
> > On Thu, Aug 11, 2022 at 1:07 AM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > On Wed, Aug 10, 2022 at 03:18:30PM +0000, Yafang Shao wrote:
> > > > The memcg may be the root_mem_cgroup, in which case we shouldn't put it.
> > >
> > > No, it is ok to put root_mem_cgroup. css_put already handles the root
> > > cgroups.
> > >
> >
> > Ah, this commit log doesn't describe the issue clearly. I should improve it.
> > The issue is that in bpf_map_get_memcg() it doesn't get the objcg if
> > map->objcg is NULL (that can happen if the map belongs to the root
> > memcg), so we shouldn't put the objcg if map->objcg is NULL neither in
> > bpf_map_put_memcg().
>
> Sorry I am still not understanding. We are not 'getting' objcg in
> bpf_map_get_memcg(). We are 'getting' memcg from map->objcg and if that
> is NULL the function is returning root memcg and putting root memcg is
> totally fine.

When the map belongs to root_mem_cgroup, the map->objcg is NULL, right ?
See also bpf_map_save_memcg() and it describes clearly in the comment -

static void bpf_map_save_memcg(struct bpf_map *map)
{
        /* Currently if a map is created by a process belonging to the root
         * memory cgroup, get_obj_cgroup_from_current() will return NULL.
         * So we have to check map->objcg for being NULL each time it's
         * being used.
         */
        map->objcg = get_obj_cgroup_from_current();
}

So for the root_mem_cgroup case, bpf_map_get_memcg() will return
root_mem_cgroup directly without getting its css, right ? See below,

static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
{

        if (map->objcg)
                return get_mem_cgroup_from_objcg(map->objcg);

        return root_mem_cgroup;   // without css_get(&memcg->css);
}

But it will put the css unconditionally.  See below,

memcg = bpf_map_get_memcg(map);
...
mem_cgroup_put(memcg);

So we should put it *conditionally* as well.

  memcg = bpf_map_get_memcg(map);
   ...
+ if (map->objcg)
       mem_cgroup_put(memcg);

Is it clear to you ?

> Or are you saying that root_mem_cgroup is NULL?
>

No

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put
  2022-08-11 16:48   ` Roman Gushchin
@ 2022-08-12  0:31     ` Yafang Shao
  0 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-12  0:31 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, netdev,
	bpf, Linux MM

On Fri, Aug 12, 2022 at 12:48 AM Roman Gushchin
<roman.gushchin@linux.dev> wrote:
>
> On Wed, Aug 10, 2022 at 03:18:30PM +0000, Yafang Shao wrote:
> > The memcg may be the root_mem_cgroup, in which case we shouldn't put it.
> > So a new helper bpf_map_put_memcg() is introduced to pair with
> > bpf_map_get_memcg().
> >
> > Fixes: 4201d9ab3e42 ("bpf: reparent bpf maps on memcg offlining")
> > Cc: Roman Gushchin <roman.gushchin@linux.dev>
> > Cc: Shakeel Butt <shakeelb@google.com>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  kernel/bpf/syscall.c | 14 +++++++++++---
> >  1 file changed, 11 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 83c7136..51ab8b1 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -441,6 +441,14 @@ static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
> >       return root_mem_cgroup;
> >  }
> >
> > +static void bpf_map_put_memcg(struct mem_cgroup *memcg)
> > +{
> > +     if (mem_cgroup_is_root(memcg))
> > +             return;
> > +
> > +     mem_cgroup_put(memcg);
> > +}
>
> +1 to what Shakeel said. mem_cgroup_put(root_mem_cgroup) is totally valid.
> So this change does absolutely nothing.
>

Do you mean that we can mem_cgroup_put(root_mem_cgroup) without
mem_cgroup_get(root_mem_cgroup) ?
Am I missing something ?

> The fixes tag assumes there is a bug in the existing code. If so, please,
> describe the problem and how to reproduce it.
>

It is found by code review.  The root_mem_cgroup's css will break. But
I don't know what it may cause to the user.
If you think the fixes tag is unproper, I will remove it.

> Also, if it's not related to the rest of the patchset, please, send it
> separately.
>

I want to introduce a bpf_map_put_memcg() helper to pair with
bpf_map_get_memcg().
This new helper will be used by other patches.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  2022-08-11 16:16   ` Roman Gushchin
@ 2022-08-12  0:35     ` Yafang Shao
  2022-08-12 17:40       ` Roman Gushchin
  0 siblings, 1 reply; 33+ messages in thread
From: Yafang Shao @ 2022-08-12  0:35 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, netdev,
	bpf, Linux MM

On Fri, Aug 12, 2022 at 12:16 AM Roman Gushchin
<roman.gushchin@linux.dev> wrote:
>
> On Wed, Aug 10, 2022 at 03:18:38PM +0000, Yafang Shao wrote:
> > Introduce new helper get_obj_cgroup_from_cgroup() to get obj_cgroup from
> > a specific cgroup.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  include/linux/memcontrol.h |  1 +
> >  mm/memcontrol.c            | 41 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 42 insertions(+)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 2f0a611..901a921 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -1713,6 +1713,7 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
> >  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
> >  void __memcg_kmem_uncharge_page(struct page *page, int order);
> >
> > +struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp);
> >  struct obj_cgroup *get_obj_cgroup_from_current(void);
> >  struct obj_cgroup *get_obj_cgroup_from_page(struct page *page);
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 618c366..762cffa 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2908,6 +2908,47 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> >       return objcg;
> >  }
> >
> > +static struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> > +{
> > +     struct obj_cgroup *objcg;
> > +
> > +     if (memcg_kmem_bypass())
> > +             return NULL;
> > +
> > +     rcu_read_lock();
> > +     objcg = __get_obj_cgroup_from_memcg(memcg);
> > +     rcu_read_unlock();
> > +     return objcg;
>
> This code doesn't make sense to me. What does rcu read lock protect here?

To protect rcu_dereference(memcg->objcg);.
Doesn't it need the read rcu lock ?

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put
  2022-08-12  0:27         ` Yafang Shao
@ 2022-08-12  5:33           ` Muchun Song
  2022-08-12 11:25             ` Yafang Shao
  0 siblings, 1 reply; 33+ messages in thread
From: Muchun Song @ 2022-08-12  5:33 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Shakeel Butt, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin Lau, Song Liu, Yonghong Song,
	john fastabend, KP Singh, Stanislav Fomichev, Hao Luo, jolsa,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Andrew Morton, netdev, bpf, Linux MM



> On Aug 12, 2022, at 08:27, Yafang Shao <laoar.shao@gmail.com> wrote:
> 
> On Thu, Aug 11, 2022 at 11:47 PM Shakeel Butt <shakeelb@google.com> wrote:
>> 
>> On Thu, Aug 11, 2022 at 10:49:13AM +0800, Yafang Shao wrote:
>>> On Thu, Aug 11, 2022 at 1:07 AM Shakeel Butt <shakeelb@google.com> wrote:
>>>> 
>>>> On Wed, Aug 10, 2022 at 03:18:30PM +0000, Yafang Shao wrote:
>>>>> The memcg may be the root_mem_cgroup, in which case we shouldn't put it.
>>>> 
>>>> No, it is ok to put root_mem_cgroup. css_put already handles the root
>>>> cgroups.
>>>> 
>>> 
>>> Ah, this commit log doesn't describe the issue clearly. I should improve it.
>>> The issue is that in bpf_map_get_memcg() it doesn't get the objcg if
>>> map->objcg is NULL (that can happen if the map belongs to the root
>>> memcg), so we shouldn't put the objcg if map->objcg is NULL neither in
>>> bpf_map_put_memcg().
>> 
>> Sorry I am still not understanding. We are not 'getting' objcg in
>> bpf_map_get_memcg(). We are 'getting' memcg from map->objcg and if that
>> is NULL the function is returning root memcg and putting root memcg is
>> totally fine.
> 
> When the map belongs to root_mem_cgroup, the map->objcg is NULL, right ?
> See also bpf_map_save_memcg() and it describes clearly in the comment -
> 
> static void bpf_map_save_memcg(struct bpf_map *map)
> {
>        /* Currently if a map is created by a process belonging to the root
>         * memory cgroup, get_obj_cgroup_from_current() will return NULL.
>         * So we have to check map->objcg for being NULL each time it's
>         * being used.
>         */
>        map->objcg = get_obj_cgroup_from_current();
> }
> 
> So for the root_mem_cgroup case, bpf_map_get_memcg() will return
> root_mem_cgroup directly without getting its css, right ? See below,
> 
> static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
> {
> 
>        if (map->objcg)
>                return get_mem_cgroup_from_objcg(map->objcg);
> 
>        return root_mem_cgroup;   // without css_get(&memcg->css);
> }
> 
> But it will put the css unconditionally.  See below,
> 
> memcg = bpf_map_get_memcg(map);
> ...
> mem_cgroup_put(memcg);
> 
> So we should put it *conditionally* as well.

Hi,

No. We could put root_mem_cgroup unconditionally since the root css
is treated as no reference css. See css_put().

static inline void css_put(struct cgroup_subsys_state *css)
{
	// The root memcg’s css has been set with CSS_NO_REF.
        if (!(css->flags & CSS_NO_REF))
                percpu_ref_put(&css->refcnt);
}

Muchun,
Thanks.

> 
>  memcg = bpf_map_get_memcg(map);
>   ...
> + if (map->objcg)
>       mem_cgroup_put(memcg);
> 
> Is it clear to you ?
> 
>> Or are you saying that root_mem_cgroup is NULL?
>> 
> 
> No
> 
> -- 
> Regards
> Yafang


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put
  2022-08-12  5:33           ` Muchun Song
@ 2022-08-12 11:25             ` Yafang Shao
  0 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-12 11:25 UTC (permalink / raw)
  To: Muchun Song
  Cc: Shakeel Butt, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin Lau, Song Liu, Yonghong Song,
	john fastabend, KP Singh, Stanislav Fomichev, Hao Luo, jolsa,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Andrew Morton, netdev, bpf, Linux MM

On Fri, Aug 12, 2022 at 1:34 PM Muchun Song <muchun.song@linux.dev> wrote:
>
>
>
> > On Aug 12, 2022, at 08:27, Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Thu, Aug 11, 2022 at 11:47 PM Shakeel Butt <shakeelb@google.com> wrote:
> >>
> >> On Thu, Aug 11, 2022 at 10:49:13AM +0800, Yafang Shao wrote:
> >>> On Thu, Aug 11, 2022 at 1:07 AM Shakeel Butt <shakeelb@google.com> wrote:
> >>>>
> >>>> On Wed, Aug 10, 2022 at 03:18:30PM +0000, Yafang Shao wrote:
> >>>>> The memcg may be the root_mem_cgroup, in which case we shouldn't put it.
> >>>>
> >>>> No, it is ok to put root_mem_cgroup. css_put already handles the root
> >>>> cgroups.
> >>>>
> >>>
> >>> Ah, this commit log doesn't describe the issue clearly. I should improve it.
> >>> The issue is that in bpf_map_get_memcg() it doesn't get the objcg if
> >>> map->objcg is NULL (that can happen if the map belongs to the root
> >>> memcg), so we shouldn't put the objcg if map->objcg is NULL neither in
> >>> bpf_map_put_memcg().
> >>
> >> Sorry I am still not understanding. We are not 'getting' objcg in
> >> bpf_map_get_memcg(). We are 'getting' memcg from map->objcg and if that
> >> is NULL the function is returning root memcg and putting root memcg is
> >> totally fine.
> >
> > When the map belongs to root_mem_cgroup, the map->objcg is NULL, right ?
> > See also bpf_map_save_memcg() and it describes clearly in the comment -
> >
> > static void bpf_map_save_memcg(struct bpf_map *map)
> > {
> >        /* Currently if a map is created by a process belonging to the root
> >         * memory cgroup, get_obj_cgroup_from_current() will return NULL.
> >         * So we have to check map->objcg for being NULL each time it's
> >         * being used.
> >         */
> >        map->objcg = get_obj_cgroup_from_current();
> > }
> >
> > So for the root_mem_cgroup case, bpf_map_get_memcg() will return
> > root_mem_cgroup directly without getting its css, right ? See below,
> >
> > static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
> > {
> >
> >        if (map->objcg)
> >                return get_mem_cgroup_from_objcg(map->objcg);
> >
> >        return root_mem_cgroup;   // without css_get(&memcg->css);
> > }
> >
> > But it will put the css unconditionally.  See below,
> >
> > memcg = bpf_map_get_memcg(map);
> > ...
> > mem_cgroup_put(memcg);
> >
> > So we should put it *conditionally* as well.
>
> Hi,
>
> No. We could put root_mem_cgroup unconditionally since the root css
> is treated as no reference css. See css_put().
>
> static inline void css_put(struct cgroup_subsys_state *css)
> {
>         // The root memcg’s css has been set with CSS_NO_REF.
>         if (!(css->flags & CSS_NO_REF))
>                 percpu_ref_put(&css->refcnt);
> }

Indeed. I missed that.
The call stack is so deep that I didn't look into it :(
Thanks for the information.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  2022-08-10 15:18 ` [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup Yafang Shao
  2022-08-11 16:16   ` Roman Gushchin
@ 2022-08-12 16:57   ` Shakeel Butt
  2022-08-13  0:07     ` Yafang Shao
  1 sibling, 1 reply; 33+ messages in thread
From: Shakeel Butt @ 2022-08-12 16:57 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	songmuchun, akpm, netdev, bpf, linux-mm

On Wed, Aug 10, 2022 at 03:18:38PM +0000, Yafang Shao wrote:
> Introduce new helper get_obj_cgroup_from_cgroup() to get obj_cgroup from
> a specific cgroup.

Can you please add couple of lines on why you need objcg?

> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  include/linux/memcontrol.h |  1 +
>  mm/memcontrol.c            | 41 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 2f0a611..901a921 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1713,6 +1713,7 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
>  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
>  void __memcg_kmem_uncharge_page(struct page *page, int order);
>  
> +struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp);
>  struct obj_cgroup *get_obj_cgroup_from_current(void);
>  struct obj_cgroup *get_obj_cgroup_from_page(struct page *page);
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 618c366..762cffa 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2908,6 +2908,47 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
>  	return objcg;
>  }
>  
> +static struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> +{
> +	struct obj_cgroup *objcg;
> +
> +	if (memcg_kmem_bypass())
> +		return NULL;
> +
> +	rcu_read_lock();
> +	objcg = __get_obj_cgroup_from_memcg(memcg);
> +	rcu_read_unlock();
> +	return objcg;
> +}
> +
> +struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp)
> +{
> +	struct cgroup_subsys_state *css;
> +	struct mem_cgroup *memcg;
> +	struct obj_cgroup *objcg;
> +
> +	rcu_read_lock();
> +	css = rcu_dereference(cgrp->subsys[memory_cgrp_id]);
> +	if (!css || !css_tryget_online(css)) {
> +		rcu_read_unlock();
> +		cgroup_put(cgrp);
> +		return ERR_PTR(-EINVAL);
> +	}
> +	rcu_read_unlock();
> +	cgroup_put(cgrp);

The above put seems out of place and buggy.

> +
> +	memcg = mem_cgroup_from_css(css);
> +	if (!memcg) {
> +		css_put(css);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	objcg = get_obj_cgroup_from_memcg(memcg);
> +	css_put(css);
> +
> +	return objcg;
> +}
> +
>  __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
>  {
>  	struct obj_cgroup *objcg = NULL;
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  2022-08-12  0:35     ` Yafang Shao
@ 2022-08-12 17:40       ` Roman Gushchin
  2022-08-12 23:56         ` Yafang Shao
  0 siblings, 1 reply; 33+ messages in thread
From: Roman Gushchin @ 2022-08-12 17:40 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, netdev,
	bpf, Linux MM

On Fri, Aug 12, 2022 at 08:35:19AM +0800, Yafang Shao wrote:
> On Fri, Aug 12, 2022 at 12:16 AM Roman Gushchin
> <roman.gushchin@linux.dev> wrote:
> >
> > On Wed, Aug 10, 2022 at 03:18:38PM +0000, Yafang Shao wrote:
> > > Introduce new helper get_obj_cgroup_from_cgroup() to get obj_cgroup from
> > > a specific cgroup.
> > >
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > ---
> > >  include/linux/memcontrol.h |  1 +
> > >  mm/memcontrol.c            | 41 +++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 42 insertions(+)
> > >
> > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > > index 2f0a611..901a921 100644
> > > --- a/include/linux/memcontrol.h
> > > +++ b/include/linux/memcontrol.h
> > > @@ -1713,6 +1713,7 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
> > >  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
> > >  void __memcg_kmem_uncharge_page(struct page *page, int order);
> > >
> > > +struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp);
> > >  struct obj_cgroup *get_obj_cgroup_from_current(void);
> > >  struct obj_cgroup *get_obj_cgroup_from_page(struct page *page);
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 618c366..762cffa 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -2908,6 +2908,47 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> > >       return objcg;
> > >  }
> > >
> > > +static struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> > > +{
> > > +     struct obj_cgroup *objcg;
> > > +
> > > +     if (memcg_kmem_bypass())
> > > +             return NULL;
> > > +
> > > +     rcu_read_lock();
> > > +     objcg = __get_obj_cgroup_from_memcg(memcg);
> > > +     rcu_read_unlock();
> > > +     return objcg;
> >
> > This code doesn't make sense to me. What does rcu read lock protect here?
> 
> To protect rcu_dereference(memcg->objcg);.
> Doesn't it need the read rcu lock ?

No, it's not how rcu works. Please, take a look at the docs here:
https://docs.kernel.org/RCU/whatisRCU.html#whatisrcu .
In particular, it describes this specific case very well.

In 2 words, you don't protect the rcu_dereference() call, you protect the pointer
you get, cause it's valid only inside the rcu read section. After rcu_read_unlock()
it might point at a random data, because the protected object can be already freed.

Thanks!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  2022-08-12 17:40       ` Roman Gushchin
@ 2022-08-12 23:56         ` Yafang Shao
  2022-08-13 18:30           ` Roman Gushchin
  0 siblings, 1 reply; 33+ messages in thread
From: Yafang Shao @ 2022-08-12 23:56 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, netdev,
	bpf, Linux MM

On Sat, Aug 13, 2022 at 1:40 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Fri, Aug 12, 2022 at 08:35:19AM +0800, Yafang Shao wrote:
> > On Fri, Aug 12, 2022 at 12:16 AM Roman Gushchin
> > <roman.gushchin@linux.dev> wrote:
> > >
> > > On Wed, Aug 10, 2022 at 03:18:38PM +0000, Yafang Shao wrote:
> > > > Introduce new helper get_obj_cgroup_from_cgroup() to get obj_cgroup from
> > > > a specific cgroup.
> > > >
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > ---
> > > >  include/linux/memcontrol.h |  1 +
> > > >  mm/memcontrol.c            | 41 +++++++++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 42 insertions(+)
> > > >
> > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > > > index 2f0a611..901a921 100644
> > > > --- a/include/linux/memcontrol.h
> > > > +++ b/include/linux/memcontrol.h
> > > > @@ -1713,6 +1713,7 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
> > > >  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
> > > >  void __memcg_kmem_uncharge_page(struct page *page, int order);
> > > >
> > > > +struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp);
> > > >  struct obj_cgroup *get_obj_cgroup_from_current(void);
> > > >  struct obj_cgroup *get_obj_cgroup_from_page(struct page *page);
> > > >
> > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > index 618c366..762cffa 100644
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -2908,6 +2908,47 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> > > >       return objcg;
> > > >  }
> > > >
> > > > +static struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> > > > +{
> > > > +     struct obj_cgroup *objcg;
> > > > +
> > > > +     if (memcg_kmem_bypass())
> > > > +             return NULL;
> > > > +
> > > > +     rcu_read_lock();
> > > > +     objcg = __get_obj_cgroup_from_memcg(memcg);
> > > > +     rcu_read_unlock();
> > > > +     return objcg;
> > >
> > > This code doesn't make sense to me. What does rcu read lock protect here?
> >
> > To protect rcu_dereference(memcg->objcg);.
> > Doesn't it need the read rcu lock ?
>
> No, it's not how rcu works. Please, take a look at the docs here:
> https://docs.kernel.org/RCU/whatisRCU.html#whatisrcu .
> In particular, it describes this specific case very well.
>
> In 2 words, you don't protect the rcu_dereference() call, you protect the pointer

I just copied and pasted rcu_dereference(memcg->objcg) there to make it clear.
Actually it protects memcg->objcg, doesn't it ?

> you get, cause it's valid only inside the rcu read section. After rcu_read_unlock()
> it might point at a random data, because the protected object can be already freed.
>

Are you sure?
Can't the obj_cgroup_tryget(objcg) prevent it from being freed ?

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  2022-08-12 16:57   ` Shakeel Butt
@ 2022-08-13  0:07     ` Yafang Shao
  0 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-13  0:07 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton, netdev,
	bpf, Linux MM

On Sat, Aug 13, 2022 at 12:57 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Wed, Aug 10, 2022 at 03:18:38PM +0000, Yafang Shao wrote:
> > Introduce new helper get_obj_cgroup_from_cgroup() to get obj_cgroup from
> > a specific cgroup.
>
> Can you please add couple of lines on why you need objcg?
>

Sure. will update in the next version.

> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  include/linux/memcontrol.h |  1 +
> >  mm/memcontrol.c            | 41 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 42 insertions(+)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 2f0a611..901a921 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -1713,6 +1713,7 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
> >  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
> >  void __memcg_kmem_uncharge_page(struct page *page, int order);
> >
> > +struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp);
> >  struct obj_cgroup *get_obj_cgroup_from_current(void);
> >  struct obj_cgroup *get_obj_cgroup_from_page(struct page *page);
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 618c366..762cffa 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2908,6 +2908,47 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> >       return objcg;
> >  }
> >
> > +static struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> > +{
> > +     struct obj_cgroup *objcg;
> > +
> > +     if (memcg_kmem_bypass())
> > +             return NULL;
> > +
> > +     rcu_read_lock();
> > +     objcg = __get_obj_cgroup_from_memcg(memcg);
> > +     rcu_read_unlock();
> > +     return objcg;
> > +}
> > +
> > +struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp)
> > +{
> > +     struct cgroup_subsys_state *css;
> > +     struct mem_cgroup *memcg;
> > +     struct obj_cgroup *objcg;
> > +
> > +     rcu_read_lock();
> > +     css = rcu_dereference(cgrp->subsys[memory_cgrp_id]);
> > +     if (!css || !css_tryget_online(css)) {
> > +             rcu_read_unlock();
> > +             cgroup_put(cgrp);
> > +             return ERR_PTR(-EINVAL);
> > +     }
> > +     rcu_read_unlock();
> > +     cgroup_put(cgrp);
>
> The above put seems out of place and buggy.
>

Thanks for pointing it out.
The cgroup_put should be used in bpf_map_save_memcg().
I will update it.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  2022-08-12 23:56         ` Yafang Shao
@ 2022-08-13 18:30           ` Roman Gushchin
  2022-08-14  2:35             ` Yafang Shao
  0 siblings, 1 reply; 33+ messages in thread
From: Roman Gushchin @ 2022-08-13 18:30 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, netdev,
	bpf, Linux MM

On Sat, Aug 13, 2022 at 07:56:54AM +0800, Yafang Shao wrote:
> On Sat, Aug 13, 2022 at 1:40 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > On Fri, Aug 12, 2022 at 08:35:19AM +0800, Yafang Shao wrote:
> > > On Fri, Aug 12, 2022 at 12:16 AM Roman Gushchin
> > > <roman.gushchin@linux.dev> wrote:
> > > >
> > > > On Wed, Aug 10, 2022 at 03:18:38PM +0000, Yafang Shao wrote:
> > > > > Introduce new helper get_obj_cgroup_from_cgroup() to get obj_cgroup from
> > > > > a specific cgroup.
> > > > >
> > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > > ---
> > > > >  include/linux/memcontrol.h |  1 +
> > > > >  mm/memcontrol.c            | 41 +++++++++++++++++++++++++++++++++++++++++
> > > > >  2 files changed, 42 insertions(+)
> > > > >
> > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > > > > index 2f0a611..901a921 100644
> > > > > --- a/include/linux/memcontrol.h
> > > > > +++ b/include/linux/memcontrol.h
> > > > > @@ -1713,6 +1713,7 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
> > > > >  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
> > > > >  void __memcg_kmem_uncharge_page(struct page *page, int order);
> > > > >
> > > > > +struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp);
> > > > >  struct obj_cgroup *get_obj_cgroup_from_current(void);
> > > > >  struct obj_cgroup *get_obj_cgroup_from_page(struct page *page);
> > > > >
> > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > > index 618c366..762cffa 100644
> > > > > --- a/mm/memcontrol.c
> > > > > +++ b/mm/memcontrol.c
> > > > > @@ -2908,6 +2908,47 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> > > > >       return objcg;
> > > > >  }
> > > > >
> > > > > +static struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> > > > > +{
> > > > > +     struct obj_cgroup *objcg;
> > > > > +
> > > > > +     if (memcg_kmem_bypass())
> > > > > +             return NULL;
> > > > > +
> > > > > +     rcu_read_lock();
> > > > > +     objcg = __get_obj_cgroup_from_memcg(memcg);
> > > > > +     rcu_read_unlock();
> > > > > +     return objcg;
> > > >
> > > > This code doesn't make sense to me. What does rcu read lock protect here?
> > >
> > > To protect rcu_dereference(memcg->objcg);.
> > > Doesn't it need the read rcu lock ?
> >
> > No, it's not how rcu works. Please, take a look at the docs here:
> > https://docs.kernel.org/RCU/whatisRCU.html#whatisrcu .
> > In particular, it describes this specific case very well.
> >
> > In 2 words, you don't protect the rcu_dereference() call, you protect the pointer
> 
> I just copied and pasted rcu_dereference(memcg->objcg) there to make it clear.
> Actually it protects memcg->objcg, doesn't it ?
> 
> > you get, cause it's valid only inside the rcu read section. After rcu_read_unlock()
> > it might point at a random data, because the protected object can be already freed.
> >
> 
> Are you sure?
> Can't the obj_cgroup_tryget(objcg) prevent it from being freed ?

Ok, now I see where it comes from. You copy-pasted it from get_obj_cgroup_from_current()?
There rcu read lock section protects memcg, not objcg.
In your case you don't need it, because memcg is passed as a parameter to the function,
so it's the duty of the caller to ensure the lifetime of memcg.

Thanks!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  2022-08-13 18:30           ` Roman Gushchin
@ 2022-08-14  2:35             ` Yafang Shao
  0 siblings, 0 replies; 33+ messages in thread
From: Yafang Shao @ 2022-08-14  2:35 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, netdev,
	bpf, Linux MM

On Sun, Aug 14, 2022 at 2:30 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Sat, Aug 13, 2022 at 07:56:54AM +0800, Yafang Shao wrote:
> > On Sat, Aug 13, 2022 at 1:40 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > >
> > > On Fri, Aug 12, 2022 at 08:35:19AM +0800, Yafang Shao wrote:
> > > > On Fri, Aug 12, 2022 at 12:16 AM Roman Gushchin
> > > > <roman.gushchin@linux.dev> wrote:
> > > > >
> > > > > On Wed, Aug 10, 2022 at 03:18:38PM +0000, Yafang Shao wrote:
> > > > > > Introduce new helper get_obj_cgroup_from_cgroup() to get obj_cgroup from
> > > > > > a specific cgroup.
> > > > > >
> > > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > > > ---
> > > > > >  include/linux/memcontrol.h |  1 +
> > > > > >  mm/memcontrol.c            | 41 +++++++++++++++++++++++++++++++++++++++++
> > > > > >  2 files changed, 42 insertions(+)
> > > > > >
> > > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > > > > > index 2f0a611..901a921 100644
> > > > > > --- a/include/linux/memcontrol.h
> > > > > > +++ b/include/linux/memcontrol.h
> > > > > > @@ -1713,6 +1713,7 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
> > > > > >  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
> > > > > >  void __memcg_kmem_uncharge_page(struct page *page, int order);
> > > > > >
> > > > > > +struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp);
> > > > > >  struct obj_cgroup *get_obj_cgroup_from_current(void);
> > > > > >  struct obj_cgroup *get_obj_cgroup_from_page(struct page *page);
> > > > > >
> > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > > > index 618c366..762cffa 100644
> > > > > > --- a/mm/memcontrol.c
> > > > > > +++ b/mm/memcontrol.c
> > > > > > @@ -2908,6 +2908,47 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> > > > > >       return objcg;
> > > > > >  }
> > > > > >
> > > > > > +static struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> > > > > > +{
> > > > > > +     struct obj_cgroup *objcg;
> > > > > > +
> > > > > > +     if (memcg_kmem_bypass())
> > > > > > +             return NULL;
> > > > > > +
> > > > > > +     rcu_read_lock();
> > > > > > +     objcg = __get_obj_cgroup_from_memcg(memcg);
> > > > > > +     rcu_read_unlock();
> > > > > > +     return objcg;
> > > > >
> > > > > This code doesn't make sense to me. What does rcu read lock protect here?
> > > >
> > > > To protect rcu_dereference(memcg->objcg);.
> > > > Doesn't it need the read rcu lock ?
> > >
> > > No, it's not how rcu works. Please, take a look at the docs here:
> > > https://docs.kernel.org/RCU/whatisRCU.html#whatisrcu .
> > > In particular, it describes this specific case very well.
> > >
> > > In 2 words, you don't protect the rcu_dereference() call, you protect the pointer
> >
> > I just copied and pasted rcu_dereference(memcg->objcg) there to make it clear.
> > Actually it protects memcg->objcg, doesn't it ?
> >
> > > you get, cause it's valid only inside the rcu read section. After rcu_read_unlock()
> > > it might point at a random data, because the protected object can be already freed.
> > >
> >
> > Are you sure?
> > Can't the obj_cgroup_tryget(objcg) prevent it from being freed ?
>
> Ok, now I see where it comes from. You copy-pasted it from get_obj_cgroup_from_current()?
> There rcu read lock section protects memcg, not objcg.

Could you pls explain in detail why we should protect memcg instead of objcg ?
Why does the memcg need the read rcu lock ?

> In your case you don't need it, because memcg is passed as a parameter to the function,
> so it's the duty of the caller to ensure the lifetime of memcg.
>

I'm still a bit confused. See below,

objcg = rcu_dereference(memcg->objcg);
percpu_ref_tryget(&objcg->refcnt);    <<<< what if the objcg is freed
before this operation ??


-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-08-14  2:36 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-10 15:18 [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 01/15] bpf: Remove unneeded memset in queue_stack_map creation Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 02/15] bpf: Use bpf_map_area_free instread of kvfree Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 03/15] bpf: Make __GFP_NOWARN consistent in bpf map creation Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 04/15] bpf: Use bpf_map_area_alloc consistently on " Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put Yafang Shao
2022-08-10 17:07   ` Shakeel Butt
2022-08-11  2:49     ` Yafang Shao
2022-08-11 15:47       ` Shakeel Butt
2022-08-12  0:27         ` Yafang Shao
2022-08-12  5:33           ` Muchun Song
2022-08-12 11:25             ` Yafang Shao
2022-08-11 16:48   ` Roman Gushchin
2022-08-12  0:31     ` Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 06/15] bpf: Define bpf_map_{get,put}_memcg for !CONFIG_MEMCG_KMEM Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 07/15] bpf: Call bpf_map_init_from_attr() immediately after map creation Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 08/15] bpf: Save memcg in bpf_map_init_from_attr() Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 09/15] bpf: Use scoped-based charge in bpf_map_area_alloc Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 10/15] bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free} Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 11/15] bpf: Use bpf_map_kzalloc in arraymap Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 12/15] bpf: Use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup Yafang Shao
2022-08-11 16:16   ` Roman Gushchin
2022-08-12  0:35     ` Yafang Shao
2022-08-12 17:40       ` Roman Gushchin
2022-08-12 23:56         ` Yafang Shao
2022-08-13 18:30           ` Roman Gushchin
2022-08-14  2:35             ` Yafang Shao
2022-08-12 16:57   ` Shakeel Butt
2022-08-13  0:07     ` Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 14/15] bpf: Add return value for bpf_map_init_from_attr Yafang Shao
2022-08-10 15:18 ` [PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
2022-08-10 19:00 ` [PATCH bpf-next 00/15] " patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).