All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map
@ 2022-07-29 15:23 Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 01/15] bpf: Remove unneeded memset in queue_stack_map creation Yafang Shao
                   ` (14 more replies)
  0 siblings, 15 replies; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

On our production environment, we may load, run and pin bpf programs and
maps in containers. For example, some of our networking bpf programs and
maps are loaded and pinned by a process running in a container on our
k8s environment. In this container, there're also running some other
user applications which watch the networking configurations from remote
servers and update them on this local host, log the error events, monitor
the traffic, and do some other stuffs. Sometimes we may need to update 
these user applications to a new release, and in this update process we
will destroy the old container and then start a new genration. In order not
to interrupt the bpf programs in the update process, we will pin the bpf
programs and maps in bpffs. That is the background and use case on our
production environment. 

After switching to memcg-based bpf memory accounting to limit the bpf
memory, some unexpected issues jumped out at us.
1. The memory usage is not consistent between the first generation and
new generations.
2. After the first generation is destroyed, the bpf memory can't be
limited if the bpf maps are not preallocated, because they will be
reparented.

This patchset tries to resolve these issues by introducing an
independent memcg to limit the bpf memory.

In the bpf map creation, we can assign a specific memcg instead of using
the current memcg.  That makes it flexible in containized environment.
For example, if we want to limit the pinned bpf maps, we can use below
hierarchy,

    Shared resources             Private resources 
                                    
     bpf-memcg                      k8s-memcg
     /        \                     /             
bpf-bar-memcg bpf-foo-memcg   srv-foo-memcg        
                  |               /        \
               (charged)     (not charged) (charged)                 
                  |           /              \
                  |          /                \
          bpf-foo-{progs, maps}              srv-foo

srv-foo loads and pins bpf-foo-{progs, maps}, but they are charged to an
independent memcg (bpf-foo-memcg) instead of srv-foo's memcg
(srv-foo-memcg).

Pls. note that there may be no process in bpf-foo-memcg, that means it
can be rmdir-ed by root user currently. Meanwhile we don't forcefully
destroy a memcg if it doesn't have any residents. So this hierarchy is
acceptible. 

In order to make the memcg of bpf maps seletectable, this patchset
introduces some memory allocation wrappers to allocate map related
memory. In these wrappers, it will get the memcg from the map and then
charge the allocated pages or objs.  

Currenly it only supports for bpf map, and we can extend it to bpf prog
as well. It only supports for cgroup2 now, but we can make an additional
change in cgroup_get_from_fd() to support it for cgroup1. 

The observebility can also be supported in the next step, for example,
showing the bpf map's memcg by 'bpftool map show' or even showing which
maps are charged to a specific memcg by 'bpftool cgroup show'.
Furthermore, we may also show an accurate memory size of a bpf map
instead of an estimated memory size in 'bpftool map show' in the future. 

Yafang Shao (15):
  bpf: Remove unneeded memset in queue_stack_map creation
  bpf: Use bpf_map_area_free instread of kvfree
  bpf: Make __GFP_NOWARN consistent in bpf map creation
  bpf: Use bpf_map_area_alloc consistently on bpf map creation
  bpf: Introduce helpers for container of struct bpf_map
  bpf: Use bpf_map_container_alloc helpers in various bpf maps
  bpf: Define bpf_map_get_memcg for !CONFIG_MEMCG_KMEM
  bpf: Use scope-based charge for bpf_map_area_alloc
  bpf: Use bpf_map_kzalloc in arraymap
  bpf: Use bpf_map_pages_alloc in ringbuf
  bpf: Use bpf_map_kvcalloc in bpf_local_storage
  mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  bpf: Add new parameter into bpf_map_container_alloc
  bpf: Add new map flag BPF_F_SELECTABLE_MEMCG
  bpf: Introduce selectable memcg for bpf map

 include/linux/bpf.h            |  19 +++-
 include/linux/memcontrol.h     |  11 ++
 include/uapi/linux/bpf.h       |   5 +
 kernel/bpf/arraymap.c          |  46 ++++----
 kernel/bpf/bloom_filter.c      |  13 ++-
 kernel/bpf/bpf_local_storage.c |  17 +--
 kernel/bpf/bpf_struct_ops.c    |  17 +--
 kernel/bpf/cpumap.c            |  12 +-
 kernel/bpf/devmap.c            |  26 +++--
 kernel/bpf/hashtab.c           |  17 +--
 kernel/bpf/local_storage.c     |  11 +-
 kernel/bpf/lpm_trie.c          |   8 +-
 kernel/bpf/offload.c           |   6 +-
 kernel/bpf/queue_stack_maps.c  |  12 +-
 kernel/bpf/reuseport_array.c   |   9 +-
 kernel/bpf/ringbuf.c           |  57 +++++-----
 kernel/bpf/stackmap.c          |  15 +--
 kernel/bpf/syscall.c           | 197 ++++++++++++++++++++++++++++-----
 mm/memcontrol.c                |  41 +++++++
 net/core/sock_map.c            |  31 +++---
 net/xdp/xskmap.c               |   8 +-
 tools/include/uapi/linux/bpf.h |   5 +
 tools/lib/bpf/bpf.c            |   1 +
 tools/lib/bpf/bpf.h            |   3 +-
 tools/lib/bpf/libbpf.c         |   2 +
 25 files changed, 411 insertions(+), 178 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 01/15] bpf: Remove unneeded memset in queue_stack_map creation
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 02/15] bpf: Use bpf_map_area_free instread of kvfree Yafang Shao
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

__GFP_ZERO will clear the memory, so we don't need to memset it.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/queue_stack_maps.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index a1c0794ae49d..8a5e060de63b 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -78,8 +78,6 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 	if (!qs)
 		return ERR_PTR(-ENOMEM);
 
-	memset(qs, 0, sizeof(*qs));
-
 	bpf_map_init_from_attr(&qs->map, attr);
 
 	qs->size = size;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 02/15] bpf: Use bpf_map_area_free instread of kvfree
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 01/15] bpf: Remove unneeded memset in queue_stack_map creation Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 03/15] bpf: Make __GFP_NOWARN consistent in bpf map creation Yafang Shao
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

bpf_map_area_alloc() should be paired with bpf_map_area_free().

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/ringbuf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index ded4faeca192..3fb54feb39d4 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -116,7 +116,7 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
 err_free_pages:
 	for (i = 0; i < nr_pages; i++)
 		__free_page(pages[i]);
-	kvfree(pages);
+	bpf_map_area_free(pages);
 	return NULL;
 }
 
@@ -190,7 +190,7 @@ static void bpf_ringbuf_free(struct bpf_ringbuf *rb)
 	vunmap(rb);
 	for (i = 0; i < nr_pages; i++)
 		__free_page(pages[i]);
-	kvfree(pages);
+	bpf_map_area_free(pages);
 }
 
 static void ringbuf_map_free(struct bpf_map *map)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 03/15] bpf: Make __GFP_NOWARN consistent in bpf map creation
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 01/15] bpf: Remove unneeded memset in queue_stack_map creation Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 02/15] bpf: Use bpf_map_area_free instread of kvfree Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 04/15] bpf: Use bpf_map_area_alloc consistently on " Yafang Shao
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Some of the bpf maps are created with __GFP_NOWARN, i.e. arraymap,
bloom_filter, bpf_local_storage, bpf_struct_ops, lpm_trie,
queue_stack_maps, reuseport_array, stackmap and xskmap, while others are
created without __GFP_NOWARN, i.e. cpumap, devmap, hashtab,
local_storage, offload, ringbuf and sock_map. But there are not key
differences between the creation of these maps. So let make this
allocation flag consistent in all bpf maps creation. Then we can use a
generic helper to alloc all bpf maps.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/cpumap.c        | 2 +-
 kernel/bpf/devmap.c        | 2 +-
 kernel/bpf/hashtab.c       | 2 +-
 kernel/bpf/local_storage.c | 4 ++--
 kernel/bpf/offload.c       | 2 +-
 kernel/bpf/ringbuf.c       | 2 +-
 net/core/sock_map.c        | 4 ++--
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index f4860ac756cd..b25ca9d603a6 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -97,7 +97,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	    attr->map_flags & ~BPF_F_NUMA_NODE)
 		return ERR_PTR(-EINVAL);
 
-	cmap = kzalloc(sizeof(*cmap), GFP_USER | __GFP_ACCOUNT);
+	cmap = kzalloc(sizeof(*cmap), GFP_USER | __GFP_NOWARN |  __GFP_ACCOUNT);
 	if (!cmap)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index a0e02b009487..88feaa094de8 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -163,7 +163,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	if (!capable(CAP_NET_ADMIN))
 		return ERR_PTR(-EPERM);
 
-	dtab = kzalloc(sizeof(*dtab), GFP_USER | __GFP_ACCOUNT);
+	dtab = kzalloc(sizeof(*dtab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!dtab)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index da7578426a46..f1e5303fe26e 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -495,7 +495,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	struct bpf_htab *htab;
 	int err, i;
 
-	htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_ACCOUNT);
+	htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 49ef0ce040c7..a64255e20f87 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -313,8 +313,8 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 		/* max_entries is not used and enforced to be 0 */
 		return ERR_PTR(-EINVAL);
 
-	map = kmalloc_node(sizeof(struct bpf_cgroup_storage_map),
-			   __GFP_ZERO | GFP_USER | __GFP_ACCOUNT, numa_node);
+	map = kzalloc_node(sizeof(struct bpf_cgroup_storage_map),
+			   GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT, numa_node);
 	if (!map)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index bd09290e3648..5a629a1b971c 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -372,7 +372,7 @@ struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr)
 	    attr->map_type != BPF_MAP_TYPE_HASH)
 		return ERR_PTR(-EINVAL);
 
-	offmap = kzalloc(sizeof(*offmap), GFP_USER);
+	offmap = kzalloc(sizeof(*offmap), GFP_USER | __GFP_NOWARN);
 	if (!offmap)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 3fb54feb39d4..df8062cb258c 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -164,7 +164,7 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-E2BIG);
 #endif
 
-	rb_map = kzalloc(sizeof(*rb_map), GFP_USER | __GFP_ACCOUNT);
+	rb_map = kzalloc(sizeof(*rb_map), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!rb_map)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 028813dfecb0..763d77162d0c 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -41,7 +41,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 	    attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
 		return ERR_PTR(-EINVAL);
 
-	stab = kzalloc(sizeof(*stab), GFP_USER | __GFP_ACCOUNT);
+	stab = kzalloc(sizeof(*stab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!stab)
 		return ERR_PTR(-ENOMEM);
 
@@ -1076,7 +1076,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 	if (attr->key_size > MAX_BPF_STACK)
 		return ERR_PTR(-E2BIG);
 
-	htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_ACCOUNT);
+	htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 04/15] bpf: Use bpf_map_area_alloc consistently on bpf map creation
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (2 preceding siblings ...)
  2022-07-29 15:23 ` [RFC PATCH bpf-next 03/15] bpf: Make __GFP_NOWARN consistent in bpf map creation Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 05/15] bpf: Introduce helpers for container of struct bpf_map Yafang Shao
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Let's use the generic helper bpf_map_area_alloc() instead of the
open-coded kzalloc helpers in bpf maps creation path.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/bpf_local_storage.c |  6 +++---
 kernel/bpf/cpumap.c            |  6 +++---
 kernel/bpf/devmap.c            |  6 +++---
 kernel/bpf/hashtab.c           |  6 +++---
 kernel/bpf/local_storage.c     |  5 ++---
 kernel/bpf/lpm_trie.c          |  4 ++--
 kernel/bpf/ringbuf.c           |  6 +++---
 net/core/sock_map.c            | 12 ++++++------
 8 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 8ce40fd869f6..4ee2e7286c23 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -582,7 +582,7 @@ void bpf_local_storage_map_free(struct bpf_local_storage_map *smap,
 	synchronize_rcu();
 
 	kvfree(smap->buckets);
-	kfree(smap);
+	bpf_map_area_free(smap);
 }
 
 int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
@@ -610,7 +610,7 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	unsigned int i;
 	u32 nbuckets;
 
-	smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	smap = bpf_map_area_alloc(sizeof(*smap), NUMA_NO_NODE);
 	if (!smap)
 		return ERR_PTR(-ENOMEM);
 	bpf_map_init_from_attr(&smap->map, attr);
@@ -623,7 +623,7 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
 				 GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!smap->buckets) {
-		kfree(smap);
+		bpf_map_area_free(smap);
 		return ERR_PTR(-ENOMEM);
 	}
 
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index b25ca9d603a6..b5ba34ddd4b6 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -97,7 +97,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	    attr->map_flags & ~BPF_F_NUMA_NODE)
 		return ERR_PTR(-EINVAL);
 
-	cmap = kzalloc(sizeof(*cmap), GFP_USER | __GFP_NOWARN |  __GFP_ACCOUNT);
+	cmap = bpf_map_area_alloc(sizeof(*cmap), NUMA_NO_NODE);
 	if (!cmap)
 		return ERR_PTR(-ENOMEM);
 
@@ -118,7 +118,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 
 	return &cmap->map;
 free_cmap:
-	kfree(cmap);
+	bpf_map_area_free(cmap);
 	return ERR_PTR(err);
 }
 
@@ -623,7 +623,7 @@ static void cpu_map_free(struct bpf_map *map)
 		__cpu_map_entry_replace(cmap, i, NULL); /* call_rcu */
 	}
 	bpf_map_area_free(cmap->cpu_map);
-	kfree(cmap);
+	bpf_map_area_free(cmap);
 }
 
 /* Elements are kept alive by RCU; either by rcu_read_lock() (from syscall) or
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 88feaa094de8..f9a87dcc5535 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -163,13 +163,13 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	if (!capable(CAP_NET_ADMIN))
 		return ERR_PTR(-EPERM);
 
-	dtab = kzalloc(sizeof(*dtab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	dtab = bpf_map_area_alloc(sizeof(*dtab), NUMA_NO_NODE);
 	if (!dtab)
 		return ERR_PTR(-ENOMEM);
 
 	err = dev_map_init_map(dtab, attr);
 	if (err) {
-		kfree(dtab);
+		bpf_map_area_free(dtab);
 		return ERR_PTR(err);
 	}
 
@@ -240,7 +240,7 @@ static void dev_map_free(struct bpf_map *map)
 		bpf_map_area_free(dtab->netdev_map);
 	}
 
-	kfree(dtab);
+	bpf_map_area_free(dtab);
 }
 
 static int dev_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index f1e5303fe26e..8392f7f8a8ac 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -495,7 +495,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	struct bpf_htab *htab;
 	int err, i;
 
-	htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
@@ -579,7 +579,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	bpf_map_area_free(htab->buckets);
 free_htab:
 	lockdep_unregister_key(&htab->lockdep_key);
-	kfree(htab);
+	bpf_map_area_free(htab);
 	return ERR_PTR(err);
 }
 
@@ -1496,7 +1496,7 @@ static void htab_map_free(struct bpf_map *map)
 	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
 		free_percpu(htab->map_locked[i]);
 	lockdep_unregister_key(&htab->lockdep_key);
-	kfree(htab);
+	bpf_map_area_free(htab);
 }
 
 static void htab_map_seq_show_elem(struct bpf_map *map, void *key,
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index a64255e20f87..098cf336fae6 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -313,8 +313,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 		/* max_entries is not used and enforced to be 0 */
 		return ERR_PTR(-EINVAL);
 
-	map = kzalloc_node(sizeof(struct bpf_cgroup_storage_map),
-			   GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT, numa_node);
+	map = bpf_map_area_alloc(sizeof(struct bpf_cgroup_storage_map), numa_node);
 	if (!map)
 		return ERR_PTR(-ENOMEM);
 
@@ -346,7 +345,7 @@ static void cgroup_storage_map_free(struct bpf_map *_map)
 	WARN_ON(!RB_EMPTY_ROOT(&map->root));
 	WARN_ON(!list_empty(&map->list));
 
-	kfree(map);
+	bpf_map_area_free(map);
 }
 
 static int cgroup_storage_delete_elem(struct bpf_map *map, void *key)
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index d789e3b831ad..d833496e9e42 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -558,7 +558,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 	    attr->value_size > LPM_VAL_SIZE_MAX)
 		return ERR_PTR(-EINVAL);
 
-	trie = kzalloc(sizeof(*trie), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	trie = bpf_map_area_alloc(sizeof(*trie), NUMA_NO_NODE);
 	if (!trie)
 		return ERR_PTR(-ENOMEM);
 
@@ -609,7 +609,7 @@ static void trie_free(struct bpf_map *map)
 	}
 
 out:
-	kfree(trie);
+	bpf_map_area_free(trie);
 }
 
 static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index df8062cb258c..b483aea35f41 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -164,7 +164,7 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-E2BIG);
 #endif
 
-	rb_map = kzalloc(sizeof(*rb_map), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	rb_map = bpf_map_area_alloc(sizeof(*rb_map), NUMA_NO_NODE);
 	if (!rb_map)
 		return ERR_PTR(-ENOMEM);
 
@@ -172,7 +172,7 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
 
 	rb_map->rb = bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node);
 	if (!rb_map->rb) {
-		kfree(rb_map);
+		bpf_map_area_free(rb_map);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -199,7 +199,7 @@ static void ringbuf_map_free(struct bpf_map *map)
 
 	rb_map = container_of(map, struct bpf_ringbuf_map, map);
 	bpf_ringbuf_free(rb_map->rb);
-	kfree(rb_map);
+	bpf_map_area_free(rb_map);
 }
 
 static void *ringbuf_map_lookup_elem(struct bpf_map *map, void *key)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 763d77162d0c..d0c43384d8bf 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -41,7 +41,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 	    attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
 		return ERR_PTR(-EINVAL);
 
-	stab = kzalloc(sizeof(*stab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	stab = bpf_map_area_alloc(sizeof(*stab), NUMA_NO_NODE);
 	if (!stab)
 		return ERR_PTR(-ENOMEM);
 
@@ -52,7 +52,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 				       sizeof(struct sock *),
 				       stab->map.numa_node);
 	if (!stab->sks) {
-		kfree(stab);
+		bpf_map_area_free(stab);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -361,7 +361,7 @@ static void sock_map_free(struct bpf_map *map)
 	synchronize_rcu();
 
 	bpf_map_area_free(stab->sks);
-	kfree(stab);
+	bpf_map_area_free(stab);
 }
 
 static void sock_map_release_progs(struct bpf_map *map)
@@ -1076,7 +1076,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 	if (attr->key_size > MAX_BPF_STACK)
 		return ERR_PTR(-E2BIG);
 
-	htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
@@ -1106,7 +1106,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 
 	return &htab->map;
 free_htab:
-	kfree(htab);
+	bpf_map_area_free(htab);
 	return ERR_PTR(err);
 }
 
@@ -1159,7 +1159,7 @@ static void sock_hash_free(struct bpf_map *map)
 	synchronize_rcu();
 
 	bpf_map_area_free(htab->buckets);
-	kfree(htab);
+	bpf_map_area_free(htab);
 }
 
 static void *sock_hash_lookup_sys(struct bpf_map *map, void *key)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 05/15] bpf: Introduce helpers for container of struct bpf_map
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (3 preceding siblings ...)
  2022-07-29 15:23 ` [RFC PATCH bpf-next 04/15] bpf: Use bpf_map_area_alloc consistently on " Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-08-02  4:58   ` Alexei Starovoitov
  2022-07-29 15:23 ` [RFC PATCH bpf-next 06/15] bpf: Use bpf_map_container_alloc helpers in various bpf maps Yafang Shao
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Currently bpf_map_area_alloc() is used to allocate a container of struct
bpf_map or members in this container. To distinguish the map creation
and other members, let split it into two different helpers,
  - bpf_map_container_alloc()
    Used to allocate a container of struct bpf_map, the container is as
    follows,
      struct bpf_map_container {
        struct bpf_map map;  // the map must be the first member
        ....
      };
    Pls. note that the struct bpf_map_contianer is a abstract one, which
    can be struct bpf_array, struct bpf_bloom_filter and etc.

    In this helper, it will call bpf_map_save_memcg() to init memcg
    relevant data in the bpf map. And these data will be cleared in
    bpf_map_container_free().

  - bpf_map_area_alloc()
    Now it is used to allocate the members in a contianer only.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/bpf.h  |  4 ++++
 kernel/bpf/syscall.c | 56 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 20c26aed7896..2d971b0eb24b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1634,9 +1634,13 @@ void bpf_map_inc_with_uref(struct bpf_map *map);
 struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map);
 void bpf_map_put_with_uref(struct bpf_map *map);
 void bpf_map_put(struct bpf_map *map);
+void *bpf_map_container_alloc(u64 size, int numa_node);
+void *bpf_map_container_mmapable_alloc(u64 size, int numa_node,
+				       u32 align, u32 offset);
 void *bpf_map_area_alloc(u64 size, int numa_node);
 void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
 void bpf_map_area_free(void *base);
+void bpf_map_container_free(void *base);
 bool bpf_map_write_active(const struct bpf_map *map);
 void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
 int  generic_map_lookup_batch(struct bpf_map *map,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 83c7136c5788..1a1a81a11b37 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -495,6 +495,62 @@ static void bpf_map_release_memcg(struct bpf_map *map)
 }
 #endif
 
+/*
+ * The return pointer is a bpf_map container, as follow,
+ *   struct bpf_map_container {
+ *       struct bpf_map map;
+ *       ...
+ *   };
+ *
+ * It is used in map creation path.
+ */
+void *bpf_map_container_alloc(u64 size, int numa_node)
+{
+	struct bpf_map *map;
+	void *container;
+
+	container = __bpf_map_area_alloc(size, numa_node, false);
+	if (!container)
+		return NULL;
+
+	map = (struct bpf_map *)container;
+	bpf_map_save_memcg(map);
+
+	return container;
+}
+
+void *bpf_map_container_mmapable_alloc(u64 size, int numa_node, u32 align,
+				       u32 offset)
+{
+	struct bpf_map *map;
+	void *container;
+	void *ptr;
+
+	/* kmalloc'ed memory can't be mmap'ed, use explicit vmalloc */
+	ptr = __bpf_map_area_alloc(size, numa_node, true);
+	if (!ptr)
+		return NULL;
+
+	container = ptr + align - offset;
+	map = (struct bpf_map *)container;
+	bpf_map_save_memcg(map);
+
+	return ptr;
+}
+
+void bpf_map_container_free(void *container)
+{
+	struct bpf_map *map;
+
+	if (!container)
+		return;
+
+	map = (struct bpf_map *)container;
+	bpf_map_release_memcg(map);
+
+	kvfree(container);
+}
+
 static int bpf_map_kptr_off_cmp(const void *a, const void *b)
 {
 	const struct bpf_map_value_off_desc *off_desc1 = a, *off_desc2 = b;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 06/15] bpf: Use bpf_map_container_alloc helpers in various bpf maps
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (4 preceding siblings ...)
  2022-07-29 15:23 ` [RFC PATCH bpf-next 05/15] bpf: Introduce helpers for container of struct bpf_map Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 07/15] bpf: Define bpf_map_get_memcg for !CONFIG_MEMCG_KMEM Yafang Shao
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

With the preparation of previous patch, now we can use the helper
bpf_map_container_alloc() to allocate all bpf maps.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/bpf.h            |  1 -
 kernel/bpf/arraymap.c          | 19 +++++++++++--------
 kernel/bpf/bloom_filter.c      |  4 ++--
 kernel/bpf/bpf_local_storage.c |  6 +++---
 kernel/bpf/bpf_struct_ops.c    |  4 ++--
 kernel/bpf/cpumap.c            |  6 +++---
 kernel/bpf/devmap.c            |  6 +++---
 kernel/bpf/hashtab.c           |  6 +++---
 kernel/bpf/local_storage.c     |  4 ++--
 kernel/bpf/lpm_trie.c          |  4 ++--
 kernel/bpf/offload.c           |  2 +-
 kernel/bpf/queue_stack_maps.c  |  4 ++--
 kernel/bpf/reuseport_array.c   |  4 ++--
 kernel/bpf/ringbuf.c           |  6 +++---
 kernel/bpf/stackmap.c          |  6 +++---
 kernel/bpf/syscall.c           |  8 --------
 net/core/sock_map.c            | 12 ++++++------
 net/xdp/xskmap.c               |  4 ++--
 18 files changed, 50 insertions(+), 56 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 2d971b0eb24b..3f9893e14124 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1638,7 +1638,6 @@ void *bpf_map_container_alloc(u64 size, int numa_node);
 void *bpf_map_container_mmapable_alloc(u64 size, int numa_node,
 				       u32 align, u32 offset);
 void *bpf_map_area_alloc(u64 size, int numa_node);
-void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
 void bpf_map_area_free(void *base);
 void bpf_map_container_free(void *base);
 bool bpf_map_write_active(const struct bpf_map *map);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index d3e734bf8056..9517619dbe8d 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -126,16 +126,18 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 
 	/* allocate all map elements and zero-initialize them */
 	if (attr->map_flags & BPF_F_MMAPABLE) {
+		u32 align = PAGE_ALIGN(sizeof(struct bpf_array));
+		u32 offset = offsetof(struct bpf_array, value);
 		void *data;
 
 		/* kmalloc'ed memory can't be mmap'ed, use explicit vmalloc */
-		data = bpf_map_area_mmapable_alloc(array_size, numa_node);
+		data = bpf_map_container_mmapable_alloc(array_size, numa_node,
+							align, offset);
 		if (!data)
 			return ERR_PTR(-ENOMEM);
-		array = data + PAGE_ALIGN(sizeof(struct bpf_array))
-			- offsetof(struct bpf_array, value);
+		array = data + align - offset;
 	} else {
-		array = bpf_map_area_alloc(array_size, numa_node);
+		array = bpf_map_container_alloc(array_size, numa_node);
 	}
 	if (!array)
 		return ERR_PTR(-ENOMEM);
@@ -147,7 +149,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	array->elem_size = elem_size;
 
 	if (percpu && bpf_array_alloc_percpu(array)) {
-		bpf_map_area_free(array);
+		bpf_map_container_free(array);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -420,6 +422,7 @@ static void array_map_free(struct bpf_map *map)
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	int i;
 
+
 	if (map_value_has_kptrs(map)) {
 		for (i = 0; i < array->map.max_entries; i++)
 			bpf_map_free_kptrs(map, array_map_elem_ptr(array, i));
@@ -430,9 +433,9 @@ static void array_map_free(struct bpf_map *map)
 		bpf_array_free_percpu(array);
 
 	if (array->map.map_flags & BPF_F_MMAPABLE)
-		bpf_map_area_free(array_map_vmalloc_addr(array));
+		bpf_map_container_free(array_map_vmalloc_addr(array));
 	else
-		bpf_map_area_free(array);
+		bpf_map_container_free(array);
 }
 
 static void array_map_seq_show_elem(struct bpf_map *map, void *key,
@@ -774,7 +777,7 @@ static void fd_array_map_free(struct bpf_map *map)
 	for (i = 0; i < array->map.max_entries; i++)
 		BUG_ON(array->ptrs[i] != NULL);
 
-	bpf_map_area_free(array);
+	bpf_map_container_free(array);
 }
 
 static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key)
diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
index b9ea539a5561..8b3194903b52 100644
--- a/kernel/bpf/bloom_filter.c
+++ b/kernel/bpf/bloom_filter.c
@@ -142,7 +142,7 @@ static struct bpf_map *bloom_map_alloc(union bpf_attr *attr)
 	}
 
 	bitset_bytes = roundup(bitset_bytes, sizeof(unsigned long));
-	bloom = bpf_map_area_alloc(sizeof(*bloom) + bitset_bytes, numa_node);
+	bloom = bpf_map_container_alloc(sizeof(*bloom) + bitset_bytes, numa_node);
 
 	if (!bloom)
 		return ERR_PTR(-ENOMEM);
@@ -168,7 +168,7 @@ static void bloom_map_free(struct bpf_map *map)
 	struct bpf_bloom_filter *bloom =
 		container_of(map, struct bpf_bloom_filter, map);
 
-	bpf_map_area_free(bloom);
+	bpf_map_container_free(bloom);
 }
 
 static void *bloom_map_lookup_elem(struct bpf_map *map, void *key)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 4ee2e7286c23..0b50cb2e1d5b 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -582,7 +582,7 @@ void bpf_local_storage_map_free(struct bpf_local_storage_map *smap,
 	synchronize_rcu();
 
 	kvfree(smap->buckets);
-	bpf_map_area_free(smap);
+	bpf_map_container_free(smap);
 }
 
 int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
@@ -610,7 +610,7 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	unsigned int i;
 	u32 nbuckets;
 
-	smap = bpf_map_area_alloc(sizeof(*smap), NUMA_NO_NODE);
+	smap = bpf_map_container_alloc(sizeof(*smap), NUMA_NO_NODE);
 	if (!smap)
 		return ERR_PTR(-ENOMEM);
 	bpf_map_init_from_attr(&smap->map, attr);
@@ -623,7 +623,7 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
 				 GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!smap->buckets) {
-		bpf_map_area_free(smap);
+		bpf_map_container_free(smap);
 		return ERR_PTR(-ENOMEM);
 	}
 
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 84b2d9dba79a..66df3059a3fe 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -580,7 +580,7 @@ static void bpf_struct_ops_map_free(struct bpf_map *map)
 	bpf_map_area_free(st_map->links);
 	bpf_jit_free_exec(st_map->image);
 	bpf_map_area_free(st_map->uvalue);
-	bpf_map_area_free(st_map);
+	bpf_map_container_free(st_map);
 }
 
 static int bpf_struct_ops_map_alloc_check(union bpf_attr *attr)
@@ -618,7 +618,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
 		 */
 		(vt->size - sizeof(struct bpf_struct_ops_value));
 
-	st_map = bpf_map_area_alloc(st_map_size, NUMA_NO_NODE);
+	st_map = bpf_map_container_alloc(st_map_size, NUMA_NO_NODE);
 	if (!st_map)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index b5ba34ddd4b6..23e941826eec 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -97,7 +97,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	    attr->map_flags & ~BPF_F_NUMA_NODE)
 		return ERR_PTR(-EINVAL);
 
-	cmap = bpf_map_area_alloc(sizeof(*cmap), NUMA_NO_NODE);
+	cmap = bpf_map_container_alloc(sizeof(*cmap), NUMA_NO_NODE);
 	if (!cmap)
 		return ERR_PTR(-ENOMEM);
 
@@ -118,7 +118,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 
 	return &cmap->map;
 free_cmap:
-	bpf_map_area_free(cmap);
+	bpf_map_container_free(cmap);
 	return ERR_PTR(err);
 }
 
@@ -623,7 +623,7 @@ static void cpu_map_free(struct bpf_map *map)
 		__cpu_map_entry_replace(cmap, i, NULL); /* call_rcu */
 	}
 	bpf_map_area_free(cmap->cpu_map);
-	bpf_map_area_free(cmap);
+	bpf_map_container_free(cmap);
 }
 
 /* Elements are kept alive by RCU; either by rcu_read_lock() (from syscall) or
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index f9a87dcc5535..3e99c10f1729 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -163,13 +163,13 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	if (!capable(CAP_NET_ADMIN))
 		return ERR_PTR(-EPERM);
 
-	dtab = bpf_map_area_alloc(sizeof(*dtab), NUMA_NO_NODE);
+	dtab = bpf_map_container_alloc(sizeof(*dtab), NUMA_NO_NODE);
 	if (!dtab)
 		return ERR_PTR(-ENOMEM);
 
 	err = dev_map_init_map(dtab, attr);
 	if (err) {
-		bpf_map_area_free(dtab);
+		bpf_map_container_free(dtab);
 		return ERR_PTR(err);
 	}
 
@@ -240,7 +240,7 @@ static void dev_map_free(struct bpf_map *map)
 		bpf_map_area_free(dtab->netdev_map);
 	}
 
-	bpf_map_area_free(dtab);
+	bpf_map_container_free(dtab);
 }
 
 static int dev_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 8392f7f8a8ac..d2fb144276ab 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -495,7 +495,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	struct bpf_htab *htab;
 	int err, i;
 
-	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE);
+	htab = bpf_map_container_alloc(sizeof(*htab), NUMA_NO_NODE);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
@@ -579,7 +579,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	bpf_map_area_free(htab->buckets);
 free_htab:
 	lockdep_unregister_key(&htab->lockdep_key);
-	bpf_map_area_free(htab);
+	bpf_map_container_free(htab);
 	return ERR_PTR(err);
 }
 
@@ -1496,7 +1496,7 @@ static void htab_map_free(struct bpf_map *map)
 	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
 		free_percpu(htab->map_locked[i]);
 	lockdep_unregister_key(&htab->lockdep_key);
-	bpf_map_area_free(htab);
+	bpf_map_container_free(htab);
 }
 
 static void htab_map_seq_show_elem(struct bpf_map *map, void *key,
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 098cf336fae6..963dc82e3c50 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -313,7 +313,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 		/* max_entries is not used and enforced to be 0 */
 		return ERR_PTR(-EINVAL);
 
-	map = bpf_map_area_alloc(sizeof(struct bpf_cgroup_storage_map), numa_node);
+	map = bpf_map_container_alloc(sizeof(struct bpf_cgroup_storage_map), numa_node);
 	if (!map)
 		return ERR_PTR(-ENOMEM);
 
@@ -345,7 +345,7 @@ static void cgroup_storage_map_free(struct bpf_map *_map)
 	WARN_ON(!RB_EMPTY_ROOT(&map->root));
 	WARN_ON(!list_empty(&map->list));
 
-	bpf_map_area_free(map);
+	bpf_map_container_free(map);
 }
 
 static int cgroup_storage_delete_elem(struct bpf_map *map, void *key)
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index d833496e9e42..15ccef2f10a3 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -558,7 +558,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 	    attr->value_size > LPM_VAL_SIZE_MAX)
 		return ERR_PTR(-EINVAL);
 
-	trie = bpf_map_area_alloc(sizeof(*trie), NUMA_NO_NODE);
+	trie = bpf_map_container_alloc(sizeof(*trie), NUMA_NO_NODE);
 	if (!trie)
 		return ERR_PTR(-ENOMEM);
 
@@ -609,7 +609,7 @@ static void trie_free(struct bpf_map *map)
 	}
 
 out:
-	bpf_map_area_free(trie);
+	bpf_map_container_free(trie);
 }
 
 static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 5a629a1b971c..02e3f4ef19f5 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -372,7 +372,7 @@ struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr)
 	    attr->map_type != BPF_MAP_TYPE_HASH)
 		return ERR_PTR(-EINVAL);
 
-	offmap = kzalloc(sizeof(*offmap), GFP_USER | __GFP_NOWARN);
+	offmap = bpf_map_container_alloc(sizeof(*offmap), NUMA_NO_NODE);
 	if (!offmap)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index 8a5e060de63b..d4c0284c9089 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -74,7 +74,7 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 	size = (u64) attr->max_entries + 1;
 	queue_size = sizeof(*qs) + size * attr->value_size;
 
-	qs = bpf_map_area_alloc(queue_size, numa_node);
+	qs = bpf_map_container_alloc(queue_size, numa_node);
 	if (!qs)
 		return ERR_PTR(-ENOMEM);
 
@@ -92,7 +92,7 @@ static void queue_stack_map_free(struct bpf_map *map)
 {
 	struct bpf_queue_stack *qs = bpf_queue_stack(map);
 
-	bpf_map_area_free(qs);
+	bpf_map_container_free(qs);
 }
 
 static int __queue_map_get(struct bpf_map *map, void *value, bool delete)
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index e2618fb5870e..d36a60ca68f4 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -146,7 +146,7 @@ static void reuseport_array_free(struct bpf_map *map)
 	 * Once reaching here, all sk->sk_user_data is not
 	 * referencing this "array". "array" can be freed now.
 	 */
-	bpf_map_area_free(array);
+	bpf_map_container_free(array);
 }
 
 static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
@@ -158,7 +158,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 		return ERR_PTR(-EPERM);
 
 	/* allocate all map elements and zero-initialize them */
-	array = bpf_map_area_alloc(struct_size(array, ptrs, attr->max_entries), numa_node);
+	array = bpf_map_container_alloc(struct_size(array, ptrs, attr->max_entries), numa_node);
 	if (!array)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index b483aea35f41..35258aa45236 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -164,7 +164,7 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-E2BIG);
 #endif
 
-	rb_map = bpf_map_area_alloc(sizeof(*rb_map), NUMA_NO_NODE);
+	rb_map = bpf_map_container_alloc(sizeof(*rb_map), NUMA_NO_NODE);
 	if (!rb_map)
 		return ERR_PTR(-ENOMEM);
 
@@ -172,7 +172,7 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
 
 	rb_map->rb = bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node);
 	if (!rb_map->rb) {
-		bpf_map_area_free(rb_map);
+		bpf_map_container_free(rb_map);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -199,7 +199,7 @@ static void ringbuf_map_free(struct bpf_map *map)
 
 	rb_map = container_of(map, struct bpf_ringbuf_map, map);
 	bpf_ringbuf_free(rb_map->rb);
-	bpf_map_area_free(rb_map);
+	bpf_map_container_free(rb_map);
 }
 
 static void *ringbuf_map_lookup_elem(struct bpf_map *map, void *key)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 1adbe67cdb95..0c3185406f56 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -100,7 +100,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-E2BIG);
 
 	cost = n_buckets * sizeof(struct stack_map_bucket *) + sizeof(*smap);
-	smap = bpf_map_area_alloc(cost, bpf_map_attr_numa_node(attr));
+	smap = bpf_map_container_alloc(cost, bpf_map_attr_numa_node(attr));
 	if (!smap)
 		return ERR_PTR(-ENOMEM);
 
@@ -120,7 +120,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 put_buffers:
 	put_callchain_buffers();
 free_smap:
-	bpf_map_area_free(smap);
+	bpf_map_container_free(smap);
 	return ERR_PTR(err);
 }
 
@@ -650,7 +650,7 @@ static void stack_map_free(struct bpf_map *map)
 
 	bpf_map_area_free(smap->elems);
 	pcpu_freelist_destroy(&smap->freelist);
-	bpf_map_area_free(smap);
+	bpf_map_container_free(smap);
 	put_callchain_buffers();
 }
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1a1a81a11b37..a6f68ade200f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -339,11 +339,6 @@ void *bpf_map_area_alloc(u64 size, int numa_node)
 	return __bpf_map_area_alloc(size, numa_node, false);
 }
 
-void *bpf_map_area_mmapable_alloc(u64 size, int numa_node)
-{
-	return __bpf_map_area_alloc(size, numa_node, true);
-}
-
 void bpf_map_area_free(void *area)
 {
 	kvfree(area);
@@ -669,7 +664,6 @@ static void bpf_map_free_deferred(struct work_struct *work)
 
 	security_bpf_map_free(map);
 	kfree(map->off_arr);
-	bpf_map_release_memcg(map);
 	/* implementation dependent freeing, map_free callback also does
 	 * bpf_map_free_kptr_off_tab, if needed.
 	 */
@@ -1218,8 +1212,6 @@ static int map_create(union bpf_attr *attr)
 	if (err)
 		goto free_map_sec;
 
-	bpf_map_save_memcg(map);
-
 	err = bpf_map_new_fd(map, f_flags);
 	if (err < 0) {
 		/* failed to allocate fd.
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index d0c43384d8bf..4b5876c4a47d 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -41,7 +41,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 	    attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
 		return ERR_PTR(-EINVAL);
 
-	stab = bpf_map_area_alloc(sizeof(*stab), NUMA_NO_NODE);
+	stab = bpf_map_container_alloc(sizeof(*stab), NUMA_NO_NODE);
 	if (!stab)
 		return ERR_PTR(-ENOMEM);
 
@@ -52,7 +52,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 				       sizeof(struct sock *),
 				       stab->map.numa_node);
 	if (!stab->sks) {
-		bpf_map_area_free(stab);
+		bpf_map_container_free(stab);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -361,7 +361,7 @@ static void sock_map_free(struct bpf_map *map)
 	synchronize_rcu();
 
 	bpf_map_area_free(stab->sks);
-	bpf_map_area_free(stab);
+	bpf_map_container_free(stab);
 }
 
 static void sock_map_release_progs(struct bpf_map *map)
@@ -1076,7 +1076,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 	if (attr->key_size > MAX_BPF_STACK)
 		return ERR_PTR(-E2BIG);
 
-	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE);
+	htab = bpf_map_container_alloc(sizeof(*htab), NUMA_NO_NODE);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
@@ -1106,7 +1106,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 
 	return &htab->map;
 free_htab:
-	bpf_map_area_free(htab);
+	bpf_map_container_free(htab);
 	return ERR_PTR(err);
 }
 
@@ -1159,7 +1159,7 @@ static void sock_hash_free(struct bpf_map *map)
 	synchronize_rcu();
 
 	bpf_map_area_free(htab->buckets);
-	bpf_map_area_free(htab);
+	bpf_map_container_free(htab);
 }
 
 static void *sock_hash_lookup_sys(struct bpf_map *map, void *key)
diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
index acc8e52a4f5f..6cf98f1c621a 100644
--- a/net/xdp/xskmap.c
+++ b/net/xdp/xskmap.c
@@ -75,7 +75,7 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 	numa_node = bpf_map_attr_numa_node(attr);
 	size = struct_size(m, xsk_map, attr->max_entries);
 
-	m = bpf_map_area_alloc(size, numa_node);
+	m = bpf_map_container_alloc(size, numa_node);
 	if (!m)
 		return ERR_PTR(-ENOMEM);
 
@@ -90,7 +90,7 @@ static void xsk_map_free(struct bpf_map *map)
 	struct xsk_map *m = container_of(map, struct xsk_map, map);
 
 	synchronize_net();
-	bpf_map_area_free(m);
+	bpf_map_container_free(m);
 }
 
 static int xsk_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 07/15] bpf: Define bpf_map_get_memcg for !CONFIG_MEMCG_KMEM
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (5 preceding siblings ...)
  2022-07-29 15:23 ` [RFC PATCH bpf-next 06/15] bpf: Use bpf_map_container_alloc helpers in various bpf maps Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 08/15] bpf: Use scope-based charge for bpf_map_area_alloc Yafang Shao
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Then we can use this helper when CONFIG_MEMCG_KMEM is not set.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/memcontrol.h | 10 ++++++++++
 kernel/bpf/syscall.c       |  7 ++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 9ecead1042b9..2f0a611f12e5 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -361,6 +361,11 @@ struct mem_cgroup {
 
 extern struct mem_cgroup *root_mem_cgroup;
 
+static inline struct mem_cgroup *root_memcg(void)
+{
+	return root_mem_cgroup;
+}
+
 enum page_memcg_data_flags {
 	/* page->memcg_data is a pointer to an objcgs vector */
 	MEMCG_DATA_OBJCGS = (1UL << 0),
@@ -1138,6 +1143,11 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 #define MEM_CGROUP_ID_SHIFT	0
 #define MEM_CGROUP_ID_MAX	0
 
+static inline struct mem_cgroup *root_memcg(void)
+{
+	return NULL;
+}
+
 static inline struct mem_cgroup *folio_memcg(struct folio *folio)
 {
 	return NULL;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a6f68ade200f..7289ee1a300a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -433,7 +433,7 @@ static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
 	if (map->objcg)
 		return get_mem_cgroup_from_objcg(map->objcg);
 
-	return root_mem_cgroup;
+	return root_memcg();
 }
 
 void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
@@ -488,6 +488,11 @@ static void bpf_map_save_memcg(struct bpf_map *map)
 static void bpf_map_release_memcg(struct bpf_map *map)
 {
 }
+
+static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
+{
+	return root_memcg();
+}
 #endif
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 08/15] bpf: Use scope-based charge for bpf_map_area_alloc
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (6 preceding siblings ...)
  2022-07-29 15:23 ` [RFC PATCH bpf-next 07/15] bpf: Define bpf_map_get_memcg for !CONFIG_MEMCG_KMEM Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 09/15] bpf: Use bpf_map_kzalloc in arraymap Yafang Shao
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Let's also get memcg from the bpf map in bpf_map_area_alloc() instead of
using the current memcg.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/bpf.h         |  2 +-
 kernel/bpf/bpf_struct_ops.c |  4 ++--
 kernel/bpf/cpumap.c         |  2 +-
 kernel/bpf/devmap.c         | 12 ++++++++----
 kernel/bpf/hashtab.c        |  5 +++--
 kernel/bpf/ringbuf.c        | 14 +++++++++-----
 kernel/bpf/stackmap.c       |  3 ++-
 kernel/bpf/syscall.c        | 28 ++++++++++++++++++----------
 net/core/sock_map.c         |  7 ++++---
 9 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3f9893e14124..711d9b1829d4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1637,7 +1637,7 @@ void bpf_map_put(struct bpf_map *map);
 void *bpf_map_container_alloc(u64 size, int numa_node);
 void *bpf_map_container_mmapable_alloc(u64 size, int numa_node,
 				       u32 align, u32 offset);
-void *bpf_map_area_alloc(u64 size, int numa_node);
+void *bpf_map_area_alloc(struct bpf_map *map, u64 size, int numa_node);
 void bpf_map_area_free(void *base);
 void bpf_map_container_free(void *base);
 bool bpf_map_write_active(const struct bpf_map *map);
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 66df3059a3fe..874fda7e2b8b 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -625,9 +625,9 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
 	st_map->st_ops = st_ops;
 	map = &st_map->map;
 
-	st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE);
+	st_map->uvalue = bpf_map_area_alloc(map, vt->size, NUMA_NO_NODE);
 	st_map->links =
-		bpf_map_area_alloc(btf_type_vlen(t) * sizeof(struct bpf_links *),
+		bpf_map_area_alloc(map, btf_type_vlen(t) * sizeof(struct bpf_links *),
 				   NUMA_NO_NODE);
 	st_map->image = bpf_jit_alloc_exec(PAGE_SIZE);
 	if (!st_map->uvalue || !st_map->links || !st_map->image) {
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 23e941826eec..95c1642deaf6 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -110,7 +110,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	}
 
 	/* Alloc array for possible remote "destination" CPUs */
-	cmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries *
+	cmap->cpu_map = bpf_map_area_alloc(&cmap->map, cmap->map.max_entries *
 					   sizeof(struct bpf_cpu_map_entry *),
 					   cmap->map.numa_node);
 	if (!cmap->cpu_map)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 3e99c10f1729..b625d578bc93 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -88,13 +88,15 @@ static DEFINE_PER_CPU(struct list_head, dev_flush_list);
 static DEFINE_SPINLOCK(dev_map_lock);
 static LIST_HEAD(dev_map_list);
 
-static struct hlist_head *dev_map_create_hash(unsigned int entries,
+static struct hlist_head *dev_map_create_hash(struct bpf_map *map,
+					      unsigned int entries,
 					      int numa_node)
 {
 	int i;
 	struct hlist_head *hash;
 
-	hash = bpf_map_area_alloc((u64) entries * sizeof(*hash), numa_node);
+	hash = bpf_map_area_alloc(map, (u64) entries * sizeof(*hash),
+				  numa_node);
 	if (hash != NULL)
 		for (i = 0; i < entries; i++)
 			INIT_HLIST_HEAD(&hash[i]);
@@ -138,14 +140,16 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
 	}
 
 	if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
-		dtab->dev_index_head = dev_map_create_hash(dtab->n_buckets,
+		dtab->dev_index_head = dev_map_create_hash(&dtab->map,
+							   dtab->n_buckets,
 							   dtab->map.numa_node);
 		if (!dtab->dev_index_head)
 			return -ENOMEM;
 
 		spin_lock_init(&dtab->index_lock);
 	} else {
-		dtab->netdev_map = bpf_map_area_alloc((u64) dtab->map.max_entries *
+		dtab->netdev_map = bpf_map_area_alloc(&dtab->map,
+						      (u64) dtab->map.max_entries *
 						      sizeof(struct bpf_dtab_netdev *),
 						      dtab->map.numa_node);
 		if (!dtab->netdev_map)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index d2fb144276ab..2a34a115e14f 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -331,7 +331,8 @@ static int prealloc_init(struct bpf_htab *htab)
 	if (htab_has_extra_elems(htab))
 		num_entries += num_possible_cpus();
 
-	htab->elems = bpf_map_area_alloc((u64)htab->elem_size * num_entries,
+	htab->elems = bpf_map_area_alloc(&htab->map,
+					 (u64)htab->elem_size * num_entries,
 					 htab->map.numa_node);
 	if (!htab->elems)
 		return -ENOMEM;
@@ -532,7 +533,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 		goto free_htab;
 
 	err = -ENOMEM;
-	htab->buckets = bpf_map_area_alloc(htab->n_buckets *
+	htab->buckets = bpf_map_area_alloc(&htab->map, htab->n_buckets *
 					   sizeof(struct bucket),
 					   htab->map.numa_node);
 	if (!htab->buckets)
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 35258aa45236..7c875d4d5b2f 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -59,7 +59,9 @@ struct bpf_ringbuf_hdr {
 	u32 pg_off;
 };
 
-static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
+static struct bpf_ringbuf *bpf_ringbuf_area_alloc(struct bpf_map *map,
+						  size_t data_sz,
+						  int numa_node)
 {
 	const gfp_t flags = GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL |
 			    __GFP_NOWARN | __GFP_ZERO;
@@ -89,7 +91,7 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
 	 * user-space implementations significantly.
 	 */
 	array_size = (nr_meta_pages + 2 * nr_data_pages) * sizeof(*pages);
-	pages = bpf_map_area_alloc(array_size, numa_node);
+	pages = bpf_map_area_alloc(map, array_size, numa_node);
 	if (!pages)
 		return NULL;
 
@@ -127,11 +129,12 @@ static void bpf_ringbuf_notify(struct irq_work *work)
 	wake_up_all(&rb->waitq);
 }
 
-static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node)
+static struct bpf_ringbuf *bpf_ringbuf_alloc(struct bpf_map *map,
+					     size_t data_sz, int numa_node)
 {
 	struct bpf_ringbuf *rb;
 
-	rb = bpf_ringbuf_area_alloc(data_sz, numa_node);
+	rb = bpf_ringbuf_area_alloc(map, data_sz, numa_node);
 	if (!rb)
 		return NULL;
 
@@ -170,7 +173,8 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
 
 	bpf_map_init_from_attr(&rb_map->map, attr);
 
-	rb_map->rb = bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node);
+	rb_map->rb = bpf_ringbuf_alloc(&rb_map->map, attr->max_entries,
+				       rb_map->map.numa_node);
 	if (!rb_map->rb) {
 		bpf_map_container_free(rb_map);
 		return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 0c3185406f56..c9a91ca05a03 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -48,7 +48,8 @@ static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
 			(u64)smap->map.value_size;
 	int err;
 
-	smap->elems = bpf_map_area_alloc(elem_size * smap->map.max_entries,
+	smap->elems = bpf_map_area_alloc(&smap->map,
+					 elem_size * smap->map.max_entries,
 					 smap->map.numa_node);
 	if (!smap->elems)
 		return -ENOMEM;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7289ee1a300a..4f893d2ac4fd 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -334,16 +334,6 @@ static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
 			flags, numa_node, __builtin_return_address(0));
 }
 
-void *bpf_map_area_alloc(u64 size, int numa_node)
-{
-	return __bpf_map_area_alloc(size, numa_node, false);
-}
-
-void bpf_map_area_free(void *area)
-{
-	kvfree(area);
-}
-
 static u32 bpf_map_flags_retain_permanent(u32 flags)
 {
 	/* Some map creation flags are not tied to the map object but
@@ -495,6 +485,24 @@ static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
 }
 #endif
 
+void *bpf_map_area_alloc(struct bpf_map *map, u64 size, int numa_node)
+{
+	struct mem_cgroup *memcg, *old_memcg;
+	void *ptr;
+
+	memcg = bpf_map_get_memcg(map);
+	old_memcg = set_active_memcg(memcg);
+	ptr = __bpf_map_area_alloc(size, numa_node, false);
+	set_active_memcg(old_memcg);
+
+	return ptr;
+}
+
+void bpf_map_area_free(void *area)
+{
+	kvfree(area);
+}
+
 /*
  * The return pointer is a bpf_map container, as follow,
  *   struct bpf_map_container {
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 4b5876c4a47d..1f49dc89822c 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -48,8 +48,9 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 	bpf_map_init_from_attr(&stab->map, attr);
 	raw_spin_lock_init(&stab->lock);
 
-	stab->sks = bpf_map_area_alloc((u64) stab->map.max_entries *
-				       sizeof(struct sock *),
+	stab->sks = bpf_map_area_alloc(&stab->map,
+				       (u64)stab->map.max_entries *
+					sizeof(struct sock *),
 				       stab->map.numa_node);
 	if (!stab->sks) {
 		bpf_map_container_free(stab);
@@ -1091,7 +1092,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 		goto free_htab;
 	}
 
-	htab->buckets = bpf_map_area_alloc(htab->buckets_num *
+	htab->buckets = bpf_map_area_alloc(&htab->map, htab->buckets_num *
 					   sizeof(struct bpf_shtab_bucket),
 					   htab->map.numa_node);
 	if (!htab->buckets) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 09/15] bpf: Use bpf_map_kzalloc in arraymap
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (7 preceding siblings ...)
  2022-07-29 15:23 ` [RFC PATCH bpf-next 08/15] bpf: Use scope-based charge for bpf_map_area_alloc Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 10/15] bpf: Use bpf_map_pages_alloc in ringbuf Yafang Shao
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Allocates memory after map creation, then we can use the generic helper
bpf_map_kzalloc() instead of the open-coded kzalloc().

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/arraymap.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 9517619dbe8d..bfcfc5df9983 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -1093,20 +1093,20 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr)
 	struct bpf_array_aux *aux;
 	struct bpf_map *map;
 
-	aux = kzalloc(sizeof(*aux), GFP_KERNEL_ACCOUNT);
-	if (!aux)
+	map = array_map_alloc(attr);
+	if (IS_ERR(map))
 		return ERR_PTR(-ENOMEM);
 
+	aux = bpf_map_kzalloc(map, sizeof(*aux), GFP_KERNEL);
+	if (!aux) {
+		array_map_free(map);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	INIT_WORK(&aux->work, prog_array_map_clear_deferred);
 	INIT_LIST_HEAD(&aux->poke_progs);
 	mutex_init(&aux->poke_mutex);
 
-	map = array_map_alloc(attr);
-	if (IS_ERR(map)) {
-		kfree(aux);
-		return map;
-	}
-
 	container_of(map, struct bpf_array, map)->aux = aux;
 	aux->map = map;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 10/15] bpf: Use bpf_map_pages_alloc in ringbuf
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (8 preceding siblings ...)
  2022-07-29 15:23 ` [RFC PATCH bpf-next 09/15] bpf: Use bpf_map_kzalloc in arraymap Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-08-01 23:16   ` Andrii Nakryiko
  2022-07-29 15:23 ` [RFC PATCH bpf-next 11/15] bpf: Use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Introduce new helper bpf_map_pages_alloc() for this memory allocation.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/bpf.h  |  4 ++++
 kernel/bpf/ringbuf.c | 27 +++++++++------------------
 kernel/bpf/syscall.c | 41 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 54 insertions(+), 18 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 711d9b1829d4..4af72d2b6d73 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1638,8 +1638,12 @@ void *bpf_map_container_alloc(u64 size, int numa_node);
 void *bpf_map_container_mmapable_alloc(u64 size, int numa_node,
 				       u32 align, u32 offset);
 void *bpf_map_area_alloc(struct bpf_map *map, u64 size, int numa_node);
+void *bpf_map_pages_alloc(struct bpf_map *map, struct page **pages,
+			  int nr_meta_pages, int nr_data_pages, int nid,
+			  gfp_t flags, unsigned int order);
 void bpf_map_area_free(void *base);
 void bpf_map_container_free(void *base);
+void bpf_map_pages_free(struct page **pages, int nr_pages);
 bool bpf_map_write_active(const struct bpf_map *map);
 void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
 int  generic_map_lookup_batch(struct bpf_map *map,
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 7c875d4d5b2f..25973cab251d 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -63,15 +63,15 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(struct bpf_map *map,
 						  size_t data_sz,
 						  int numa_node)
 {
-	const gfp_t flags = GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL |
+	const gfp_t flags = GFP_KERNEL | __GFP_RETRY_MAYFAIL |
 			    __GFP_NOWARN | __GFP_ZERO;
 	int nr_meta_pages = RINGBUF_PGOFF + RINGBUF_POS_PAGES;
 	int nr_data_pages = data_sz >> PAGE_SHIFT;
 	int nr_pages = nr_meta_pages + nr_data_pages;
-	struct page **pages, *page;
 	struct bpf_ringbuf *rb;
+	struct page **pages;
 	size_t array_size;
-	int i;
+	void *ptr;
 
 	/* Each data page is mapped twice to allow "virtual"
 	 * continuous read of samples wrapping around the end of ring
@@ -95,16 +95,10 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(struct bpf_map *map,
 	if (!pages)
 		return NULL;
 
-	for (i = 0; i < nr_pages; i++) {
-		page = alloc_pages_node(numa_node, flags, 0);
-		if (!page) {
-			nr_pages = i;
-			goto err_free_pages;
-		}
-		pages[i] = page;
-		if (i >= nr_meta_pages)
-			pages[nr_data_pages + i] = page;
-	}
+	ptr = bpf_map_pages_alloc(map, pages, nr_meta_pages, nr_data_pages,
+				  numa_node, flags, 0);
+	if (!ptr)
+		goto err_free_pages;
 
 	rb = vmap(pages, nr_meta_pages + 2 * nr_data_pages,
 		  VM_MAP | VM_USERMAP, PAGE_KERNEL);
@@ -116,8 +110,6 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(struct bpf_map *map,
 	}
 
 err_free_pages:
-	for (i = 0; i < nr_pages; i++)
-		__free_page(pages[i]);
 	bpf_map_area_free(pages);
 	return NULL;
 }
@@ -189,11 +181,10 @@ static void bpf_ringbuf_free(struct bpf_ringbuf *rb)
 	 * to unmap rb itself with vunmap() below
 	 */
 	struct page **pages = rb->pages;
-	int i, nr_pages = rb->nr_pages;
+	int nr_pages = rb->nr_pages;
 
 	vunmap(rb);
-	for (i = 0; i < nr_pages; i++)
-		__free_page(pages[i]);
+	bpf_map_pages_free(pages, nr_pages);
 	bpf_map_area_free(pages);
 }
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4f893d2ac4fd..5c13782839f3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -559,6 +559,47 @@ void bpf_map_container_free(void *container)
 	kvfree(container);
 }
 
+void *bpf_map_pages_alloc(struct bpf_map *map, struct page **pages,
+			  int nr_meta_pages, int nr_data_pages, int nid,
+			  gfp_t flags, unsigned int order)
+{
+	int nr_pages = nr_meta_pages + nr_data_pages;
+	struct mem_cgroup *memcg, *old_memcg;
+	struct page *page;
+	int i;
+
+	memcg = bpf_map_get_memcg(map);
+	old_memcg = set_active_memcg(memcg);
+	for (i = 0; i < nr_pages; i++) {
+		page = alloc_pages_node(nid, flags | __GFP_ACCOUNT, order);
+		if (!page) {
+			nr_pages = i;
+			set_active_memcg(old_memcg);
+			goto err_free_pages;
+		}
+		pages[i] = page;
+		if (i >= nr_meta_pages)
+			pages[nr_data_pages + i] = page;
+	}
+	set_active_memcg(old_memcg);
+
+	return pages;
+
+err_free_pages:
+	for (i = 0; i < nr_pages; i++)
+		__free_page(pages[i]);
+
+	return NULL;
+}
+
+void bpf_map_pages_free(struct page **pages, int nr_pages)
+{
+	int i;
+
+	for (i = 0; i < nr_pages; i++)
+		__free_page(pages[i]);
+}
+
 static int bpf_map_kptr_off_cmp(const void *a, const void *b)
 {
 	const struct bpf_map_value_off_desc *off_desc1 = a, *off_desc2 = b;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 11/15] bpf: Use bpf_map_kvcalloc in bpf_local_storage
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (9 preceding siblings ...)
  2022-07-29 15:23 ` [RFC PATCH bpf-next 10/15] bpf: Use bpf_map_pages_alloc in ringbuf Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 12/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup Yafang Shao
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Introduce new helper bpf_map_kvcalloc() for this memory allocation.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/bpf.h            |  8 ++++++++
 kernel/bpf/bpf_local_storage.c |  4 ++--
 kernel/bpf/syscall.c           | 14 ++++++++++++++
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4af72d2b6d73..605214fedd6d 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1662,6 +1662,8 @@ struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id);
 void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
 			   int node);
 void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags);
+void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size,
+		       gfp_t flags);
 void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 				    size_t align, gfp_t flags);
 #else
@@ -1678,6 +1680,12 @@ bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags)
 	return kzalloc(size, flags);
 }
 
+static inline void *
+bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size, gfp_t flags)
+{
+	return kvcalloc(n, size, flags);
+}
+
 static inline void __percpu *
 bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, size_t align,
 		     gfp_t flags)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 0b50cb2e1d5b..3440135c3612 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -620,8 +620,8 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	nbuckets = max_t(u32, 2, nbuckets);
 	smap->bucket_log = ilog2(nbuckets);
 
-	smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
-				 GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+	smap->buckets = bpf_map_kvcalloc(&smap->map, sizeof(*smap->buckets),
+					 nbuckets, GFP_USER | __GFP_NOWARN);
 	if (!smap->buckets) {
 		bpf_map_container_free(smap);
 		return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5c13782839f3..f7210fa8c3be 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -455,6 +455,20 @@ void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags)
 	return ptr;
 }
 
+void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size,
+		       gfp_t flags)
+{
+	struct mem_cgroup *memcg, *old_memcg;
+	void *ptr;
+
+	memcg = bpf_map_get_memcg(map);
+	old_memcg = set_active_memcg(memcg);
+	ptr = kvcalloc(n, size, flags | __GFP_ACCOUNT);
+	set_active_memcg(old_memcg);
+
+	return ptr;
+}
+
 void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 				    size_t align, gfp_t flags)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 12/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (10 preceding siblings ...)
  2022-07-29 15:23 ` [RFC PATCH bpf-next 11/15] bpf: Use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 13/15] bpf: Add new parameter into bpf_map_container_alloc Yafang Shao
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Introduce new helper get_obj_cgroup_from_cgroup() to get obj_cgroup from
a specific cgroup.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/memcontrol.h |  1 +
 mm/memcontrol.c            | 41 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 2f0a611f12e5..901a9219fe9a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1713,6 +1713,7 @@ bool mem_cgroup_kmem_disabled(void);
 int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
 void __memcg_kmem_uncharge_page(struct page *page, int order);
 
+struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp);
 struct obj_cgroup *get_obj_cgroup_from_current(void);
 struct obj_cgroup *get_obj_cgroup_from_page(struct page *page);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 618c366a2f07..762cffae0320 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2908,6 +2908,47 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
 	return objcg;
 }
 
+static struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
+{
+	struct obj_cgroup *objcg;
+
+	if (memcg_kmem_bypass())
+		return NULL;
+
+	rcu_read_lock();
+	objcg = __get_obj_cgroup_from_memcg(memcg);
+	rcu_read_unlock();
+	return objcg;
+}
+
+struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp)
+{
+	struct cgroup_subsys_state *css;
+	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
+
+	rcu_read_lock();
+	css = rcu_dereference(cgrp->subsys[memory_cgrp_id]);
+	if (!css || !css_tryget_online(css)) {
+		rcu_read_unlock();
+		cgroup_put(cgrp);
+		return ERR_PTR(-EINVAL);
+	}
+	rcu_read_unlock();
+	cgroup_put(cgrp);
+
+	memcg = mem_cgroup_from_css(css);
+	if (!memcg) {
+		css_put(css);
+		return ERR_PTR(-EINVAL);
+	}
+
+	objcg = get_obj_cgroup_from_memcg(memcg);
+	css_put(css);
+
+	return objcg;
+}
+
 __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
 {
 	struct obj_cgroup *objcg = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 13/15] bpf: Add new parameter into bpf_map_container_alloc
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (11 preceding siblings ...)
  2022-07-29 15:23 ` [RFC PATCH bpf-next 12/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 14/15] bpf: Add new map flag BPF_F_SELECTABLE_MEMCG Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
  14 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Add bpf attr into bpf_map_container_alloc(), then we can get map
creation related attributes in these function.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/bpf.h            |  6 +++---
 kernel/bpf/arraymap.c          | 15 ++++++++-------
 kernel/bpf/bloom_filter.c      |  7 ++++---
 kernel/bpf/bpf_local_storage.c |  6 +++---
 kernel/bpf/bpf_struct_ops.c    |  6 +++---
 kernel/bpf/cpumap.c            |  6 +++---
 kernel/bpf/devmap.c            |  6 +++---
 kernel/bpf/hashtab.c           |  6 +++---
 kernel/bpf/local_storage.c     |  6 +++---
 kernel/bpf/lpm_trie.c          |  6 +++---
 kernel/bpf/offload.c           |  6 +++---
 kernel/bpf/queue_stack_maps.c  |  6 +++---
 kernel/bpf/reuseport_array.c   |  7 ++++---
 kernel/bpf/ringbuf.c           |  6 +++---
 kernel/bpf/stackmap.c          |  6 +++---
 kernel/bpf/syscall.c           | 10 +++++-----
 net/core/sock_map.c            | 12 ++++++------
 net/xdp/xskmap.c               |  6 +++---
 18 files changed, 66 insertions(+), 63 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 605214fedd6d..2bd941279e6c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1634,9 +1634,9 @@ void bpf_map_inc_with_uref(struct bpf_map *map);
 struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map);
 void bpf_map_put_with_uref(struct bpf_map *map);
 void bpf_map_put(struct bpf_map *map);
-void *bpf_map_container_alloc(u64 size, int numa_node);
-void *bpf_map_container_mmapable_alloc(u64 size, int numa_node,
-				       u32 align, u32 offset);
+void *bpf_map_container_alloc(union bpf_attr *attr, u64 size, int numa_node);
+void *bpf_map_container_mmapable_alloc(union bpf_attr *attr, u64 size,
+				       int numa_node, u32 align, u32 offset);
 void *bpf_map_area_alloc(struct bpf_map *map, u64 size, int numa_node);
 void *bpf_map_pages_alloc(struct bpf_map *map, struct page **pages,
 			  int nr_meta_pages, int nr_data_pages, int nid,
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index bfcfc5df9983..883905c6c845 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -131,16 +131,17 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 		void *data;
 
 		/* kmalloc'ed memory can't be mmap'ed, use explicit vmalloc */
-		data = bpf_map_container_mmapable_alloc(array_size, numa_node,
-							align, offset);
-		if (!data)
-			return ERR_PTR(-ENOMEM);
+		data = bpf_map_container_mmapable_alloc(attr, array_size,
+							numa_node, align,
+							offset);
+		if (IS_ERR(data))
+			return data;
 		array = data + align - offset;
 	} else {
-		array = bpf_map_container_alloc(array_size, numa_node);
+		array = bpf_map_container_alloc(attr, array_size, numa_node);
 	}
-	if (!array)
-		return ERR_PTR(-ENOMEM);
+	if (IS_ERR(array))
+		return ERR_CAST(array);
 	array->index_mask = index_mask;
 	array->map.bypass_spec_v1 = bypass_spec_v1;
 
diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
index 8b3194903b52..9fe3c6774c40 100644
--- a/kernel/bpf/bloom_filter.c
+++ b/kernel/bpf/bloom_filter.c
@@ -142,10 +142,11 @@ static struct bpf_map *bloom_map_alloc(union bpf_attr *attr)
 	}
 
 	bitset_bytes = roundup(bitset_bytes, sizeof(unsigned long));
-	bloom = bpf_map_container_alloc(sizeof(*bloom) + bitset_bytes, numa_node);
+	bloom = bpf_map_container_alloc(attr, sizeof(*bloom) + bitset_bytes,
+					numa_node);
 
-	if (!bloom)
-		return ERR_PTR(-ENOMEM);
+	if (IS_ERR(bloom))
+		return ERR_CAST(bloom);
 
 	bpf_map_init_from_attr(&bloom->map, attr);
 
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 3440135c3612..e12891dcf2a9 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -610,9 +610,9 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	unsigned int i;
 	u32 nbuckets;
 
-	smap = bpf_map_container_alloc(sizeof(*smap), NUMA_NO_NODE);
-	if (!smap)
-		return ERR_PTR(-ENOMEM);
+	smap = bpf_map_container_alloc(attr, sizeof(*smap), NUMA_NO_NODE);
+	if (IS_ERR(smap))
+		return ERR_CAST(smap);
 	bpf_map_init_from_attr(&smap->map, attr);
 
 	nbuckets = roundup_pow_of_two(num_possible_cpus());
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 874fda7e2b8b..51b7ce9902a8 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -618,9 +618,9 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
 		 */
 		(vt->size - sizeof(struct bpf_struct_ops_value));
 
-	st_map = bpf_map_container_alloc(st_map_size, NUMA_NO_NODE);
-	if (!st_map)
-		return ERR_PTR(-ENOMEM);
+	st_map = bpf_map_container_alloc(attr, st_map_size, NUMA_NO_NODE);
+	if (IS_ERR(st_map))
+		return ERR_CAST(st_map);
 
 	st_map->st_ops = st_ops;
 	map = &st_map->map;
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 95c1642deaf6..1e915a3bd1a2 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -97,9 +97,9 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	    attr->map_flags & ~BPF_F_NUMA_NODE)
 		return ERR_PTR(-EINVAL);
 
-	cmap = bpf_map_container_alloc(sizeof(*cmap), NUMA_NO_NODE);
-	if (!cmap)
-		return ERR_PTR(-ENOMEM);
+	cmap = bpf_map_container_alloc(attr, sizeof(*cmap), NUMA_NO_NODE);
+	if (IS_ERR(cmap))
+		return ERR_CAST(cmap);
 
 	bpf_map_init_from_attr(&cmap->map, attr);
 
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index b625d578bc93..11c7b8411b03 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -167,9 +167,9 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	if (!capable(CAP_NET_ADMIN))
 		return ERR_PTR(-EPERM);
 
-	dtab = bpf_map_container_alloc(sizeof(*dtab), NUMA_NO_NODE);
-	if (!dtab)
-		return ERR_PTR(-ENOMEM);
+	dtab = bpf_map_container_alloc(attr, sizeof(*dtab), NUMA_NO_NODE);
+	if (IS_ERR(dtab))
+		return ERR_CAST(dtab);
 
 	err = dev_map_init_map(dtab, attr);
 	if (err) {
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 2a34a115e14f..3cb9486eb313 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -496,9 +496,9 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	struct bpf_htab *htab;
 	int err, i;
 
-	htab = bpf_map_container_alloc(sizeof(*htab), NUMA_NO_NODE);
-	if (!htab)
-		return ERR_PTR(-ENOMEM);
+	htab = bpf_map_container_alloc(attr, sizeof(*htab), NUMA_NO_NODE);
+	if (IS_ERR(htab))
+		return ERR_CAST(htab);
 
 	lockdep_register_key(&htab->lockdep_key);
 
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 963dc82e3c50..b2bd031aba79 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -313,9 +313,9 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 		/* max_entries is not used and enforced to be 0 */
 		return ERR_PTR(-EINVAL);
 
-	map = bpf_map_container_alloc(sizeof(struct bpf_cgroup_storage_map), numa_node);
-	if (!map)
-		return ERR_PTR(-ENOMEM);
+	map = bpf_map_container_alloc(attr, sizeof(*map), numa_node);
+	if (IS_ERR(map))
+		return ERR_CAST(map);
 
 	/* copy mandatory map attributes */
 	bpf_map_init_from_attr(&map->map, attr);
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 15ccef2f10a3..e55327ad9c3a 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -558,9 +558,9 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 	    attr->value_size > LPM_VAL_SIZE_MAX)
 		return ERR_PTR(-EINVAL);
 
-	trie = bpf_map_container_alloc(sizeof(*trie), NUMA_NO_NODE);
-	if (!trie)
-		return ERR_PTR(-ENOMEM);
+	trie = bpf_map_container_alloc(attr, sizeof(*trie), NUMA_NO_NODE);
+	if (IS_ERR(trie))
+		return ERR_CAST(trie);
 
 	/* copy mandatory map attributes */
 	bpf_map_init_from_attr(&trie->map, attr);
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 02e3f4ef19f5..34fb5261ff95 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -372,9 +372,9 @@ struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr)
 	    attr->map_type != BPF_MAP_TYPE_HASH)
 		return ERR_PTR(-EINVAL);
 
-	offmap = bpf_map_container_alloc(sizeof(*offmap), NUMA_NO_NODE);
-	if (!offmap)
-		return ERR_PTR(-ENOMEM);
+	offmap = bpf_map_container_alloc(attr, sizeof(*offmap), NUMA_NO_NODE);
+	if (IS_ERR(offmap))
+		return ERR_CAST(offmap);
 
 	bpf_map_init_from_attr(&offmap->map, attr);
 
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index d4c0284c9089..9425df0695ac 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -74,9 +74,9 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 	size = (u64) attr->max_entries + 1;
 	queue_size = sizeof(*qs) + size * attr->value_size;
 
-	qs = bpf_map_container_alloc(queue_size, numa_node);
-	if (!qs)
-		return ERR_PTR(-ENOMEM);
+	qs = bpf_map_container_alloc(attr, queue_size, numa_node);
+	if (IS_ERR(qs))
+		return ERR_CAST(qs);
 
 	bpf_map_init_from_attr(&qs->map, attr);
 
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index d36a60ca68f4..0d3091866e1d 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -153,14 +153,15 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 {
 	int numa_node = bpf_map_attr_numa_node(attr);
 	struct reuseport_array *array;
+	size_t size = struct_size(array, ptrs, attr->max_entries);
 
 	if (!bpf_capable())
 		return ERR_PTR(-EPERM);
 
 	/* allocate all map elements and zero-initialize them */
-	array = bpf_map_container_alloc(struct_size(array, ptrs, attr->max_entries), numa_node);
-	if (!array)
-		return ERR_PTR(-ENOMEM);
+	array = bpf_map_container_alloc(attr, size, numa_node);
+	if (IS_ERR(array))
+		return ERR_CAST(array);
 
 	/* copy mandatory map attributes */
 	bpf_map_init_from_attr(&array->map, attr);
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 25973cab251d..3be472fd55da 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -159,9 +159,9 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-E2BIG);
 #endif
 
-	rb_map = bpf_map_container_alloc(sizeof(*rb_map), NUMA_NO_NODE);
-	if (!rb_map)
-		return ERR_PTR(-ENOMEM);
+	rb_map = bpf_map_container_alloc(attr, sizeof(*rb_map), NUMA_NO_NODE);
+	if (IS_ERR(rb_map))
+		return ERR_CAST(rb_map);
 
 	bpf_map_init_from_attr(&rb_map->map, attr);
 
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index c9a91ca05a03..c952c7547279 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -101,9 +101,9 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-E2BIG);
 
 	cost = n_buckets * sizeof(struct stack_map_bucket *) + sizeof(*smap);
-	smap = bpf_map_container_alloc(cost, bpf_map_attr_numa_node(attr));
-	if (!smap)
-		return ERR_PTR(-ENOMEM);
+	smap = bpf_map_container_alloc(attr, cost, bpf_map_attr_numa_node(attr));
+	if (IS_ERR(smap))
+		return ERR_CAST(smap);
 
 	bpf_map_init_from_attr(&smap->map, attr);
 	smap->n_buckets = n_buckets;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f7210fa8c3be..6401cc417fa9 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -526,14 +526,14 @@ void bpf_map_area_free(void *area)
  *
  * It is used in map creation path.
  */
-void *bpf_map_container_alloc(u64 size, int numa_node)
+void *bpf_map_container_alloc(union bpf_attr *attr, u64 size, int numa_node)
 {
 	struct bpf_map *map;
 	void *container;
 
 	container = __bpf_map_area_alloc(size, numa_node, false);
 	if (!container)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	map = (struct bpf_map *)container;
 	bpf_map_save_memcg(map);
@@ -541,8 +541,8 @@ void *bpf_map_container_alloc(u64 size, int numa_node)
 	return container;
 }
 
-void *bpf_map_container_mmapable_alloc(u64 size, int numa_node, u32 align,
-				       u32 offset)
+void *bpf_map_container_mmapable_alloc(union bpf_attr *attr, u64 size,
+				       int numa_node, u32 align, u32 offset)
 {
 	struct bpf_map *map;
 	void *container;
@@ -551,7 +551,7 @@ void *bpf_map_container_mmapable_alloc(u64 size, int numa_node, u32 align,
 	/* kmalloc'ed memory can't be mmap'ed, use explicit vmalloc */
 	ptr = __bpf_map_area_alloc(size, numa_node, true);
 	if (!ptr)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	container = ptr + align - offset;
 	map = (struct bpf_map *)container;
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 1f49dc89822c..4d9b730aa27f 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -41,9 +41,9 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 	    attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
 		return ERR_PTR(-EINVAL);
 
-	stab = bpf_map_container_alloc(sizeof(*stab), NUMA_NO_NODE);
-	if (!stab)
-		return ERR_PTR(-ENOMEM);
+	stab = bpf_map_container_alloc(attr, sizeof(*stab), NUMA_NO_NODE);
+	if (IS_ERR(stab))
+		return ERR_CAST(stab);
 
 	bpf_map_init_from_attr(&stab->map, attr);
 	raw_spin_lock_init(&stab->lock);
@@ -1077,9 +1077,9 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 	if (attr->key_size > MAX_BPF_STACK)
 		return ERR_PTR(-E2BIG);
 
-	htab = bpf_map_container_alloc(sizeof(*htab), NUMA_NO_NODE);
-	if (!htab)
-		return ERR_PTR(-ENOMEM);
+	htab = bpf_map_container_alloc(attr, sizeof(*htab), NUMA_NO_NODE);
+	if (IS_ERR(htab))
+		return ERR_CAST(htab);
 
 	bpf_map_init_from_attr(&htab->map, attr);
 
diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
index 6cf98f1c621a..161fc569ed88 100644
--- a/net/xdp/xskmap.c
+++ b/net/xdp/xskmap.c
@@ -75,9 +75,9 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 	numa_node = bpf_map_attr_numa_node(attr);
 	size = struct_size(m, xsk_map, attr->max_entries);
 
-	m = bpf_map_container_alloc(size, numa_node);
-	if (!m)
-		return ERR_PTR(-ENOMEM);
+	m = bpf_map_container_alloc(attr, size, numa_node);
+	if (IS_ERR(m))
+		return ERR_CAST(m);
 
 	bpf_map_init_from_attr(&m->map, attr);
 	spin_lock_init(&m->lock);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 14/15] bpf: Add new map flag BPF_F_SELECTABLE_MEMCG
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (12 preceding siblings ...)
  2022-07-29 15:23 ` [RFC PATCH bpf-next 13/15] bpf: Add new parameter into bpf_map_container_alloc Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-07-29 15:23 ` [RFC PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
  14 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

Introduce new map flag BPF_F_SELECTABLE_MEMCG for map creation. This
flag is supported for all bpf maps. This is a preparation for followup
patch.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/uapi/linux/bpf.h       | 3 +++
 kernel/bpf/arraymap.c          | 2 +-
 kernel/bpf/bloom_filter.c      | 4 ++--
 kernel/bpf/bpf_local_storage.c | 3 ++-
 kernel/bpf/bpf_struct_ops.c    | 5 ++++-
 kernel/bpf/devmap.c            | 4 ++--
 kernel/bpf/hashtab.c           | 2 +-
 kernel/bpf/local_storage.c     | 2 +-
 kernel/bpf/queue_stack_maps.c  | 2 +-
 kernel/bpf/ringbuf.c           | 2 +-
 kernel/bpf/stackmap.c          | 2 +-
 net/core/sock_map.c            | 4 ++--
 tools/include/uapi/linux/bpf.h | 3 +++
 13 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 59a217ca2dfd..d5fc1ea70b59 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1227,6 +1227,9 @@ enum {
 
 /* Create a map that is suitable to be an inner map with dynamic max entries */
 	BPF_F_INNER_MAP		= (1U << 12),
+
+/* Selectable memcg */
+	BPF_F_SELECTABLE_MEMCG	= (1U << 13),
 };
 
 /* Flags for BPF_PROG_QUERY. */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 883905c6c845..eb8deac529ac 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -17,7 +17,7 @@
 
 #define ARRAY_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_MMAPABLE | BPF_F_ACCESS_MASK | \
-	 BPF_F_PRESERVE_ELEMS | BPF_F_INNER_MAP)
+	 BPF_F_PRESERVE_ELEMS | BPF_F_INNER_MAP | BPF_F_SELECTABLE_MEMCG)
 
 static void bpf_array_free_percpu(struct bpf_array *array)
 {
diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
index 9fe3c6774c40..3714aebc9ed6 100644
--- a/kernel/bpf/bloom_filter.c
+++ b/kernel/bpf/bloom_filter.c
@@ -9,8 +9,8 @@
 #include <linux/random.h>
 #include <linux/btf_ids.h>
 
-#define BLOOM_CREATE_FLAG_MASK \
-	(BPF_F_NUMA_NODE | BPF_F_ZERO_SEED | BPF_F_ACCESS_MASK)
+#define BLOOM_CREATE_FLAG_MASK	(BPF_F_NUMA_NODE |	\
+	BPF_F_ZERO_SEED | BPF_F_ACCESS_MASK | BPF_F_SELECTABLE_MEMCG)
 
 struct bpf_bloom_filter {
 	struct bpf_map map;
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index e12891dcf2a9..7798235f311e 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -15,7 +15,8 @@
 #include <linux/rcupdate_trace.h>
 #include <linux/rcupdate_wait.h>
 
-#define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE)
+#define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK	\
+	(BPF_F_NO_PREALLOC | BPF_F_CLONE | BPF_F_SELECTABLE_MEMCG)
 
 static struct bpf_local_storage_map_bucket *
 select_bucket(struct bpf_local_storage_map *smap,
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 51b7ce9902a8..208d593e6a44 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -12,6 +12,8 @@
 #include <linux/mutex.h>
 #include <linux/btf_ids.h>
 
+#define STRUCT_OPS_CREATE_FLAG_MASK (BPF_F_SELECTABLE_MEMCG)
+
 enum bpf_struct_ops_state {
 	BPF_STRUCT_OPS_STATE_INIT,
 	BPF_STRUCT_OPS_STATE_INUSE,
@@ -586,7 +588,8 @@ static void bpf_struct_ops_map_free(struct bpf_map *map)
 static int bpf_struct_ops_map_alloc_check(union bpf_attr *attr)
 {
 	if (attr->key_size != sizeof(unsigned int) || attr->max_entries != 1 ||
-	    attr->map_flags || !attr->btf_vmlinux_value_type_id)
+	    (attr->map_flags & ~STRUCT_OPS_CREATE_FLAG_MASK) ||
+	    !attr->btf_vmlinux_value_type_id)
 		return -EINVAL;
 	return 0;
 }
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 11c7b8411b03..52858963373c 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -50,8 +50,8 @@
 #include <trace/events/xdp.h>
 #include <linux/btf_ids.h>
 
-#define DEV_CREATE_FLAG_MASK \
-	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+#define DEV_CREATE_FLAG_MASK (BPF_F_NUMA_NODE |		\
+	BPF_F_RDONLY | BPF_F_WRONLY | BPF_F_SELECTABLE_MEMCG)
 
 struct xdp_dev_bulk_queue {
 	struct xdp_frame *q[DEV_MAP_BULK_SIZE];
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 3cb9486eb313..532c8ee89d58 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -17,7 +17,7 @@
 
 #define HTAB_CREATE_FLAG_MASK						\
 	(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |	\
-	 BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED)
+	 BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED | BPF_F_SELECTABLE_MEMCG)
 
 #define BATCH_OPS(_name)			\
 	.map_lookup_batch =			\
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index b2bd031aba79..009d6f43a099 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -16,7 +16,7 @@
 #include "../cgroup/cgroup-internal.h"
 
 #define LOCAL_STORAGE_CREATE_FLAG_MASK					\
-	(BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
+	(BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK | BPF_F_SELECTABLE_MEMCG)
 
 struct bpf_cgroup_storage_map {
 	struct bpf_map map;
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index 9425df0695ac..c4aab43198ad 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -12,7 +12,7 @@
 #include "percpu_freelist.h"
 
 #define QUEUE_STACK_CREATE_FLAG_MASK \
-	(BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
+	(BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK | BPF_F_SELECTABLE_MEMCG)
 
 struct bpf_queue_stack {
 	struct bpf_map map;
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 3be472fd55da..53a7eb8db257 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -12,7 +12,7 @@
 #include <uapi/linux/btf.h>
 #include <linux/btf_ids.h>
 
-#define RINGBUF_CREATE_FLAG_MASK (BPF_F_NUMA_NODE)
+#define RINGBUF_CREATE_FLAG_MASK (BPF_F_NUMA_NODE | BPF_F_SELECTABLE_MEMCG)
 
 /* non-mmap()'able part of bpf_ringbuf (everything up to consumer page) */
 #define RINGBUF_PGOFF \
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index c952c7547279..007b10d2da7d 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -14,7 +14,7 @@
 
 #define STACK_CREATE_FLAG_MASK					\
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY |	\
-	 BPF_F_STACK_BUILD_ID)
+	 BPF_F_STACK_BUILD_ID | BPF_F_SELECTABLE_MEMCG)
 
 struct stack_map_bucket {
 	struct pcpu_freelist_node fnode;
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 4d9b730aa27f..0310b00e19b8 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -21,8 +21,8 @@ struct bpf_stab {
 	raw_spinlock_t lock;
 };
 
-#define SOCK_CREATE_FLAG_MASK				\
-	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+#define SOCK_CREATE_FLAG_MASK (BPF_F_NUMA_NODE |	\
+	BPF_F_RDONLY | BPF_F_WRONLY | BPF_F_SELECTABLE_MEMCG)
 
 static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
 				struct bpf_prog *old, u32 which);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 59a217ca2dfd..d5fc1ea70b59 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1227,6 +1227,9 @@ enum {
 
 /* Create a map that is suitable to be an inner map with dynamic max entries */
 	BPF_F_INNER_MAP		= (1U << 12),
+
+/* Selectable memcg */
+	BPF_F_SELECTABLE_MEMCG	= (1U << 13),
 };
 
 /* Flags for BPF_PROG_QUERY. */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map
  2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
                   ` (13 preceding siblings ...)
  2022-07-29 15:23 ` [RFC PATCH bpf-next 14/15] bpf: Add new map flag BPF_F_SELECTABLE_MEMCG Yafang Shao
@ 2022-07-29 15:23 ` Yafang Shao
  2022-08-02  4:55   ` Alexei Starovoitov
  14 siblings, 1 reply; 24+ messages in thread
From: Yafang Shao @ 2022-07-29 15:23 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

A new member memcg_fd is introduced into bpf attr of BPF_MAP_CREATE
command, which is the fd of an opened cgroup directory. In this cgroup,
the memory subsystem must be enabled. This value is valid only when
BPF_F_SELECTABLE_MEMCG is set in map_flags. Once the kernel get the
memory cgroup from this fd, it will set this memcg into bpf map, then
all the subsequent memory allocation of this map will be charge to the
memcg.

The map creation paths in libbpf are also changed consequently.

Currently it is only supported for cgroup2 directory.

The usage of this new member as follows,
	struct bpf_map_create_opts map_opts = {
		.sz = sizeof(map_opts),
		.map_flags = BPF_F_SELECTABLE_MEMCG,
	};
	int memcg_fd, int map_fd;
	int key, value;

	memcg_fd = open("/cgroup2", O_DIRECTORY);
	if (memcg_fd < 0) {
		perror("memcg dir open");
		return -1;
	}

	map_opts.memcg_fd = memcg_fd;
	map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, "map_for_memcg",
				sizeof(key), sizeof(value),
				1024, &map_opts);
	if (map_fd <= 0) {
		perror("map create");
		return -1;
	}

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/uapi/linux/bpf.h       |  2 ++
 kernel/bpf/syscall.c           | 47 ++++++++++++++++++++++++++--------
 tools/include/uapi/linux/bpf.h |  2 ++
 tools/lib/bpf/bpf.c            |  1 +
 tools/lib/bpf/bpf.h            |  3 ++-
 tools/lib/bpf/libbpf.c         |  2 ++
 6 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d5fc1ea70b59..a6e02c8be924 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1296,6 +1296,8 @@ union bpf_attr {
 						   * struct stored as the
 						   * map value
 						   */
+		__s32	memcg_fd;	/* selectable memcg */
+		__s32	:32;		/* hole */
 		/* Any per-map-type extra fields
 		 *
 		 * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6401cc417fa9..9900e2b87315 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -402,14 +402,30 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
 }
 
 #ifdef CONFIG_MEMCG_KMEM
-static void bpf_map_save_memcg(struct bpf_map *map)
+static int bpf_map_save_memcg(struct bpf_map *map, union bpf_attr *attr)
 {
-	/* Currently if a map is created by a process belonging to the root
-	 * memory cgroup, get_obj_cgroup_from_current() will return NULL.
-	 * So we have to check map->objcg for being NULL each time it's
-	 * being used.
-	 */
-	map->objcg = get_obj_cgroup_from_current();
+	struct obj_cgroup *objcg;
+	struct cgroup *cgrp;
+
+	if (attr->map_flags & BPF_F_SELECTABLE_MEMCG) {
+		cgrp = cgroup_get_from_fd(attr->memcg_fd);
+		if (IS_ERR(cgrp))
+			return -EINVAL;
+
+		objcg = get_obj_cgroup_from_cgroup(cgrp);
+		if (IS_ERR(objcg))
+			return PTR_ERR(objcg);
+	} else {
+		/* Currently if a map is created by a process belonging to the root
+		 * memory cgroup, get_obj_cgroup_from_current() will return NULL.
+		 * So we have to check map->objcg for being NULL each time it's
+		 * being used.
+		 */
+		objcg = get_obj_cgroup_from_current();
+	}
+
+	map->objcg = objcg;
+	return 0;
 }
 
 static void bpf_map_release_memcg(struct bpf_map *map)
@@ -485,8 +501,9 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 }
 
 #else
-static void bpf_map_save_memcg(struct bpf_map *map)
+static int bpf_map_save_memcg(struct bpf_map *map, union bpf_attr *attr)
 {
+	return 0;
 }
 
 static void bpf_map_release_memcg(struct bpf_map *map)
@@ -530,13 +547,18 @@ void *bpf_map_container_alloc(union bpf_attr *attr, u64 size, int numa_node)
 {
 	struct bpf_map *map;
 	void *container;
+	int ret;
 
 	container = __bpf_map_area_alloc(size, numa_node, false);
 	if (!container)
 		return ERR_PTR(-ENOMEM);
 
 	map = (struct bpf_map *)container;
-	bpf_map_save_memcg(map);
+	ret = bpf_map_save_memcg(map, attr);
+	if (ret) {
+		bpf_map_area_free(container);
+		return ERR_PTR(ret);
+	}
 
 	return container;
 }
@@ -547,6 +569,7 @@ void *bpf_map_container_mmapable_alloc(union bpf_attr *attr, u64 size,
 	struct bpf_map *map;
 	void *container;
 	void *ptr;
+	int ret;
 
 	/* kmalloc'ed memory can't be mmap'ed, use explicit vmalloc */
 	ptr = __bpf_map_area_alloc(size, numa_node, true);
@@ -555,7 +578,11 @@ void *bpf_map_container_mmapable_alloc(union bpf_attr *attr, u64 size,
 
 	container = ptr + align - offset;
 	map = (struct bpf_map *)container;
-	bpf_map_save_memcg(map);
+	ret = bpf_map_save_memcg(map, attr);
+	if (ret) {
+		bpf_map_area_free(ptr);
+		return ERR_PTR(ret);
+	}
 
 	return ptr;
 }
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d5fc1ea70b59..a6e02c8be924 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1296,6 +1296,8 @@ union bpf_attr {
 						   * struct stored as the
 						   * map value
 						   */
+		__s32	memcg_fd;	/* selectable memcg */
+		__s32	:32;		/* hole */
 		/* Any per-map-type extra fields
 		 *
 		 * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 5eb0df90eb2b..662ce5808386 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -199,6 +199,7 @@ int bpf_map_create(enum bpf_map_type map_type,
 	attr.map_extra = OPTS_GET(opts, map_extra, 0);
 	attr.numa_node = OPTS_GET(opts, numa_node, 0);
 	attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0);
+	attr.memcg_fd = OPTS_GET(opts, memcg_fd, 0);
 
 	fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
 	return libbpf_err_errno(fd);
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 88a7cc4bd76f..481aad49422b 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -51,8 +51,9 @@ struct bpf_map_create_opts {
 
 	__u32 numa_node;
 	__u32 map_ifindex;
+	__u32 memcg_fd;
 };
-#define bpf_map_create_opts__last_field map_ifindex
+#define bpf_map_create_opts__last_field memcg_fd
 
 LIBBPF_API int bpf_map_create(enum bpf_map_type map_type,
 			      const char *map_name,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 50d41815f431..86916d550031 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -505,6 +505,7 @@ struct bpf_map {
 	bool pinned;
 	bool reused;
 	bool autocreate;
+	__s32 memcg_fd;
 	__u64 map_extra;
 };
 
@@ -4928,6 +4929,7 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
 	create_attr.map_ifindex = map->map_ifindex;
 	create_attr.map_flags = def->map_flags;
 	create_attr.numa_node = map->numa_node;
+	create_attr.memcg_fd = map->memcg_fd;
 	create_attr.map_extra = map->map_extra;
 
 	if (bpf_map__is_struct_ops(map))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 10/15] bpf: Use bpf_map_pages_alloc in ringbuf
  2022-07-29 15:23 ` [RFC PATCH bpf-next 10/15] bpf: Use bpf_map_pages_alloc in ringbuf Yafang Shao
@ 2022-08-01 23:16   ` Andrii Nakryiko
  2022-08-02 13:31     ` Yafang Shao
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2022-08-01 23:16 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm, netdev, bpf, linux-mm

On Fri, Jul 29, 2022 at 8:23 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> Introduce new helper bpf_map_pages_alloc() for this memory allocation.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  include/linux/bpf.h  |  4 ++++
>  kernel/bpf/ringbuf.c | 27 +++++++++------------------
>  kernel/bpf/syscall.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 54 insertions(+), 18 deletions(-)
>

[...]

>         /* Each data page is mapped twice to allow "virtual"
>          * continuous read of samples wrapping around the end of ring
> @@ -95,16 +95,10 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(struct bpf_map *map,
>         if (!pages)
>                 return NULL;
>
> -       for (i = 0; i < nr_pages; i++) {
> -               page = alloc_pages_node(numa_node, flags, 0);
> -               if (!page) {
> -                       nr_pages = i;
> -                       goto err_free_pages;
> -               }
> -               pages[i] = page;
> -               if (i >= nr_meta_pages)
> -                       pages[nr_data_pages + i] = page;
> -       }
> +       ptr = bpf_map_pages_alloc(map, pages, nr_meta_pages, nr_data_pages,
> +                                 numa_node, flags, 0);
> +       if (!ptr)

bpf_map_pages_alloc() has some weird and confusing interface. It fills
out pages (second argument) and also returns pages as void *. Why not
just return int error (0 or -ENOMEM)? You are discarding this ptr
anyways.


But also thinking some more, bpf_map_pages_alloc() is very ringbuf
specific (which other map will have exactly the same meaning for
nr_meta_pages and nr_data_pages, where we also allocate 2 *
nr_data_pages, etc).

I don't think it makes sense to expose it as a generic internal API.
Why not keep all that inside kernel/bpf/ringbuf.c instead?

> +               goto err_free_pages;
>
>         rb = vmap(pages, nr_meta_pages + 2 * nr_data_pages,
>                   VM_MAP | VM_USERMAP, PAGE_KERNEL);

[...]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map
  2022-07-29 15:23 ` [RFC PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
@ 2022-08-02  4:55   ` Alexei Starovoitov
  2022-08-02 13:47     ` Yafang Shao
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2022-08-02  4:55 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm, netdev, bpf, linux-mm

On Fri, Jul 29, 2022 at 03:23:16PM +0000, Yafang Shao wrote:
> A new member memcg_fd is introduced into bpf attr of BPF_MAP_CREATE
> command, which is the fd of an opened cgroup directory. In this cgroup,
> the memory subsystem must be enabled. This value is valid only when
> BPF_F_SELECTABLE_MEMCG is set in map_flags. Once the kernel get the
> memory cgroup from this fd, it will set this memcg into bpf map, then
> all the subsequent memory allocation of this map will be charge to the
> memcg.
> 
> The map creation paths in libbpf are also changed consequently.
> 
> Currently it is only supported for cgroup2 directory.
> 
> The usage of this new member as follows,
> 	struct bpf_map_create_opts map_opts = {
> 		.sz = sizeof(map_opts),
> 		.map_flags = BPF_F_SELECTABLE_MEMCG,
> 	};
> 	int memcg_fd, int map_fd;
> 	int key, value;
> 
> 	memcg_fd = open("/cgroup2", O_DIRECTORY);
> 	if (memcg_fd < 0) {
> 		perror("memcg dir open");
> 		return -1;
> 	}
> 
> 	map_opts.memcg_fd = memcg_fd;
> 	map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, "map_for_memcg",
> 				sizeof(key), sizeof(value),
> 				1024, &map_opts);
> 	if (map_fd <= 0) {
> 		perror("map create");
> 		return -1;
> 	}

Overall the api extension makes sense.
The flexibility of selecting memcg is useful.

> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  include/uapi/linux/bpf.h       |  2 ++
>  kernel/bpf/syscall.c           | 47 ++++++++++++++++++++++++++--------
>  tools/include/uapi/linux/bpf.h |  2 ++
>  tools/lib/bpf/bpf.c            |  1 +
>  tools/lib/bpf/bpf.h            |  3 ++-
>  tools/lib/bpf/libbpf.c         |  2 ++
>  6 files changed, 46 insertions(+), 11 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d5fc1ea70b59..a6e02c8be924 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1296,6 +1296,8 @@ union bpf_attr {
>  						   * struct stored as the
>  						   * map value
>  						   */
> +		__s32	memcg_fd;	/* selectable memcg */
> +		__s32	:32;		/* hole */

new fields cannot be inserted in the middle of uapi struct.

>  		/* Any per-map-type extra fields
>  		 *
>  		 * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 6401cc417fa9..9900e2b87315 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -402,14 +402,30 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
>  }
>  
>  #ifdef CONFIG_MEMCG_KMEM
> -static void bpf_map_save_memcg(struct bpf_map *map)
> +static int bpf_map_save_memcg(struct bpf_map *map, union bpf_attr *attr)
>  {
> -	/* Currently if a map is created by a process belonging to the root
> -	 * memory cgroup, get_obj_cgroup_from_current() will return NULL.
> -	 * So we have to check map->objcg for being NULL each time it's
> -	 * being used.
> -	 */
> -	map->objcg = get_obj_cgroup_from_current();
> +	struct obj_cgroup *objcg;
> +	struct cgroup *cgrp;
> +
> +	if (attr->map_flags & BPF_F_SELECTABLE_MEMCG) {

The flag is unnecessary. Just add memcg_fd to the end of attr and use != 0
as a condition that it should be used instead of get_obj_cgroup_from_current().
There are other parts of bpf uapi that have similar fd handling logic.

> +		cgrp = cgroup_get_from_fd(attr->memcg_fd);
> +		if (IS_ERR(cgrp))
> +			return -EINVAL;
> +
> +		objcg = get_obj_cgroup_from_cgroup(cgrp);
> +		if (IS_ERR(objcg))
> +			return PTR_ERR(objcg);
> +	} else {
> +		/* Currently if a map is created by a process belonging to the root
> +		 * memory cgroup, get_obj_cgroup_from_current() will return NULL.
> +		 * So we have to check map->objcg for being NULL each time it's
> +		 * being used.
> +		 */
> +		objcg = get_obj_cgroup_from_current();
> +	}
> +
> +	map->objcg = objcg;
> +	return 0;
>  }
>  
>  static void bpf_map_release_memcg(struct bpf_map *map)
> @@ -485,8 +501,9 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
>  }
>  
>  #else
> -static void bpf_map_save_memcg(struct bpf_map *map)
> +static int bpf_map_save_memcg(struct bpf_map *map, union bpf_attr *attr)
>  {
> +	return 0;
>  }
>  
>  static void bpf_map_release_memcg(struct bpf_map *map)
> @@ -530,13 +547,18 @@ void *bpf_map_container_alloc(union bpf_attr *attr, u64 size, int numa_node)

High level uapi struct should not be passed into low level helper like this.
Pls pass memcg_fd instead.

>  {
>  	struct bpf_map *map;
>  	void *container;
> +	int ret;
>  
>  	container = __bpf_map_area_alloc(size, numa_node, false);
>  	if (!container)
>  		return ERR_PTR(-ENOMEM);
>  
>  	map = (struct bpf_map *)container;
> -	bpf_map_save_memcg(map);
> +	ret = bpf_map_save_memcg(map, attr);
> +	if (ret) {
> +		bpf_map_area_free(container);
> +		return ERR_PTR(ret);
> +	}
>  
>  	return container;
>  }
> @@ -547,6 +569,7 @@ void *bpf_map_container_mmapable_alloc(union bpf_attr *attr, u64 size,
>  	struct bpf_map *map;
>  	void *container;
>  	void *ptr;
> +	int ret;
>  
>  	/* kmalloc'ed memory can't be mmap'ed, use explicit vmalloc */
>  	ptr = __bpf_map_area_alloc(size, numa_node, true);
> @@ -555,7 +578,11 @@ void *bpf_map_container_mmapable_alloc(union bpf_attr *attr, u64 size,
>  
>  	container = ptr + align - offset;
>  	map = (struct bpf_map *)container;
> -	bpf_map_save_memcg(map);
> +	ret = bpf_map_save_memcg(map, attr);
> +	if (ret) {
> +		bpf_map_area_free(ptr);
> +		return ERR_PTR(ret);
> +	}
>  
>  	return ptr;
>  }
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index d5fc1ea70b59..a6e02c8be924 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -1296,6 +1296,8 @@ union bpf_attr {
>  						   * struct stored as the
>  						   * map value
>  						   */
> +		__s32	memcg_fd;	/* selectable memcg */
> +		__s32	:32;		/* hole */
>  		/* Any per-map-type extra fields
>  		 *
>  		 * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 5eb0df90eb2b..662ce5808386 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -199,6 +199,7 @@ int bpf_map_create(enum bpf_map_type map_type,
>  	attr.map_extra = OPTS_GET(opts, map_extra, 0);
>  	attr.numa_node = OPTS_GET(opts, numa_node, 0);
>  	attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0);
> +	attr.memcg_fd = OPTS_GET(opts, memcg_fd, 0);
>  
>  	fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
>  	return libbpf_err_errno(fd);
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 88a7cc4bd76f..481aad49422b 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -51,8 +51,9 @@ struct bpf_map_create_opts {
>  
>  	__u32 numa_node;
>  	__u32 map_ifindex;
> +	__u32 memcg_fd;
>  };
> -#define bpf_map_create_opts__last_field map_ifindex
> +#define bpf_map_create_opts__last_field memcg_fd
>  
>  LIBBPF_API int bpf_map_create(enum bpf_map_type map_type,
>  			      const char *map_name,
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 50d41815f431..86916d550031 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -505,6 +505,7 @@ struct bpf_map {
>  	bool pinned;
>  	bool reused;
>  	bool autocreate;
> +	__s32 memcg_fd;
>  	__u64 map_extra;
>  };
>  
> @@ -4928,6 +4929,7 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
>  	create_attr.map_ifindex = map->map_ifindex;
>  	create_attr.map_flags = def->map_flags;
>  	create_attr.numa_node = map->numa_node;
> +	create_attr.memcg_fd = map->memcg_fd;
>  	create_attr.map_extra = map->map_extra;
>  
>  	if (bpf_map__is_struct_ops(map))
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 05/15] bpf: Introduce helpers for container of struct bpf_map
  2022-07-29 15:23 ` [RFC PATCH bpf-next 05/15] bpf: Introduce helpers for container of struct bpf_map Yafang Shao
@ 2022-08-02  4:58   ` Alexei Starovoitov
  2022-08-02 13:47     ` Yafang Shao
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2022-08-02  4:58 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm, netdev, bpf, linux-mm

On Fri, Jul 29, 2022 at 03:23:06PM +0000, Yafang Shao wrote:
> Currently bpf_map_area_alloc() is used to allocate a container of struct
> bpf_map or members in this container. To distinguish the map creation
> and other members, let split it into two different helpers,
>   - bpf_map_container_alloc()
>     Used to allocate a container of struct bpf_map, the container is as
>     follows,
>       struct bpf_map_container {
>         struct bpf_map map;  // the map must be the first member
>         ....
>       };
>     Pls. note that the struct bpf_map_contianer is a abstract one, which
>     can be struct bpf_array, struct bpf_bloom_filter and etc.
> 
>     In this helper, it will call bpf_map_save_memcg() to init memcg
>     relevant data in the bpf map. And these data will be cleared in
>     bpf_map_container_free().
> 
>   - bpf_map_area_alloc()
>     Now it is used to allocate the members in a contianer only.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  include/linux/bpf.h  |  4 ++++
>  kernel/bpf/syscall.c | 56 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 60 insertions(+)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 20c26aed7896..2d971b0eb24b 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1634,9 +1634,13 @@ void bpf_map_inc_with_uref(struct bpf_map *map);
>  struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map);
>  void bpf_map_put_with_uref(struct bpf_map *map);
>  void bpf_map_put(struct bpf_map *map);
> +void *bpf_map_container_alloc(u64 size, int numa_node);
> +void *bpf_map_container_mmapable_alloc(u64 size, int numa_node,
> +				       u32 align, u32 offset);
>  void *bpf_map_area_alloc(u64 size, int numa_node);
>  void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
>  void bpf_map_area_free(void *base);
> +void bpf_map_container_free(void *base);
>  bool bpf_map_write_active(const struct bpf_map *map);
>  void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
>  int  generic_map_lookup_batch(struct bpf_map *map,
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 83c7136c5788..1a1a81a11b37 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -495,6 +495,62 @@ static void bpf_map_release_memcg(struct bpf_map *map)
>  }
>  #endif
>  
> +/*
> + * The return pointer is a bpf_map container, as follow,
> + *   struct bpf_map_container {
> + *       struct bpf_map map;
> + *       ...
> + *   };
> + *
> + * It is used in map creation path.
> + */
> +void *bpf_map_container_alloc(u64 size, int numa_node)
> +{
> +	struct bpf_map *map;
> +	void *container;
> +
> +	container = __bpf_map_area_alloc(size, numa_node, false);
> +	if (!container)
> +		return NULL;
> +
> +	map = (struct bpf_map *)container;
> +	bpf_map_save_memcg(map);
> +
> +	return container;
> +}
> +
> +void *bpf_map_container_mmapable_alloc(u64 size, int numa_node, u32 align,
> +				       u32 offset)
> +{
> +	struct bpf_map *map;
> +	void *container;
> +	void *ptr;
> +
> +	/* kmalloc'ed memory can't be mmap'ed, use explicit vmalloc */
> +	ptr = __bpf_map_area_alloc(size, numa_node, true);
> +	if (!ptr)
> +		return NULL;
> +
> +	container = ptr + align - offset;
> +	map = (struct bpf_map *)container;
> +	bpf_map_save_memcg(map);

This is very error prone.
I don't think the container concept is necessary.
bpf_map_area_alloc() can just take extra memcg_fd argument.

> +
> +	return ptr;
> +}
> +
> +void bpf_map_container_free(void *container)
> +{
> +	struct bpf_map *map;
> +
> +	if (!container)
> +		return;
> +
> +	map = (struct bpf_map *)container;
> +	bpf_map_release_memcg(map);
> +
> +	kvfree(container);
> +}
> +
>  static int bpf_map_kptr_off_cmp(const void *a, const void *b)
>  {
>  	const struct bpf_map_value_off_desc *off_desc1 = a, *off_desc2 = b;
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 10/15] bpf: Use bpf_map_pages_alloc in ringbuf
  2022-08-01 23:16   ` Andrii Nakryiko
@ 2022-08-02 13:31     ` Yafang Shao
  2022-08-02 18:00       ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Yafang Shao @ 2022-08-02 13:31 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, netdev, bpf, Linux MM

On Tue, Aug 2, 2022 at 7:17 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Jul 29, 2022 at 8:23 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > Introduce new helper bpf_map_pages_alloc() for this memory allocation.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  include/linux/bpf.h  |  4 ++++
> >  kernel/bpf/ringbuf.c | 27 +++++++++------------------
> >  kernel/bpf/syscall.c | 41 +++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 54 insertions(+), 18 deletions(-)
> >
>
> [...]
>
> >         /* Each data page is mapped twice to allow "virtual"
> >          * continuous read of samples wrapping around the end of ring
> > @@ -95,16 +95,10 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(struct bpf_map *map,
> >         if (!pages)
> >                 return NULL;
> >
> > -       for (i = 0; i < nr_pages; i++) {
> > -               page = alloc_pages_node(numa_node, flags, 0);
> > -               if (!page) {
> > -                       nr_pages = i;
> > -                       goto err_free_pages;
> > -               }
> > -               pages[i] = page;
> > -               if (i >= nr_meta_pages)
> > -                       pages[nr_data_pages + i] = page;
> > -       }
> > +       ptr = bpf_map_pages_alloc(map, pages, nr_meta_pages, nr_data_pages,
> > +                                 numa_node, flags, 0);
> > +       if (!ptr)
>
> bpf_map_pages_alloc() has some weird and confusing interface. It fills
> out pages (second argument) and also returns pages as void *. Why not
> just return int error (0 or -ENOMEM)? You are discarding this ptr
> anyways.
>

I will change it.

>
> But also thinking some more, bpf_map_pages_alloc() is very ringbuf
> specific (which other map will have exactly the same meaning for
> nr_meta_pages and nr_data_pages, where we also allocate 2 *
> nr_data_pages, etc).
>
> I don't think it makes sense to expose it as a generic internal API.
> Why not keep all that inside kernel/bpf/ringbuf.c instead?
>

Right, it is used in ringbuf.c only currently. I will keep it inside ringbuf.c.


-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map
  2022-08-02  4:55   ` Alexei Starovoitov
@ 2022-08-02 13:47     ` Yafang Shao
  0 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-08-02 13:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, netdev, bpf, Linux MM

On Tue, Aug 2, 2022 at 12:55 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jul 29, 2022 at 03:23:16PM +0000, Yafang Shao wrote:
> > A new member memcg_fd is introduced into bpf attr of BPF_MAP_CREATE
> > command, which is the fd of an opened cgroup directory. In this cgroup,
> > the memory subsystem must be enabled. This value is valid only when
> > BPF_F_SELECTABLE_MEMCG is set in map_flags. Once the kernel get the
> > memory cgroup from this fd, it will set this memcg into bpf map, then
> > all the subsequent memory allocation of this map will be charge to the
> > memcg.
> >
> > The map creation paths in libbpf are also changed consequently.
> >
> > Currently it is only supported for cgroup2 directory.
> >
> > The usage of this new member as follows,
> >       struct bpf_map_create_opts map_opts = {
> >               .sz = sizeof(map_opts),
> >               .map_flags = BPF_F_SELECTABLE_MEMCG,
> >       };
> >       int memcg_fd, int map_fd;
> >       int key, value;
> >
> >       memcg_fd = open("/cgroup2", O_DIRECTORY);
> >       if (memcg_fd < 0) {
> >               perror("memcg dir open");
> >               return -1;
> >       }
> >
> >       map_opts.memcg_fd = memcg_fd;
> >       map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, "map_for_memcg",
> >                               sizeof(key), sizeof(value),
> >                               1024, &map_opts);
> >       if (map_fd <= 0) {
> >               perror("map create");
> >               return -1;
> >       }
>
> Overall the api extension makes sense.
> The flexibility of selecting memcg is useful.
>

Thanks!

> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  include/uapi/linux/bpf.h       |  2 ++
> >  kernel/bpf/syscall.c           | 47 ++++++++++++++++++++++++++--------
> >  tools/include/uapi/linux/bpf.h |  2 ++
> >  tools/lib/bpf/bpf.c            |  1 +
> >  tools/lib/bpf/bpf.h            |  3 ++-
> >  tools/lib/bpf/libbpf.c         |  2 ++
> >  6 files changed, 46 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index d5fc1ea70b59..a6e02c8be924 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -1296,6 +1296,8 @@ union bpf_attr {
> >                                                  * struct stored as the
> >                                                  * map value
> >                                                  */
> > +             __s32   memcg_fd;       /* selectable memcg */
> > +             __s32   :32;            /* hole */
>
> new fields cannot be inserted in the middle of uapi struct.
>

There's a "#define BPF_MAP_CREATE_LAST_FIELD map_extra" in
kernel/bpf/syscall.c, and thus I thought it may have some special
meaning, so I put the new field above it.
Now that it doesn't have any special meaning, I will change it as you suggested.

> >               /* Any per-map-type extra fields
> >                *
> >                * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 6401cc417fa9..9900e2b87315 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -402,14 +402,30 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
> >  }
> >
> >  #ifdef CONFIG_MEMCG_KMEM
> > -static void bpf_map_save_memcg(struct bpf_map *map)
> > +static int bpf_map_save_memcg(struct bpf_map *map, union bpf_attr *attr)
> >  {
> > -     /* Currently if a map is created by a process belonging to the root
> > -      * memory cgroup, get_obj_cgroup_from_current() will return NULL.
> > -      * So we have to check map->objcg for being NULL each time it's
> > -      * being used.
> > -      */
> > -     map->objcg = get_obj_cgroup_from_current();
> > +     struct obj_cgroup *objcg;
> > +     struct cgroup *cgrp;
> > +
> > +     if (attr->map_flags & BPF_F_SELECTABLE_MEMCG) {
>
> The flag is unnecessary. Just add memcg_fd to the end of attr and use != 0
> as a condition that it should be used instead of get_obj_cgroup_from_current().
> There are other parts of bpf uapi that have similar fd handling logic.
>

Right. There's a ensure_good_fd() to make the fd a positive number.
I will change it.

> > +             cgrp = cgroup_get_from_fd(attr->memcg_fd);
> > +             if (IS_ERR(cgrp))
> > +                     return -EINVAL;
> > +
> > +             objcg = get_obj_cgroup_from_cgroup(cgrp);
> > +             if (IS_ERR(objcg))
> > +                     return PTR_ERR(objcg);
> > +     } else {
> > +             /* Currently if a map is created by a process belonging to the root
> > +              * memory cgroup, get_obj_cgroup_from_current() will return NULL.
> > +              * So we have to check map->objcg for being NULL each time it's
> > +              * being used.
> > +              */
> > +             objcg = get_obj_cgroup_from_current();
> > +     }
> > +
> > +     map->objcg = objcg;
> > +     return 0;
> >  }
> >
> >  static void bpf_map_release_memcg(struct bpf_map *map)
> > @@ -485,8 +501,9 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
> >  }
> >
> >  #else
> > -static void bpf_map_save_memcg(struct bpf_map *map)
> > +static int bpf_map_save_memcg(struct bpf_map *map, union bpf_attr *attr)
> >  {
> > +     return 0;
> >  }
> >
> >  static void bpf_map_release_memcg(struct bpf_map *map)
> > @@ -530,13 +547,18 @@ void *bpf_map_container_alloc(union bpf_attr *attr, u64 size, int numa_node)
>
> High level uapi struct should not be passed into low level helper like this.
> Pls pass memcg_fd instead.
>

Sure, I will do it.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 05/15] bpf: Introduce helpers for container of struct bpf_map
  2022-08-02  4:58   ` Alexei Starovoitov
@ 2022-08-02 13:47     ` Yafang Shao
  0 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-08-02 13:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, netdev, bpf, Linux MM

On Tue, Aug 2, 2022 at 12:58 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jul 29, 2022 at 03:23:06PM +0000, Yafang Shao wrote:
> > Currently bpf_map_area_alloc() is used to allocate a container of struct
> > bpf_map or members in this container. To distinguish the map creation
> > and other members, let split it into two different helpers,
> >   - bpf_map_container_alloc()
> >     Used to allocate a container of struct bpf_map, the container is as
> >     follows,
> >       struct bpf_map_container {
> >         struct bpf_map map;  // the map must be the first member
> >         ....
> >       };
> >     Pls. note that the struct bpf_map_contianer is a abstract one, which
> >     can be struct bpf_array, struct bpf_bloom_filter and etc.
> >
> >     In this helper, it will call bpf_map_save_memcg() to init memcg
> >     relevant data in the bpf map. And these data will be cleared in
> >     bpf_map_container_free().
> >
> >   - bpf_map_area_alloc()
> >     Now it is used to allocate the members in a contianer only.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  include/linux/bpf.h  |  4 ++++
> >  kernel/bpf/syscall.c | 56 ++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 60 insertions(+)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 20c26aed7896..2d971b0eb24b 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1634,9 +1634,13 @@ void bpf_map_inc_with_uref(struct bpf_map *map);
> >  struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map);
> >  void bpf_map_put_with_uref(struct bpf_map *map);
> >  void bpf_map_put(struct bpf_map *map);
> > +void *bpf_map_container_alloc(u64 size, int numa_node);
> > +void *bpf_map_container_mmapable_alloc(u64 size, int numa_node,
> > +                                    u32 align, u32 offset);
> >  void *bpf_map_area_alloc(u64 size, int numa_node);
> >  void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
> >  void bpf_map_area_free(void *base);
> > +void bpf_map_container_free(void *base);
> >  bool bpf_map_write_active(const struct bpf_map *map);
> >  void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
> >  int  generic_map_lookup_batch(struct bpf_map *map,
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 83c7136c5788..1a1a81a11b37 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -495,6 +495,62 @@ static void bpf_map_release_memcg(struct bpf_map *map)
> >  }
> >  #endif
> >
> > +/*
> > + * The return pointer is a bpf_map container, as follow,
> > + *   struct bpf_map_container {
> > + *       struct bpf_map map;
> > + *       ...
> > + *   };
> > + *
> > + * It is used in map creation path.
> > + */
> > +void *bpf_map_container_alloc(u64 size, int numa_node)
> > +{
> > +     struct bpf_map *map;
> > +     void *container;
> > +
> > +     container = __bpf_map_area_alloc(size, numa_node, false);
> > +     if (!container)
> > +             return NULL;
> > +
> > +     map = (struct bpf_map *)container;
> > +     bpf_map_save_memcg(map);
> > +
> > +     return container;
> > +}
> > +
> > +void *bpf_map_container_mmapable_alloc(u64 size, int numa_node, u32 align,
> > +                                    u32 offset)
> > +{
> > +     struct bpf_map *map;
> > +     void *container;
> > +     void *ptr;
> > +
> > +     /* kmalloc'ed memory can't be mmap'ed, use explicit vmalloc */
> > +     ptr = __bpf_map_area_alloc(size, numa_node, true);
> > +     if (!ptr)
> > +             return NULL;
> > +
> > +     container = ptr + align - offset;
> > +     map = (struct bpf_map *)container;
> > +     bpf_map_save_memcg(map);
>
> This is very error prone.
> I don't think the container concept is necessary.
> bpf_map_area_alloc() can just take extra memcg_fd argument.
>

Got it. I will change it.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 10/15] bpf: Use bpf_map_pages_alloc in ringbuf
  2022-08-02 13:31     ` Yafang Shao
@ 2022-08-02 18:00       ` Andrii Nakryiko
  2022-08-03 13:27         ` Yafang Shao
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2022-08-02 18:00 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, netdev, bpf, Linux MM

On Tue, Aug 2, 2022 at 6:31 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Tue, Aug 2, 2022 at 7:17 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, Jul 29, 2022 at 8:23 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > Introduce new helper bpf_map_pages_alloc() for this memory allocation.
> > >
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > ---
> > >  include/linux/bpf.h  |  4 ++++
> > >  kernel/bpf/ringbuf.c | 27 +++++++++------------------
> > >  kernel/bpf/syscall.c | 41 +++++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 54 insertions(+), 18 deletions(-)
> > >
> >
> > [...]
> >
> > >         /* Each data page is mapped twice to allow "virtual"
> > >          * continuous read of samples wrapping around the end of ring
> > > @@ -95,16 +95,10 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(struct bpf_map *map,
> > >         if (!pages)
> > >                 return NULL;
> > >
> > > -       for (i = 0; i < nr_pages; i++) {
> > > -               page = alloc_pages_node(numa_node, flags, 0);
> > > -               if (!page) {
> > > -                       nr_pages = i;
> > > -                       goto err_free_pages;
> > > -               }
> > > -               pages[i] = page;
> > > -               if (i >= nr_meta_pages)
> > > -                       pages[nr_data_pages + i] = page;
> > > -       }
> > > +       ptr = bpf_map_pages_alloc(map, pages, nr_meta_pages, nr_data_pages,
> > > +                                 numa_node, flags, 0);
> > > +       if (!ptr)
> >
> > bpf_map_pages_alloc() has some weird and confusing interface. It fills
> > out pages (second argument) and also returns pages as void *. Why not
> > just return int error (0 or -ENOMEM)? You are discarding this ptr
> > anyways.
> >
>
> I will change it.
>
> >
> > But also thinking some more, bpf_map_pages_alloc() is very ringbuf
> > specific (which other map will have exactly the same meaning for
> > nr_meta_pages and nr_data_pages, where we also allocate 2 *
> > nr_data_pages, etc).
> >
> > I don't think it makes sense to expose it as a generic internal API.
> > Why not keep all that inside kernel/bpf/ringbuf.c instead?
> >
>
> Right, it is used in ringbuf.c only currently. I will keep it inside ringbuf.c.
>

In such case you might as well put pages = bpf_map_area_alloc(); part
into this function and return struct page ** as a result, so that
everything related to pages is handled as a single unit. And then
bpf_map_pages_free() will free not just each individual page, but also
struct page*[] array.

Also please call it something ringbuf specific, e.g.,
bpf_ringbuf_pages_{alloc,free}()?

>
> --
> Regards
> Yafang

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 10/15] bpf: Use bpf_map_pages_alloc in ringbuf
  2022-08-02 18:00       ` Andrii Nakryiko
@ 2022-08-03 13:27         ` Yafang Shao
  0 siblings, 0 replies; 24+ messages in thread
From: Yafang Shao @ 2022-08-03 13:27 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, netdev, bpf, Linux MM

On Wed, Aug 3, 2022 at 2:00 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Aug 2, 2022 at 6:31 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Tue, Aug 2, 2022 at 7:17 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Fri, Jul 29, 2022 at 8:23 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > Introduce new helper bpf_map_pages_alloc() for this memory allocation.
> > > >
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > ---
> > > >  include/linux/bpf.h  |  4 ++++
> > > >  kernel/bpf/ringbuf.c | 27 +++++++++------------------
> > > >  kernel/bpf/syscall.c | 41 +++++++++++++++++++++++++++++++++++++++++
> > > >  3 files changed, 54 insertions(+), 18 deletions(-)
> > > >
> > >
> > > [...]
> > >
> > > >         /* Each data page is mapped twice to allow "virtual"
> > > >          * continuous read of samples wrapping around the end of ring
> > > > @@ -95,16 +95,10 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(struct bpf_map *map,
> > > >         if (!pages)
> > > >                 return NULL;
> > > >
> > > > -       for (i = 0; i < nr_pages; i++) {
> > > > -               page = alloc_pages_node(numa_node, flags, 0);
> > > > -               if (!page) {
> > > > -                       nr_pages = i;
> > > > -                       goto err_free_pages;
> > > > -               }
> > > > -               pages[i] = page;
> > > > -               if (i >= nr_meta_pages)
> > > > -                       pages[nr_data_pages + i] = page;
> > > > -       }
> > > > +       ptr = bpf_map_pages_alloc(map, pages, nr_meta_pages, nr_data_pages,
> > > > +                                 numa_node, flags, 0);
> > > > +       if (!ptr)
> > >
> > > bpf_map_pages_alloc() has some weird and confusing interface. It fills
> > > out pages (second argument) and also returns pages as void *. Why not
> > > just return int error (0 or -ENOMEM)? You are discarding this ptr
> > > anyways.
> > >
> >
> > I will change it.
> >
> > >
> > > But also thinking some more, bpf_map_pages_alloc() is very ringbuf
> > > specific (which other map will have exactly the same meaning for
> > > nr_meta_pages and nr_data_pages, where we also allocate 2 *
> > > nr_data_pages, etc).
> > >
> > > I don't think it makes sense to expose it as a generic internal API.
> > > Why not keep all that inside kernel/bpf/ringbuf.c instead?
> > >
> >
> > Right, it is used in ringbuf.c only currently. I will keep it inside ringbuf.c.
> >
>
> In such case you might as well put pages = bpf_map_area_alloc(); part
> into this function and return struct page ** as a result, so that
> everything related to pages is handled as a single unit. And then
> bpf_map_pages_free() will free not just each individual page, but also
> struct page*[] array.
>

Good suggestion. I will do it.

> Also please call it something ringbuf specific, e.g.,
> bpf_ringbuf_pages_{alloc,free}()?
>

It Makes sense to me.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-08-03 13:28 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-29 15:23 [RFC PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 01/15] bpf: Remove unneeded memset in queue_stack_map creation Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 02/15] bpf: Use bpf_map_area_free instread of kvfree Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 03/15] bpf: Make __GFP_NOWARN consistent in bpf map creation Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 04/15] bpf: Use bpf_map_area_alloc consistently on " Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 05/15] bpf: Introduce helpers for container of struct bpf_map Yafang Shao
2022-08-02  4:58   ` Alexei Starovoitov
2022-08-02 13:47     ` Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 06/15] bpf: Use bpf_map_container_alloc helpers in various bpf maps Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 07/15] bpf: Define bpf_map_get_memcg for !CONFIG_MEMCG_KMEM Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 08/15] bpf: Use scope-based charge for bpf_map_area_alloc Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 09/15] bpf: Use bpf_map_kzalloc in arraymap Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 10/15] bpf: Use bpf_map_pages_alloc in ringbuf Yafang Shao
2022-08-01 23:16   ` Andrii Nakryiko
2022-08-02 13:31     ` Yafang Shao
2022-08-02 18:00       ` Andrii Nakryiko
2022-08-03 13:27         ` Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 11/15] bpf: Use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 12/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 13/15] bpf: Add new parameter into bpf_map_container_alloc Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 14/15] bpf: Add new map flag BPF_F_SELECTABLE_MEMCG Yafang Shao
2022-07-29 15:23 ` [RFC PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
2022-08-02  4:55   ` Alexei Starovoitov
2022-08-02 13:47     ` Yafang Shao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.