netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/15] bpf: Introduce selectable memcg for bpf map
@ 2022-08-10 15:13 Yafang Shao
  2022-08-10 15:13 ` [PATCH 01/15] bpf: Remove unneeded memset in queue_stack_map creation Yafang Shao
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Yafang Shao @ 2022-08-10 15:13 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

On our production environment, we may load, run and pin bpf programs and
maps in containers. For example, some of our networking bpf programs and
maps are loaded and pinned by a process running in a container on our
k8s environment. In this container, there're also running some other
user applications which watch the networking configurations from remote
servers and update them on this local host, log the error events, monitor
the traffic, and do some other stuffs. Sometimes we may need to update 
these user applications to a new release, and in this update process we
will destroy the old container and then start a new genration. In order not
to interrupt the bpf programs in the update process, we will pin the bpf
programs and maps in bpffs. That is the background and use case on our
production environment. 

After switching to memcg-based bpf memory accounting to limit the bpf
memory, some unexpected issues jumped out at us.
1. The memory usage is not consistent between the first generation and
new generations.
2. After the first generation is destroyed, the bpf memory can't be
limited if the bpf maps are not preallocated, because they will be
reparented.

This patchset tries to resolve these issues by introducing an
independent memcg to limit the bpf memory.

In the bpf map creation, we can assign a specific memcg instead of using
the current memcg.  That makes it flexible in containized environment.
For example, if we want to limit the pinned bpf maps, we can use below
hierarchy,

    Shared resources              Private resources 
                                    
     bpf-memcg                      k8s-memcg
     /        \                     /             
bpf-bar-memcg bpf-foo-memcg   srv-foo-memcg        
                  |               /        \
               (charged)     (not charged) (charged)                 
                  |           /              \
                  |          /                \
          bpf-foo-{progs, maps}              srv-foo

srv-foo loads and pins bpf-foo-{progs, maps}, but they are charged to an
independent memcg (bpf-foo-memcg) instead of srv-foo's memcg
(srv-foo-memcg).

Pls. note that there may be no process in bpf-foo-memcg, that means it
can be rmdir-ed by root user currently. Meanwhile we don't forcefully
destroy a memcg if it doesn't have any residents. So this hierarchy is
acceptible. 

In order to make the memcg of bpf maps seletectable, this patchset
introduces some memory allocation wrappers to allocate map related
memory. In these wrappers, it will get the memcg from the map and then
charge the allocated pages or objs.  

Currenly it only supports for bpf map, and we can extend it to bpf prog
as well. It only supports for cgroup2 now, but we can make an additional
change in cgroup_get_from_fd() to support it for cgroup1. 

The observebility can also be supported in the next step, for example,
showing the bpf map's memcg by 'bpftool map show' or even showing which
maps are charged to a specific memcg by 'bpftool cgroup show'.
Furthermore, we may also show an accurate memory size of a bpf map
instead of an estimated memory size in 'bpftool map show' in the future. 

RFC->v1:
- get rid of bpf_map container wrapper (Alexei)
- add the new field into the end of struct (Alexei)
- get rid of BPF_F_SELECTABLE_MEMCG (Alexei)
- save memcg in bpf_map_init_from_attr
- introduce bpf_ringbuf_pages_{alloc,free} and keep them inside
  kernel/bpf/ringbuf.c  (Andrii)

Yafang Shao (15):
  bpf: Remove unneeded memset in queue_stack_map creation
  bpf: Use bpf_map_area_free instread of kvfree
  bpf: Make __GFP_NOWARN consistent in bpf map creation
  bpf: Use bpf_map_area_alloc consistently on bpf map creation
  bpf: Fix incorrect mem_cgroup_put
  bpf: Define bpf_map_{get,put}_memcg for !CONFIG_MEMCG_KMEM
  bpf: Call bpf_map_init_from_attr() immediately after map creation
  bpf: Save memcg in bpf_map_init_from_attr()
  bpf: Use scoped-based charge in bpf_map_area_alloc
  bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free}
  bpf: Use bpf_map_kzalloc in arraymap
  bpf: Use bpf_map_kvcalloc in bpf_local_storage
  mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  bpf: Add return value for bpf_map_init_from_attr
  bpf: Introduce selectable memcg for bpf map

 include/linux/bpf.h            |  43 ++++++++++++-
 include/linux/memcontrol.h     |  11 ++++
 include/uapi/linux/bpf.h       |   1 +
 kernel/bpf/arraymap.c          |  34 ++++++-----
 kernel/bpf/bloom_filter.c      |  11 +++-
 kernel/bpf/bpf_local_storage.c |  17 ++++--
 kernel/bpf/bpf_struct_ops.c    |  19 +++---
 kernel/bpf/cpumap.c            |  17 ++++--
 kernel/bpf/devmap.c            |  30 ++++++----
 kernel/bpf/hashtab.c           |  26 ++++----
 kernel/bpf/local_storage.c     |  12 ++--
 kernel/bpf/lpm_trie.c          |  12 +++-
 kernel/bpf/offload.c           |  12 ++--
 kernel/bpf/queue_stack_maps.c  |  13 ++--
 kernel/bpf/reuseport_array.c   |  11 +++-
 kernel/bpf/ringbuf.c           | 104 ++++++++++++++++++++++----------
 kernel/bpf/stackmap.c          |  13 ++--
 kernel/bpf/syscall.c           | 133 ++++++++++++++++++++++++++++-------------
 mm/memcontrol.c                |  41 +++++++++++++
 net/core/sock_map.c            |  30 ++++++----
 net/xdp/xskmap.c               |  12 +++-
 tools/include/uapi/linux/bpf.h |   1 +
 tools/lib/bpf/bpf.c            |   3 +-
 tools/lib/bpf/bpf.h            |   3 +-
 tools/lib/bpf/gen_loader.c     |   2 +-
 tools/lib/bpf/libbpf.c         |   2 +
 tools/lib/bpf/skel_internal.h  |   2 +-
 27 files changed, 436 insertions(+), 179 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 01/15] bpf: Remove unneeded memset in queue_stack_map creation
  2022-08-10 15:13 [PATCH 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
@ 2022-08-10 15:13 ` Yafang Shao
  2022-08-10 15:13 ` [PATCH 02/15] bpf: Use bpf_map_area_free instread of kvfree Yafang Shao
  2022-08-10 15:21 ` [PATCH 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
  2 siblings, 0 replies; 4+ messages in thread
From: Yafang Shao @ 2022-08-10 15:13 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

__GFP_ZERO will clear the memory, so we don't need to memset it.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/queue_stack_maps.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index a1c0794..8a5e060 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -78,8 +78,6 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 	if (!qs)
 		return ERR_PTR(-ENOMEM);
 
-	memset(qs, 0, sizeof(*qs));
-
 	bpf_map_init_from_attr(&qs->map, attr);
 
 	qs->size = size;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 02/15] bpf: Use bpf_map_area_free instread of kvfree
  2022-08-10 15:13 [PATCH 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
  2022-08-10 15:13 ` [PATCH 01/15] bpf: Remove unneeded memset in queue_stack_map creation Yafang Shao
@ 2022-08-10 15:13 ` Yafang Shao
  2022-08-10 15:21 ` [PATCH 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
  2 siblings, 0 replies; 4+ messages in thread
From: Yafang Shao @ 2022-08-10 15:13 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

bpf_map_area_alloc() should be paired with bpf_map_area_free().

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/ringbuf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index ded4fae..3fb54fe 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -116,7 +116,7 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
 err_free_pages:
 	for (i = 0; i < nr_pages; i++)
 		__free_page(pages[i]);
-	kvfree(pages);
+	bpf_map_area_free(pages);
 	return NULL;
 }
 
@@ -190,7 +190,7 @@ static void bpf_ringbuf_free(struct bpf_ringbuf *rb)
 	vunmap(rb);
 	for (i = 0; i < nr_pages; i++)
 		__free_page(pages[i]);
-	kvfree(pages);
+	bpf_map_area_free(pages);
 }
 
 static void ringbuf_map_free(struct bpf_map *map)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 00/15] bpf: Introduce selectable memcg for bpf map
  2022-08-10 15:13 [PATCH 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
  2022-08-10 15:13 ` [PATCH 01/15] bpf: Remove unneeded memset in queue_stack_map creation Yafang Shao
  2022-08-10 15:13 ` [PATCH 02/15] bpf: Use bpf_map_area_free instread of kvfree Yafang Shao
@ 2022-08-10 15:21 ` Yafang Shao
  2 siblings, 0 replies; 4+ messages in thread
From: Yafang Shao @ 2022-08-10 15:21 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton
  Cc: netdev, bpf, Linux MM

On Wed, Aug 10, 2022 at 11:13 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On our production environment, we may load, run and pin bpf programs and
> maps in containers. For example, some of our networking bpf programs and
> maps are loaded and pinned by a process running in a container on our
> k8s environment. In this container, there're also running some other
> user applications which watch the networking configurations from remote
> servers and update them on this local host, log the error events, monitor
> the traffic, and do some other stuffs. Sometimes we may need to update
> these user applications to a new release, and in this update process we
> will destroy the old container and then start a new genration. In order not
> to interrupt the bpf programs in the update process, we will pin the bpf
> programs and maps in bpffs. That is the background and use case on our
> production environment.
>
> After switching to memcg-based bpf memory accounting to limit the bpf
> memory, some unexpected issues jumped out at us.
> 1. The memory usage is not consistent between the first generation and
> new generations.
> 2. After the first generation is destroyed, the bpf memory can't be
> limited if the bpf maps are not preallocated, because they will be
> reparented.
>
> This patchset tries to resolve these issues by introducing an
> independent memcg to limit the bpf memory.
>
> In the bpf map creation, we can assign a specific memcg instead of using
> the current memcg.  That makes it flexible in containized environment.
> For example, if we want to limit the pinned bpf maps, we can use below
> hierarchy,
>
>     Shared resources              Private resources
>
>      bpf-memcg                      k8s-memcg
>      /        \                     /
> bpf-bar-memcg bpf-foo-memcg   srv-foo-memcg
>                   |               /        \
>                (charged)     (not charged) (charged)
>                   |           /              \
>                   |          /                \
>           bpf-foo-{progs, maps}              srv-foo
>
> srv-foo loads and pins bpf-foo-{progs, maps}, but they are charged to an
> independent memcg (bpf-foo-memcg) instead of srv-foo's memcg
> (srv-foo-memcg).
>
> Pls. note that there may be no process in bpf-foo-memcg, that means it
> can be rmdir-ed by root user currently. Meanwhile we don't forcefully
> destroy a memcg if it doesn't have any residents. So this hierarchy is
> acceptible.
>
> In order to make the memcg of bpf maps seletectable, this patchset
> introduces some memory allocation wrappers to allocate map related
> memory. In these wrappers, it will get the memcg from the map and then
> charge the allocated pages or objs.
>
> Currenly it only supports for bpf map, and we can extend it to bpf prog
> as well. It only supports for cgroup2 now, but we can make an additional
> change in cgroup_get_from_fd() to support it for cgroup1.
>
> The observebility can also be supported in the next step, for example,
> showing the bpf map's memcg by 'bpftool map show' or even showing which
> maps are charged to a specific memcg by 'bpftool cgroup show'.
> Furthermore, we may also show an accurate memory size of a bpf map
> instead of an estimated memory size in 'bpftool map show' in the future.
>
> RFC->v1:
> - get rid of bpf_map container wrapper (Alexei)
> - add the new field into the end of struct (Alexei)
> - get rid of BPF_F_SELECTABLE_MEMCG (Alexei)
> - save memcg in bpf_map_init_from_attr
> - introduce bpf_ringbuf_pages_{alloc,free} and keep them inside
>   kernel/bpf/ringbuf.c  (Andrii)
>
> Yafang Shao (15):
>   bpf: Remove unneeded memset in queue_stack_map creation
>   bpf: Use bpf_map_area_free instread of kvfree
>   bpf: Make __GFP_NOWARN consistent in bpf map creation
>   bpf: Use bpf_map_area_alloc consistently on bpf map creation
>   bpf: Fix incorrect mem_cgroup_put
>   bpf: Define bpf_map_{get,put}_memcg for !CONFIG_MEMCG_KMEM
>   bpf: Call bpf_map_init_from_attr() immediately after map creation
>   bpf: Save memcg in bpf_map_init_from_attr()
>   bpf: Use scoped-based charge in bpf_map_area_alloc
>   bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free}
>   bpf: Use bpf_map_kzalloc in arraymap
>   bpf: Use bpf_map_kvcalloc in bpf_local_storage
>   mm, memcg: Add new helper get_obj_cgroup_from_cgroup
>   bpf: Add return value for bpf_map_init_from_attr
>   bpf: Introduce selectable memcg for bpf map
>
>  include/linux/bpf.h            |  43 ++++++++++++-
>  include/linux/memcontrol.h     |  11 ++++
>  include/uapi/linux/bpf.h       |   1 +
>  kernel/bpf/arraymap.c          |  34 ++++++-----
>  kernel/bpf/bloom_filter.c      |  11 +++-
>  kernel/bpf/bpf_local_storage.c |  17 ++++--
>  kernel/bpf/bpf_struct_ops.c    |  19 +++---
>  kernel/bpf/cpumap.c            |  17 ++++--
>  kernel/bpf/devmap.c            |  30 ++++++----
>  kernel/bpf/hashtab.c           |  26 ++++----
>  kernel/bpf/local_storage.c     |  12 ++--
>  kernel/bpf/lpm_trie.c          |  12 +++-
>  kernel/bpf/offload.c           |  12 ++--
>  kernel/bpf/queue_stack_maps.c  |  13 ++--
>  kernel/bpf/reuseport_array.c   |  11 +++-
>  kernel/bpf/ringbuf.c           | 104 ++++++++++++++++++++++----------
>  kernel/bpf/stackmap.c          |  13 ++--
>  kernel/bpf/syscall.c           | 133 ++++++++++++++++++++++++++++-------------
>  mm/memcontrol.c                |  41 +++++++++++++
>  net/core/sock_map.c            |  30 ++++++----
>  net/xdp/xskmap.c               |  12 +++-
>  tools/include/uapi/linux/bpf.h |   1 +
>  tools/lib/bpf/bpf.c            |   3 +-
>  tools/lib/bpf/bpf.h            |   3 +-
>  tools/lib/bpf/gen_loader.c     |   2 +-
>  tools/lib/bpf/libbpf.c         |   2 +
>  tools/lib/bpf/skel_internal.h  |   2 +-
>  27 files changed, 436 insertions(+), 179 deletions(-)
>
> --
> 1.8.3.1
>

Ah, this series is incomplete.
Pls see the update one.
https://lore.kernel.org/bpf/20220810151840.16394-1-laoar.shao@gmail.com/T/#t

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-08-10 15:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-10 15:13 [PATCH 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao
2022-08-10 15:13 ` [PATCH 01/15] bpf: Remove unneeded memset in queue_stack_map creation Yafang Shao
2022-08-10 15:13 ` [PATCH 02/15] bpf: Use bpf_map_area_free instread of kvfree Yafang Shao
2022-08-10 15:21 ` [PATCH 00/15] bpf: Introduce selectable memcg for bpf map Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).