From: Yafang Shao <laoar.shao@gmail.com>
To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
kafai@fb.com, songliubraving@fb.com, yhs@fb.com,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com,
haoluo@google.com, jolsa@kernel.org, tj@kernel.org,
dennis@kernel.org, cl@linux.com, akpm@linux-foundation.org,
penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com,
vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com
Cc: linux-mm@kvack.org, bpf@vger.kernel.org,
Yafang Shao <laoar.shao@gmail.com>
Subject: [RFC PATCH bpf-next 0/9] mm, bpf: Add BPF into /proc/meminfo
Date: Mon, 12 Dec 2022 00:37:02 +0000 [thread overview]
Message-ID: <20221212003711.24977-1-laoar.shao@gmail.com> (raw)
Currently there's no way to get BPF memory usage, while we can only
estimate the usage by bpftool or memcg, both of which are not reliable.
- bpftool
`bpftool {map,prog} show` can show us the memlock of each map and
prog, but the memlock is vary from the real memory size. The memlock
of a bpf object is approximately
`round_up(key_size + value_size, 8) * max_entries`,
so 1) it can't apply to the non-preallocated bpf map which may
increase or decrease the real memory size dynamically. 2) the element
size of some bpf map is not `key_size + value_size`, for example the
element size of htab is
`sizeof(struct htab_elem) + round_up(key_size, 8) + round_up(value_size, 8)`
That said the differece between these two values may be very great if
the key_size and value_size is small. For example in my verifaction,
the size of memlock and real memory of a preallocated hash map are,
$ grep BPF /proc/meminfo
BPF: 1026048 B <<< the size of preallocated memalloc pool
(create hash map)
$ bpftool map show
3: hash name count_map flags 0x0
key 4B value 4B max_entries 1048576 memlock 8388608B
$ grep BPF /proc/meminfo
BPF: 84919344 B
So the real memory size is $((84919344 - 1026048)) which is 83893296
bytes while the memlock is only 8388608 bytes.
- memcg
With memcg we only know that the BPF memory usage is less than
memory.usage_in_bytes (or memory.current in v2). Furthermore, we only
know that the BPF memory usage is less than $MemTotal if the BPF
object is charged into root memcg :)
So we need a way to get the BPF memory usage especially there will be
more and more bpf programs running on the production environment. The
memory usage of BPF memory is not trivial, which deserves a new item in
/proc/meminfo.
This patchset introduce a solution to calculate the BPF memory usage.
This solution is similar to how memory is charged into memcg, so it is
easy to understand. It counts three types of memory usage -
- page
via kmalloc, vmalloc, kmem_cache_alloc or alloc pages directly and
their families.
When a page is allocated, we will count its size and mark the head
page, and then check the head page at page freeing.
- slab
via kmalloc, kmem_cache_alloc and their families.
When a slab object is allocated, we will mark this object in this
slab and check it at slab object freeing. That said we need extra memory
to store the information of each object in a slab.
- percpu
via alloc_percpu and its family.
When a percpu area is allocated, we will mark this area in this
percpu chunk and check it at percpu area freeing. That said we need
extra memory to store the information of each area in a percpu chunk.
So we only need to annotate the allcation to add the BPF memory size,
and the sub of the BPF memory size will be handled automatically at
freeing. We can annotate it in irq, softirq or process context. To avoid
counting the nested allcations, for example the percpu backing allocator,
we reuse the __GFP_ACCOUNT to filter them out. __GFP_ACCOUNT also make
the count consistent with memcg accounting.
To store the information of a slab or a page, we need to create a new
member in struct page, but we can do it in page extension which can
avoid changing the size of struct page. So a new page extension
active_vm is introduced. Each page and each slab which is allocated as
BPF memory will have a struct active_vm. The reason it is named as
active_vm is that we can extend it to other areas easily, for example in
the future we may use it to count other memory usage.
The new page extension active_vm can be disabled via CONFIG_ACTIVE_VM at
compile time or kernel parameter `active_vm=` at runtime.
Below is the result of this patchset,
$ grep BPF /proc/meminfo
BPF: 1002 kB
Currently only bpf map is supported, and only slub in supported.
Future works:
- support bpf prog
- not sure if it needs to support slab
(it seems slab will be deprecated)
- support per-map memory usage
- support per-memcg memory usage
Yafang Shao (9):
mm: Introduce active vm item
mm: Allow using active vm in all contexts
mm: percpu: Account active vm for percpu
mm: slab: Account active vm for slab
mm: Account active vm for page
bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free}
bpf: Use bpf_map_kzalloc in arraymap
bpf: Use bpf_map_kvcalloc in bpf_local_storage
bpf: Use active vm to account bpf map memory usage
fs/proc/meminfo.c | 3 +
include/linux/active_vm.h | 73 ++++++++++++
include/linux/bpf.h | 8 ++
include/linux/page_ext.h | 1 +
include/linux/sched.h | 5 +
kernel/bpf/arraymap.c | 16 +--
kernel/bpf/bpf_local_storage.c | 4 +-
kernel/bpf/memalloc.c | 5 +
kernel/bpf/ringbuf.c | 75 ++++++++----
kernel/bpf/syscall.c | 40 ++++++-
kernel/fork.c | 4 +
mm/Kconfig | 8 ++
mm/Makefile | 1 +
mm/active_vm.c | 203 +++++++++++++++++++++++++++++++++
mm/active_vm.h | 74 ++++++++++++
mm/page_alloc.c | 14 +++
mm/page_ext.c | 4 +
mm/percpu-internal.h | 3 +
mm/percpu.c | 43 +++++++
mm/slab.h | 7 ++
mm/slub.c | 2 +
21 files changed, 557 insertions(+), 36 deletions(-)
create mode 100644 include/linux/active_vm.h
create mode 100644 mm/active_vm.c
create mode 100644 mm/active_vm.h
--
2.30.1 (Apple Git-130)
next reply other threads:[~2022-12-12 0:37 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-12 0:37 Yafang Shao [this message]
2022-12-12 0:37 ` [RFC PATCH bpf-next 1/9] mm: Introduce active vm item Yafang Shao
2022-12-12 0:37 ` [RFC PATCH bpf-next 2/9] mm: Allow using active vm in all contexts Yafang Shao
2022-12-12 0:37 ` [RFC PATCH bpf-next 3/9] mm: percpu: Account active vm for percpu Yafang Shao
2022-12-12 0:37 ` [RFC PATCH bpf-next 4/9] mm: slab: Account active vm for slab Yafang Shao
2022-12-12 2:54 ` kernel test robot
2022-12-12 0:37 ` [RFC PATCH bpf-next 5/9] mm: Account active vm for page Yafang Shao
2022-12-12 3:34 ` kernel test robot
2022-12-12 4:14 ` kernel test robot
2022-12-12 0:37 ` [RFC PATCH bpf-next 6/9] bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free} Yafang Shao
2022-12-12 0:37 ` [RFC PATCH bpf-next 7/9] bpf: Use bpf_map_kzalloc in arraymap Yafang Shao
2022-12-12 0:37 ` [RFC PATCH bpf-next 8/9] bpf: Use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
2022-12-12 0:37 ` [RFC PATCH bpf-next 9/9] bpf: Use active vm to account bpf map memory usage Yafang Shao
2022-12-14 8:45 ` kernel test robot
2022-12-14 12:01 ` Yafang Shao
2022-12-12 17:54 ` [RFC PATCH bpf-next 0/9] mm, bpf: Add BPF into /proc/meminfo Vlastimil Babka
2022-12-13 11:52 ` Yafang Shao
2022-12-13 14:56 ` Hyeonggon Yoo
2022-12-13 15:52 ` Vlastimil Babka
2022-12-13 19:21 ` Paul E. McKenney
2022-12-14 10:46 ` Yafang Shao
2022-12-14 10:43 ` Yafang Shao
2022-12-14 10:34 ` Yafang Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221212003711.24977-1-laoar.shao@gmail.com \
--to=laoar.shao@gmail.com \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cl@linux.com \
--cc=daniel@iogearbox.net \
--cc=dennis@kernel.org \
--cc=haoluo@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=linux-mm@kvack.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=sdf@google.com \
--cc=songliubraving@fb.com \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.