All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: paulmck@kernel.org
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	kafai@fb.com, songliubraving@fb.com, yhs@fb.com,
	john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com,
	haoluo@google.com, jolsa@kernel.org, tj@kernel.org,
	dennis@kernel.org, cl@linux.com, akpm@linux-foundation.org,
	penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com,
	roman.gushchin@linux.dev, linux-mm@kvack.org,
	bpf@vger.kernel.org, rcu@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [RFC PATCH bpf-next 0/9] mm, bpf: Add BPF into /proc/meminfo
Date: Wed, 14 Dec 2022 18:46:52 +0800	[thread overview]
Message-ID: <CALOAHbBxim-ahGQ8AQz5B4NCMFCza+Pzm9+jiQHPerMKHg_6Eg@mail.gmail.com> (raw)
In-Reply-To: <20221213192156.GS4001@paulmck-ThinkPad-P17-Gen-1>

On Wed, Dec 14, 2022 at 3:22 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Dec 13, 2022 at 04:52:09PM +0100, Vlastimil Babka wrote:
> > On 12/13/22 15:56, Hyeonggon Yoo wrote:
> > > On Tue, Dec 13, 2022 at 07:52:42PM +0800, Yafang Shao wrote:
> > >> On Tue, Dec 13, 2022 at 1:54 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > >> >
> > >> > On 12/12/22 01:37, Yafang Shao wrote:
> > >> > > Currently there's no way to get BPF memory usage, while we can only
> > >> > > estimate the usage by bpftool or memcg, both of which are not reliable.
> > >> > >
> > >> > > - bpftool
> > >> > >   `bpftool {map,prog} show` can show us the memlock of each map and
> > >> > >   prog, but the memlock is vary from the real memory size. The memlock
> > >> > >   of a bpf object is approximately
> > >> > >   `round_up(key_size + value_size, 8) * max_entries`,
> > >> > >   so 1) it can't apply to the non-preallocated bpf map which may
> > >> > >   increase or decrease the real memory size dynamically. 2) the element
> > >> > >   size of some bpf map is not `key_size + value_size`, for example the
> > >> > >   element size of htab is
> > >> > >   `sizeof(struct htab_elem) + round_up(key_size, 8) + round_up(value_size, 8)`
> > >> > >   That said the differece between these two values may be very great if
> > >> > >   the key_size and value_size is small. For example in my verifaction,
> > >> > >   the size of memlock and real memory of a preallocated hash map are,
> > >> > >
> > >> > >   $ grep BPF /proc/meminfo
> > >> > >   BPF:             1026048 B <<< the size of preallocated memalloc pool
> > >> > >
> > >> > >   (create hash map)
> > >> > >
> > >> > >   $ bpftool map show
> > >> > >   3: hash  name count_map  flags 0x0
> > >> > >           key 4B  value 4B  max_entries 1048576  memlock 8388608B
> > >> > >
> > >> > >   $ grep BPF /proc/meminfo
> > >> > >   BPF:            84919344 B
> > >> > >
> > >> > >   So the real memory size is $((84919344 - 1026048)) which is 83893296
> > >> > >   bytes while the memlock is only 8388608 bytes.
> > >> > >
> > >> > > - memcg
> > >> > >   With memcg we only know that the BPF memory usage is less than
> > >> > >   memory.usage_in_bytes (or memory.current in v2). Furthermore, we only
> > >> > >   know that the BPF memory usage is less than $MemTotal if the BPF
> > >> > >   object is charged into root memcg :)
> > >> > >
> > >> > > So we need a way to get the BPF memory usage especially there will be
> > >> > > more and more bpf programs running on the production environment. The
> > >> > > memory usage of BPF memory is not trivial, which deserves a new item in
> > >> > > /proc/meminfo.
> > >> > >
> > >> > > This patchset introduce a solution to calculate the BPF memory usage.
> > >> > > This solution is similar to how memory is charged into memcg, so it is
> > >> > > easy to understand. It counts three types of memory usage -
> > >> > >  - page
> > >> > >    via kmalloc, vmalloc, kmem_cache_alloc or alloc pages directly and
> > >> > >    their families.
> > >> > >    When a page is allocated, we will count its size and mark the head
> > >> > >    page, and then check the head page at page freeing.
> > >> > >  - slab
> > >> > >    via kmalloc, kmem_cache_alloc and their families.
> > >> > >    When a slab object is allocated, we will mark this object in this
> > >> > >    slab and check it at slab object freeing. That said we need extra memory
> > >> > >    to store the information of each object in a slab.
> > >> > >  - percpu
> > >> > >    via alloc_percpu and its family.
> > >> > >    When a percpu area is allocated, we will mark this area in this
> > >> > >    percpu chunk and check it at percpu area freeing. That said we need
> > >> > >    extra memory to store the information of each area in a percpu chunk.
> > >> > >
> > >> > > So we only need to annotate the allcation to add the BPF memory size,
> > >> > > and the sub of the BPF memory size will be handled automatically at
> > >> > > freeing. We can annotate it in irq, softirq or process context. To avoid
> > >> > > counting the nested allcations, for example the percpu backing allocator,
> > >> > > we reuse the __GFP_ACCOUNT to filter them out. __GFP_ACCOUNT also make
> > >> > > the count consistent with memcg accounting.
> > >> >
> > >> > So you can't easily annotate the freeing places as well, to avoid the whole
> > >> > tracking infrastructure?
> > >>
> > >> The trouble is kfree_rcu().  for example,
> > >>     old_item = active_vm_item_set(ACTIVE_VM_BPF);
> > >>     kfree_rcu();
> > >>     active_vm_item_set(old_item);
> > >> If we want to pass the ACTIVE_VM_BPF into the deferred rcu context, we
> > >> will change lots of code in the RCU subsystem. I'm not sure if it is
> > >> worth it.
> > >
> > > (+Cc rcu folks)
> > >
> > > IMO adding new kfree_rcu() varient for BPF that accounts BPF memory
> > > usage would be much less churn :)
> >
> > Alternatively, just account the bpf memory as freed already when calling
> > kfree_rcu()? I think the amount of memory "in flight" to be freed by rcu is
> > a separate issue (if it's actually an issue) and not something each
> > kfree_rcu() user should think about separately?
>
> If the in-flight memory really does need to be accounted for, then one
> straightforward approach is to use call_rcu() and do the first part of
> the needed accounting at the call_rcu() callsite and the rest of the
> accounting when the callback is invoked.  Or, if memory must be freed
> quickly even on ChromeOS and Android, use call_rcu_hurry() instead
> of call_rcu().
>

Right, call_rcu() can make it work.
But I'm not sure if all kfree_rcu() in kernel/bpf can be replaced by call_rcu().
Alexei, any comment on it ?

$ grep -r "kfree_rcu" kernel/bpf/
kernel/bpf/local_storage.c:     kfree_rcu(new, rcu);
kernel/bpf/lpm_trie.c:          kfree_rcu(node, rcu);
kernel/bpf/lpm_trie.c:          kfree_rcu(parent, rcu);
kernel/bpf/lpm_trie.c:          kfree_rcu(node, rcu);
kernel/bpf/lpm_trie.c:  kfree_rcu(node, rcu);
kernel/bpf/bpf_inode_storage.c:         kfree_rcu(local_storage, rcu);
kernel/bpf/bpf_task_storage.c:          kfree_rcu(local_storage, rcu);
kernel/bpf/trampoline.c:        kfree_rcu(im, rcu);
kernel/bpf/core.c:      kfree_rcu(progs, rcu);
kernel/bpf/core.c:       * no need to call kfree_rcu(), just call
kfree() directly.
kernel/bpf/core.c:              kfree_rcu(progs, rcu);
kernel/bpf/bpf_local_storage.c:  * kfree(), else do kfree_rcu().
kernel/bpf/bpf_local_storage.c:         kfree_rcu(local_storage, rcu);
kernel/bpf/bpf_local_storage.c:         kfree_rcu(selem, rcu);
kernel/bpf/bpf_local_storage.c:         kfree_rcu(selem, rcu);
kernel/bpf/bpf_local_storage.c:                 kfree_rcu(local_storage, rcu);

-- 
Regards
Yafang

  reply	other threads:[~2022-12-14 10:47 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-12  0:37 [RFC PATCH bpf-next 0/9] mm, bpf: Add BPF into /proc/meminfo Yafang Shao
2022-12-12  0:37 ` [RFC PATCH bpf-next 1/9] mm: Introduce active vm item Yafang Shao
2022-12-12  0:37 ` [RFC PATCH bpf-next 2/9] mm: Allow using active vm in all contexts Yafang Shao
2022-12-12  0:37 ` [RFC PATCH bpf-next 3/9] mm: percpu: Account active vm for percpu Yafang Shao
2022-12-12  0:37 ` [RFC PATCH bpf-next 4/9] mm: slab: Account active vm for slab Yafang Shao
2022-12-12  2:54   ` kernel test robot
2022-12-12  0:37 ` [RFC PATCH bpf-next 5/9] mm: Account active vm for page Yafang Shao
2022-12-12  3:34   ` kernel test robot
2022-12-12  4:14   ` kernel test robot
2022-12-12  0:37 ` [RFC PATCH bpf-next 6/9] bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free} Yafang Shao
2022-12-12  0:37 ` [RFC PATCH bpf-next 7/9] bpf: Use bpf_map_kzalloc in arraymap Yafang Shao
2022-12-12  0:37 ` [RFC PATCH bpf-next 8/9] bpf: Use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
2022-12-12  0:37 ` [RFC PATCH bpf-next 9/9] bpf: Use active vm to account bpf map memory usage Yafang Shao
2022-12-14  8:45   ` kernel test robot
2022-12-14 12:01     ` Yafang Shao
2022-12-12 17:54 ` [RFC PATCH bpf-next 0/9] mm, bpf: Add BPF into /proc/meminfo Vlastimil Babka
2022-12-13 11:52   ` Yafang Shao
2022-12-13 14:56     ` Hyeonggon Yoo
2022-12-13 15:52       ` Vlastimil Babka
2022-12-13 19:21         ` Paul E. McKenney
2022-12-14 10:46           ` Yafang Shao [this message]
2022-12-14 10:43         ` Yafang Shao
2022-12-14 10:34       ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALOAHbBxim-ahGQ8AQz5B4NCMFCza+Pzm9+jiQHPerMKHg_6Eg@mail.gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=daniel@iogearbox.net \
    --cc=dennis@kernel.org \
    --cc=haoluo@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=paulmck@kernel.org \
    --cc=penberg@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=sdf@google.com \
    --cc=songliubraving@fb.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.