Re: [PATCH bpf-next 0/7] bpf, mm: bpf memory usage

From: Yafang Shao <laoar.shao@gmail.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	kafai@fb.com, songliubraving@fb.com, yhs@fb.com,
	kpsingh@kernel.org, sdf@google.com, haoluo@google.com,
	jolsa@kernel.org, tj@kernel.org, dennis@kernel.org, cl@linux.com,
	akpm@linux-foundation.org, penberg@kernel.org,
	rientjes@google.com, iamjoonsoo.kim@lge.com,
	roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, vbabka@suse.cz,
	urezki@gmail.com, linux-mm@kvack.org, bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next 0/7] bpf, mm: bpf memory usage
Date: Sun, 5 Feb 2023 12:03:05 +0800	[thread overview]
Message-ID: <CALOAHbAjHqXGZH_p19aYTbqK=sE8ZaMxhVzAoTO4ZKSXLiyx-w@mail.gmail.com> (raw)
In-Reply-To: <63ddbfd9ae610_6bb1520861@john.notmuch>

On Sat, Feb 4, 2023 at 10:15 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Yafang Shao wrote:
> > Currently we can't get bpf memory usage reliably. bpftool now shows the
> > bpf memory footprint, which is difference with bpf memory usage. The
> > difference can be quite great between the footprint showed in bpftool
> > and the memory actually allocated by bpf in some cases, for example,
> >
> > - non-preallocated bpf map
> >   The non-preallocated bpf map memory usage is dynamically changed. The
> >   allocated elements count can be from 0 to the max entries. But the
> >   memory footprint in bpftool only shows a fixed number.
> > - bpf metadata consumes more memory than bpf element
> >   In some corner cases, the bpf metadata can consumes a lot more memory
> >   than bpf element consumes. For example, it can happen when the element
> >   size is quite small.
>
> Just following up slightly on previous comment.
>
> The metadata should be fixed and knowable correct?

The metadata of BPF itself is fixed, but the medata of MM allocation
depends on the kernel configuretion.

> What I'm getting at
> is if this can be calculated directly instead of through a BPF helper
> and walking the entire map.
>

As I explained in another thread, it doesn't walk the entire map.

> >
> > We need a way to get the bpf memory usage especially there will be more
> > and more bpf programs running on the production environment and thus the
> > bpf memory usage is not trivial.
>
> In our environments we track map usage so we always know how many entries
> are in a map. I don't think we use this to calculate memory footprint
> at the moment, but just for map usage. Seems though once you have this
> calculating memory footprint can be done out of band because element
> and overheads costs are fixed.
>
> >
> > This patchset introduces a new map ops ->map_mem_usage to get the memory
> > usage. In this ops, the memory usage is got from the pointers which is
> > already allocated by a bpf map. To make the code simple, we igore some
> > small pointers as their size are quite small compared with the total
> > usage.
> >
> > In order to get the memory size from the pointers, some generic mm helpers
> > are introduced firstly, for example, percpu_size(), vsize() and kvsize().
> >
> > This patchset only implements the bpf memory usage for hashtab. I will
> > extend it to other maps and bpf progs (bpf progs can dynamically allocate
> > memory via bpf_obj_new()) in the future.
>
> My preference would be to calculate this out of band. Walking a
> large map and doing it in a critical section to get the memory
> usage seems not optimal
>

I don't quite understand what you mean by calculating it out of band.
This patchset introduces a BPF helper which is used in bpftool, so it
is already out of band, right ?
We should do it in bpftool, because the sys admin wants a generic way
to get the system-wide bpf memory usage.

-- 
Regards
Yafang