All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>, Tejun Heo <tj@kernel.org>,
	dennis@kernel.org, Chris Lameter <cl@linux.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	linux-mm <linux-mm@kvack.org>, bpf <bpf@vger.kernel.org>
Subject: Re: [RFC PATCH bpf-next v2 00/11] mm, bpf: Add BPF into /proc/meminfo
Date: Wed, 18 Jan 2023 11:07:48 +0800	[thread overview]
Message-ID: <CALOAHbBVRvTkSxLin+9A20Wv0DZWz4epvNTY1jEaCTf7q0qWJA@mail.gmail.com> (raw)
In-Reply-To: <CAADnVQJGF5Xthpn7D2DgHHvZz8+dnuz2xMi6yoSziuauXO7ncA@mail.gmail.com>

On Wed, Jan 18, 2023 at 1:25 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jan 13, 2023 at 3:53 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Fri, Jan 13, 2023 at 5:05 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Jan 12, 2023 at 7:53 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > Currently there's no way to get BPF memory usage, while we can only
> > > > estimate the usage by bpftool or memcg, both of which are not reliable.
> > > >
> > > > - bpftool
> > > >   `bpftool {map,prog} show` can show us the memlock of each map and
> > > >   prog, but the memlock is vary from the real memory size. The memlock
> > > >   of a bpf object is approximately
> > > >   `round_up(key_size + value_size, 8) * max_entries`,
> > > >   so 1) it can't apply to the non-preallocated bpf map which may
> > > >   increase or decrease the real memory size dynamically. 2) the element
> > > >   size of some bpf map is not `key_size + value_size`, for example the
> > > >   element size of htab is
> > > >   `sizeof(struct htab_elem) + round_up(key_size, 8) + round_up(value_size, 8)`
> > > >   That said the differece between these two values may be very great if
> > > >   the key_size and value_size is small. For example in my verifaction,
> > > >   the size of memlock and real memory of a preallocated hash map are,
> > > >
> > > >   $ grep BPF /proc/meminfo
> > > >   BPF:                 350 kB  <<< the size of preallocated memalloc pool
> > > >
> > > >   (create hash map)
> > > >
> > > >   $ bpftool map show
> > > >   41549: hash  name count_map  flags 0x0
> > > >         key 4B  value 4B  max_entries 1048576  memlock 8388608B
> > > >
> > > >   $ grep BPF /proc/meminfo
> > > >   BPF:               82284 kB
> > > >
> > > >   So the real memory size is $((82284 - 350)) which is 81934 kB
> > > >   while the memlock is only 8192 kB.
> > >
> > > hashmap with key 4b and value 4b looks artificial to me,
> > > but since you're concerned with accuracy of bpftool reporting,
> > > please fix the estimation in bpf_map_memory_footprint().
> >
> > I thought bpf_map_memory_footprint() was deprecated, so I didn't try
> > to fix it before.
>
> It's not deprecated. It's trying to be accurate.
> See bpf_map_value_size().
> In the past we had to be precise when we calculated the required memory
> before we allocated and that was causing ongoing maintenance issues.
> Now bpf_map_memory_footprint() is an estimate for show_fdinfo.
> It can be made more accurate for this map with corner case key/value sizes.
>

Thanks for the clarification.

> > > You're correct that:
> > >
> > > > size of some bpf map is not `key_size + value_size`, for example the
> > > >   element size of htab is
> > > >   `sizeof(struct htab_elem) + round_up(key_size, 8) + round_up(value_size, 8)`
> > >
> > > So just teach bpf_map_memory_footprint() to do this more accurately.
> > > Add bucket size to it as well.
> > > Make it even more accurate with prealloc vs not.
> > > Much simpler change than adding run-time overhead to every alloc/free
> > > on bpf side.
> > >
> >
> > It seems that we'd better introduce ->memory_footprint for some
> > specific bpf maps. I will think about it.
>
> No. Don't build it into a replica of what we had before.
> Making existing bpf_map_memory_footprint() more accurate.
>

I just don't want to add many if-elses or switch-cases into
bpf_map_memory_footprint(), because I think it is a little ugly.
Introducing a new map ops could make it more clear.  For example,
static unsigned long bpf_map_memory_footprint(const struct bpf_map *map)
{
    unsigned long size;

    if (map->ops->map_mem_footprint)
        return map->ops->map_mem_footprint(map);

    size = round_up(map->key_size + bpf_map_value_size(map), 8);
    return round_up(map->max_entries * size, PAGE_SIZE);
}

> > > bpf side tracks all of its allocation. There is no need to do that
> > > in generic mm side.
> > > Exposing an aggregated single number if /proc/meminfo also looks wrong.
> >
> > Do you mean that we shouldn't expose it in /proc/meminfo ?
>
> We should not because it helps one particular use case only.
> Somebody else might want map mem info per container,
> then somebody would need it per user, etc.

It seems we should show memcg info and user info in bpftool map show.

> bpftool map show | awk
> solves all those cases without adding new uapi-s.

Makes sense to me.

-- 
Regards
Yafang

  reply	other threads:[~2023-01-18  3:08 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-12 15:53 [RFC PATCH bpf-next v2 00/11] mm, bpf: Add BPF into /proc/meminfo Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 01/11] mm: percpu: count memcg relevant memory only when kmemcg is enabled Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 02/11] mm: percpu: introduce percpu_size() Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 03/11] mm: slab: rename obj_full_size() Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 04/11] mm: slab: introduce ksize_full() Yafang Shao
2023-01-12 18:37   ` kernel test robot
2023-01-12 19:38   ` kernel test robot
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 05/11] mm: vmalloc: introduce vsize() Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 06/11] mm: util: introduce kvsize() Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 07/11] bpf: introduce new helpers bpf_ringbuf_pages_{alloc,free} Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 08/11] bpf: use bpf_map_kzalloc in arraymap Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 09/11] bpf: use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 10/11] bpf: add and use bpf map free helpers Yafang Shao
2023-01-12 19:07   ` kernel test robot
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 11/11] bpf: introduce bpf memory statistics Yafang Shao
2023-01-13  0:41   ` kernel test robot
2023-01-12 21:05 ` [RFC PATCH bpf-next v2 00/11] mm, bpf: Add BPF into /proc/meminfo Alexei Starovoitov
2023-01-13 11:53   ` Yafang Shao
2023-01-17 17:25     ` Alexei Starovoitov
2023-01-18  3:07       ` Yafang Shao [this message]
2023-01-18  5:39         ` Alexei Starovoitov
2023-01-18  6:49           ` Yafang Shao
2023-01-26  5:45             ` Alexei Starovoitov
2023-01-28 11:49               ` Yafang Shao
2023-01-30 13:14                 ` Uladzislau Rezki
2023-01-31  6:28                   ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALOAHbBVRvTkSxLin+9A20Wv0DZWz4epvNTY1jEaCTf7q0qWJA@mail.gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=daniel@iogearbox.net \
    --cc=dennis@kernel.org \
    --cc=haoluo@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=sdf@google.com \
    --cc=songliubraving@fb.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.