All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>, Tejun Heo <tj@kernel.org>,
	dennis@kernel.org, Chris Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	linux-mm <linux-mm@kvack.org>, bpf <bpf@vger.kernel.org>
Subject: Re: [RFC PATCH bpf-next v2 00/11] mm, bpf: Add BPF into /proc/meminfo
Date: Tue, 31 Jan 2023 14:28:22 +0800	[thread overview]
Message-ID: <CALOAHbB17gTQx4O0+b8oBLQ6ckSFHyS4qwqUn0kXj5U1Wqc3SA@mail.gmail.com> (raw)
In-Reply-To: <Y9fCvZIfvgO+nJ9E@pc636>

On Mon, Jan 30, 2023 at 9:14 PM Uladzislau Rezki <urezki@gmail.com> wrote:
>
> On Sat, Jan 28, 2023 at 07:49:08PM +0800, Yafang Shao wrote:
> > On Thu, Jan 26, 2023 at 1:45 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Tue, Jan 17, 2023 at 10:49 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > I just don't want to add many if-elses or switch-cases into
> > > > > > bpf_map_memory_footprint(), because I think it is a little ugly.
> > > > > > Introducing a new map ops could make it more clear.  For example,
> > > > > > static unsigned long bpf_map_memory_footprint(const struct bpf_map *map)
> > > > > > {
> > > > > >     unsigned long size;
> > > > > >
> > > > > >     if (map->ops->map_mem_footprint)
> > > > > >         return map->ops->map_mem_footprint(map);
> > > > > >
> > > > > >     size = round_up(map->key_size + bpf_map_value_size(map), 8);
> > > > > >     return round_up(map->max_entries * size, PAGE_SIZE);
> > > > > > }
> > > > >
> > > > > It is also ugly, because bpf_map_value_size() already has if-stmt.
> > > > > I prefer to keep all estimates in one place.
> > > > > There is no need to be 100% accurate.
> > > >
> > > > Per my investigation, it can be almost accurate with little effort.
> > > > Take the htab for example,
> > > > static unsigned long htab_mem_footprint(const struct bpf_map *map)
> > > > {
> > > >     struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
> > > >     unsigned long size = 0;
> > > >
> > > >     if (!htab_is_prealloc(htab)) {
> > > >         size += htab_elements_size(htab);
> > > >     }
> > > >     size += kvsize(htab->elems);
> > > >     size += percpu_size(htab->extra_elems);
> > > >     size += kvsize(htab->buckets);
> > > >     size += bpf_mem_alloc_size(&htab->pcpu_ma);
> > > >     size += bpf_mem_alloc_size(&htab->ma);
> > > >     if (htab->use_percpu_counter)
> > > >         size += percpu_size(htab->pcount.counters);
> > > >     size += percpu_size(htab->map_locked[i]) * HASHTAB_MAP_LOCK_COUNT;
> > > >     size += kvsize(htab);
> > > >     return size;
> > > > }
> > >
> > > Please don't.
> > > Above doesn't look maintainable.
> >
> > It is similar to htab_map_free(). These pointers are the pointers
> > which will be freed in map_free().
> > We just need to keep map_mem_footprint() in sync with map_free(). It
> > won't be a problem for maintenance.
> >
> > > Look at kvsize(htab). Do you really care about hundred bytes?
> > > Just accept that there will be a small constant difference
> > > between what show_fdinfo reports and the real memory.
> >
> > The point is we don't have a clear idea what the margin is.
> >
> > > You cannot make it 100%.
> > > There is kfence that will allocate 4k though you asked kmalloc(8).
> > >
> >
> > We already have ksize()[1], which covers the kfence.
> >
> > [1]. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/mm/slab_common.c#n1431
> >
> > > > We just need to get the real memory size from the pointer instead of
> > > > calculating the size again.
> > > > For non-preallocated htab, it is a little trouble to get the element
> > > > size (not the unit_size), but it won't be a big deal.
> > >
> > > You'd have to convince mm folks that kvsize() is worth doing.
> > > I don't think it will be easy.
> > >
> >
> > As I mentioned above, we already have ksize(), so we only need to
> > introduce vsize().  Per my understanding, we can simply use
> > vm_struct->size to get the vmalloc size, see also the patch #5 in this
> > patchset[2].
> >
> > Andrew, Uladzislau, Christoph,  do you have any comments on this newly
> > introduced vsize()[2] ?
> >
> > [2]. https://lore.kernel.org/bpf/20230112155326.26902-6-laoar.shao@gmail.com/
> >
> <snip>
> +/* Report full size of underlying allocation of a vmalloc'ed addr */
> +static inline size_t vsize(const void *addr)
> +{
> +       struct vm_struct *area;
> +
> +       if (!addr)
> +               return 0;
> +
> +       area = find_vm_area(addr);
> +       if (unlikely(!area))
> +               return 0;
> +
> +       return area->size;
> +}
> <snip>
>
> You can not access area after the lock is dropped. We do not have any
> ref counters for VA objects. Therefore it should be done like below:
>
>
> <snip>
>   spin_lock(&vmap_area_lock);
>   va = __find_vmap_area(addr, &vmap_area_root);
>   if (va && va->vm)
>     va_size = va->vm->size;
>   spin_unlock(&vmap_area_lock);
>
>   return va_size;
> <snip>
>

Ah, it should take this global lock.  I missed that.
Many thanks for the detailed explanation.

-- 
Regards
Yafang

      reply	other threads:[~2023-01-31  6:29 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-12 15:53 [RFC PATCH bpf-next v2 00/11] mm, bpf: Add BPF into /proc/meminfo Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 01/11] mm: percpu: count memcg relevant memory only when kmemcg is enabled Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 02/11] mm: percpu: introduce percpu_size() Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 03/11] mm: slab: rename obj_full_size() Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 04/11] mm: slab: introduce ksize_full() Yafang Shao
2023-01-12 18:37   ` kernel test robot
2023-01-12 19:38   ` kernel test robot
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 05/11] mm: vmalloc: introduce vsize() Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 06/11] mm: util: introduce kvsize() Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 07/11] bpf: introduce new helpers bpf_ringbuf_pages_{alloc,free} Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 08/11] bpf: use bpf_map_kzalloc in arraymap Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 09/11] bpf: use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 10/11] bpf: add and use bpf map free helpers Yafang Shao
2023-01-12 19:07   ` kernel test robot
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 11/11] bpf: introduce bpf memory statistics Yafang Shao
2023-01-13  0:41   ` kernel test robot
2023-01-12 21:05 ` [RFC PATCH bpf-next v2 00/11] mm, bpf: Add BPF into /proc/meminfo Alexei Starovoitov
2023-01-13 11:53   ` Yafang Shao
2023-01-17 17:25     ` Alexei Starovoitov
2023-01-18  3:07       ` Yafang Shao
2023-01-18  5:39         ` Alexei Starovoitov
2023-01-18  6:49           ` Yafang Shao
2023-01-26  5:45             ` Alexei Starovoitov
2023-01-28 11:49               ` Yafang Shao
2023-01-30 13:14                 ` Uladzislau Rezki
2023-01-31  6:28                   ` Yafang Shao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALOAHbB17gTQx4O0+b8oBLQ6ckSFHyS4qwqUn0kXj5U1Wqc3SA@mail.gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=daniel@iogearbox.net \
    --cc=dennis@kernel.org \
    --cc=haoluo@google.com \
    --cc=hch@infradead.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=sdf@google.com \
    --cc=songliubraving@fb.com \
    --cc=tj@kernel.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.