All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: Song Liu <songliubraving@fb.com>
Cc: Song Liu <song@kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	bpf <bpf@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH bpf-next 2/2] bpf: flexible size for bpf_prog_pack
Date: Fri, 11 Feb 2022 15:35:31 +0100	[thread overview]
Message-ID: <dd6dee71-94d7-5393-8fe6-c667938ebfac@iogearbox.net> (raw)
In-Reply-To: <A3FB68F3-34DC-4598-8C6B-145421DCE73E@fb.com>

On 2/10/22 5:51 PM, Song Liu wrote:
>> On Feb 10, 2022, at 12:25 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 2/10/22 7:41 AM, Song Liu wrote:
>>> bpf_prog_pack uses huge pages to reduce pressue on instruction TLB.
>>> To guarantee allocating huge pages for bpf_prog_pack, it is necessary to
>>> allocate memory of size PMD_SIZE * num_online_nodes().
>>> On the other hand, if the system doesn't support huge pages, it is more
>>> efficient to allocate PAGE_SIZE bpf_prog_pack.
>>> Address different scenarios with more flexible bpf_prog_pack_size().
>>> Signed-off-by: Song Liu <song@kernel.org>
>>> ---
>>>   kernel/bpf/core.c | 47 +++++++++++++++++++++++++++--------------------
>>>   1 file changed, 27 insertions(+), 20 deletions(-)
>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>>> index 42d96549a804..d961a1f07a13 100644
>>> --- a/kernel/bpf/core.c
>>> +++ b/kernel/bpf/core.c
>>> @@ -814,46 +814,53 @@ int bpf_jit_add_poke_descriptor(struct bpf_prog *prog,
>>>    * allocator. The prog_pack allocator uses HPAGE_PMD_SIZE page (2MB on x86)
>>>    * to host BPF programs.
>>>    */
>>> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> -#define BPF_PROG_PACK_SIZE	HPAGE_PMD_SIZE
>>> -#else
>>> -#define BPF_PROG_PACK_SIZE	PAGE_SIZE
>>> -#endif
>>>   #define BPF_PROG_CHUNK_SHIFT	6
>>>   #define BPF_PROG_CHUNK_SIZE	(1 << BPF_PROG_CHUNK_SHIFT)
>>>   #define BPF_PROG_CHUNK_MASK	(~(BPF_PROG_CHUNK_SIZE - 1))
>>> -#define BPF_PROG_CHUNK_COUNT	(BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE)
>>>     struct bpf_prog_pack {
>>>   	struct list_head list;
>>>   	void *ptr;
>>> -	unsigned long bitmap[BITS_TO_LONGS(BPF_PROG_CHUNK_COUNT)];
>>> +	unsigned long bitmap[];
>>>   };
>>>   -#define BPF_PROG_MAX_PACK_PROG_SIZE	BPF_PROG_PACK_SIZE
>>>   #define BPF_PROG_SIZE_TO_NBITS(size)	(round_up(size, BPF_PROG_CHUNK_SIZE) / BPF_PROG_CHUNK_SIZE)
>>>     static DEFINE_MUTEX(pack_mutex);
>>>   static LIST_HEAD(pack_list);
>>>   +static inline int bpf_prog_pack_size(void)
>>> +{
>>> +	/* If vmap_allow_huge == true, use pack size of the smallest
>>> +	 * possible vmalloc huge page: PMD_SIZE * num_online_nodes().
>>> +	 * Otherwise, use pack size of PAGE_SIZE.
>>> +	 */
>>> +	return get_vmap_allow_huge() ? PMD_SIZE * num_online_nodes() : PAGE_SIZE;
>>> +}
>>
>> Imho, this is making too many assumptions about implementation details. Can't we
>> just add a new module_alloc*() API instead which internally guarantees allocating
>> huge pages when enabled/supported (e.g. with a __weak function as fallback)?
> 
> I agree that this is making too many assumptions. But a new module_alloc_huge()
> may not work, because we need the caller to know the proper size to ask for.
> (Or maybe I misunderstood your suggestion?)
> 
> How about we introduce something like
> 
>      /* minimal size to get huge pages from vmalloc. If not possible,
>       * return 0 (or -1?)
>       */
>      int vmalloc_hpage_min_size(void)
>      {
>          return vmap_allow_huge ? PMD_SIZE * num_online_nodes() : 0;
>      }

And that would live inside mm/vmalloc.c and is exported to users ...

>      /* minimal size to get huge pages from module_alloc */
>      int module_alloc_hpage_min_size(void)
>      {
>          return vmalloc_hpage_min_size();
>      }

... and this one as wrapper in module alloc infra with __weak attr?

>      static inline int bpf_prog_pack_size(void)
>      {
>          return module_alloc_hpage_min_size() ? : PAGE_SIZE;
>      }

Could probably work. It's not nice, but at least in the corresponding places so it's
not exposed / hard coded inside bpf and assuming implementation details which could
potentially break later on.

Thanks,
Daniel

  reply	other threads:[~2022-02-11 14:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-10  6:41 [PATCH bpf-next 0/2] flexible size for bpf_prog_pack Song Liu
2022-02-10  6:41 ` [PATCH bpf-next 1/2] vmalloc: expose vmap_allow_huge via get_vmap_allow_huge() Song Liu
2022-02-10  6:41 ` [PATCH bpf-next 2/2] bpf: flexible size for bpf_prog_pack Song Liu
2022-02-10  8:25   ` Daniel Borkmann
2022-02-10 16:51     ` Song Liu
2022-02-11 14:35       ` Daniel Borkmann [this message]
2022-02-11 19:42         ` Song Liu
2022-03-01 23:01           ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dd6dee71-94d7-5393-8fe6-c667938ebfac@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=netdev@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.