linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
To: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"song@kernel.org" <song@kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Cc: "daniel@iogearbox.net" <daniel@iogearbox.net>,
	"Torvalds, Linus" <torvalds@linux-foundation.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"kernel-team@fb.com" <kernel-team@fb.com>,
	"ast@kernel.org" <ast@kernel.org>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>
Subject: Re: [PATCH bpf-next 5/5] bpf: use module_alloc_huge for bpf_prog_pack
Date: Tue, 17 May 2022 19:15:01 +0000	[thread overview]
Message-ID: <83a69976cb93e69c5ad7a9511b5e57c402eee19d.camel@intel.com> (raw)
In-Reply-To: <20220516054051.114490-6-song@kernel.org>

On Sun, 2022-05-15 at 22:40 -0700, Song Liu wrote:
> Use module_alloc_huge for bpf_prog_pack so that BPF programs sit on
> PMD_SIZE pages. This benefits system performance by reducing iTLB
> miss
> rate. Benchmark of a real web service workload shows this change
> gives
> another ~0.2% performance boost on top of PAGE_SIZE bpf_prog_pack
> (which improve system throughput by ~0.5%).

0.7% sounds good as a whole. How sure are you of that +0.2%? Was this a
big averaged test?

> 
> Also, remove set_vm_flush_reset_perms() from alloc_new_pack() and use
> set_memory_[nx|rw] in bpf_prog_pack_free(). This is because
> VM_FLUSH_RESET_PERMS does not work with huge pages yet. [1]
> 
> [1] 
> https://lore.kernel.org/bpf/aeeeaf0b7ec63fdba55d4834d2f524d8bf05b71b.camel@intel.com/
> Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

As I said before, I think this will work functionally. But I meant it
as a quick fix when we were talking about patching this up to keep it
enabled upstream.

So now, should we make VM_FLUSH_RESET_PERMS work properly with huge
pages? The main benefit would be to keep the tear down of these types
of allocations consistent for correctness reasons. The TLB flush
minimizing differences are probably less impactful given the caching
introduced here. At the very least though, we should have (or have
already had) some WARN if people try to use it with huge pages.

> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  kernel/bpf/core.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index cacd8684c3c4..b64d91fcb0ba 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -857,7 +857,7 @@ static size_t select_bpf_prog_pack_size(void)
>  	void *ptr;
>  
>  	size = BPF_HPAGE_SIZE * num_online_nodes();
> -	ptr = module_alloc(size);
> +	ptr = module_alloc_huge(size);

This select_bpf_prog_pack_size() function always seemed weird - doing a
big allocation and then immediately freeing. Can't it check a config
for vmalloc huge page support?

>  
>  	/* Test whether we can get huge pages. If not just use
> PAGE_SIZE
>  	 * packs.
> @@ -881,7 +881,7 @@ static struct bpf_prog_pack
> *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins
>  		       GFP_KERNEL);
>  	if (!pack)
>  		return NULL;
> -	pack->ptr = module_alloc(bpf_prog_pack_size);
> +	pack->ptr = module_alloc_huge(bpf_prog_pack_size);
>  	if (!pack->ptr) {
>  		kfree(pack);
>  		return NULL;
> @@ -890,7 +890,6 @@ static struct bpf_prog_pack
> *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins
>  	bitmap_zero(pack->bitmap, bpf_prog_pack_size /
> BPF_PROG_CHUNK_SIZE);
>  	list_add_tail(&pack->list, &pack_list);
>  
> -	set_vm_flush_reset_perms(pack->ptr);
>  	set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size /
> PAGE_SIZE);
>  	set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size /
> PAGE_SIZE);
>  	return pack;
> @@ -909,10 +908,9 @@ static void *bpf_prog_pack_alloc(u32 size,
> bpf_jit_fill_hole_t bpf_fill_ill_insn
>  
>  	if (size > bpf_prog_pack_size) {
>  		size = round_up(size, PAGE_SIZE);
> -		ptr = module_alloc(size);
> +		ptr = module_alloc_huge(size);
>  		if (ptr) {
>  			bpf_fill_ill_insns(ptr, size);
> -			set_vm_flush_reset_perms(ptr);
>  			set_memory_ro((unsigned long)ptr, size /
> PAGE_SIZE);
>  			set_memory_x((unsigned long)ptr, size /
> PAGE_SIZE);
>  		}
> @@ -949,6 +947,8 @@ static void bpf_prog_pack_free(struct
> bpf_binary_header *hdr)
>  
>  	mutex_lock(&pack_mutex);
>  	if (hdr->size > bpf_prog_pack_size) {
> +		set_memory_nx((unsigned long)hdr, hdr->size /
> PAGE_SIZE);
> +		set_memory_rw((unsigned long)hdr, hdr->size /
> PAGE_SIZE);
>  		module_memfree(hdr);
>  		goto out;
>  	}
> @@ -975,6 +975,8 @@ static void bpf_prog_pack_free(struct
> bpf_binary_header *hdr)
>  	if (bitmap_find_next_zero_area(pack->bitmap,
> bpf_prog_chunk_count(), 0,
>  				       bpf_prog_chunk_count(), 0) == 0)
> {
>  		list_del(&pack->list);
> +		set_memory_nx((unsigned long)pack->ptr,
> bpf_prog_pack_size / PAGE_SIZE);
> +		set_memory_rw((unsigned long)pack->ptr,
> bpf_prog_pack_size / PAGE_SIZE);
>  		module_memfree(pack->ptr);
>  		kfree(pack);
>  	}

  reply	other threads:[~2022-05-17 19:15 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-16  5:40 [PATCH bpf-next 0/5] bpf_prog_pack followup Song Liu
2022-05-16  5:40 ` [PATCH bpf-next 1/5] bpf: fill new bpf_prog_pack with illegal instructions Song Liu
2022-05-16  5:40 ` [PATCH bpf-next 2/5] x86/alternative: introduce text_poke_set Song Liu
2022-05-17 19:16   ` Edgecombe, Rick P
2022-05-17 21:09     ` Song Liu
2022-05-18  6:58   ` Song Liu
2022-05-18  7:45     ` Peter Zijlstra
2022-05-18 15:32       ` Song Liu
2022-05-18 17:09   ` Peter Zijlstra
2022-05-18 18:34     ` Song Liu
2022-05-19  7:38       ` Peter Zijlstra
2022-05-16  5:40 ` [PATCH bpf-next 3/5] bpf: introduce bpf_arch_text_invalidate for bpf_prog_pack Song Liu
2022-05-16  5:40 ` [PATCH bpf-next 5/5] bpf: use module_alloc_huge " Song Liu
2022-05-17 19:15   ` Edgecombe, Rick P [this message]
2022-05-17 21:08     ` Song Liu
2022-05-17 23:58       ` Edgecombe, Rick P
2022-05-18  6:25         ` Song Liu
2022-05-18  6:34         ` Song Liu
2022-05-18 16:49           ` Edgecombe, Rick P
2022-05-18 18:31             ` Song Liu
2022-05-19  6:42         ` Song Liu
2022-05-19 16:56           ` Edgecombe, Rick P
2022-05-19 18:25             ` Song Liu
     [not found] ` <20220516054051.114490-5-song@kernel.org>
2022-05-18  6:28   ` [PATCH bpf-next 4/5] module: introduce module_alloc_huge Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83a69976cb93e69c5ad7a9511b5e57c402eee19d.camel@intel.com \
    --to=rick.p.edgecombe@intel.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=peterz@infradead.org \
    --cc=song@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).