All of lore.kernel.org
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Song Liu <song@kernel.org>, bpf <bpf@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	"Peter Zijlstra" <peterz@infradead.org>, X86 ML <x86@kernel.org>
Subject: Re: [PATCH v6 bpf-next 6/7] bpf: introduce bpf_prog_pack allocator
Date: Mon, 24 Jan 2022 18:27:11 +0000	[thread overview]
Message-ID: <2AAC8B8C-96F1-400F-AFA6-D4AF41EC82F4@fb.com> (raw)
In-Reply-To: <adec88f9-b3e6-bfe4-c09e-54825a60f45d@linux.ibm.com>



> On Jan 24, 2022, at 4:29 AM, Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> 
> 
> 
> On 1/23/22 02:03, Song Liu wrote:
>>> On Jan 21, 2022, at 6:12 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>> 
>>> On Fri, Jan 21, 2022 at 5:30 PM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Jan 21, 2022, at 5:12 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>>>> 
>>>>> On Fri, Jan 21, 2022 at 5:01 PM Song Liu <songliubraving@fb.com> wrote:
>>>>>> 
>>>>>> In this way, we need to allocate rw_image here, and free it in
>>>>>> bpf_jit_comp.c. This feels a little weird to me, but I guess that
>>>>>> is still the cleanest solution for now.
>>>>> 
>>>>> You mean inside bpf_jit_binary_alloc?
>>>>> That won't be arch independent.
>>>>> It needs to be split into generic piece that stays in core.c
>>>>> and callbacks like bpf_jit_fill_hole_t
>>>>> or into multiple helpers with prep in-between.
>>>>> Don't worry if all archs need to be touched.
>>>> 
>>>> How about we introduce callback bpf_jit_set_header_size_t? Then we
>>>> can split x86's jit_fill_hole() into two functions, one to fill the
>>>> hole, the other to set size. The rest of the logic gonna stay the same.
>>>> 
>>>> Archs that do not use bpf_prog_pack won't need bpf_jit_set_header_size_t.
>>> 
>>> That's not any better.
>>> 
>>> Currently the choice of bpf_jit_binary_alloc_pack vs bpf_jit_binary_alloc
>>> leaks into arch bits and bpf_prog_pack_max_size() doesn't
>>> really make it generic.
>>> 
>>> Ideally all archs continue to use bpf_jit_binary_alloc()
>>> and magic happens in a generic code.
>>> If not then please remove bpf_prog_pack_max_size(),
>>> since it doesn't provide much value and pick
>>> bpf_jit_binary_alloc_pack() signature to fit x86 jit better.
>>> It wouldn't need bpf_jit_fill_hole_t callback at all.
>>> Please think it through so we don't need to redesign it
>>> when another arch will decide to use huge pages for bpf progs.
>>> 
>>> cc-ing Ilya for ideas on how that would fit s390.
>> I guess we have a few different questions here:
>> 1. Can we use bpf_jit_binary_alloc() for both regular page and shared
>> huge page?
>> I think the answer is no, as bpf_jit_binary_alloc() allocates a rw
>> buffer, and arch calls bpf_jit_binary_lock_ro after JITing. The new
>> allocator will return a slice of a shared huge page, which is locked
>> RO before JITing.
>> 2. The problem with bpf_prog_pack_max_size() limitation.
>> I think this is the worst part of current version of bpf_prog_pack,
>> but it shouldn't be too hard to fix. I will remove this limitation
>> in the next version.
>> 3. How to set proper header->size?
>> I guess we can introduce something similar to bpf_arch_text_poke()
>> for this?
>> My proposal for the next version is:
>> 1. No changes to archs that do not use huge page, just keep using
>>    bpf_jit_binary_alloc.
>> 2. For x86_64 (and other arch that would support bpf program on huge
>>    pages):
>>    2.1 arch/bpf_jit_comp calls bpf_jit_binary_alloc_pack() to allocate
>>        an RO bpf_binary_header;
>>    2.2 arch allocates a temporary buffer for JIT. Once JIT is done,
>>        use text_poke_copy to copy the code to the RO bpf_binary_header.
> 
> Are arches expected to allocate rw buffers in different ways? If not,
> I would consider putting this into the common code as well. Then
> arch-specific code would do something like
> 
>  header = bpf_jit_binary_alloc_pack(size, &prg_buf, &prg_addr, ...);
>  ...
>  /*
>   * Generate code into prg_buf, the code should assume that its first
>   * byte is located at prg_addr.
>   */
>  ...
>  bpf_jit_binary_finalize_pack(header, prg_buf);
> 
> where bpf_jit_binary_finalize_pack() would copy prg_buf to header and
> free it.

I think this should work. 

We will need an API like: bpf_arch_text_copy, which uses text_poke_copy() 
for x86_64 and s390_kernel_write() for x390. We will use bpf_arch_text_copy 
to 
  1) write header->size;
  2) do finally copy in bpf_jit_binary_finalize_pack().

The syntax of bpf_arch_text_copy is quite different to existing 
bpf_arch_text_poke, so I guess a new API is better. 

> 
> If this won't work, I also don't see any big problems in the scheme
> that you propose (especially if bpf_prog_pack_max_size() limitation is
> gone).
> 
> [...]
> 
> Btw, are there any existing benchmarks that I can use to check whether
> this is worth enabling on s390?

Unfortunately, we don't have a benchmark to share. Most of our benchmarks
are shadow tests that cannot run out of production environment. We have 
issues with iTLB misses for most of our big services. A typical system 
may see hundreds of iTLB misses per million instruction. Some sched_cls
programs are often the top triggers of these iTLB misses. 

Thanks,
Song

  reply	other threads:[~2022-01-24 18:27 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-21 19:49 [PATCH v6 bpf-next 0/7] bpf_prog_pack allocator Song Liu
2022-01-21 19:49 ` [PATCH v6 bpf-next 1/7] x86/Kconfig: select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP Song Liu
2022-01-21 19:49 ` [PATCH v6 bpf-next 2/7] bpf: use bytes instead of pages for bpf_jit_[charge|uncharge]_modmem Song Liu
2022-01-21 19:49 ` [PATCH v6 bpf-next 3/7] bpf: use size instead of pages in bpf_binary_header Song Liu
2022-01-21 19:49 ` [PATCH v6 bpf-next 4/7] bpf: add a pointer of bpf_binary_header to bpf_prog Song Liu
2022-01-21 19:49 ` [PATCH v6 bpf-next 5/7] x86/alternative: introduce text_poke_copy Song Liu
2022-01-21 19:49 ` [PATCH v6 bpf-next 6/7] bpf: introduce bpf_prog_pack allocator Song Liu
2022-01-21 23:55   ` Alexei Starovoitov
2022-01-22  0:23     ` Song Liu
2022-01-22  0:41       ` Alexei Starovoitov
2022-01-22  1:01         ` Song Liu
2022-01-22  1:12           ` Alexei Starovoitov
2022-01-22  1:30             ` Song Liu
2022-01-22  2:12               ` Alexei Starovoitov
2022-01-23  1:03                 ` Song Liu
2022-01-24 12:29                   ` Ilya Leoshkevich
2022-01-24 18:27                     ` Song Liu [this message]
2022-01-25  5:21                       ` Alexei Starovoitov
2022-01-25  7:21                         ` Song Liu
2022-01-25 19:59                           ` Alexei Starovoitov
2022-01-25 22:25                             ` Song Liu
2022-01-25 22:48                               ` Alexei Starovoitov
2022-01-25 23:09                                 ` Song Liu
2022-01-26  0:38                                   ` Alexei Starovoitov
2022-01-26  0:50                                     ` Song Liu
2022-01-26  1:20                                       ` Alexei Starovoitov
2022-01-26  1:28                                         ` Song Liu
2022-01-26  1:31                                           ` Song Liu
2022-01-26  1:34                                             ` Alexei Starovoitov
2022-01-24 12:45                   ` Peter Zijlstra
2022-01-21 19:49 ` [PATCH v6 bpf-next 7/7] bpf, x86_64: use " Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2AAC8B8C-96F1-400F-AFA6-D4AF41EC82F4@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=iii@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=song@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.