All of lore.kernel.org
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Song Liu <song@kernel.org>, bpf <bpf@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	"Peter Zijlstra" <peterz@infradead.org>, X86 ML <x86@kernel.org>
Subject: Re: [PATCH v6 bpf-next 6/7] bpf: introduce bpf_prog_pack allocator
Date: Sat, 22 Jan 2022 00:23:24 +0000	[thread overview]
Message-ID: <7393B983-3295-4B14-9528-B7BD04A82709@fb.com> (raw)
In-Reply-To: <CAADnVQK6+gWTUDo2z1H6AE5_DtuBBetW+VTwwKz03tpVdfuoHA@mail.gmail.com>



> On Jan 21, 2022, at 3:55 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Fri, Jan 21, 2022 at 11:49 AM Song Liu <song@kernel.org> wrote:
>> 
>> +static struct bpf_binary_header *
>> +__bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>> +                      unsigned int alignment,
>> +                      bpf_jit_fill_hole_t bpf_fill_ill_insns,
>> +                      u32 round_up_to)
>> +{
>> +       struct bpf_binary_header *hdr;
>> +       u32 size, hole, start;
>> +
>> +       WARN_ON_ONCE(!is_power_of_2(alignment) ||
>> +                    alignment > BPF_IMAGE_ALIGNMENT);
>> +
>> +       /* Most of BPF filters are really small, but if some of them
>> +        * fill a page, allow at least 128 extra bytes to insert a
>> +        * random section of illegal instructions.
>> +        */
>> +       size = round_up(proglen + sizeof(*hdr) + 128, round_up_to);
>> +
>> +       if (bpf_jit_charge_modmem(size))
>> +               return NULL;
>> +       hdr = bpf_jit_alloc_exec(size);
>> +       if (!hdr) {
>> +               bpf_jit_uncharge_modmem(size);
>> +               return NULL;
>> +       }
>> +
>> +       /* Fill space with illegal/arch-dep instructions. */
>> +       bpf_fill_ill_insns(hdr, size);
>> +
>> +       hdr->size = size;
>> +       hole = min_t(unsigned int, size - (proglen + sizeof(*hdr)),
>> +                    PAGE_SIZE - sizeof(*hdr));
> 
> It probably should be 'round_up_to' instead of PAGE_SIZE ?

Actually, some of these change is not longer needed after the following
change in v6:

  4. Change fall back round_up_to in bpf_jit_binary_alloc_pack() from
     BPF_PROG_MAX_PACK_PROG_SIZE to PAGE_SIZE.

My initial thought (last year) was if we allocate more than 2MB (either 
2.1MB or 3.9MB), we round up to 4MB to save page table entries. 
However, when I revisited this earlier today, I thought we should still
round up to PAGE_SIZE to save memory

Right now, I am not sure which way is better. What do you think? If we
round up to PAGE_SIZE, we don't need split out __bpf_jit_binary_alloc().

> 
>> +       start = (get_random_int() % hole) & ~(alignment - 1);
>> +
>> +       /* Leave a random number of instructions before BPF code. */
>> +       *image_ptr = &hdr->image[start];
>> +
>> +       return hdr;
>> +}
>> +
>> struct bpf_binary_header *
>> bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>>                     unsigned int alignment,
>>                     bpf_jit_fill_hole_t bpf_fill_ill_insns)
>> +{
>> +       return __bpf_jit_binary_alloc(proglen, image_ptr, alignment,
>> +                                     bpf_fill_ill_insns, PAGE_SIZE);
>> +}
>> +
>> +struct bpf_binary_header *
>> +bpf_jit_binary_alloc_pack(unsigned int proglen, u8 **image_ptr,
>> +                         unsigned int alignment,
>> +                         bpf_jit_fill_hole_t bpf_fill_ill_insns)
>> {
>>        struct bpf_binary_header *hdr;
>>        u32 size, hole, start;
>> @@ -875,11 +1034,16 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>>         * fill a page, allow at least 128 extra bytes to insert a
>>         * random section of illegal instructions.
>>         */
>> -       size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
>> +       size = round_up(proglen + sizeof(*hdr) + 128, BPF_PROG_CHUNK_SIZE);
>> +
>> +       /* for too big program, use __bpf_jit_binary_alloc. */
>> +       if (size > BPF_PROG_MAX_PACK_PROG_SIZE)
>> +               return __bpf_jit_binary_alloc(proglen, image_ptr, alignment,
>> +                                             bpf_fill_ill_insns, PAGE_SIZE);
>> 
>>        if (bpf_jit_charge_modmem(size))
>>                return NULL;
>> -       hdr = bpf_jit_alloc_exec(size);
>> +       hdr = bpf_prog_pack_alloc(size);
>>        if (!hdr) {
>>                bpf_jit_uncharge_modmem(size);
>>                return NULL;
>> @@ -888,9 +1052,8 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>>        /* Fill space with illegal/arch-dep instructions. */
>>        bpf_fill_ill_insns(hdr, size);
>> 
>> -       hdr->size = size;
> 
> I'm missing where it's assigned.
> Looks like hdr->size stays zero, so free is never performed?

This is read only memory, so we set it in bpf_fill_ill_insns(). There was a 
comment in x86/bpf_jit_comp.c. I guess we also need a comment here. 

> 
>>        hole = min_t(unsigned int, size - (proglen + sizeof(*hdr)),
>> -                    PAGE_SIZE - sizeof(*hdr));
>> +                    BPF_PROG_CHUNK_SIZE - sizeof(*hdr));
> 
> Before this change size - (proglen + sizeof(*hdr)) would
> be at least 128 and potentially bigger than PAGE_SIZE
> when extra 128 crossed page boundary.
> Hence min() was necessary with the 2nd arg being PAGE_SIZE - sizeof(*hdr).
> 
> With new code size - (proglen + sizeof(*hdr)) would
> be between 128 and 128+64
> while BPF_PROG_CHUNK_SIZE - sizeof(*hdr) is a constant less than 64.
> What is the point of min() ?

Yeah, I guess I didn't finish my math homework here. Will fix it in the
next version. 


  reply	other threads:[~2022-01-22  0:23 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-21 19:49 [PATCH v6 bpf-next 0/7] bpf_prog_pack allocator Song Liu
2022-01-21 19:49 ` [PATCH v6 bpf-next 1/7] x86/Kconfig: select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP Song Liu
2022-01-21 19:49 ` [PATCH v6 bpf-next 2/7] bpf: use bytes instead of pages for bpf_jit_[charge|uncharge]_modmem Song Liu
2022-01-21 19:49 ` [PATCH v6 bpf-next 3/7] bpf: use size instead of pages in bpf_binary_header Song Liu
2022-01-21 19:49 ` [PATCH v6 bpf-next 4/7] bpf: add a pointer of bpf_binary_header to bpf_prog Song Liu
2022-01-21 19:49 ` [PATCH v6 bpf-next 5/7] x86/alternative: introduce text_poke_copy Song Liu
2022-01-21 19:49 ` [PATCH v6 bpf-next 6/7] bpf: introduce bpf_prog_pack allocator Song Liu
2022-01-21 23:55   ` Alexei Starovoitov
2022-01-22  0:23     ` Song Liu [this message]
2022-01-22  0:41       ` Alexei Starovoitov
2022-01-22  1:01         ` Song Liu
2022-01-22  1:12           ` Alexei Starovoitov
2022-01-22  1:30             ` Song Liu
2022-01-22  2:12               ` Alexei Starovoitov
2022-01-23  1:03                 ` Song Liu
2022-01-24 12:29                   ` Ilya Leoshkevich
2022-01-24 18:27                     ` Song Liu
2022-01-25  5:21                       ` Alexei Starovoitov
2022-01-25  7:21                         ` Song Liu
2022-01-25 19:59                           ` Alexei Starovoitov
2022-01-25 22:25                             ` Song Liu
2022-01-25 22:48                               ` Alexei Starovoitov
2022-01-25 23:09                                 ` Song Liu
2022-01-26  0:38                                   ` Alexei Starovoitov
2022-01-26  0:50                                     ` Song Liu
2022-01-26  1:20                                       ` Alexei Starovoitov
2022-01-26  1:28                                         ` Song Liu
2022-01-26  1:31                                           ` Song Liu
2022-01-26  1:34                                             ` Alexei Starovoitov
2022-01-24 12:45                   ` Peter Zijlstra
2022-01-21 19:49 ` [PATCH v6 bpf-next 7/7] bpf, x86_64: use " Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7393B983-3295-4B14-9528-B7BD04A82709@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=song@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.