linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Song Liu <song@kernel.org>
To: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"hch@infradead.org" <hch@infradead.org>,
	"imbrenda@linux.ibm.com" <imbrenda@linux.ibm.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"ast@kernel.org" <ast@kernel.org>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>
Subject: Re: [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
Date: Tue, 12 Apr 2022 14:00:35 -0700	[thread overview]
Message-ID: <CAPhsuW4pde5DTMNzRUHwPoxs6vh9AQPn+ny5-9cFVO4nVnYsGw@mail.gmail.com> (raw)
In-Reply-To: <0e8047d07def02db8ef33836ee37de616660045c.camel@intel.com>

On Tue, Apr 12, 2022 at 10:21 AM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
>
> On Mon, 2022-04-11 at 16:35 -0700, Song Liu wrote:
> > @@ -889,7 +889,6 @@ static struct bpf_prog_pack *alloc_new_pack(void)
> >         bitmap_zero(pack->bitmap, bpf_prog_pack_size /
> > BPF_PROG_CHUNK_SIZE);
> >         list_add_tail(&pack->list, &pack_list);
> >
> > -       set_vm_flush_reset_perms(pack->ptr);
> >         set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size /
> > PAGE_SIZE);
> >         set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size /
> > PAGE_SIZE);
> >         return pack;
>
> Dropping set_vm_flush_reset_perms() is not mentioned in the commit log.
> It is kind of a fix for a different issue.
>
> Now that x86 supports vmalloc huge pages, but VM_FLUSH_RESET_PERMS does
> not work with them, we should have some comments or warnings to that
> effect somewhere. Someone may try to pass the flags in together.

Good catch! I will add it in the next version.

>
> > @@ -970,7 +969,9 @@ static void bpf_prog_pack_free(struct
> > bpf_binary_header *hdr)
> >         if (bitmap_find_next_zero_area(pack->bitmap,
> > bpf_prog_chunk_count(), 0,
> >                                        bpf_prog_chunk_count(), 0) ==
> > 0) {
> >                 list_del(&pack->list);
> > -               module_memfree(pack->ptr);
>
>
> > +               set_memory_nx((unsigned long)pack->ptr,
> > bpf_prog_pack_size / PAGE_SIZE);
> > +               set_memory_rw((unsigned long)pack->ptr,
> > bpf_prog_pack_size / PAGE_SIZE);
> > +               vfree(pack->ptr);
> >                 kfree(pack);
>
> Now that it calls module_alloc_huge() instead of vmalloc_node_range(),
> should it call module_memfree() instead of vfree()?

Right. Let me sort that out. (Also, whether we introduce module_alloc_huge()
or not).

>
>
>
> Since there are bugs, simple, immediate fixes seem like the right thing
> to do, but I had a couple long term focused comments on this new
> feature:
>
> It would be nice if bpf and the other module_alloc() callers could
> share the same large pages. Meaning, ultimately that this whole thing
> should probably live outside of bpf. BPF tracing usages might benefit
> for example, and kprobes and ftrace are not too different than bpf
> progs from a text allocation perspective.

Agreed.

>
> I agree that the module's part is non-trivial. A while back I had tried
> to do something like bpf_prog_pack() that worked for all the
> module_alloc() callers. It had some modules changes to allow different
> permissions to go to different allocations so they could be made to
> share large pages:
>
> https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@intel.com/
>
> I thought the existing kernel special permission allocation methods
> were just too brittle and intertwined to improve without a new
> interface. The hope was the new interface could wrap all the arch
> intricacies instead of leaving them exposed in the cross-arch callers.
>
> I wonder what you think of that general direction or if you have any
> follow up plans for this?

Since I am still learning the vmalloc/module_alloc code, I think I am
not really capable of commenting on the direction. From our use
cases, we do see performance hit due to large number of BPF
program fragmenting the page table. Kernel module, OTOH, is not
too big an issue for us, as we usually build hot modules into the
kernel. That being said, we are interested in making the huge page
interface general for BPF program and kernel module. We can
commit resources to this effort.

Thanks,
Song

  reply	other threads:[~2022-04-12 23:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220411233549.740157-1-song@kernel.org>
     [not found] ` <20220411233549.740157-4-song@kernel.org>
2022-04-11 23:52   ` [PATCH v2 bpf 3/3] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
     [not found] ` <20220411233549.740157-2-song@kernel.org>
2022-04-12  4:18   ` [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Christoph Hellwig
2022-04-12  6:00     ` Song Liu
2022-04-21  2:24       ` Nicholas Piggin
2022-04-21  3:35         ` Nicholas Piggin
     [not found] ` <20220411233549.740157-3-song@kernel.org>
2022-04-12  4:20   ` [PATCH v2 bpf 2/3] module: introduce module_alloc_huge Christoph Hellwig
2022-04-12  6:11     ` Song Liu
     [not found] ` <20220411233549.740157-5-song@kernel.org>
2022-04-12  4:20   ` [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack Christoph Hellwig
2022-04-12  6:12     ` Song Liu
2022-04-12 17:20   ` Edgecombe, Rick P
2022-04-12 21:00     ` Song Liu [this message]
2022-04-13 15:51       ` Edgecombe, Rick P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPhsuW4pde5DTMNzRUHwPoxs6vh9AQPn+ny5-9cFVO4nVnYsGw@mail.gmail.com \
    --to=song@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hch@infradead.org \
    --cc=imbrenda@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=rick.p.edgecombe@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).