linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2 bpf 3/3] bpf: use module_alloc_huge for bpf_prog_pack
       [not found] ` <20220411233549.740157-4-song@kernel.org>
@ 2022-04-11 23:52   ` Song Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Song Liu @ 2022-04-11 23:52 UTC (permalink / raw)
  To: bpf, Linux-MM, open list
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Andrew Morton, rick.p.edgecombe, Christoph Hellwig, imbrenda,
	Luis Chamberlain

On Mon, Apr 11, 2022 at 4:41 PM Song Liu <song@kernel.org> wrote:
>
> module_alloc_huge for bpf_prog_pack so that BPF programs sit on PMD_SIZE
> pages. This benefits system performance by reducing iTLB miss rate.
>
> Signed-off-by: Song Liu <song@kernel.org>

I am really sorry for spamming the list twice. The first set
experienced long lag,
so I thought it didn't get through.

I also updated the commit log of 3/3, and this one is the latest version.

Thanks,
Song

> ---
>  kernel/bpf/core.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 13e9dbeeedf3..fd45bdd80a75 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -857,7 +857,7 @@ static size_t select_bpf_prog_pack_size(void)
>         void *ptr;
>
>         size = BPF_HPAGE_SIZE * num_online_nodes();
> -       ptr = module_alloc(size);
> +       ptr = module_alloc_huge(size);
>
>         /* Test whether we can get huge pages. If not just use PAGE_SIZE
>          * packs.
> @@ -881,7 +881,7 @@ static struct bpf_prog_pack *alloc_new_pack(void)
>                        GFP_KERNEL);
>         if (!pack)
>                 return NULL;
> -       pack->ptr = module_alloc(bpf_prog_pack_size);
> +       pack->ptr = module_alloc_huge(bpf_prog_pack_size);
>         if (!pack->ptr) {
>                 kfree(pack);
>                 return NULL;
> @@ -889,7 +889,6 @@ static struct bpf_prog_pack *alloc_new_pack(void)
>         bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE);
>         list_add_tail(&pack->list, &pack_list);
>
> -       set_vm_flush_reset_perms(pack->ptr);
>         set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
>         set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
>         return pack;
> @@ -970,7 +969,9 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
>         if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
>                                        bpf_prog_chunk_count(), 0) == 0) {
>                 list_del(&pack->list);
> -               module_memfree(pack->ptr);
> +               set_memory_nx((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
> +               set_memory_rw((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
> +               vfree(pack->ptr);
>                 kfree(pack);
>         }
>  out:
> --
> 2.30.2
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
       [not found] ` <20220411233549.740157-2-song@kernel.org>
@ 2022-04-12  4:18   ` Christoph Hellwig
  2022-04-12  6:00     ` Song Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2022-04-12  4:18 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, linux-mm, linux-kernel, ast, daniel, andrii, akpm,
	rick.p.edgecombe, hch, imbrenda, mcgrof, Nicholas Piggin

On Mon, Apr 11, 2022 at 04:35:46PM -0700, Song Liu wrote:
> Huge page backed vmalloc memory could benefit performance in many cases.
> Since some users of vmalloc may not be ready to handle huge pages,
> VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
> pages. However, it is not easy to add VM_NO_HUGE_VMAP to all the users
> that may try to allocate >= PMD_SIZE pages, but are not ready to handle
> huge pages properly.

This is a good place to document what the problems are, and how they are
hard to track down (e.g. because the allocations are passed down I/O
stacks)

> 
> Replace VM_NO_HUGE_VMAP with an opt-in flag, VM_ALLOW_HUGE_VMAP, so that
> users that benefit from huge pages could ask specificially.
> 
> Also, replace vmalloc_no_huge() with opt-in helper vmalloc_huge().

We still need to find out what the primary users of the large vmalloc
hashes was and convert them.

> +extern void *vmalloc_huge(unsigned long size) __alloc_size(1);

No need for the extern.

> +EXPORT_SYMBOL(vmalloc_huge);

EXPORT_SYMBOL_GPL for all advanced vmalloc functionality, please.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 bpf 2/3] module: introduce module_alloc_huge
       [not found] ` <20220411233549.740157-3-song@kernel.org>
@ 2022-04-12  4:20   ` Christoph Hellwig
  2022-04-12  6:11     ` Song Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2022-04-12  4:20 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, linux-mm, linux-kernel, ast, daniel, andrii, akpm,
	rick.p.edgecombe, hch, imbrenda, mcgrof

On Mon, Apr 11, 2022 at 04:35:47PM -0700, Song Liu wrote:
> Introduce module_alloc_huge, which allocates huge page backed memory in
> module memory space. The primary user of this memory is bpf_prog_pack
> (multiple BPF programs sharing a huge page).

I kow I lead you downthis road first, but I wonder if we just want to
pass a flag to module_alloc instead.  This avoids duplicating all the
arch overrides.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
       [not found] ` <20220411233549.740157-5-song@kernel.org>
@ 2022-04-12  4:20   ` Christoph Hellwig
  2022-04-12  6:12     ` Song Liu
  2022-04-12 17:20   ` Edgecombe, Rick P
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2022-04-12  4:20 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, linux-mm, linux-kernel, ast, daniel, andrii, akpm,
	rick.p.edgecombe, hch, imbrenda, mcgrof

On Mon, Apr 11, 2022 at 04:35:49PM -0700, Song Liu wrote:
> Use __vmalloc_node_range with VM_ALLOW_HUGE_VMAP for bpf_prog_pack so that

That is only very indirectly true now.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
  2022-04-12  4:18   ` [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Christoph Hellwig
@ 2022-04-12  6:00     ` Song Liu
  2022-04-21  2:24       ` Nicholas Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Song Liu @ 2022-04-12  6:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: bpf, Linux-MM, open list, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Andrew Morton, rick.p.edgecombe, imbrenda,
	Luis Chamberlain, Nicholas Piggin

On Mon, Apr 11, 2022 at 9:18 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Apr 11, 2022 at 04:35:46PM -0700, Song Liu wrote:
> > Huge page backed vmalloc memory could benefit performance in many cases.
> > Since some users of vmalloc may not be ready to handle huge pages,
> > VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
> > pages. However, it is not easy to add VM_NO_HUGE_VMAP to all the users
> > that may try to allocate >= PMD_SIZE pages, but are not ready to handle
> > huge pages properly.
>
> This is a good place to document what the problems are, and how they are
> hard to track down (e.g. because the allocations are passed down I/O
> stacks)

Will add it in v3.

>
> >
> > Replace VM_NO_HUGE_VMAP with an opt-in flag, VM_ALLOW_HUGE_VMAP, so that
> > users that benefit from huge pages could ask specificially.
> >
> > Also, replace vmalloc_no_huge() with opt-in helper vmalloc_huge().
>
> We still need to find out what the primary users of the large vmalloc
> hashes was and convert them.

@ Claudio and Nicholas,

Could you please help identify users of large vmalloc? So far, I found
alloc_large_system_hash(), and something like the following seems to
work:

diff --git i/mm/page_alloc.c w/mm/page_alloc.c
index 6e5b4488a0c5..20d38b8482c4 100644
--- i/mm/page_alloc.c
+++ w/mm/page_alloc.c
@@ -8919,7 +8919,7 @@ void *__init alloc_large_system_hash(const char
*tablename,
                                table = memblock_alloc_raw(size,
                                                           SMP_CACHE_BYTES);
                } else if (get_order(size) >= MAX_ORDER || hashdist) {
-                       table = __vmalloc(size, gfp_flags);
+                       table = __vmalloc_huge(size, gfp_flags);
                        virt = true;
                        if (table)
                                huge = is_vm_area_hugepages(table);
diff --git i/mm/vmalloc.c w/mm/vmalloc.c
index 7cc2be6a7554..cbadbe83e6a6 100644
--- i/mm/vmalloc.c
+++ w/mm/vmalloc.c
@@ -3253,6 +3253,14 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(__vmalloc);

+void *__vmalloc_huge(unsigned long size, gfp_t gfp_mask)
+{
+       return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+                                   gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
+                                   NUMA_NO_NODE, __builtin_return_address(0));
+}
+EXPORT_SYMBOL_GPL(__vmalloc_huge);
+
 /**
  * vmalloc - allocate virtually contiguous memory
  * @size:    allocation size


>
> > +extern void *vmalloc_huge(unsigned long size) __alloc_size(1);
>
> No need for the extern.
>
> > +EXPORT_SYMBOL(vmalloc_huge);
>
> EXPORT_SYMBOL_GPL for all advanced vmalloc functionality, please.

Will fix these in v3.

Thanks,
Song

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 bpf 2/3] module: introduce module_alloc_huge
  2022-04-12  4:20   ` [PATCH v2 bpf 2/3] module: introduce module_alloc_huge Christoph Hellwig
@ 2022-04-12  6:11     ` Song Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Song Liu @ 2022-04-12  6:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: bpf, Linux-MM, open list, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Andrew Morton, rick.p.edgecombe, imbrenda,
	Luis Chamberlain

On Mon, Apr 11, 2022 at 9:20 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Apr 11, 2022 at 04:35:47PM -0700, Song Liu wrote:
> > Introduce module_alloc_huge, which allocates huge page backed memory in
> > module memory space. The primary user of this memory is bpf_prog_pack
> > (multiple BPF programs sharing a huge page).
>
> I kow I lead you downthis road first, but I wonder if we just want to
> pass a flag to module_alloc instead.  This avoids duplicating all the
> arch overrides.

I don't think we will see many archs support bpf_prog_pack, so the __weak
version might be good enough for a long time. Adding an argument to
module_alloc seems like more trouble to me.

Thanks,
Song

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
  2022-04-12  4:20   ` [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack Christoph Hellwig
@ 2022-04-12  6:12     ` Song Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Song Liu @ 2022-04-12  6:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: bpf, Linux-MM, open list, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Andrew Morton, rick.p.edgecombe, imbrenda,
	Luis Chamberlain

On Mon, Apr 11, 2022 at 9:20 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Apr 11, 2022 at 04:35:49PM -0700, Song Liu wrote:
> > Use __vmalloc_node_range with VM_ALLOW_HUGE_VMAP for bpf_prog_pack so that
>
> That is only very indirectly true now.

Yeah, I realized I missed this part after sending it. Will fix.

Thanks,
Song

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
       [not found] ` <20220411233549.740157-5-song@kernel.org>
  2022-04-12  4:20   ` [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack Christoph Hellwig
@ 2022-04-12 17:20   ` Edgecombe, Rick P
  2022-04-12 21:00     ` Song Liu
  1 sibling, 1 reply; 12+ messages in thread
From: Edgecombe, Rick P @ 2022-04-12 17:20 UTC (permalink / raw)
  To: linux-kernel, linux-mm, song, bpf
  Cc: daniel, andrii, hch, imbrenda, akpm, ast, mcgrof

On Mon, 2022-04-11 at 16:35 -0700, Song Liu wrote:
> @@ -889,7 +889,6 @@ static struct bpf_prog_pack *alloc_new_pack(void)
>         bitmap_zero(pack->bitmap, bpf_prog_pack_size /
> BPF_PROG_CHUNK_SIZE);
>         list_add_tail(&pack->list, &pack_list);
>  
> -       set_vm_flush_reset_perms(pack->ptr);
>         set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size /
> PAGE_SIZE);
>         set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size /
> PAGE_SIZE);
>         return pack;

Dropping set_vm_flush_reset_perms() is not mentioned in the commit log.
It is kind of a fix for a different issue.

Now that x86 supports vmalloc huge pages, but VM_FLUSH_RESET_PERMS does
not work with them, we should have some comments or warnings to that
effect somewhere. Someone may try to pass the flags in together.

> @@ -970,7 +969,9 @@ static void bpf_prog_pack_free(struct
> bpf_binary_header *hdr)
>         if (bitmap_find_next_zero_area(pack->bitmap,
> bpf_prog_chunk_count(), 0,
>                                        bpf_prog_chunk_count(), 0) ==
> 0) {
>                 list_del(&pack->list);
> -               module_memfree(pack->ptr);


> +               set_memory_nx((unsigned long)pack->ptr,
> bpf_prog_pack_size / PAGE_SIZE);
> +               set_memory_rw((unsigned long)pack->ptr,
> bpf_prog_pack_size / PAGE_SIZE);
> +               vfree(pack->ptr);
>                 kfree(pack);

Now that it calls module_alloc_huge() instead of vmalloc_node_range(),
should it call module_memfree() instead of vfree()?



Since there are bugs, simple, immediate fixes seem like the right thing
to do, but I had a couple long term focused comments on this new
feature:

It would be nice if bpf and the other module_alloc() callers could
share the same large pages. Meaning, ultimately that this whole thing
should probably live outside of bpf. BPF tracing usages might benefit
for example, and kprobes and ftrace are not too different than bpf
progs from a text allocation perspective.

I agree that the module's part is non-trivial. A while back I had tried
to do something like bpf_prog_pack() that worked for all the
module_alloc() callers. It had some modules changes to allow different
permissions to go to different allocations so they could be made to
share large pages:

https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@intel.com/

I thought the existing kernel special permission allocation methods
were just too brittle and intertwined to improve without a new
interface. The hope was the new interface could wrap all the arch
intricacies instead of leaving them exposed in the cross-arch callers.

I wonder what you think of that general direction or if you have any
follow up plans for this?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
  2022-04-12 17:20   ` Edgecombe, Rick P
@ 2022-04-12 21:00     ` Song Liu
  2022-04-13 15:51       ` Edgecombe, Rick P
  0 siblings, 1 reply; 12+ messages in thread
From: Song Liu @ 2022-04-12 21:00 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: linux-kernel, linux-mm, bpf, daniel, andrii, hch, imbrenda, akpm,
	ast, mcgrof

On Tue, Apr 12, 2022 at 10:21 AM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
>
> On Mon, 2022-04-11 at 16:35 -0700, Song Liu wrote:
> > @@ -889,7 +889,6 @@ static struct bpf_prog_pack *alloc_new_pack(void)
> >         bitmap_zero(pack->bitmap, bpf_prog_pack_size /
> > BPF_PROG_CHUNK_SIZE);
> >         list_add_tail(&pack->list, &pack_list);
> >
> > -       set_vm_flush_reset_perms(pack->ptr);
> >         set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size /
> > PAGE_SIZE);
> >         set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size /
> > PAGE_SIZE);
> >         return pack;
>
> Dropping set_vm_flush_reset_perms() is not mentioned in the commit log.
> It is kind of a fix for a different issue.
>
> Now that x86 supports vmalloc huge pages, but VM_FLUSH_RESET_PERMS does
> not work with them, we should have some comments or warnings to that
> effect somewhere. Someone may try to pass the flags in together.

Good catch! I will add it in the next version.

>
> > @@ -970,7 +969,9 @@ static void bpf_prog_pack_free(struct
> > bpf_binary_header *hdr)
> >         if (bitmap_find_next_zero_area(pack->bitmap,
> > bpf_prog_chunk_count(), 0,
> >                                        bpf_prog_chunk_count(), 0) ==
> > 0) {
> >                 list_del(&pack->list);
> > -               module_memfree(pack->ptr);
>
>
> > +               set_memory_nx((unsigned long)pack->ptr,
> > bpf_prog_pack_size / PAGE_SIZE);
> > +               set_memory_rw((unsigned long)pack->ptr,
> > bpf_prog_pack_size / PAGE_SIZE);
> > +               vfree(pack->ptr);
> >                 kfree(pack);
>
> Now that it calls module_alloc_huge() instead of vmalloc_node_range(),
> should it call module_memfree() instead of vfree()?

Right. Let me sort that out. (Also, whether we introduce module_alloc_huge()
or not).

>
>
>
> Since there are bugs, simple, immediate fixes seem like the right thing
> to do, but I had a couple long term focused comments on this new
> feature:
>
> It would be nice if bpf and the other module_alloc() callers could
> share the same large pages. Meaning, ultimately that this whole thing
> should probably live outside of bpf. BPF tracing usages might benefit
> for example, and kprobes and ftrace are not too different than bpf
> progs from a text allocation perspective.

Agreed.

>
> I agree that the module's part is non-trivial. A while back I had tried
> to do something like bpf_prog_pack() that worked for all the
> module_alloc() callers. It had some modules changes to allow different
> permissions to go to different allocations so they could be made to
> share large pages:
>
> https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@intel.com/
>
> I thought the existing kernel special permission allocation methods
> were just too brittle and intertwined to improve without a new
> interface. The hope was the new interface could wrap all the arch
> intricacies instead of leaving them exposed in the cross-arch callers.
>
> I wonder what you think of that general direction or if you have any
> follow up plans for this?

Since I am still learning the vmalloc/module_alloc code, I think I am
not really capable of commenting on the direction. From our use
cases, we do see performance hit due to large number of BPF
program fragmenting the page table. Kernel module, OTOH, is not
too big an issue for us, as we usually build hot modules into the
kernel. That being said, we are interested in making the huge page
interface general for BPF program and kernel module. We can
commit resources to this effort.

Thanks,
Song

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
  2022-04-12 21:00     ` Song Liu
@ 2022-04-13 15:51       ` Edgecombe, Rick P
  0 siblings, 0 replies; 12+ messages in thread
From: Edgecombe, Rick P @ 2022-04-13 15:51 UTC (permalink / raw)
  To: song
  Cc: linux-kernel, daniel, bpf, hch, ast, rppt, linux-mm, andrii,
	akpm, mcgrof, imbrenda

CC Mike, who has been working on a direct map fragmentation solution.
[0]

On Tue, 2022-04-12 at 14:00 -0700, Song Liu wrote:
> Since I am still learning the vmalloc/module_alloc code, I think I am
> not really capable of commenting on the direction. From our use
> cases, we do see performance hit due to large number of BPF
> program fragmenting the page table. Kernel module, OTOH, is not
> too big an issue for us, as we usually build hot modules into the
> kernel. That being said, we are interested in making the huge page
> interface general for BPF program and kernel module. We can
> commit resources to this effort.

That sounds great. Please feel free to loop me in if you do.


[0] 
https://lore.kernel.org/lkml/20220127085608.306306-1-rppt@kernel.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
  2022-04-12  6:00     ` Song Liu
@ 2022-04-21  2:24       ` Nicholas Piggin
  2022-04-21  3:35         ` Nicholas Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2022-04-21  2:24 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu
  Cc: Andrew Morton, Andrii Nakryiko, Alexei Starovoitov, bpf,
	Daniel Borkmann, imbrenda, open list, Linux-MM, Luis Chamberlain,
	rick.p.edgecombe

Excerpts from Song Liu's message of April 12, 2022 4:00 pm:
> On Mon, Apr 11, 2022 at 9:18 PM Christoph Hellwig <hch@infradead.org> wrote:
>>
>> On Mon, Apr 11, 2022 at 04:35:46PM -0700, Song Liu wrote:
>> > Huge page backed vmalloc memory could benefit performance in many cases.
>> > Since some users of vmalloc may not be ready to handle huge pages,
>> > VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
>> > pages. However, it is not easy to add VM_NO_HUGE_VMAP to all the users
>> > that may try to allocate >= PMD_SIZE pages, but are not ready to handle
>> > huge pages properly.
>>
>> This is a good place to document what the problems are, and how they are
>> hard to track down (e.g. because the allocations are passed down I/O
>> stacks)
> 
> Will add it in v3.
> 
>>
>> >
>> > Replace VM_NO_HUGE_VMAP with an opt-in flag, VM_ALLOW_HUGE_VMAP, so that
>> > users that benefit from huge pages could ask specificially.
>> >
>> > Also, replace vmalloc_no_huge() with opt-in helper vmalloc_huge().
>>
>> We still need to find out what the primary users of the large vmalloc
>> hashes was and convert them.
> 
> @ Claudio and Nicholas,
> 
> Could you please help identify users of large vmalloc? So far, I found
> alloc_large_system_hash(), and something like the following seems to
> work:

The large system hashes were the main ones I was interested in. IIRC 
there was a few more in some drivers or tracing things depending on
config but those are less important (to me at least).

Curious what the problem is though. powerpc so far has not required
any special case outside arch/powerpc/ for this so I would much
prefer x86 to fix itself rather than add APIs which non-arch code
really shouldn't need to know about.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
  2022-04-21  2:24       ` Nicholas Piggin
@ 2022-04-21  3:35         ` Nicholas Piggin
  0 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2022-04-21  3:35 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu
  Cc: Andrew Morton, Andrii Nakryiko, Alexei Starovoitov, bpf,
	Daniel Borkmann, imbrenda, open list, Linux-MM, Luis Chamberlain,
	rick.p.edgecombe

Excerpts from Nicholas Piggin's message of April 21, 2022 12:24 pm:
> Excerpts from Song Liu's message of April 12, 2022 4:00 pm:
>> On Mon, Apr 11, 2022 at 9:18 PM Christoph Hellwig <hch@infradead.org> wrote:
>>>
>>> On Mon, Apr 11, 2022 at 04:35:46PM -0700, Song Liu wrote:
>>> > Huge page backed vmalloc memory could benefit performance in many cases.
>>> > Since some users of vmalloc may not be ready to handle huge pages,
>>> > VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
>>> > pages. However, it is not easy to add VM_NO_HUGE_VMAP to all the users
>>> > that may try to allocate >= PMD_SIZE pages, but are not ready to handle
>>> > huge pages properly.
>>>
>>> This is a good place to document what the problems are, and how they are
>>> hard to track down (e.g. because the allocations are passed down I/O
>>> stacks)
>> 
>> Will add it in v3.
>> 
>>>
>>> >
>>> > Replace VM_NO_HUGE_VMAP with an opt-in flag, VM_ALLOW_HUGE_VMAP, so that
>>> > users that benefit from huge pages could ask specificially.
>>> >
>>> > Also, replace vmalloc_no_huge() with opt-in helper vmalloc_huge().
>>>
>>> We still need to find out what the primary users of the large vmalloc
>>> hashes was and convert them.
>> 
>> @ Claudio and Nicholas,
>> 
>> Could you please help identify users of large vmalloc? So far, I found
>> alloc_large_system_hash(), and something like the following seems to
>> work:
> 
> The large system hashes were the main ones I was interested in. IIRC 
> there was a few more in some drivers or tracing things depending on
> config but those are less important (to me at least).

Oh there is also a reverse map array in KVM now I think of it.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-04-21  3:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220411233549.740157-1-song@kernel.org>
     [not found] ` <20220411233549.740157-4-song@kernel.org>
2022-04-11 23:52   ` [PATCH v2 bpf 3/3] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
     [not found] ` <20220411233549.740157-2-song@kernel.org>
2022-04-12  4:18   ` [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Christoph Hellwig
2022-04-12  6:00     ` Song Liu
2022-04-21  2:24       ` Nicholas Piggin
2022-04-21  3:35         ` Nicholas Piggin
     [not found] ` <20220411233549.740157-3-song@kernel.org>
2022-04-12  4:20   ` [PATCH v2 bpf 2/3] module: introduce module_alloc_huge Christoph Hellwig
2022-04-12  6:11     ` Song Liu
     [not found] ` <20220411233549.740157-5-song@kernel.org>
2022-04-12  4:20   ` [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack Christoph Hellwig
2022-04-12  6:12     ` Song Liu
2022-04-12 17:20   ` Edgecombe, Rick P
2022-04-12 21:00     ` Song Liu
2022-04-13 15:51       ` Edgecombe, Rick P

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).