All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf 0/2] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP
@ 2022-04-08 22:34 Song Liu
  2022-04-08 22:34 ` [PATCH bpf 1/2] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Song Liu @ 2022-04-08 22:34 UTC (permalink / raw)
  To: bpf, netdev, linux-mm
  Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
	imbrenda, Song Liu

Enabling HAVE_ARCH_HUGE_VMALLOC on x86_64 and use it for bpf_prog_pack has
caused some issues [1], as many users of vmalloc are not yet ready to
handle huge pages. To enable a more smooth transition to use huge page
backed vmalloc memory, this set replaces VM_NO_HUGE_VMAP flag with an new
opt-in flag, VM_ALLOW_HUGE_VMAP. More discussions about this topic can be
found at [2].

Patch 1 removes VM_NO_HUGE_VMAP and adds VM_ALLOW_HUGE_VMAP.
Patch 2 uses VM_ALLOW_HUGE_VMAP in bpf_prog_pack.

[1] https://lore.kernel.org/lkml/20220204185742.271030-1-song@kernel.org/
[2] https://lore.kernel.org/linux-mm/20220330225642.1163897-1-song@kernel.org/

Song Liu (2):
  vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
  bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack

 arch/Kconfig                 |  6 ++----
 arch/powerpc/kernel/module.c |  2 +-
 arch/s390/kvm/pv.c           |  2 +-
 include/linux/vmalloc.h      |  3 +--
 kernel/bpf/core.c            | 24 ++++++++++++++++++++----
 mm/vmalloc.c                 | 19 +------------------
 6 files changed, 26 insertions(+), 30 deletions(-)

--
2.30.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf 1/2] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
  2022-04-08 22:34 [PATCH bpf 0/2] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
@ 2022-04-08 22:34 ` Song Liu
  2022-04-09  5:28   ` Christoph Hellwig
  2022-04-08 22:34 ` [PATCH bpf 2/2] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack Song Liu
  2022-04-09 11:43 ` [PATCH bpf 0/2] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Thorsten Leemhuis
  2 siblings, 1 reply; 11+ messages in thread
From: Song Liu @ 2022-04-08 22:34 UTC (permalink / raw)
  To: bpf, netdev, linux-mm
  Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
	imbrenda, Song Liu

Huge page backed vmalloc memory could benefit performance in many cases.
Since some users of vmalloc may not be ready to handle huge pages,
VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
pages. However, it is not easy to add VM_NO_HUGE_VMAP to all the users
that may try to allocate >= PMD_SIZE pages, but are not ready to handle
huge pages properly.

Replace VM_NO_HUGE_VMAP with an opt-in flag, VM_ALLOW_HUGE_VMAP, so that
users that benefit from huge pages could ask specificially.

Signed-off-by: Song Liu <song@kernel.org>
---
 arch/Kconfig                 |  6 ++----
 arch/powerpc/kernel/module.c |  2 +-
 arch/s390/kvm/pv.c           |  2 +-
 include/linux/vmalloc.h      |  3 +--
 mm/vmalloc.c                 | 19 +------------------
 5 files changed, 6 insertions(+), 26 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 29b0167c088b..31c4fdc4a4ba 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -854,10 +854,8 @@ config HAVE_ARCH_HUGE_VMAP
 
 #
 #  Archs that select this would be capable of PMD-sized vmaps (i.e.,
-#  arch_vmap_pmd_supported() returns true), and they must make no assumptions
-#  that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag
-#  can be used to prohibit arch-specific allocations from using hugepages to
-#  help with this (e.g., modules may require it).
+#  arch_vmap_pmd_supported() returns true). The VM_ALLOW_HUGE_VMAP flag
+#  must be used to enable allocations to use hugepages.
 #
 config HAVE_ARCH_HUGE_VMALLOC
 	depends on HAVE_ARCH_HUGE_VMAP
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index 40a583e9d3c7..97a76a8619fb 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -101,7 +101,7 @@ __module_alloc(unsigned long size, unsigned long start, unsigned long end, bool
 	 * too.
 	 */
 	return __vmalloc_node_range(size, 1, start, end, gfp, prot,
-				    VM_FLUSH_RESET_PERMS | VM_NO_HUGE_VMAP,
+				    VM_FLUSH_RESET_PERMS,
 				    NUMA_NO_NODE, __builtin_return_address(0));
 }
 
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 7f7c0d6af2ce..8afede243903 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -142,7 +142,7 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
 	 * using large pages for the virtual memory area.
 	 * This is a hardware limitation.
 	 */
-	kvm->arch.pv.stor_var = vmalloc_no_huge(vlen);
+	kvm->arch.pv.stor_var = vmalloc(vlen);
 	if (!kvm->arch.pv.stor_var)
 		goto out_err;
 	return 0;
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 3b1df7da402d..5fe76e6d1cc2 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -26,7 +26,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
 #define VM_FLUSH_RESET_PERMS	0x00000100	/* reset direct map and flush TLB on unmap, can't be freed in atomic context */
 #define VM_MAP_PUT_PAGES	0x00000200	/* put pages and free array in vfree */
-#define VM_NO_HUGE_VMAP		0x00000400	/* force PAGE_SIZE pte mapping */
+#define VM_ALLOW_HUGE_VMAP	0x00000400      /* Allow for huge pages on archs with HAVE_ARCH_HUGE_VMALLOC */
 
 #if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
 	!defined(CONFIG_KASAN_VMALLOC)
@@ -153,7 +153,6 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			const void *caller) __alloc_size(1);
 void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
 		int node, const void *caller) __alloc_size(1);
-void *vmalloc_no_huge(unsigned long size) __alloc_size(1);
 
 extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
 extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e163372d3967..7e9b140f652d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3106,7 +3106,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 		return NULL;
 	}
 
-	if (vmap_allow_huge && !(vm_flags & VM_NO_HUGE_VMAP)) {
+	if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
 		unsigned long size_per_node;
 
 		/*
@@ -3272,23 +3272,6 @@ void *vmalloc(unsigned long size)
 }
 EXPORT_SYMBOL(vmalloc);
 
-/**
- * vmalloc_no_huge - allocate virtually contiguous memory using small pages
- * @size:    allocation size
- *
- * Allocate enough non-huge pages to cover @size from the page level
- * allocator and map them into contiguous kernel virtual space.
- *
- * Return: pointer to the allocated memory or %NULL on error
- */
-void *vmalloc_no_huge(unsigned long size)
-{
-	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
-				    GFP_KERNEL, PAGE_KERNEL, VM_NO_HUGE_VMAP,
-				    NUMA_NO_NODE, __builtin_return_address(0));
-}
-EXPORT_SYMBOL(vmalloc_no_huge);
-
 /**
  * vzalloc - allocate virtually contiguous memory with zero fill
  * @size:    allocation size
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf 2/2] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
  2022-04-08 22:34 [PATCH bpf 0/2] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
  2022-04-08 22:34 ` [PATCH bpf 1/2] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
@ 2022-04-08 22:34 ` Song Liu
  2022-04-09  5:29   ` Christoph Hellwig
  2022-04-09 11:43 ` [PATCH bpf 0/2] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Thorsten Leemhuis
  2 siblings, 1 reply; 11+ messages in thread
From: Song Liu @ 2022-04-08 22:34 UTC (permalink / raw)
  To: bpf, netdev, linux-mm
  Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
	imbrenda, Song Liu

Use __vmalloc_node_range with VM_ALLOW_HUGE_VMAP for bpf_prog_pack so that
BPF programs sit on PMD_SIZE pages. This benefits system performance by
reducing iTLB miss rate.

Signed-off-by: Song Liu <song@kernel.org>
---
 kernel/bpf/core.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 13e9dbeeedf3..04214f4e64f1 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -851,13 +851,28 @@ static LIST_HEAD(pack_list);
 #define BPF_HPAGE_MASK PAGE_MASK
 #endif
 
+static void *bpf_prog_pack_vmalloc(unsigned long size)
+{
+#if defined(MODULES_VADDR)
+	unsigned long start = MODULES_VADDR;
+	unsigned long end = MODULES_END;
+#else
+	unsigned long start = VMALLOC_START;
+	unsigned long end = VMALLOC_END;
+#endif
+
+	return __vmalloc_node_range(size, PAGE_SIZE, start, end, GFP_KERNEL, PAGE_KERNEL,
+				    VM_DEFER_KMEMLEAK | VM_ALLOW_HUGE_VMAP,
+				    NUMA_NO_NODE, __builtin_return_address(0));
+}
+
 static size_t select_bpf_prog_pack_size(void)
 {
 	size_t size;
 	void *ptr;
 
 	size = BPF_HPAGE_SIZE * num_online_nodes();
-	ptr = module_alloc(size);
+	ptr = bpf_prog_pack_vmalloc(size);
 
 	/* Test whether we can get huge pages. If not just use PAGE_SIZE
 	 * packs.
@@ -881,7 +896,7 @@ static struct bpf_prog_pack *alloc_new_pack(void)
 		       GFP_KERNEL);
 	if (!pack)
 		return NULL;
-	pack->ptr = module_alloc(bpf_prog_pack_size);
+	pack->ptr = bpf_prog_pack_vmalloc(bpf_prog_pack_size);
 	if (!pack->ptr) {
 		kfree(pack);
 		return NULL;
@@ -889,7 +904,6 @@ static struct bpf_prog_pack *alloc_new_pack(void)
 	bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE);
 	list_add_tail(&pack->list, &pack_list);
 
-	set_vm_flush_reset_perms(pack->ptr);
 	set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
 	set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
 	return pack;
@@ -970,7 +984,9 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
 	if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
 				       bpf_prog_chunk_count(), 0) == 0) {
 		list_del(&pack->list);
-		module_memfree(pack->ptr);
+		set_memory_nx((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
+		set_memory_rw((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
+		vfree(pack->ptr);
 		kfree(pack);
 	}
 out:
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 1/2] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
  2022-04-08 22:34 ` [PATCH bpf 1/2] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
@ 2022-04-09  5:28   ` Christoph Hellwig
  2022-04-10  1:25     ` Song Liu
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2022-04-09  5:28 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, netdev, linux-mm, ast, daniel, andrii, kernel-team, akpm,
	rick.p.edgecombe, hch, imbrenda

On Fri, Apr 08, 2022 at 03:34:42PM -0700, Song Liu wrote:
> Huge page backed vmalloc memory could benefit performance in many cases.
> Since some users of vmalloc may not be ready to handle huge pages,
> VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
> pages. However, it is not easy to add VM_NO_HUGE_VMAP to all the users
> that may try to allocate >= PMD_SIZE pages, but are not ready to handle
> huge pages properly.
> 
> Replace VM_NO_HUGE_VMAP with an opt-in flag, VM_ALLOW_HUGE_VMAP, so that
> users that benefit from huge pages could ask specificially.

Given that the huge page backing was added explicitly for some big boot
time allocated hashed,those should probably have the VM_ALLOW_HUGE_VMAP
added from the start (maybe not in this patch, but certainly in this
series).  We'll probably also need a vmalloc_huge interface for those.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 2/2] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
  2022-04-08 22:34 ` [PATCH bpf 2/2] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack Song Liu
@ 2022-04-09  5:29   ` Christoph Hellwig
  2022-04-10  1:34     ` Song Liu
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2022-04-09  5:29 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, netdev, linux-mm, ast, daniel, andrii, kernel-team, akpm,
	rick.p.edgecombe, hch, imbrenda

On Fri, Apr 08, 2022 at 03:34:43PM -0700, Song Liu wrote:
> +static void *bpf_prog_pack_vmalloc(unsigned long size)
> +{
> +#if defined(MODULES_VADDR)
> +	unsigned long start = MODULES_VADDR;
> +	unsigned long end = MODULES_END;
> +#else
> +	unsigned long start = VMALLOC_START;
> +	unsigned long end = VMALLOC_END;
> +#endif
> +
> +	return __vmalloc_node_range(size, PAGE_SIZE, start, end, GFP_KERNEL, PAGE_KERNEL,
> +				    VM_DEFER_KMEMLEAK | VM_ALLOW_HUGE_VMAP,
> +				    NUMA_NO_NODE, __builtin_return_address(0));
> +}

Instead of having this magic in bpf I think a module_alloc_large would
seems like the better interface here.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 0/2] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP
  2022-04-08 22:34 [PATCH bpf 0/2] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
  2022-04-08 22:34 ` [PATCH bpf 1/2] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
  2022-04-08 22:34 ` [PATCH bpf 2/2] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack Song Liu
@ 2022-04-09 11:43 ` Thorsten Leemhuis
  2022-04-10  1:36   ` Song Liu
  2 siblings, 1 reply; 11+ messages in thread
From: Thorsten Leemhuis @ 2022-04-09 11:43 UTC (permalink / raw)
  To: Song Liu, bpf, netdev, linux-mm
  Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch, imbrenda

Hi, this is your Linux kernel regression tracker.

On 09.04.22 00:34, Song Liu wrote:
> Enabling HAVE_ARCH_HUGE_VMALLOC on x86_64 and use it for bpf_prog_pack has
> caused some issues [1], as many users of vmalloc are not yet ready to
> handle huge pages. To enable a more smooth transition to use huge page
> backed vmalloc memory, this set replaces VM_NO_HUGE_VMAP flag with an new
> opt-in flag, VM_ALLOW_HUGE_VMAP. More discussions about this topic can be
> found at [2].
> 
> Patch 1 removes VM_NO_HUGE_VMAP and adds VM_ALLOW_HUGE_VMAP.
> Patch 2 uses VM_ALLOW_HUGE_VMAP in bpf_prog_pack.
> 
> [1] https://lore.kernel.org/lkml/20220204185742.271030-1-song@kernel.org/
> [2] https://lore.kernel.org/linux-mm/20220330225642.1163897-1-song@kernel.org/

These patches apparently fix a regression (one that's mentioned in your
[2]) that I tracked. Hence in the next iteration of your patches could
you please instead add a 'Link:' tag pointing to the report for anyone
wanting to look into the backstory in the future, as explained in
'Documentation/process/submitting-patches.rst' and
'Documentation/process/5.Posting.rst'? E.g. like this:

"Link:
https://lore.kernel.org/netdev/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/"

Not totally sure, but I guess it needs a Fixes tag as well specifying
the change that cause this regression (that's "fac54e2bfb5b"). The
documents mentioned above explain this, too. A "Reported-by" might be
appropriate as well.

In anyone wonders why I care: there are internal and publicly used tools
and scripts out there that reply on proper "Link" tags. I don't known
how many, but there is at least one public tool I'm running that cares:
regzbot, my regression tracking bot, which I use to track Linux kernel
regressions and generate the regression reports sent to Linus. Proper
"Link:" tags allow the bot to automatically connect regression reports
with fixes being posted or applied to resolve the particular regression
-- which makes regression tracking a whole lot easier and feasible for
the Linux kernel. That's why it's a great help for me if people set
proper "Link" tags.

While at it, let me tell regzbot about this thread:
#regzbot ^backmonitor:
https://lore.kernel.org/netdev/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 1/2] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
  2022-04-09  5:28   ` Christoph Hellwig
@ 2022-04-10  1:25     ` Song Liu
  0 siblings, 0 replies; 11+ messages in thread
From: Song Liu @ 2022-04-10  1:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Song Liu, bpf, Networking, Linux Memory Management List,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Andrew Morton, rick.p.edgecombe, imbrenda



> On Apr 8, 2022, at 10:28 PM, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Fri, Apr 08, 2022 at 03:34:42PM -0700, Song Liu wrote:
>> Huge page backed vmalloc memory could benefit performance in many cases.
>> Since some users of vmalloc may not be ready to handle huge pages,
>> VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
>> pages. However, it is not easy to add VM_NO_HUGE_VMAP to all the users
>> that may try to allocate >= PMD_SIZE pages, but are not ready to handle
>> huge pages properly.
>> 
>> Replace VM_NO_HUGE_VMAP with an opt-in flag, VM_ALLOW_HUGE_VMAP, so that
>> users that benefit from huge pages could ask specificially.
> 
> Given that the huge page backing was added explicitly for some big boot
> time allocated hashed,those should probably have the VM_ALLOW_HUGE_VMAP
> added from the start (maybe not in this patch, but certainly in this
> series).  We'll probably also need a vmalloc_huge interface for those.

Will add it in v2. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 2/2] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
  2022-04-09  5:29   ` Christoph Hellwig
@ 2022-04-10  1:34     ` Song Liu
  2022-04-11  6:56       ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Song Liu @ 2022-04-10  1:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Song Liu, bpf, netdev, linux-mm, ast, daniel, andrii,
	Kernel Team, akpm, rick.p.edgecombe, imbrenda



> On Apr 8, 2022, at 10:29 PM, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Fri, Apr 08, 2022 at 03:34:43PM -0700, Song Liu wrote:
>> +static void *bpf_prog_pack_vmalloc(unsigned long size)
>> +{
>> +#if defined(MODULES_VADDR)
>> +	unsigned long start = MODULES_VADDR;
>> +	unsigned long end = MODULES_END;
>> +#else
>> +	unsigned long start = VMALLOC_START;
>> +	unsigned long end = VMALLOC_END;
>> +#endif
>> +
>> +	return __vmalloc_node_range(size, PAGE_SIZE, start, end, GFP_KERNEL, PAGE_KERNEL,
>> +				    VM_DEFER_KMEMLEAK | VM_ALLOW_HUGE_VMAP,
>> +				    NUMA_NO_NODE, __builtin_return_address(0));
>> +}
> 
> Instead of having this magic in bpf I think a module_alloc_large would
> seems like the better interface here.

AFAICT, modules allocate a large piece of memory and put both text and
data on it, so modules cannot really use huge pages yet. 

OTOH, it is probably beneficial for the modules to use something 
similar to bpf_prog_pack, i.e., put text from multiple modules to a 
single huge page. Of course, this requires non-trivial work in both 
mm code and module code.

Given that 1) modules cannot use huge pages yet, and 2) module may
use differently (with sharing), I think adding module_alloc_large()
doesn't add much value at the moment. So we can just keep this logic
in BPF for now. 

Does this make sense?

Thanks,
Song

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 0/2] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP
  2022-04-09 11:43 ` [PATCH bpf 0/2] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Thorsten Leemhuis
@ 2022-04-10  1:36   ` Song Liu
  0 siblings, 0 replies; 11+ messages in thread
From: Song Liu @ 2022-04-10  1:36 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Song Liu, bpf, netdev, linux-mm, ast, daniel, andrii,
	Kernel Team, akpm, rick.p.edgecombe, hch, imbrenda



> On Apr 9, 2022, at 4:43 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
> 
> Hi, this is your Linux kernel regression tracker.
> 
> On 09.04.22 00:34, Song Liu wrote:
>> Enabling HAVE_ARCH_HUGE_VMALLOC on x86_64 and use it for bpf_prog_pack has
>> caused some issues [1], as many users of vmalloc are not yet ready to
>> handle huge pages. To enable a more smooth transition to use huge page
>> backed vmalloc memory, this set replaces VM_NO_HUGE_VMAP flag with an new
>> opt-in flag, VM_ALLOW_HUGE_VMAP. More discussions about this topic can be
>> found at [2].
>> 
>> Patch 1 removes VM_NO_HUGE_VMAP and adds VM_ALLOW_HUGE_VMAP.
>> Patch 2 uses VM_ALLOW_HUGE_VMAP in bpf_prog_pack.
>> 
>> [1] https://lore.kernel.org/lkml/20220204185742.271030-1-song@kernel.org/
>> [2] https://lore.kernel.org/linux-mm/20220330225642.1163897-1-song@kernel.org/
> 
> These patches apparently fix a regression (one that's mentioned in your
> [2]) that I tracked. Hence in the next iteration of your patches could
> you please instead add a 'Link:' tag pointing to the report for anyone
> wanting to look into the backstory in the future, as explained in
> 'Documentation/process/submitting-patches.rst' and
> 'Documentation/process/5.Posting.rst'? E.g. like this:
> 
> "Link:
> https://lore.kernel.org/netdev/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/"
> 
> Not totally sure, but I guess it needs a Fixes tag as well specifying
> the change that cause this regression (that's "fac54e2bfb5b"). The
> documents mentioned above explain this, too. A "Reported-by" might be
> appropriate as well.
> 
> In anyone wonders why I care: there are internal and publicly used tools
> and scripts out there that reply on proper "Link" tags. I don't known
> how many, but there is at least one public tool I'm running that cares:
> regzbot, my regression tracking bot, which I use to track Linux kernel
> regressions and generate the regression reports sent to Linus. Proper
> "Link:" tags allow the bot to automatically connect regression reports
> with fixes being posted or applied to resolve the particular regression
> -- which makes regression tracking a whole lot easier and feasible for
> the Linux kernel. That's why it's a great help for me if people set
> proper "Link" tags.
> 
> While at it, let me tell regzbot about this thread:
> #regzbot ^backmonitor:
> https://lore.kernel.org/netdev/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> 
> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> reports on my table. I can only look briefly into most of them and lack
> knowledge about most of the areas they concern. I thus unfortunately
> will sometimes get things wrong or miss something important. I hope
> that's not the case here; if you think it is, don't hesitate to tell me
> in a public reply, it's in everyone's interest to set the public record
> straight.

Thanks for the reminder. I will add the Fixes tag, and try to work with 
regzbot. 

Song


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 2/2] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
  2022-04-10  1:34     ` Song Liu
@ 2022-04-11  6:56       ` Christoph Hellwig
  2022-04-11 22:18         ` Song Liu
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2022-04-11  6:56 UTC (permalink / raw)
  To: Song Liu
  Cc: Christoph Hellwig, Song Liu, bpf, netdev, linux-mm, ast, daniel,
	andrii, Kernel Team, akpm, rick.p.edgecombe, imbrenda

On Sun, Apr 10, 2022 at 01:34:50AM +0000, Song Liu wrote:
> OTOH, it is probably beneficial for the modules to use something 
> similar to bpf_prog_pack, i.e., put text from multiple modules to a 
> single huge page. Of course, this requires non-trivial work in both 
> mm code and module code.
> 
> Given that 1) modules cannot use huge pages yet, and 2) module may
> use differently (with sharing), I think adding module_alloc_large()
> doesn't add much value at the moment. So we can just keep this logic
> in BPF for now. 
> 
> Does this make sense?

I'm not intending to say modules should use the new helper.  But I'd much
prefer to keep all the MODULES_VADDR related bits self-contained in the
modules code and not splatter it over random other subsystems.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 2/2] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
  2022-04-11  6:56       ` Christoph Hellwig
@ 2022-04-11 22:18         ` Song Liu
  0 siblings, 0 replies; 11+ messages in thread
From: Song Liu @ 2022-04-11 22:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Song Liu, bpf, netdev, linux-mm, ast, daniel, andrii,
	Kernel Team, akpm, rick.p.edgecombe, imbrenda



> On Apr 10, 2022, at 11:56 PM, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Sun, Apr 10, 2022 at 01:34:50AM +0000, Song Liu wrote:
>> OTOH, it is probably beneficial for the modules to use something 
>> similar to bpf_prog_pack, i.e., put text from multiple modules to a 
>> single huge page. Of course, this requires non-trivial work in both 
>> mm code and module code.
>> 
>> Given that 1) modules cannot use huge pages yet, and 2) module may
>> use differently (with sharing), I think adding module_alloc_large()
>> doesn't add much value at the moment. So we can just keep this logic
>> in BPF for now. 
>> 
>> Does this make sense?
> 
> I'm not intending to say modules should use the new helper.  But I'd much
> prefer to keep all the MODULES_VADDR related bits self-contained in the
> modules code and not splatter it over random other subsystems.

Got it. Will add that in v2. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-04-11 22:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-08 22:34 [PATCH bpf 0/2] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
2022-04-08 22:34 ` [PATCH bpf 1/2] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
2022-04-09  5:28   ` Christoph Hellwig
2022-04-10  1:25     ` Song Liu
2022-04-08 22:34 ` [PATCH bpf 2/2] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack Song Liu
2022-04-09  5:29   ` Christoph Hellwig
2022-04-10  1:34     ` Song Liu
2022-04-11  6:56       ` Christoph Hellwig
2022-04-11 22:18         ` Song Liu
2022-04-09 11:43 ` [PATCH bpf 0/2] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Thorsten Leemhuis
2022-04-10  1:36   ` Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.