All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 bpf RESEND 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP
@ 2022-04-14 19:59 Song Liu
  2022-04-14 19:59 ` [PATCH v3 bpf RESEND 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Song Liu @ 2022-04-14 19:59 UTC (permalink / raw)
  To: bpf, linux-mm, linux-kernel
  Cc: ast, daniel, kernel-team, akpm, rick.p.edgecombe, hch, imbrenda,
	mcgrof, Song Liu

Resend with a shorter CC list, as the original v3 didn't make through
the mail list. :(

Changes v2 => v3:
1. Use __vmalloc_huge in alloc_large_system_hash.
2. Use EXPORT_SYMBOL_GPL for new functions. (Christoph Hellwig)
3. Add more description about the issues and changes.(Christoph Hellwig,
   Rick Edgecombe).

Changes v1 => v2:
1. Add vmalloc_huge(). (Christoph Hellwig)
2. Add module_alloc_huge(). (Christoph Hellwig)
3. Add Fixes tag and Link tag. (Thorsten Leemhuis)

Enabling HAVE_ARCH_HUGE_VMALLOC on x86_64 and use it for bpf_prog_pack has
caused some issues [1], as many users of vmalloc are not yet ready to
handle huge pages. To enable a more smooth transition to use huge page
backed vmalloc memory, this set replaces VM_NO_HUGE_VMAP flag with an new
opt-in flag, VM_ALLOW_HUGE_VMAP. More discussions about this topic can be
found at [2].

Patch 1 removes VM_NO_HUGE_VMAP and adds VM_ALLOW_HUGE_VMAP.
Patch 2 uses VM_ALLOW_HUGE_VMAP in bpf_prog_pack.

[1] https://lore.kernel.org/lkml/20220204185742.271030-1-song@kernel.org/
[2] https://lore.kernel.org/linux-mm/20220330225642.1163897-1-song@kernel.org/

Song Liu (4):
  vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
  page_alloc: use __vmalloc_huge for large system hash
  module: introduce module_alloc_huge
  bpf: use module_alloc_huge for bpf_prog_pack

 arch/Kconfig                 |  6 ++----
 arch/powerpc/kernel/module.c |  2 +-
 arch/s390/kvm/pv.c           |  2 +-
 arch/x86/kernel/module.c     | 21 +++++++++++++++++++++
 include/linux/moduleloader.h |  5 +++++
 include/linux/vmalloc.h      |  5 +++--
 kernel/bpf/core.c            |  7 ++++---
 kernel/module.c              |  5 +++++
 mm/page_alloc.c              |  2 +-
 mm/vmalloc.c                 | 34 ++++++++++++++++++++++++++++------
 10 files changed, 71 insertions(+), 18 deletions(-)

--
2.30.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 bpf RESEND 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
  2022-04-14 19:59 [PATCH v3 bpf RESEND 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
@ 2022-04-14 19:59 ` Song Liu
  2022-04-15  6:31   ` Christoph Hellwig
  2022-04-14 19:59 ` [PATCH v3 bpf RESEND 2/4] page_alloc: use __vmalloc_huge for large system hash Song Liu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Song Liu @ 2022-04-14 19:59 UTC (permalink / raw)
  To: bpf, linux-mm, linux-kernel
  Cc: ast, daniel, kernel-team, akpm, rick.p.edgecombe, hch, imbrenda,
	mcgrof, Song Liu

Huge page backed vmalloc memory could benefit performance in many cases.
However, some users of vmalloc may not be ready to handle huge pages for
various reasons: hardware constraints, potential pages split, etc.
VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
pages. However, it is not easy to track down all the users that require
the opt-out, as the allocation are passed different stacks and may cause
issues in different layers.

To address this issue, replace VM_NO_HUGE_VMAP with an opt-in flag,
VM_ALLOW_HUGE_VMAP, so that users that benefit from huge pages could ask
specificially.

Also, replace vmalloc_no_huge() with opt-in helpers vmalloc_huge(), and
__vmalloc_huge().

Fixes: fac54e2bfb5b ("x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with
                      HAVE_ARCH_HUGE_VMAP")
Link: https://lore.kernel.org/netdev/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/"
Signed-off-by: Song Liu <song@kernel.org>
---
 arch/Kconfig                 |  6 ++----
 arch/powerpc/kernel/module.c |  2 +-
 arch/s390/kvm/pv.c           |  2 +-
 include/linux/vmalloc.h      |  5 +++--
 mm/vmalloc.c                 | 34 ++++++++++++++++++++++++++++------
 5 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 29b0167c088b..31c4fdc4a4ba 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -854,10 +854,8 @@ config HAVE_ARCH_HUGE_VMAP
 
 #
 #  Archs that select this would be capable of PMD-sized vmaps (i.e.,
-#  arch_vmap_pmd_supported() returns true), and they must make no assumptions
-#  that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag
-#  can be used to prohibit arch-specific allocations from using hugepages to
-#  help with this (e.g., modules may require it).
+#  arch_vmap_pmd_supported() returns true). The VM_ALLOW_HUGE_VMAP flag
+#  must be used to enable allocations to use hugepages.
 #
 config HAVE_ARCH_HUGE_VMALLOC
 	depends on HAVE_ARCH_HUGE_VMAP
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index 40a583e9d3c7..97a76a8619fb 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -101,7 +101,7 @@ __module_alloc(unsigned long size, unsigned long start, unsigned long end, bool
 	 * too.
 	 */
 	return __vmalloc_node_range(size, 1, start, end, gfp, prot,
-				    VM_FLUSH_RESET_PERMS | VM_NO_HUGE_VMAP,
+				    VM_FLUSH_RESET_PERMS,
 				    NUMA_NO_NODE, __builtin_return_address(0));
 }
 
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 7f7c0d6af2ce..8afede243903 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -142,7 +142,7 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
 	 * using large pages for the virtual memory area.
 	 * This is a hardware limitation.
 	 */
-	kvm->arch.pv.stor_var = vmalloc_no_huge(vlen);
+	kvm->arch.pv.stor_var = vmalloc(vlen);
 	if (!kvm->arch.pv.stor_var)
 		goto out_err;
 	return 0;
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 3b1df7da402d..20205c4e3b23 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -26,7 +26,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
 #define VM_FLUSH_RESET_PERMS	0x00000100	/* reset direct map and flush TLB on unmap, can't be freed in atomic context */
 #define VM_MAP_PUT_PAGES	0x00000200	/* put pages and free array in vfree */
-#define VM_NO_HUGE_VMAP		0x00000400	/* force PAGE_SIZE pte mapping */
+#define VM_ALLOW_HUGE_VMAP	0x00000400      /* Allow for huge pages on archs with HAVE_ARCH_HUGE_VMALLOC */
 
 #if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
 	!defined(CONFIG_KASAN_VMALLOC)
@@ -153,7 +153,8 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			const void *caller) __alloc_size(1);
 void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
 		int node, const void *caller) __alloc_size(1);
-void *vmalloc_no_huge(unsigned long size) __alloc_size(1);
+void *vmalloc_huge(unsigned long size) __alloc_size(1);
+void *__vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
 
 extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
 extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e163372d3967..1dac30c0ea41 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3106,7 +3106,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 		return NULL;
 	}
 
-	if (vmap_allow_huge && !(vm_flags & VM_NO_HUGE_VMAP)) {
+	if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
 		unsigned long size_per_node;
 
 		/*
@@ -3273,21 +3273,43 @@ void *vmalloc(unsigned long size)
 EXPORT_SYMBOL(vmalloc);
 
 /**
- * vmalloc_no_huge - allocate virtually contiguous memory using small pages
+ * vmalloc_huge - allocate virtually contiguous memory, allow huge pages
  * @size:    allocation size
  *
- * Allocate enough non-huge pages to cover @size from the page level
+ * Allocate enough pages to cover @size from the page level
+ * allocator and map them into contiguous kernel virtual space.
+ * If @size is greater than or equal to PMD_SIZE, allow using
+ * huge pages for the memory
+ *
+ * Return: pointer to the allocated memory or %NULL on error
+ */
+void *vmalloc_huge(unsigned long size)
+{
+	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+				    GFP_KERNEL, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
+				    NUMA_NO_NODE, __builtin_return_address(0));
+}
+EXPORT_SYMBOL_GPL(vmalloc_huge);
+
+/**
+ * __vmalloc_huge - allocate virtually contiguous memory, allow huge pages
+ * @size:      allocation size
+ * @gfp_mask:  flags for the page level allocator
+ *
+ * Allocate enough pages to cover @size from the page level
  * allocator and map them into contiguous kernel virtual space.
+ * If @size is greater than or equal to PMD_SIZE, allow using
+ * huge pages for the memory
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-void *vmalloc_no_huge(unsigned long size)
+void *__vmalloc_huge(unsigned long size, gfp_t gfp_mask)
 {
 	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
-				    GFP_KERNEL, PAGE_KERNEL, VM_NO_HUGE_VMAP,
+				    gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
 				    NUMA_NO_NODE, __builtin_return_address(0));
 }
-EXPORT_SYMBOL(vmalloc_no_huge);
+EXPORT_SYMBOL_GPL(__vmalloc_huge);
 
 /**
  * vzalloc - allocate virtually contiguous memory with zero fill
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 bpf RESEND 2/4] page_alloc: use __vmalloc_huge for large system hash
  2022-04-14 19:59 [PATCH v3 bpf RESEND 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
  2022-04-14 19:59 ` [PATCH v3 bpf RESEND 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
@ 2022-04-14 19:59 ` Song Liu
  2022-04-15  6:32   ` Christoph Hellwig
  2022-04-14 19:59 ` [PATCH v3 bpf RESEND 3/4] module: introduce module_alloc_huge Song Liu
  2022-04-14 19:59 ` [PATCH v3 bpf RESEND 4/4] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
  3 siblings, 1 reply; 15+ messages in thread
From: Song Liu @ 2022-04-14 19:59 UTC (permalink / raw)
  To: bpf, linux-mm, linux-kernel
  Cc: ast, daniel, kernel-team, akpm, rick.p.edgecombe, hch, imbrenda,
	mcgrof, Song Liu

Use __vmalloc_huge() in alloc_large_system_hash() so that large system
hash (>= PMD_SIZE) could benefit from huge pages. Note that __vmalloc_huge
only allocates huge pages for systems with HAVE_ARCH_HUGE_VMALLOC.

Signed-off-by: Song Liu <song@kernel.org>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e5b4488a0c5..20d38b8482c4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8919,7 +8919,7 @@ void *__init alloc_large_system_hash(const char *tablename,
 				table = memblock_alloc_raw(size,
 							   SMP_CACHE_BYTES);
 		} else if (get_order(size) >= MAX_ORDER || hashdist) {
-			table = __vmalloc(size, gfp_flags);
+			table = __vmalloc_huge(size, gfp_flags);
 			virt = true;
 			if (table)
 				huge = is_vm_area_hugepages(table);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 bpf RESEND 3/4] module: introduce module_alloc_huge
  2022-04-14 19:59 [PATCH v3 bpf RESEND 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
  2022-04-14 19:59 ` [PATCH v3 bpf RESEND 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
  2022-04-14 19:59 ` [PATCH v3 bpf RESEND 2/4] page_alloc: use __vmalloc_huge for large system hash Song Liu
@ 2022-04-14 19:59 ` Song Liu
  2022-04-14 20:34   ` Luis Chamberlain
  2022-04-15  6:32   ` Christoph Hellwig
  2022-04-14 19:59 ` [PATCH v3 bpf RESEND 4/4] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
  3 siblings, 2 replies; 15+ messages in thread
From: Song Liu @ 2022-04-14 19:59 UTC (permalink / raw)
  To: bpf, linux-mm, linux-kernel
  Cc: ast, daniel, kernel-team, akpm, rick.p.edgecombe, hch, imbrenda,
	mcgrof, Song Liu

Introduce module_alloc_huge, which allocates huge page backed memory in
module memory space. The primary user of this memory is bpf_prog_pack
(multiple BPF programs sharing a huge page).

Signed-off-by: Song Liu <song@kernel.org>
---
 arch/x86/kernel/module.c     | 21 +++++++++++++++++++++
 include/linux/moduleloader.h |  5 +++++
 kernel/module.c              |  5 +++++
 3 files changed, 31 insertions(+)

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index b98ffcf4d250..63f6a16c70dc 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -86,6 +86,27 @@ void *module_alloc(unsigned long size)
 	return p;
 }
 
+void *module_alloc_huge(unsigned long size)
+{
+	gfp_t gfp_mask = GFP_KERNEL;
+	void *p;
+
+	if (PAGE_ALIGN(size) > MODULES_LEN)
+		return NULL;
+
+	p = __vmalloc_node_range(size, MODULE_ALIGN,
+				 MODULES_VADDR + get_module_load_offset(),
+				 MODULES_END, gfp_mask, PAGE_KERNEL,
+				 VM_DEFER_KMEMLEAK | VM_ALLOW_HUGE_VMAP,
+				 NUMA_NO_NODE, __builtin_return_address(0));
+	if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
+		vfree(p);
+		return NULL;
+	}
+
+	return p;
+}
+
 #ifdef CONFIG_X86_32
 int apply_relocate(Elf32_Shdr *sechdrs,
 		   const char *strtab,
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 9e09d11ffe5b..d34743a88938 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -26,6 +26,11 @@ unsigned int arch_mod_section_prepend(struct module *mod, unsigned int section);
    sections.  Returns NULL on failure. */
 void *module_alloc(unsigned long size);
 
+/* Allocator used for allocating memory in module memory space. If size is
+ * greater than PMD_SIZE, allow using huge pages. Returns NULL on failure.
+ */
+void *module_alloc_huge(unsigned long size);
+
 /* Free memory returned from module_alloc. */
 void module_memfree(void *module_region);
 
diff --git a/kernel/module.c b/kernel/module.c
index 6cea788fd965..b2c6cb682a7d 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2839,6 +2839,11 @@ void * __weak module_alloc(unsigned long size)
 			NUMA_NO_NODE, __builtin_return_address(0));
 }
 
+void * __weak module_alloc_huge(unsigned long size)
+{
+	return vmalloc_huge(size);
+}
+
 bool __weak module_init_section(const char *name)
 {
 	return strstarts(name, ".init");
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 bpf RESEND 4/4] bpf: use module_alloc_huge for bpf_prog_pack
  2022-04-14 19:59 [PATCH v3 bpf RESEND 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
                   ` (2 preceding siblings ...)
  2022-04-14 19:59 ` [PATCH v3 bpf RESEND 3/4] module: introduce module_alloc_huge Song Liu
@ 2022-04-14 19:59 ` Song Liu
  3 siblings, 0 replies; 15+ messages in thread
From: Song Liu @ 2022-04-14 19:59 UTC (permalink / raw)
  To: bpf, linux-mm, linux-kernel
  Cc: ast, daniel, kernel-team, akpm, rick.p.edgecombe, hch, imbrenda,
	mcgrof, Song Liu

Use module_alloc_huge for bpf_prog_pack so that BPF programs sit on
PMD_SIZE pages. This benefits system performance by reducing iTLB miss
rate.

Also, remove set_vm_flush_reset_perms() from alloc_new_pack() and use
set_memory_[nx|rw] in bpf_prog_pack_free(). This is because
VM_FLUSH_RESET_PERMS does not work with huge pages yet.

Signed-off-by: Song Liu <song@kernel.org>
---
 kernel/bpf/core.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 13e9dbeeedf3..b2a634d0f842 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -857,7 +857,7 @@ static size_t select_bpf_prog_pack_size(void)
 	void *ptr;
 
 	size = BPF_HPAGE_SIZE * num_online_nodes();
-	ptr = module_alloc(size);
+	ptr = module_alloc_huge(size);
 
 	/* Test whether we can get huge pages. If not just use PAGE_SIZE
 	 * packs.
@@ -881,7 +881,7 @@ static struct bpf_prog_pack *alloc_new_pack(void)
 		       GFP_KERNEL);
 	if (!pack)
 		return NULL;
-	pack->ptr = module_alloc(bpf_prog_pack_size);
+	pack->ptr = module_alloc_huge(bpf_prog_pack_size);
 	if (!pack->ptr) {
 		kfree(pack);
 		return NULL;
@@ -889,7 +889,6 @@ static struct bpf_prog_pack *alloc_new_pack(void)
 	bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE);
 	list_add_tail(&pack->list, &pack_list);
 
-	set_vm_flush_reset_perms(pack->ptr);
 	set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
 	set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
 	return pack;
@@ -970,6 +969,8 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
 	if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
 				       bpf_prog_chunk_count(), 0) == 0) {
 		list_del(&pack->list);
+		set_memory_nx((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
+		set_memory_rw((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
 		module_memfree(pack->ptr);
 		kfree(pack);
 	}
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 bpf RESEND 3/4] module: introduce module_alloc_huge
  2022-04-14 19:59 ` [PATCH v3 bpf RESEND 3/4] module: introduce module_alloc_huge Song Liu
@ 2022-04-14 20:34   ` Luis Chamberlain
  2022-04-14 21:03     ` Song Liu
  2022-04-15  6:32   ` Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Luis Chamberlain @ 2022-04-14 20:34 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, linux-mm, linux-kernel, ast, daniel, kernel-team, akpm,
	rick.p.edgecombe, hch, imbrenda

On Thu, Apr 14, 2022 at 12:59:13PM -0700, Song Liu wrote:
> Introduce module_alloc_huge, which allocates huge page backed memory in
> module memory space. The primary user of this memory is bpf_prog_pack
> (multiple BPF programs sharing a huge page).
> 
> Signed-off-by: Song Liu <song@kernel.org>

See modules-next [0], as modules.c has been chopped up as of late.
So if you want this to go throug modules this will need to rebased
on that tree. fortunately the amount of code in question does not
seem like much.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next

  Luis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 bpf RESEND 3/4] module: introduce module_alloc_huge
  2022-04-14 20:34   ` Luis Chamberlain
@ 2022-04-14 21:03     ` Song Liu
  2022-04-14 21:11       ` Luis Chamberlain
  0 siblings, 1 reply; 15+ messages in thread
From: Song Liu @ 2022-04-14 21:03 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: bpf, Linux-MM, open list, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Andrew Morton, Edgecombe, Rick P, Christoph Hellwig,
	imbrenda

Hi Luis,

On Thu, Apr 14, 2022 at 1:34 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Thu, Apr 14, 2022 at 12:59:13PM -0700, Song Liu wrote:
> > Introduce module_alloc_huge, which allocates huge page backed memory in
> > module memory space. The primary user of this memory is bpf_prog_pack
> > (multiple BPF programs sharing a huge page).
> >
> > Signed-off-by: Song Liu <song@kernel.org>
>
> See modules-next [0], as modules.c has been chopped up as of late.
> So if you want this to go throug modules this will need to rebased
> on that tree. fortunately the amount of code in question does not
> seem like much.
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next

We are hoping to ship this with to 5.18, as the set addresses some issue with
huge page backed vmalloc. I guess we cannot ship it via modules-next branch.

How about we ship module_alloc_huge() to 5.18 in module.c for now, and once
we update modules-next branch, I will send another patch to clean it up?

Thanks,
Song

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 bpf RESEND 3/4] module: introduce module_alloc_huge
  2022-04-14 21:03     ` Song Liu
@ 2022-04-14 21:11       ` Luis Chamberlain
  2022-04-14 21:31         ` Song Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Luis Chamberlain @ 2022-04-14 21:11 UTC (permalink / raw)
  To: Song Liu, Linus Torvalds
  Cc: bpf, Linux-MM, open list, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Andrew Morton, Edgecombe, Rick P, Christoph Hellwig,
	imbrenda

On Thu, Apr 14, 2022 at 02:03:17PM -0700, Song Liu wrote:
> Hi Luis,
> 
> On Thu, Apr 14, 2022 at 1:34 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> >
> > On Thu, Apr 14, 2022 at 12:59:13PM -0700, Song Liu wrote:
> > > Introduce module_alloc_huge, which allocates huge page backed memory in
> > > module memory space. The primary user of this memory is bpf_prog_pack
> > > (multiple BPF programs sharing a huge page).
> > >
> > > Signed-off-by: Song Liu <song@kernel.org>
> >
> > See modules-next [0], as modules.c has been chopped up as of late.
> > So if you want this to go throug modules this will need to rebased
> > on that tree. fortunately the amount of code in question does not
> > seem like much.
> >
> > [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next
> 
> We are hoping to ship this with to 5.18, as the set addresses some issue with
> huge page backed vmalloc. I guess we cannot ship it via modules-next branch.
> 

Huh, you intend this to go in as a fix for v5.18 (already released) once
properly reviewed?  This seems quite large... for a fix.

> How about we ship module_alloc_huge() to 5.18 in module.c for now, and once
> we update modules-next branch, I will send another patch to clean it up?

I rather set the expectations right about getting such a large fix in
for v5.18. I haven't even sat down to review all the changes in light of
this, but a cursorary glance seems to me it's rather "large" for a fix.

  Luis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 bpf RESEND 3/4] module: introduce module_alloc_huge
  2022-04-14 21:11       ` Luis Chamberlain
@ 2022-04-14 21:31         ` Song Liu
  2022-04-15 19:03           ` Luis Chamberlain
  0 siblings, 1 reply; 15+ messages in thread
From: Song Liu @ 2022-04-14 21:31 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Linus Torvalds, bpf, Linux-MM, open list, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Andrew Morton, Edgecombe, Rick P,
	Christoph Hellwig, imbrenda

On Thu, Apr 14, 2022 at 2:11 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Thu, Apr 14, 2022 at 02:03:17PM -0700, Song Liu wrote:
> > Hi Luis,
> >
> > On Thu, Apr 14, 2022 at 1:34 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > >
> > > On Thu, Apr 14, 2022 at 12:59:13PM -0700, Song Liu wrote:
> > > > Introduce module_alloc_huge, which allocates huge page backed memory in
> > > > module memory space. The primary user of this memory is bpf_prog_pack
> > > > (multiple BPF programs sharing a huge page).
> > > >
> > > > Signed-off-by: Song Liu <song@kernel.org>
> > >
> > > See modules-next [0], as modules.c has been chopped up as of late.
> > > So if you want this to go throug modules this will need to rebased
> > > on that tree. fortunately the amount of code in question does not
> > > seem like much.
> > >
> > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next
> >
> > We are hoping to ship this with to 5.18, as the set addresses some issue with
> > huge page backed vmalloc. I guess we cannot ship it via modules-next branch.
> >
>
> Huh, you intend this to go in as a fix for v5.18 (already released) once
> properly reviewed?  This seems quite large... for a fix.
>
> > How about we ship module_alloc_huge() to 5.18 in module.c for now, and once
> > we update modules-next branch, I will send another patch to clean it up?
>
> I rather set the expectations right about getting such a large fix in
> for v5.18. I haven't even sat down to review all the changes in light of
> this, but a cursorary glance seems to me it's rather "large" for a fix.

Yes, I agree this is a little too big for a fix. I guess we can discuss whether
some of the set need to wait until 5.19.

Thanks,
Song

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 bpf RESEND 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
  2022-04-14 19:59 ` [PATCH v3 bpf RESEND 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
@ 2022-04-15  6:31   ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2022-04-15  6:31 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, linux-mm, linux-kernel, ast, daniel, kernel-team, akpm,
	rick.p.edgecombe, hch, imbrenda, mcgrof

On Thu, Apr 14, 2022 at 12:59:11PM -0700, Song Liu wrote:
> +void *vmalloc_huge(unsigned long size)
> +{
> +	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> +				    GFP_KERNEL, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
> +				    NUMA_NO_NODE, __builtin_return_address(0));
> +}
> +EXPORT_SYMBOL_GPL(vmalloc_huge);

It seems like this one isn't actually used in this series, so I'd
suggest to drop it.

> +
> +/**
> + * __vmalloc_huge - allocate virtually contiguous memory, allow huge pages
> + * @size:      allocation size
> + * @gfp_mask:  flags for the page level allocator
> + *
> + * Allocate enough pages to cover @size from the page level
>   * allocator and map them into contiguous kernel virtual space.
> + * If @size is greater than or equal to PMD_SIZE, allow using
> + * huge pages for the memory
>   *
>   * Return: pointer to the allocated memory or %NULL on error
>   */
> -void *vmalloc_no_huge(unsigned long size)
> +void *__vmalloc_huge(unsigned long size, gfp_t gfp_mask)

And I'd just rename this vmalloc_huge.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 bpf RESEND 2/4] page_alloc: use __vmalloc_huge for large system hash
  2022-04-14 19:59 ` [PATCH v3 bpf RESEND 2/4] page_alloc: use __vmalloc_huge for large system hash Song Liu
@ 2022-04-15  6:32   ` Christoph Hellwig
  2022-04-15 16:57     ` Song Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2022-04-15  6:32 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, linux-mm, linux-kernel, ast, daniel, kernel-team, akpm,
	rick.p.edgecombe, hch, imbrenda, mcgrof

On Thu, Apr 14, 2022 at 12:59:12PM -0700, Song Liu wrote:
> Use __vmalloc_huge() in alloc_large_system_hash() so that large system
> hash (>= PMD_SIZE) could benefit from huge pages. Note that __vmalloc_huge
> only allocates huge pages for systems with HAVE_ARCH_HUGE_VMALLOC.

Looks good (modulo the possible naming chane suggested in patch 1):

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 bpf RESEND 3/4] module: introduce module_alloc_huge
  2022-04-14 19:59 ` [PATCH v3 bpf RESEND 3/4] module: introduce module_alloc_huge Song Liu
  2022-04-14 20:34   ` Luis Chamberlain
@ 2022-04-15  6:32   ` Christoph Hellwig
  2022-04-15 15:59     ` Song Liu
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2022-04-15  6:32 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, linux-mm, linux-kernel, ast, daniel, kernel-team, akpm,
	rick.p.edgecombe, hch, imbrenda, mcgrof

On Thu, Apr 14, 2022 at 12:59:13PM -0700, Song Liu wrote:
> Introduce module_alloc_huge, which allocates huge page backed memory in
> module memory space. The primary user of this memory is bpf_prog_pack
> (multiple BPF programs sharing a huge page).
> 
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  arch/x86/kernel/module.c     | 21 +++++++++++++++++++++
>  include/linux/moduleloader.h |  5 +++++
>  kernel/module.c              |  5 +++++
>  3 files changed, 31 insertions(+)
> 
> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> index b98ffcf4d250..63f6a16c70dc 100644
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -86,6 +86,27 @@ void *module_alloc(unsigned long size)
>  	return p;
>  }
>  
> +void *module_alloc_huge(unsigned long size)
> +{
> +	gfp_t gfp_mask = GFP_KERNEL;
> +	void *p;
> +
> +	if (PAGE_ALIGN(size) > MODULES_LEN)
> +		return NULL;
> +
> +	p = __vmalloc_node_range(size, MODULE_ALIGN,
> +				 MODULES_VADDR + get_module_load_offset(),
> +				 MODULES_END, gfp_mask, PAGE_KERNEL,
> +				 VM_DEFER_KMEMLEAK | VM_ALLOW_HUGE_VMAP,
> +				 NUMA_NO_NODE, __builtin_return_address(0));
> +	if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
> +		vfree(p);
> +		return NULL;
> +	}
> +
> +	return p;
> +}
> +
>  #ifdef CONFIG_X86_32
>  int apply_relocate(Elf32_Shdr *sechdrs,
>  		   const char *strtab,
> diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
> index 9e09d11ffe5b..d34743a88938 100644
> --- a/include/linux/moduleloader.h
> +++ b/include/linux/moduleloader.h
> @@ -26,6 +26,11 @@ unsigned int arch_mod_section_prepend(struct module *mod, unsigned int section);
>     sections.  Returns NULL on failure. */
>  void *module_alloc(unsigned long size);
>  
> +/* Allocator used for allocating memory in module memory space. If size is
> + * greater than PMD_SIZE, allow using huge pages. Returns NULL on failure.
> + */
> +void *module_alloc_huge(unsigned long size);
> +
>  /* Free memory returned from module_alloc. */
>  void module_memfree(void *module_region);
>  
> diff --git a/kernel/module.c b/kernel/module.c
> index 6cea788fd965..b2c6cb682a7d 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2839,6 +2839,11 @@ void * __weak module_alloc(unsigned long size)
>  			NUMA_NO_NODE, __builtin_return_address(0));
>  }
>  
> +void * __weak module_alloc_huge(unsigned long size)
> +{
> +	return vmalloc_huge(size);
> +}

Umm.  This should use the same parameters as module_alloc except for
also passing the new huge page flag.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 bpf RESEND 3/4] module: introduce module_alloc_huge
  2022-04-15  6:32   ` Christoph Hellwig
@ 2022-04-15 15:59     ` Song Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Song Liu @ 2022-04-15 15:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Song Liu, bpf, linux-mm, linux-kernel, ast, daniel, Kernel Team,
	akpm, rick.p.edgecombe, imbrenda, mcgrof



> On Apr 14, 2022, at 11:32 PM, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Thu, Apr 14, 2022 at 12:59:13PM -0700, Song Liu wrote:
>> Introduce module_alloc_huge, which allocates huge page backed memory in
>> module memory space. The primary user of this memory is bpf_prog_pack
>> (multiple BPF programs sharing a huge page).
>> 
>> Signed-off-by: Song Liu <song@kernel.org>
>> ---
>> arch/x86/kernel/module.c | 21 +++++++++++++++++++++
>> include/linux/moduleloader.h | 5 +++++
>> kernel/module.c | 5 +++++
>> 3 files changed, 31 insertions(+)
>> 
>> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
>> index b98ffcf4d250..63f6a16c70dc 100644
>> --- a/arch/x86/kernel/module.c
>> +++ b/arch/x86/kernel/module.c
>> @@ -86,6 +86,27 @@ void *module_alloc(unsigned long size)
>> 	return p;
>> }
>> 
>> +void *module_alloc_huge(unsigned long size)
>> +{
>> +	gfp_t gfp_mask = GFP_KERNEL;
>> +	void *p;
>> +
>> +	if (PAGE_ALIGN(size) > MODULES_LEN)
>> +		return NULL;
>> +
>> +	p = __vmalloc_node_range(size, MODULE_ALIGN,
>> +				 MODULES_VADDR + get_module_load_offset(),
>> +				 MODULES_END, gfp_mask, PAGE_KERNEL,
>> +				 VM_DEFER_KMEMLEAK | VM_ALLOW_HUGE_VMAP,
>> +				 NUMA_NO_NODE, __builtin_return_address(0));
>> +	if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
>> +		vfree(p);
>> +		return NULL;
>> +	}
>> +
>> +	return p;
>> +}
>> +
>> #ifdef CONFIG_X86_32
>> int apply_relocate(Elf32_Shdr *sechdrs,
>> 		 const char *strtab,
>> diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
>> index 9e09d11ffe5b..d34743a88938 100644
>> --- a/include/linux/moduleloader.h
>> +++ b/include/linux/moduleloader.h
>> @@ -26,6 +26,11 @@ unsigned int arch_mod_section_prepend(struct module *mod, unsigned int section);
>> sections. Returns NULL on failure. */
>> void *module_alloc(unsigned long size);
>> 
>> +/* Allocator used for allocating memory in module memory space. If size is
>> + * greater than PMD_SIZE, allow using huge pages. Returns NULL on failure.
>> + */
>> +void *module_alloc_huge(unsigned long size);
>> +
>> /* Free memory returned from module_alloc. */
>> void module_memfree(void *module_region);
>> 
>> diff --git a/kernel/module.c b/kernel/module.c
>> index 6cea788fd965..b2c6cb682a7d 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -2839,6 +2839,11 @@ void * __weak module_alloc(unsigned long size)
>> 			NUMA_NO_NODE, __builtin_return_address(0));
>> }
>> 
>> +void * __weak module_alloc_huge(unsigned long size)
>> +{
>> +	return vmalloc_huge(size);
>> +}
> 
> Umm. This should use the same parameters as module_alloc except for
> also passing the new huge page flag.

Will fix the set and send v4. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 bpf RESEND 2/4] page_alloc: use __vmalloc_huge for large system hash
  2022-04-15  6:32   ` Christoph Hellwig
@ 2022-04-15 16:57     ` Song Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Song Liu @ 2022-04-15 16:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Song Liu, bpf, Linux Memory Management List, open list,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team, akpm,
	rick.p.edgecombe, imbrenda, mcgrof

Hi Christoph, 

> On Apr 14, 2022, at 11:32 PM, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Thu, Apr 14, 2022 at 12:59:12PM -0700, Song Liu wrote:
>> Use __vmalloc_huge() in alloc_large_system_hash() so that large system
>> hash (>= PMD_SIZE) could benefit from huge pages. Note that __vmalloc_huge
>> only allocates huge pages for systems with HAVE_ARCH_HUGE_VMALLOC.
> 
> Looks good (modulo the possible naming chane suggested in patch 1):
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks for your kind review!

Could you please share your thoughts on shipping the set with 5.18 (or 
whether we should postpone it)? 

AFAICT, the only changed behavior is to allow alloc_large_system_hash
return huge pages for size > PMD_SIZE on x86_64. I think this is 
relatively safe, as this is only for large hash and we are at rc2. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 bpf RESEND 3/4] module: introduce module_alloc_huge
  2022-04-14 21:31         ` Song Liu
@ 2022-04-15 19:03           ` Luis Chamberlain
  0 siblings, 0 replies; 15+ messages in thread
From: Luis Chamberlain @ 2022-04-15 19:03 UTC (permalink / raw)
  To: Song Liu
  Cc: Linus Torvalds, bpf, Linux-MM, open list, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Andrew Morton, Edgecombe, Rick P,
	Christoph Hellwig, imbrenda

On Thu, Apr 14, 2022 at 02:31:18PM -0700, Song Liu wrote:
> On Thu, Apr 14, 2022 at 2:11 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> >
> > On Thu, Apr 14, 2022 at 02:03:17PM -0700, Song Liu wrote:
> > > Hi Luis,
> > >
> > > On Thu, Apr 14, 2022 at 1:34 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > > >
> > > > On Thu, Apr 14, 2022 at 12:59:13PM -0700, Song Liu wrote:
> > > > > Introduce module_alloc_huge, which allocates huge page backed memory in
> > > > > module memory space. The primary user of this memory is bpf_prog_pack
> > > > > (multiple BPF programs sharing a huge page).
> > > > >
> > > > > Signed-off-by: Song Liu <song@kernel.org>
> > > >
> > > > See modules-next [0], as modules.c has been chopped up as of late.
> > > > So if you want this to go throug modules this will need to rebased
> > > > on that tree. fortunately the amount of code in question does not
> > > > seem like much.
> > > >
> > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next
> > >
> > > We are hoping to ship this with to 5.18, as the set addresses some issue with
> > > huge page backed vmalloc. I guess we cannot ship it via modules-next branch.
> > >
> >
> > Huh, you intend this to go in as a fix for v5.18 (already released) once
> > properly reviewed?  This seems quite large... for a fix.
> >
> > > How about we ship module_alloc_huge() to 5.18 in module.c for now, and once
> > > we update modules-next branch, I will send another patch to clean it up?
> >
> > I rather set the expectations right about getting such a large fix in
> > for v5.18. I haven't even sat down to review all the changes in light of
> > this, but a cursorary glance seems to me it's rather "large" for a fix.
> 
> Yes, I agree this is a little too big for a fix. I guess we can discuss whether
> some of the set need to wait until 5.19.

Doing a more thorough review of this now, and when the other changes
landed, it seems this is *large follow up fix* for an optimization for when tons
of JIT eBPF programs are used. It's so large I can't be confident this also
doesn't go in with other holes or issues, or that the other stuff merged
already also has some other issues. So I can't see anything screaming
for why this needs to go in for v5.18 other than it'd be nice.

So my preference is for this to go through v5.19 as I see no rush.

  Luis

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-04-15 19:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-14 19:59 [PATCH v3 bpf RESEND 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
2022-04-14 19:59 ` [PATCH v3 bpf RESEND 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
2022-04-15  6:31   ` Christoph Hellwig
2022-04-14 19:59 ` [PATCH v3 bpf RESEND 2/4] page_alloc: use __vmalloc_huge for large system hash Song Liu
2022-04-15  6:32   ` Christoph Hellwig
2022-04-15 16:57     ` Song Liu
2022-04-14 19:59 ` [PATCH v3 bpf RESEND 3/4] module: introduce module_alloc_huge Song Liu
2022-04-14 20:34   ` Luis Chamberlain
2022-04-14 21:03     ` Song Liu
2022-04-14 21:11       ` Luis Chamberlain
2022-04-14 21:31         ` Song Liu
2022-04-15 19:03           ` Luis Chamberlain
2022-04-15  6:32   ` Christoph Hellwig
2022-04-15 15:59     ` Song Liu
2022-04-14 19:59 ` [PATCH v3 bpf RESEND 4/4] bpf: use module_alloc_huge for bpf_prog_pack Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.