All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v1 0/5] vmalloc_exec for modules and BPF programs
@ 2022-10-31 21:58 Song Liu
  2022-10-31 21:58 ` [PATCH bpf-next v1 1/5] vmalloc: introduce vmalloc_exec, vfree_exec, and vcopy_exec Song Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Song Liu @ 2022-10-31 21:58 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: akpm, x86, peterz, hch, rick.p.edgecombe, dave.hansen, urezki,
	mcgrof, kernel-team, Song Liu

This set enables bpf programs and bpf dispatchers to share huge pages with
new API:
  vmalloc_exec()
  vfree_exec()
  vcopy_exec()

The idea is similar to Peter's suggestion in [1].

vmalloc_exec() manages a set of PMD_SIZE RO+X memory, and allocates these
memory to its users. vfree_exec() is used to free memory allocated by
vmalloc_exec(). vcopy_exec() is used to update memory allocated by
vmalloc_exec().

Memory allocated by vmalloc_exec() is RO+X, so this doesnot violate W^X.
The caller has to update the content with text_poke like mechanism.
Specifically, vcopy_exec() is provided to update memory allocated by
vmalloc_exec(). vcopy_exec() also makes sure the update stays in the
boundary of one chunk allocated by vmalloc_exec(). Please refer to patch
1/5 for more details of

Patch 3/5 uses these new APIs in bpf program and bpf dispatcher.

Patch 4/5 and 5/5 allows static kernel text (_stext to _etext) to share
PMD_SIZE pages with dynamic kernel text on x86_64. This is achieved by
allocating PMD_SIZE pages to roundup(_etext, PMD_SIZE), and then use
_etext to roundup(_etext, PMD_SIZE) for dynamic kernel text.

[1] https://lore.kernel.org/bpf/Ys6cWUMHO8XwyYgr@hirez.programming.kicks-ass.net/
[2] RFC v1: https://lore.kernel.org/linux-mm/20220818224218.2399791-3-song@kernel.org/T/

Changes RFC v2 => PATCH v1:
1. Add vcopy_exec(), which updates memory allocated by vmalloc_exec(). It
   also ensures vcopy_exec() is only used to update memory from one single
   vmalloc_exec() call. (Christoph Hellwig)
2. Add arch_vcopy_exec() and arch_invalidate_exec() as wrapper for the
   text_poke() like logic.
3. Drop changes for kernel modules and focus on BPF side changes.

Changes RFC v1 => RFC v2:
1. Major rewrite of the logic of vmalloc_exec and vfree_exec. They now
   work fine with BPF programs (patch 1, 2, 4). But module side (patch 3)
   still need some work.

Song Liu (5):
  vmalloc: introduce vmalloc_exec, vfree_exec, and vcopy_exec
  x86/alternative: support vmalloc_exec() and vfree_exec()
  bpf: use vmalloc_exec for bpf program and bpf dispatcher
  vmalloc: introduce register_text_tail_vm()
  x86: use register_text_tail_vm

 arch/x86/include/asm/pgtable_64_types.h |   1 +
 arch/x86/kernel/alternative.c           |  12 +
 arch/x86/mm/init_64.c                   |   4 +-
 arch/x86/net/bpf_jit_comp.c             |  23 +-
 include/linux/bpf.h                     |   3 -
 include/linux/filter.h                  |   5 -
 include/linux/vmalloc.h                 |   9 +
 kernel/bpf/core.c                       | 180 +-----------
 kernel/bpf/dispatcher.c                 |  11 +-
 mm/nommu.c                              |   7 +
 mm/vmalloc.c                            | 351 ++++++++++++++++++++++++
 11 files changed, 404 insertions(+), 202 deletions(-)

--
2.30.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH bpf-next v1 1/5] vmalloc: introduce vmalloc_exec, vfree_exec, and vcopy_exec
  2022-10-31 21:58 [PATCH bpf-next v1 0/5] vmalloc_exec for modules and BPF programs Song Liu
@ 2022-10-31 21:58 ` Song Liu
  2022-11-01 11:54   ` Uladzislau Rezki
  2022-10-31 21:58 ` [PATCH bpf-next v1 2/5] x86/alternative: support vmalloc_exec() and vfree_exec() Song Liu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Song Liu @ 2022-10-31 21:58 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: akpm, x86, peterz, hch, rick.p.edgecombe, dave.hansen, urezki,
	mcgrof, kernel-team, Song Liu

vmalloc_exec is used to allocate memory to host dynamic kernel text
(modules, BPF programs, etc.) with huge pages. This is similar to the
proposal by Peter in [1].

A new tree of vmap_area, free_text_area_* tree, is introduced in addition
to free_vmap_area_* and vmap_area_*. vmalloc_exec allocates pages from
free_text_area_*. When there isn't enough space left in free_text_area_*,
new PMD_SIZE page(s) is allocated from free_vmap_area_* and added to
free_text_area_*. To be more accurate, the vmap_area is first added to
vmap_area_* tree and then moved to free_text_area_*. This extra move
simplifies the logic of vmalloc_exec.

vmap_area in free_text_area_* tree are backed with memory, but we need
subtree_max_size for tree operations. Therefore, vm_struct for these
vmap_area are stored in a separate list, all_text_vm.

The new tree allows separate handling of < PAGE_SIZE allocations, as
current vmalloc code mostly assumes PAGE_SIZE aligned allocations. This
version of vmalloc_exec can handle bpf programs, which uses 64 byte
aligned allocations), and modules, which uses PAGE_SIZE aligned
allocations.

Memory allocated by vmalloc_exec() is set to RO+X before returning to the
caller. Therefore, the caller cannot write directly write to the memory.
Instead, the caller is required to use vcopy_exec() to update the memory.
For the safety and security of X memory, vcopy_exec() checks the data
being updated always in the memory allocated by one vmalloc_exec() call.
vcopy_exec() uses text_poke like mechanism and requires arch support.
Specifically, the arch need to implement arch_vcopy_exec().

In vfree_exec(), the memory is first erased with arch_invalidate_exec().
Then, the memory is added to free_text_area_*. If this free creates big
enough continuous free space (> PMD_SIZE), vfree_exec() will try to free
the backing vm_struct.

[1] https://lore.kernel.org/bpf/Ys6cWUMHO8XwyYgr@hirez.programming.kicks-ass.net/

Signed-off-by: Song Liu <song@kernel.org>
---
 include/linux/vmalloc.h |   5 +
 mm/nommu.c              |  12 ++
 mm/vmalloc.c            | 318 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 335 insertions(+)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 096d48aa3437..9b2042313c12 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -154,6 +154,11 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
 void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
 		int node, const void *caller) __alloc_size(1);
 void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
+void *vmalloc_exec(unsigned long size, unsigned long align) __alloc_size(1);
+void *vcopy_exec(void *dst, void *src, size_t len);
+void vfree_exec(void *addr);
+void *arch_vcopy_exec(void *dst, void *src, size_t len);
+int arch_invalidate_exec(void *ptr, size_t len);
 
 extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
 extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2);
diff --git a/mm/nommu.c b/mm/nommu.c
index 214c70e1d059..8a1317247ef0 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -371,6 +371,18 @@ int vm_map_pages_zero(struct vm_area_struct *vma, struct page **pages,
 }
 EXPORT_SYMBOL(vm_map_pages_zero);
 
+void *vmalloc_exec(unsigned long size, unsigned long align)
+{
+	return NULL;
+}
+
+void *vcopy_exec(void *dst, void *src, size_t len)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+
+void vfree_exec(const void *addr) { }
+
 /*
  *  sys_brk() for the most part doesn't need the global kernel
  *  lock, except when an application is doing something nasty
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ccaa461998f3..6f4c73e67191 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -72,6 +72,9 @@ early_param("nohugevmalloc", set_nohugevmalloc);
 static const bool vmap_allow_huge = false;
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMALLOC */
 
+#define PMD_ALIGN(addr) ALIGN(addr, PMD_SIZE)
+#define PMD_ALIGN_DOWN(addr) ALIGN_DOWN(addr, PMD_SIZE)
+
 bool is_vmalloc_addr(const void *x)
 {
 	unsigned long addr = (unsigned long)kasan_reset_tag(x);
@@ -769,6 +772,38 @@ static LIST_HEAD(free_vmap_area_list);
  */
 static struct rb_root free_vmap_area_root = RB_ROOT;
 
+/*
+ * free_text_area for vmalloc_exec()
+ */
+static DEFINE_SPINLOCK(free_text_area_lock);
+/*
+ * This linked list is used in pair with free_text_area_root.
+ * It gives O(1) access to prev/next to perform fast coalescing.
+ */
+static LIST_HEAD(free_text_area_list);
+
+/*
+ * This augment red-black tree represents the free text space.
+ * All vmap_area objects in this tree are sorted by va->va_start
+ * address. It is used for allocation and merging when a vmap
+ * object is released.
+ *
+ * Each vmap_area node contains a maximum available free block
+ * of its sub-tree, right or left. Therefore it is possible to
+ * find a lowest match of free area.
+ *
+ * vmap_area in this tree are backed by RO+X memory, but they do
+ * not have valid vm pointer (because we need subtree_max_size).
+ * The vm for these vmap_area are stored in all_text_vm.
+ */
+static struct rb_root free_text_area_root = RB_ROOT;
+
+/*
+ * List of vm_struct for free_text_area_root. This list is rarely
+ * accessed, so the O(N) complexity is not likely a real issue.
+ */
+struct vm_struct *all_text_vm;
+
 /*
  * Preload a CPU with one object for "no edge" split case. The
  * aim is to get rid of allocations from the atomic context, thus
@@ -3313,6 +3348,289 @@ void *vmalloc(unsigned long size)
 }
 EXPORT_SYMBOL(vmalloc);
 
+#if defined(CONFIG_MODULES) && defined(MODULES_VADDR)
+#define VMALLOC_EXEC_START MODULES_VADDR
+#define VMALLOC_EXEC_END MODULES_END
+#else
+#define VMALLOC_EXEC_START VMALLOC_START
+#define VMALLOC_EXEC_END VMALLOC_END
+#endif
+
+static void move_vmap_to_free_text_tree(void *addr)
+{
+	struct vmap_area *va;
+
+	/* remove from vmap_area_root */
+	spin_lock(&vmap_area_lock);
+	va = __find_vmap_area((unsigned long)addr, &vmap_area_root);
+	if (WARN_ON_ONCE(!va)) {
+		spin_unlock(&vmap_area_lock);
+		return;
+	}
+	unlink_va(va, &vmap_area_root);
+	spin_unlock(&vmap_area_lock);
+
+	/* make the memory RO+X */
+	memset(addr, 0, va->va_end - va->va_start);
+	set_memory_ro(va->va_start, (va->va_end - va->va_start) >> PAGE_SHIFT);
+	set_memory_x(va->va_start, (va->va_end - va->va_start) >> PAGE_SHIFT);
+
+	/* add to all_text_vm */
+	va->vm->next = all_text_vm;
+	all_text_vm = va->vm;
+
+	/* add to free_text_area_root */
+	spin_lock(&free_text_area_lock);
+	merge_or_add_vmap_area_augment(va, &free_text_area_root, &free_text_area_list);
+	spin_unlock(&free_text_area_lock);
+}
+
+/**
+ * vmalloc_exec - allocate virtually contiguous RO+X memory
+ * @size:    allocation size
+ *
+ * This is used to allocate dynamic kernel text, such as module text, BPF
+ * programs, etc. User need to use text_poke to update the memory allocated
+ * by vmalloc_exec.
+ *
+ * Return: pointer to the allocated memory or %NULL on error
+ */
+void *vmalloc_exec(unsigned long size, unsigned long align)
+{
+	struct vmap_area *va, *tmp;
+	unsigned long addr;
+	enum fit_type type;
+	int ret;
+
+	va = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, NUMA_NO_NODE);
+	if (unlikely(!va))
+		return NULL;
+
+again:
+	preload_this_cpu_lock(&free_text_area_lock, GFP_KERNEL, NUMA_NO_NODE);
+	tmp = find_vmap_lowest_match(&free_text_area_root, size, align, 1, false);
+
+	if (!tmp) {
+		unsigned long alloc_size;
+		void *ptr;
+
+		spin_unlock(&free_text_area_lock);
+
+		/*
+		 * Not enough continuous space in free_text_area_root, try
+		 * allocate more memory. The memory is first added to
+		 * vmap_area_root, and then moved to free_text_area_root.
+		 */
+		alloc_size = roundup(size, PMD_SIZE * num_online_nodes());
+		ptr = __vmalloc_node_range(alloc_size, PMD_SIZE, VMALLOC_EXEC_START,
+					   VMALLOC_EXEC_END, GFP_KERNEL, PAGE_KERNEL,
+					   VM_ALLOW_HUGE_VMAP | VM_NO_GUARD,
+					   NUMA_NO_NODE, __builtin_return_address(0));
+		if (unlikely(!ptr))
+			goto err_out;
+
+		move_vmap_to_free_text_tree(ptr);
+		goto again;
+	}
+
+	addr = roundup(tmp->va_start, align);
+	type = classify_va_fit_type(tmp, addr, size);
+	if (WARN_ON_ONCE(type == NOTHING_FIT))
+		goto err_out;
+
+	ret = adjust_va_to_fit_type(&free_text_area_root, &free_text_area_list,
+				    tmp, addr, size);
+	if (ret)
+		goto err_out;
+
+	spin_unlock(&free_text_area_lock);
+
+	va->va_start = addr;
+	va->va_end = addr + size;
+	va->vm = NULL;
+
+	spin_lock(&vmap_area_lock);
+	insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
+	spin_unlock(&vmap_area_lock);
+
+	return (void *)addr;
+
+err_out:
+	spin_unlock(&free_text_area_lock);
+	kmem_cache_free(vmap_area_cachep, va);
+	return NULL;
+}
+
+void __weak *arch_vcopy_exec(void *dst, void *src, size_t len)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+
+int __weak arch_invalidate_exec(void *ptr, size_t len)
+{
+	return -EOPNOTSUPP;
+}
+
+/**
+ * vcopy_exec - Copy text to RO+X memory allocated by vmalloc_exec()
+ * @dst:  pointer to memory allocated by vmalloc_exec()
+ * @src:  pointer to data being copied from
+ * @len:  number of bytes to be copied
+ *
+ * vcopy_exec() will only update memory allocated by a single vcopy_exec()
+ * call. If dst + len goes beyond the boundary of one allocation,
+ * vcopy_exec() is aborted.
+ *
+ * If @addr is NULL, no operation is performed.
+ */
+void *vcopy_exec(void *dst, void *src, size_t len)
+{
+	struct vmap_area *va;
+
+	spin_lock(&vmap_area_lock);
+	va = __find_vmap_area((unsigned long)dst, &vmap_area_root);
+
+	/*
+	 * If no va, or va has a vm attached, this memory is not allocated
+	 * by vmalloc_exec().
+	 */
+	if (WARN_ON_ONCE(!va) || WARN_ON_ONCE(va->vm))
+		goto err_out;
+	if (WARN_ON_ONCE((unsigned long)dst + len > va->va_end))
+		goto err_out;
+
+	spin_unlock(&vmap_area_lock);
+
+	return arch_vcopy_exec(dst, src, len);
+
+err_out:
+	spin_unlock(&vmap_area_lock);
+	return ERR_PTR(-EINVAL);
+}
+
+static struct vm_struct *find_and_unlink_text_vm(unsigned long start, unsigned long end)
+{
+	struct vm_struct *vm, *prev_vm;
+
+	lockdep_assert_held(&free_text_area_lock);
+
+	vm = all_text_vm;
+	while (vm) {
+		unsigned long vm_addr = (unsigned long)vm->addr;
+
+		/* vm is within this free space, we can free it */
+		if ((vm_addr >= start) && ((vm_addr + vm->size) <= end))
+			goto unlink_vm;
+		vm = vm->next;
+	}
+	return NULL;
+
+unlink_vm:
+	if (all_text_vm == vm) {
+		all_text_vm = vm->next;
+	} else {
+		prev_vm = all_text_vm;
+		while (prev_vm->next != vm)
+			prev_vm = prev_vm->next;
+		prev_vm = vm->next;
+	}
+	return vm;
+}
+
+/**
+ * vfree_exec - Release memory allocated by vmalloc_exec()
+ * @addr:  Memory base address
+ *
+ * If @addr is NULL, no operation is performed.
+ */
+void vfree_exec(void *addr)
+{
+	unsigned long free_start, free_end, free_addr;
+	struct vm_struct *vm;
+	struct vmap_area *va;
+
+	might_sleep();
+
+	if (!addr)
+		return;
+
+	spin_lock(&vmap_area_lock);
+	va = __find_vmap_area((unsigned long)addr, &vmap_area_root);
+	if (WARN_ON_ONCE(!va)) {
+		spin_unlock(&vmap_area_lock);
+		return;
+	}
+	WARN_ON_ONCE(va->vm);
+
+	unlink_va(va, &vmap_area_root);
+	spin_unlock(&vmap_area_lock);
+
+	/* Invalidate text in the region */
+	arch_invalidate_exec(addr, va->va_end - va->va_start);
+
+	spin_lock(&free_text_area_lock);
+	va = merge_or_add_vmap_area_augment(va,
+		&free_text_area_root, &free_text_area_list);
+
+	if (WARN_ON_ONCE(!va))
+		goto out;
+
+	free_start = PMD_ALIGN(va->va_start);
+	free_end = PMD_ALIGN_DOWN(va->va_end);
+
+	/*
+	 * Only try to free vm when there is at least one PMD_SIZE free
+	 * continuous memory.
+	 */
+	if (free_start >= free_end)
+		goto out;
+
+	/*
+	 * TODO: It is possible that multiple vm are ready to be freed
+	 * after one vfree_exec(). But we free at most one vm for now.
+	 */
+	vm = find_and_unlink_text_vm(free_start, free_end);
+	if (!vm)
+		goto out;
+
+	va = kmem_cache_alloc_node(vmap_area_cachep, GFP_ATOMIC, NUMA_NO_NODE);
+	if (unlikely(!va))
+		goto out_save_vm;
+
+	free_addr = __alloc_vmap_area(&free_text_area_root, &free_text_area_list,
+				      vm->size, 1, (unsigned long)vm->addr,
+				      (unsigned long)vm->addr + vm->size);
+
+	if (WARN_ON_ONCE(free_addr != (unsigned long)vm->addr))
+		goto out_save_vm;
+
+	va->va_start = (unsigned long)vm->addr;
+	va->va_end = va->va_start + vm->size;
+	va->vm = vm;
+	spin_unlock(&free_text_area_lock);
+
+	set_memory_nx(va->va_start, vm->size >> PAGE_SHIFT);
+	set_memory_rw(va->va_start, vm->size >> PAGE_SHIFT);
+
+	/* put the va to vmap_area_root, and then free it with vfree */
+	spin_lock(&vmap_area_lock);
+	insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
+	spin_unlock(&vmap_area_lock);
+
+	vfree(vm->addr);
+	return;
+
+out_save_vm:
+	/*
+	 * vm is removed from all_text_vm, but not freed. Add it back,
+	 * so that we can use or free it later.
+	 */
+	vm->next = all_text_vm;
+	all_text_vm = vm;
+out:
+	spin_unlock(&free_text_area_lock);
+}
+
 /**
  * vmalloc_huge - allocate virtually contiguous memory, allow huge pages
  * @size:      allocation size
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf-next v1 2/5] x86/alternative: support vmalloc_exec() and vfree_exec()
  2022-10-31 21:58 [PATCH bpf-next v1 0/5] vmalloc_exec for modules and BPF programs Song Liu
  2022-10-31 21:58 ` [PATCH bpf-next v1 1/5] vmalloc: introduce vmalloc_exec, vfree_exec, and vcopy_exec Song Liu
@ 2022-10-31 21:58 ` Song Liu
  2022-10-31 21:58 ` [PATCH bpf-next v1 3/5] bpf: use vmalloc_exec for bpf program and bpf dispatcher Song Liu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Song Liu @ 2022-10-31 21:58 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: akpm, x86, peterz, hch, rick.p.edgecombe, dave.hansen, urezki,
	mcgrof, kernel-team, Song Liu

Implement arch_vcopy_exec() and arch_invalidate_exec() to support
vmalloc_exec.

arch_vcopy_exec() copies dynamic kernel text (such as BPF programs) to
RO+X memory region allocated by vmalloc_exec().

arch_invalidate_exec() fills memory with 0xcc after it is released by
vfree_exec().

Signed-off-by: Song Liu <song@kernel.org>
---
 arch/x86/kernel/alternative.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5cadcea035e0..73d89774ace3 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1270,6 +1270,18 @@ void *text_poke_copy(void *addr, const void *opcode, size_t len)
 	return addr;
 }
 
+void *arch_vcopy_exec(void *dst, void *src, size_t len)
+{
+	if (text_poke_copy(dst, src, len) == NULL)
+		return ERR_PTR(-EINVAL);
+	return dst;
+}
+
+int arch_invalidate_exec(void *ptr, size_t len)
+{
+	return IS_ERR_OR_NULL(text_poke_set(ptr, 0xcc, len));
+}
+
 /**
  * text_poke_set - memset into (an unused part of) RX memory
  * @addr: address to modify
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf-next v1 3/5] bpf: use vmalloc_exec for bpf program and bpf dispatcher
  2022-10-31 21:58 [PATCH bpf-next v1 0/5] vmalloc_exec for modules and BPF programs Song Liu
  2022-10-31 21:58 ` [PATCH bpf-next v1 1/5] vmalloc: introduce vmalloc_exec, vfree_exec, and vcopy_exec Song Liu
  2022-10-31 21:58 ` [PATCH bpf-next v1 2/5] x86/alternative: support vmalloc_exec() and vfree_exec() Song Liu
@ 2022-10-31 21:58 ` Song Liu
  2022-10-31 21:58 ` [PATCH bpf-next v1 4/5] vmalloc: introduce register_text_tail_vm() Song Liu
  2022-10-31 21:58 ` [PATCH bpf-next v1 5/5] x86: use register_text_tail_vm Song Liu
  4 siblings, 0 replies; 8+ messages in thread
From: Song Liu @ 2022-10-31 21:58 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: akpm, x86, peterz, hch, rick.p.edgecombe, dave.hansen, urezki,
	mcgrof, kernel-team, Song Liu

Use vmalloc_exec, vfree_exec, and vcopy_exec instead of
bpf_prog_pack_alloc, bpf_prog_pack_free, and bpf_arch_text_copy.

vfree_exec doesn't require extra size information. Therefore, the free
and error handling path can be simplified.

Signed-off-by: Song Liu <song@kernel.org>
---
 arch/x86/net/bpf_jit_comp.c |  23 +----
 include/linux/bpf.h         |   3 -
 include/linux/filter.h      |   5 -
 kernel/bpf/core.c           | 180 +++---------------------------------
 kernel/bpf/dispatcher.c     |  11 +--
 5 files changed, 21 insertions(+), 201 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 36ffe67ad6e5..3db316b2f3a7 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -228,11 +228,6 @@ static void jit_fill_hole(void *area, unsigned int size)
 	memset(area, 0xcc, size);
 }
 
-int bpf_arch_text_invalidate(void *dst, size_t len)
-{
-	return IS_ERR_OR_NULL(text_poke_set(dst, 0xcc, len));
-}
-
 struct jit_context {
 	int cleanup_addr; /* Epilogue code offset */
 
@@ -2496,11 +2491,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 		if (proglen <= 0) {
 out_image:
 			image = NULL;
-			if (header) {
-				bpf_arch_text_copy(&header->size, &rw_header->size,
-						   sizeof(rw_header->size));
+			if (header)
 				bpf_jit_binary_pack_free(header, rw_header);
-			}
+
 			/* Fall back to interpreter mode */
 			prog = orig_prog;
 			if (extra_pass) {
@@ -2550,8 +2543,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 		if (!prog->is_func || extra_pass) {
 			/*
 			 * bpf_jit_binary_pack_finalize fails in two scenarios:
-			 *   1) header is not pointing to proper module memory;
-			 *   2) the arch doesn't support bpf_arch_text_copy().
+			 *   1) header is not pointing to memory allocated by
+			 *      vmalloc_exec;
+			 *   2) the arch doesn't support vcopy_exec().
 			 *
 			 * Both cases are serious bugs and justify WARN_ON.
 			 */
@@ -2597,13 +2591,6 @@ bool bpf_jit_supports_kfunc_call(void)
 	return true;
 }
 
-void *bpf_arch_text_copy(void *dst, void *src, size_t len)
-{
-	if (text_poke_copy(dst, src, len) == NULL)
-		return ERR_PTR(-EINVAL);
-	return dst;
-}
-
 /* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
 bool bpf_jit_supports_subprog_tailcalls(void)
 {
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9fd68b0b3e9c..1d42f58334d0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2654,9 +2654,6 @@ enum bpf_text_poke_type {
 int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 		       void *addr1, void *addr2);
 
-void *bpf_arch_text_copy(void *dst, void *src, size_t len);
-int bpf_arch_text_invalidate(void *dst, size_t len);
-
 struct btf_id_set;
 bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
 
diff --git a/include/linux/filter.h b/include/linux/filter.h
index efc42a6e3aed..98e28126c24b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1023,8 +1023,6 @@ extern long bpf_jit_limit_max;
 
 typedef void (*bpf_jit_fill_hole_t)(void *area, unsigned int size);
 
-void bpf_jit_fill_hole_with_zero(void *area, unsigned int size);
-
 struct bpf_binary_header *
 bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
 		     unsigned int alignment,
@@ -1037,9 +1035,6 @@ void bpf_jit_free(struct bpf_prog *fp);
 struct bpf_binary_header *
 bpf_jit_binary_pack_hdr(const struct bpf_prog *fp);
 
-void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns);
-void bpf_prog_pack_free(struct bpf_binary_header *hdr);
-
 static inline bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
 {
 	return list_empty(&fp->aux->ksym.lnode) ||
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index a0e762a2bf97..ca722078697b 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -806,149 +806,6 @@ int bpf_jit_add_poke_descriptor(struct bpf_prog *prog,
 	return slot;
 }
 
-/*
- * BPF program pack allocator.
- *
- * Most BPF programs are pretty small. Allocating a hole page for each
- * program is sometime a waste. Many small bpf program also adds pressure
- * to instruction TLB. To solve this issue, we introduce a BPF program pack
- * allocator. The prog_pack allocator uses HPAGE_PMD_SIZE page (2MB on x86)
- * to host BPF programs.
- */
-#define BPF_PROG_CHUNK_SHIFT	6
-#define BPF_PROG_CHUNK_SIZE	(1 << BPF_PROG_CHUNK_SHIFT)
-#define BPF_PROG_CHUNK_MASK	(~(BPF_PROG_CHUNK_SIZE - 1))
-
-struct bpf_prog_pack {
-	struct list_head list;
-	void *ptr;
-	unsigned long bitmap[];
-};
-
-void bpf_jit_fill_hole_with_zero(void *area, unsigned int size)
-{
-	memset(area, 0, size);
-}
-
-#define BPF_PROG_SIZE_TO_NBITS(size)	(round_up(size, BPF_PROG_CHUNK_SIZE) / BPF_PROG_CHUNK_SIZE)
-
-static DEFINE_MUTEX(pack_mutex);
-static LIST_HEAD(pack_list);
-
-/* PMD_SIZE is not available in some special config, e.g. ARCH=arm with
- * CONFIG_MMU=n. Use PAGE_SIZE in these cases.
- */
-#ifdef PMD_SIZE
-#define BPF_PROG_PACK_SIZE (PMD_SIZE * num_possible_nodes())
-#else
-#define BPF_PROG_PACK_SIZE PAGE_SIZE
-#endif
-
-#define BPF_PROG_CHUNK_COUNT (BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE)
-
-static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_insns)
-{
-	struct bpf_prog_pack *pack;
-
-	pack = kzalloc(struct_size(pack, bitmap, BITS_TO_LONGS(BPF_PROG_CHUNK_COUNT)),
-		       GFP_KERNEL);
-	if (!pack)
-		return NULL;
-	pack->ptr = module_alloc(BPF_PROG_PACK_SIZE);
-	if (!pack->ptr) {
-		kfree(pack);
-		return NULL;
-	}
-	bpf_fill_ill_insns(pack->ptr, BPF_PROG_PACK_SIZE);
-	bitmap_zero(pack->bitmap, BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE);
-	list_add_tail(&pack->list, &pack_list);
-
-	set_vm_flush_reset_perms(pack->ptr);
-	set_memory_ro((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
-	set_memory_x((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
-	return pack;
-}
-
-void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns)
-{
-	unsigned int nbits = BPF_PROG_SIZE_TO_NBITS(size);
-	struct bpf_prog_pack *pack;
-	unsigned long pos;
-	void *ptr = NULL;
-
-	mutex_lock(&pack_mutex);
-	if (size > BPF_PROG_PACK_SIZE) {
-		size = round_up(size, PAGE_SIZE);
-		ptr = module_alloc(size);
-		if (ptr) {
-			bpf_fill_ill_insns(ptr, size);
-			set_vm_flush_reset_perms(ptr);
-			set_memory_ro((unsigned long)ptr, size / PAGE_SIZE);
-			set_memory_x((unsigned long)ptr, size / PAGE_SIZE);
-		}
-		goto out;
-	}
-	list_for_each_entry(pack, &pack_list, list) {
-		pos = bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0,
-						 nbits, 0);
-		if (pos < BPF_PROG_CHUNK_COUNT)
-			goto found_free_area;
-	}
-
-	pack = alloc_new_pack(bpf_fill_ill_insns);
-	if (!pack)
-		goto out;
-
-	pos = 0;
-
-found_free_area:
-	bitmap_set(pack->bitmap, pos, nbits);
-	ptr = (void *)(pack->ptr) + (pos << BPF_PROG_CHUNK_SHIFT);
-
-out:
-	mutex_unlock(&pack_mutex);
-	return ptr;
-}
-
-void bpf_prog_pack_free(struct bpf_binary_header *hdr)
-{
-	struct bpf_prog_pack *pack = NULL, *tmp;
-	unsigned int nbits;
-	unsigned long pos;
-
-	mutex_lock(&pack_mutex);
-	if (hdr->size > BPF_PROG_PACK_SIZE) {
-		module_memfree(hdr);
-		goto out;
-	}
-
-	list_for_each_entry(tmp, &pack_list, list) {
-		if ((void *)hdr >= tmp->ptr && (tmp->ptr + BPF_PROG_PACK_SIZE) > (void *)hdr) {
-			pack = tmp;
-			break;
-		}
-	}
-
-	if (WARN_ONCE(!pack, "bpf_prog_pack bug\n"))
-		goto out;
-
-	nbits = BPF_PROG_SIZE_TO_NBITS(hdr->size);
-	pos = ((unsigned long)hdr - (unsigned long)pack->ptr) >> BPF_PROG_CHUNK_SHIFT;
-
-	WARN_ONCE(bpf_arch_text_invalidate(hdr, hdr->size),
-		  "bpf_prog_pack bug: missing bpf_arch_text_invalidate?\n");
-
-	bitmap_clear(pack->bitmap, pos, nbits);
-	if (bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0,
-				       BPF_PROG_CHUNK_COUNT, 0) == 0) {
-		list_del(&pack->list);
-		module_memfree(pack->ptr);
-		kfree(pack);
-	}
-out:
-	mutex_unlock(&pack_mutex);
-}
-
 static atomic_long_t bpf_jit_current;
 
 /* Can be overridden by an arch's JIT compiler if it has a custom,
@@ -1048,6 +905,9 @@ void bpf_jit_binary_free(struct bpf_binary_header *hdr)
 	bpf_jit_uncharge_modmem(size);
 }
 
+#define BPF_PROG_EXEC_ALIGN	64
+#define BPF_PROG_EXEC_MASK	(~(BPF_PROG_EXEC_ALIGN - 1))
+
 /* Allocate jit binary from bpf_prog_pack allocator.
  * Since the allocated memory is RO+X, the JIT engine cannot write directly
  * to the memory. To solve this problem, a RW buffer is also allocated at
@@ -1070,11 +930,11 @@ bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **image_ptr,
 		     alignment > BPF_IMAGE_ALIGNMENT);
 
 	/* add 16 bytes for a random section of illegal instructions */
-	size = round_up(proglen + sizeof(*ro_header) + 16, BPF_PROG_CHUNK_SIZE);
+	size = round_up(proglen + sizeof(*ro_header) + 16, BPF_PROG_EXEC_ALIGN);
 
 	if (bpf_jit_charge_modmem(size))
 		return NULL;
-	ro_header = bpf_prog_pack_alloc(size, bpf_fill_ill_insns);
+	ro_header = vmalloc_exec(size, BPF_PROG_EXEC_ALIGN);
 	if (!ro_header) {
 		bpf_jit_uncharge_modmem(size);
 		return NULL;
@@ -1082,8 +942,7 @@ bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **image_ptr,
 
 	*rw_header = kvmalloc(size, GFP_KERNEL);
 	if (!*rw_header) {
-		bpf_arch_text_copy(&ro_header->size, &size, sizeof(size));
-		bpf_prog_pack_free(ro_header);
+		vfree_exec(ro_header);
 		bpf_jit_uncharge_modmem(size);
 		return NULL;
 	}
@@ -1093,7 +952,7 @@ bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **image_ptr,
 	(*rw_header)->size = size;
 
 	hole = min_t(unsigned int, size - (proglen + sizeof(*ro_header)),
-		     BPF_PROG_CHUNK_SIZE - sizeof(*ro_header));
+		     BPF_PROG_EXEC_ALIGN - sizeof(*ro_header));
 	start = (get_random_int() % hole) & ~(alignment - 1);
 
 	*image_ptr = &ro_header->image[start];
@@ -1109,12 +968,12 @@ int bpf_jit_binary_pack_finalize(struct bpf_prog *prog,
 {
 	void *ptr;
 
-	ptr = bpf_arch_text_copy(ro_header, rw_header, rw_header->size);
+	ptr = vcopy_exec(ro_header, rw_header, rw_header->size);
 
 	kvfree(rw_header);
 
 	if (IS_ERR(ptr)) {
-		bpf_prog_pack_free(ro_header);
+		vfree_exec(ro_header);
 		return PTR_ERR(ptr);
 	}
 	return 0;
@@ -1124,18 +983,13 @@ int bpf_jit_binary_pack_finalize(struct bpf_prog *prog,
  *   1) when the program is freed after;
  *   2) when the JIT engine fails (before bpf_jit_binary_pack_finalize).
  * For case 2), we need to free both the RO memory and the RW buffer.
- *
- * bpf_jit_binary_pack_free requires proper ro_header->size. However,
- * bpf_jit_binary_pack_alloc does not set it. Therefore, ro_header->size
- * must be set with either bpf_jit_binary_pack_finalize (normal path) or
- * bpf_arch_text_copy (when jit fails).
  */
 void bpf_jit_binary_pack_free(struct bpf_binary_header *ro_header,
 			      struct bpf_binary_header *rw_header)
 {
-	u32 size = ro_header->size;
+	u32 size = rw_header ? rw_header->size : ro_header->size;
 
-	bpf_prog_pack_free(ro_header);
+	vfree_exec(ro_header);
 	kvfree(rw_header);
 	bpf_jit_uncharge_modmem(size);
 }
@@ -1146,7 +1000,7 @@ bpf_jit_binary_pack_hdr(const struct bpf_prog *fp)
 	unsigned long real_start = (unsigned long)fp->bpf_func;
 	unsigned long addr;
 
-	addr = real_start & BPF_PROG_CHUNK_MASK;
+	addr = real_start & BPF_PROG_EXEC_MASK;
 	return (void *)addr;
 }
 
@@ -2736,16 +2590,6 @@ int __weak bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 	return -ENOTSUPP;
 }
 
-void * __weak bpf_arch_text_copy(void *dst, void *src, size_t len)
-{
-	return ERR_PTR(-ENOTSUPP);
-}
-
-int __weak bpf_arch_text_invalidate(void *dst, size_t len)
-{
-	return -ENOTSUPP;
-}
-
 DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
 EXPORT_SYMBOL(bpf_stats_enabled_key);
 
diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
index fa64b80b8bca..7ca8138fb8fa 100644
--- a/kernel/bpf/dispatcher.c
+++ b/kernel/bpf/dispatcher.c
@@ -120,11 +120,11 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
 	tmp = d->num_progs ? d->rw_image + noff : NULL;
 	if (new) {
 		/* Prepare the dispatcher in d->rw_image. Then use
-		 * bpf_arch_text_copy to update d->image, which is RO+X.
+		 * vcopy_exec to update d->image, which is RO+X.
 		 */
 		if (bpf_dispatcher_prepare(d, new, tmp))
 			return;
-		if (IS_ERR(bpf_arch_text_copy(new, tmp, PAGE_SIZE / 2)))
+		if (IS_ERR(vcopy_exec(new, tmp, PAGE_SIZE / 2)))
 			return;
 	}
 
@@ -146,15 +146,12 @@ void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
 
 	mutex_lock(&d->mutex);
 	if (!d->image) {
-		d->image = bpf_prog_pack_alloc(PAGE_SIZE, bpf_jit_fill_hole_with_zero);
+		d->image = vmalloc_exec(PAGE_SIZE, PAGE_SIZE /* align */);
 		if (!d->image)
 			goto out;
 		d->rw_image = bpf_jit_alloc_exec(PAGE_SIZE);
 		if (!d->rw_image) {
-			u32 size = PAGE_SIZE;
-
-			bpf_arch_text_copy(d->image, &size, sizeof(size));
-			bpf_prog_pack_free((struct bpf_binary_header *)d->image);
+			vfree_exec((struct bpf_binary_header *)d->image);
 			d->image = NULL;
 			goto out;
 		}
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf-next v1 4/5] vmalloc: introduce register_text_tail_vm()
  2022-10-31 21:58 [PATCH bpf-next v1 0/5] vmalloc_exec for modules and BPF programs Song Liu
                   ` (2 preceding siblings ...)
  2022-10-31 21:58 ` [PATCH bpf-next v1 3/5] bpf: use vmalloc_exec for bpf program and bpf dispatcher Song Liu
@ 2022-10-31 21:58 ` Song Liu
  2022-10-31 21:58 ` [PATCH bpf-next v1 5/5] x86: use register_text_tail_vm Song Liu
  4 siblings, 0 replies; 8+ messages in thread
From: Song Liu @ 2022-10-31 21:58 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: akpm, x86, peterz, hch, rick.p.edgecombe, dave.hansen, urezki,
	mcgrof, kernel-team, Song Liu

Allow arch code to register some memory to be used by vmalloc_exec().
One possible use case is to allocate PMD pages for kernl text up to
PMD_ALIGN(_etext), and use (_etext, PMD_ALIGN(_etext)) for
vmalloc_exec. Currently, only one such region is supported.

Signed-off-by: Song Liu <song@kernel.org>
---
 mm/vmalloc.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6f4c73e67191..46f2b7e56670 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -75,6 +75,9 @@ static const bool vmap_allow_huge = false;
 #define PMD_ALIGN(addr) ALIGN(addr, PMD_SIZE)
 #define PMD_ALIGN_DOWN(addr) ALIGN_DOWN(addr, PMD_SIZE)
 
+static struct vm_struct text_tail_vm;
+static struct vmap_area text_tail_va;
+
 bool is_vmalloc_addr(const void *x)
 {
 	unsigned long addr = (unsigned long)kasan_reset_tag(x);
@@ -653,6 +656,8 @@ int is_vmalloc_or_module_addr(const void *x)
 	unsigned long addr = (unsigned long)kasan_reset_tag(x);
 	if (addr >= MODULES_VADDR && addr < MODULES_END)
 		return 1;
+	if (addr >= text_tail_va.va_start && addr < text_tail_va.va_end)
+		return 1;
 #endif
 	return is_vmalloc_addr(x);
 }
@@ -2437,6 +2442,34 @@ static void vmap_init_free_space(void)
 	}
 }
 
+/*
+ * register_text_tail_vm() allows arch code to register memory regions
+ * for vmalloc_exec. Unlike regular memory regions used by vmalloc_exec,
+ * this region is never freed by vfree_exec.
+ *
+ * One possible use case is to allocate PMD pages for kernl text up to
+ * PMD_ALIGN(_etext), and use (_etext, PMD_ALIGN(_etext)) for vmalloc_exec.
+ */
+void register_text_tail_vm(unsigned long start, unsigned long end)
+{
+	struct vmap_area *va;
+
+	/* only support one region */
+	if (WARN_ON_ONCE(text_tail_vm.addr))
+		return;
+
+	va = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
+	if (WARN_ON_ONCE(!va))
+		return;
+	text_tail_vm.addr = (void *)start;
+	text_tail_vm.size = end - start;
+	text_tail_va.va_start = start;
+	text_tail_va.va_end = end;
+	text_tail_va.vm = &text_tail_vm;
+	memcpy(va, &text_tail_va, sizeof(*va));
+	insert_vmap_area_augment(va, NULL, &free_text_area_root, &free_text_area_list);
+}
+
 void __init vmalloc_init(void)
 {
 	struct vmap_area *va;
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf-next v1 5/5] x86: use register_text_tail_vm
  2022-10-31 21:58 [PATCH bpf-next v1 0/5] vmalloc_exec for modules and BPF programs Song Liu
                   ` (3 preceding siblings ...)
  2022-10-31 21:58 ` [PATCH bpf-next v1 4/5] vmalloc: introduce register_text_tail_vm() Song Liu
@ 2022-10-31 21:58 ` Song Liu
  4 siblings, 0 replies; 8+ messages in thread
From: Song Liu @ 2022-10-31 21:58 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: akpm, x86, peterz, hch, rick.p.edgecombe, dave.hansen, urezki,
	mcgrof, kernel-team, Song Liu

Allocate 2MB pages up to round_up(_etext, 2MB), and register memory
[round_up(_etext, 4kb), round_up(_etext, 2MB)] with register_text_tail_vm
so that we can use this part of memory for dynamic kernel text (BPF
programs, etc.).

Here is an example:

[root@eth50-1 ~]# grep _etext /proc/kallsyms
ffffffff82202a08 T _etext

[root@eth50-1 ~]# grep bpf_prog_ /proc/kallsyms  | tail -n 3
ffffffff8220f920 t bpf_prog_cc61a5364ac11d93_handle__sched_wakeup       [bpf]
ffffffff8220fa28 t bpf_prog_cc61a5364ac11d93_handle__sched_wakeup_new   [bpf]
ffffffff8220fad4 t bpf_prog_3bf73fa16f5e3d92_handle__sched_switch       [bpf]

[root@eth50-1 ~]#  grep 0xffffffff82200000 /sys/kernel/debug/page_tables/kernel
0xffffffff82200000-0xffffffff82400000     2M     ro   PSE         x  pmd

ffffffff82200000-ffffffff82400000 is a 2MB page, serving kernel text, and
bpf programs.

Signed-off-by: Song Liu <song@kernel.org>
---
 arch/x86/include/asm/pgtable_64_types.h | 1 +
 arch/x86/mm/init_64.c                   | 4 +++-
 include/linux/vmalloc.h                 | 4 ++++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 04f36063ad54..c0f9cceb109a 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -101,6 +101,7 @@ extern unsigned int ptrs_per_p4d;
 #define PUD_MASK	(~(PUD_SIZE - 1))
 #define PGDIR_SIZE	(_AC(1, UL) << PGDIR_SHIFT)
 #define PGDIR_MASK	(~(PGDIR_SIZE - 1))
+#define PMD_ALIGN(x)	(((unsigned long)(x) + (PMD_SIZE - 1)) & PMD_MASK)
 
 /*
  * See Documentation/x86/x86_64/mm.rst for a description of the memory map.
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3f040c6e5d13..5b42fc0c6099 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1373,7 +1373,7 @@ void mark_rodata_ro(void)
 	unsigned long start = PFN_ALIGN(_text);
 	unsigned long rodata_start = PFN_ALIGN(__start_rodata);
 	unsigned long end = (unsigned long)__end_rodata_hpage_align;
-	unsigned long text_end = PFN_ALIGN(_etext);
+	unsigned long text_end = PMD_ALIGN(_etext);
 	unsigned long rodata_end = PFN_ALIGN(__end_rodata);
 	unsigned long all_end;
 
@@ -1414,6 +1414,8 @@ void mark_rodata_ro(void)
 				(void *)rodata_end, (void *)_sdata);
 
 	debug_checkwx();
+	register_text_tail_vm(PFN_ALIGN((unsigned long)_etext),
+			      PMD_ALIGN((unsigned long)_etext));
 }
 
 int kern_addr_valid(unsigned long addr)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 9b2042313c12..7365cf9c4e7f 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -132,11 +132,15 @@ extern void vm_unmap_aliases(void);
 #ifdef CONFIG_MMU
 extern void __init vmalloc_init(void);
 extern unsigned long vmalloc_nr_pages(void);
+void register_text_tail_vm(unsigned long start, unsigned long end);
 #else
 static inline void vmalloc_init(void)
 {
 }
 static inline unsigned long vmalloc_nr_pages(void) { return 0; }
+void register_text_tail_vm(unsigned long start, unsigned long end)
+{
+}
 #endif
 
 extern void *vmalloc(unsigned long size) __alloc_size(1);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next v1 1/5] vmalloc: introduce vmalloc_exec, vfree_exec, and vcopy_exec
  2022-10-31 21:58 ` [PATCH bpf-next v1 1/5] vmalloc: introduce vmalloc_exec, vfree_exec, and vcopy_exec Song Liu
@ 2022-11-01 11:54   ` Uladzislau Rezki
  2022-11-01 15:06     ` Song Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Uladzislau Rezki @ 2022-11-01 11:54 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, linux-mm, akpm, x86, peterz, hch, rick.p.edgecombe,
	dave.hansen, urezki, mcgrof, kernel-team

On Mon, Oct 31, 2022 at 02:58:30PM -0700, Song Liu wrote:
> vmalloc_exec is used to allocate memory to host dynamic kernel text
> (modules, BPF programs, etc.) with huge pages. This is similar to the
> proposal by Peter in [1].
> 
> A new tree of vmap_area, free_text_area_* tree, is introduced in addition
> to free_vmap_area_* and vmap_area_*. vmalloc_exec allocates pages from
> free_text_area_*. When there isn't enough space left in free_text_area_*,
> new PMD_SIZE page(s) is allocated from free_vmap_area_* and added to
> free_text_area_*. To be more accurate, the vmap_area is first added to
> vmap_area_* tree and then moved to free_text_area_*. This extra move
> simplifies the logic of vmalloc_exec.
> 
> vmap_area in free_text_area_* tree are backed with memory, but we need
> subtree_max_size for tree operations. Therefore, vm_struct for these
> vmap_area are stored in a separate list, all_text_vm.
> 
> The new tree allows separate handling of < PAGE_SIZE allocations, as
> current vmalloc code mostly assumes PAGE_SIZE aligned allocations. This
> version of vmalloc_exec can handle bpf programs, which uses 64 byte
> aligned allocations), and modules, which uses PAGE_SIZE aligned
> allocations.
> 
> Memory allocated by vmalloc_exec() is set to RO+X before returning to the
> caller. Therefore, the caller cannot write directly write to the memory.
> Instead, the caller is required to use vcopy_exec() to update the memory.
> For the safety and security of X memory, vcopy_exec() checks the data
> being updated always in the memory allocated by one vmalloc_exec() call.
> vcopy_exec() uses text_poke like mechanism and requires arch support.
> Specifically, the arch need to implement arch_vcopy_exec().
> 
> In vfree_exec(), the memory is first erased with arch_invalidate_exec().
> Then, the memory is added to free_text_area_*. If this free creates big
> enough continuous free space (> PMD_SIZE), vfree_exec() will try to free
> the backing vm_struct.
> 
> [1] https://lore.kernel.org/bpf/Ys6cWUMHO8XwyYgr@hirez.programming.kicks-ass.net/
> 
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  include/linux/vmalloc.h |   5 +
>  mm/nommu.c              |  12 ++
>  mm/vmalloc.c            | 318 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 335 insertions(+)
> 
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 096d48aa3437..9b2042313c12 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -154,6 +154,11 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
>  void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
>  		int node, const void *caller) __alloc_size(1);
>  void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
> +void *vmalloc_exec(unsigned long size, unsigned long align) __alloc_size(1);
> +void *vcopy_exec(void *dst, void *src, size_t len);
> +void vfree_exec(void *addr);
> +void *arch_vcopy_exec(void *dst, void *src, size_t len);
> +int arch_invalidate_exec(void *ptr, size_t len);
>  
>  extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
>  extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2);
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 214c70e1d059..8a1317247ef0 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -371,6 +371,18 @@ int vm_map_pages_zero(struct vm_area_struct *vma, struct page **pages,
>  }
>  EXPORT_SYMBOL(vm_map_pages_zero);
>  
> +void *vmalloc_exec(unsigned long size, unsigned long align)
> +{
> +	return NULL;
> +}
> +
> +void *vcopy_exec(void *dst, void *src, size_t len)
> +{
> +	return ERR_PTR(-EOPNOTSUPP);
> +}
> +
> +void vfree_exec(const void *addr) { }
> +
>  /*
>   *  sys_brk() for the most part doesn't need the global kernel
>   *  lock, except when an application is doing something nasty
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ccaa461998f3..6f4c73e67191 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -72,6 +72,9 @@ early_param("nohugevmalloc", set_nohugevmalloc);
>  static const bool vmap_allow_huge = false;
>  #endif	/* CONFIG_HAVE_ARCH_HUGE_VMALLOC */
>  
> +#define PMD_ALIGN(addr) ALIGN(addr, PMD_SIZE)
> +#define PMD_ALIGN_DOWN(addr) ALIGN_DOWN(addr, PMD_SIZE)
> +
>  bool is_vmalloc_addr(const void *x)
>  {
>  	unsigned long addr = (unsigned long)kasan_reset_tag(x);
> @@ -769,6 +772,38 @@ static LIST_HEAD(free_vmap_area_list);
>   */
>  static struct rb_root free_vmap_area_root = RB_ROOT;
>  
> +/*
> + * free_text_area for vmalloc_exec()
> + */
> +static DEFINE_SPINLOCK(free_text_area_lock);
> +/*
> + * This linked list is used in pair with free_text_area_root.
> + * It gives O(1) access to prev/next to perform fast coalescing.
> + */
> +static LIST_HEAD(free_text_area_list);
> +
> +/*
> + * This augment red-black tree represents the free text space.
> + * All vmap_area objects in this tree are sorted by va->va_start
> + * address. It is used for allocation and merging when a vmap
> + * object is released.
> + *
> + * Each vmap_area node contains a maximum available free block
> + * of its sub-tree, right or left. Therefore it is possible to
> + * find a lowest match of free area.
> + *
> + * vmap_area in this tree are backed by RO+X memory, but they do
> + * not have valid vm pointer (because we need subtree_max_size).
> + * The vm for these vmap_area are stored in all_text_vm.
> + */
> +static struct rb_root free_text_area_root = RB_ROOT;
> +
> +/*
> + * List of vm_struct for free_text_area_root. This list is rarely
> + * accessed, so the O(N) complexity is not likely a real issue.
> + */
> +struct vm_struct *all_text_vm;
> +
>  /*
>   * Preload a CPU with one object for "no edge" split case. The
>   * aim is to get rid of allocations from the atomic context, thus
> @@ -3313,6 +3348,289 @@ void *vmalloc(unsigned long size)
>  }
>  EXPORT_SYMBOL(vmalloc);
>  
> +#if defined(CONFIG_MODULES) && defined(MODULES_VADDR)
> +#define VMALLOC_EXEC_START MODULES_VADDR
> +#define VMALLOC_EXEC_END MODULES_END
> +#else
> +#define VMALLOC_EXEC_START VMALLOC_START
> +#define VMALLOC_EXEC_END VMALLOC_END
> +#endif
> +
> +static void move_vmap_to_free_text_tree(void *addr)
> +{
> +	struct vmap_area *va;
> +
> +	/* remove from vmap_area_root */
> +	spin_lock(&vmap_area_lock);
> +	va = __find_vmap_area((unsigned long)addr, &vmap_area_root);
> +	if (WARN_ON_ONCE(!va)) {
> +		spin_unlock(&vmap_area_lock);
> +		return;
> +	}
> +	unlink_va(va, &vmap_area_root);
> +	spin_unlock(&vmap_area_lock);
> +
> +	/* make the memory RO+X */
> +	memset(addr, 0, va->va_end - va->va_start);
> +	set_memory_ro(va->va_start, (va->va_end - va->va_start) >> PAGE_SHIFT);
> +	set_memory_x(va->va_start, (va->va_end - va->va_start) >> PAGE_SHIFT);
> +
> +	/* add to all_text_vm */
> +	va->vm->next = all_text_vm;
> +	all_text_vm = va->vm;
> +
> +	/* add to free_text_area_root */
> +	spin_lock(&free_text_area_lock);
> +	merge_or_add_vmap_area_augment(va, &free_text_area_root, &free_text_area_list);
> +	spin_unlock(&free_text_area_lock);
> +}
> +
> +/**
> + * vmalloc_exec - allocate virtually contiguous RO+X memory
> + * @size:    allocation size
> + *
> + * This is used to allocate dynamic kernel text, such as module text, BPF
> + * programs, etc. User need to use text_poke to update the memory allocated
> + * by vmalloc_exec.
> + *
> + * Return: pointer to the allocated memory or %NULL on error
> + */
> +void *vmalloc_exec(unsigned long size, unsigned long align)
> +{
> +	struct vmap_area *va, *tmp;
> +	unsigned long addr;
> +	enum fit_type type;
> +	int ret;
> +
> +	va = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, NUMA_NO_NODE);
> +	if (unlikely(!va))
> +		return NULL;
> +
> +again:
> +	preload_this_cpu_lock(&free_text_area_lock, GFP_KERNEL, NUMA_NO_NODE);
> +	tmp = find_vmap_lowest_match(&free_text_area_root, size, align, 1, false);
> +
> +	if (!tmp) {
> +		unsigned long alloc_size;
> +		void *ptr;
> +
> +		spin_unlock(&free_text_area_lock);
> +
> +		/*
> +		 * Not enough continuous space in free_text_area_root, try
> +		 * allocate more memory. The memory is first added to
> +		 * vmap_area_root, and then moved to free_text_area_root.
> +		 */
> +		alloc_size = roundup(size, PMD_SIZE * num_online_nodes());
> +		ptr = __vmalloc_node_range(alloc_size, PMD_SIZE, VMALLOC_EXEC_START,
> +					   VMALLOC_EXEC_END, GFP_KERNEL, PAGE_KERNEL,
> +					   VM_ALLOW_HUGE_VMAP | VM_NO_GUARD,
> +					   NUMA_NO_NODE, __builtin_return_address(0));
> +		if (unlikely(!ptr))
> +			goto err_out;
> +
> +		move_vmap_to_free_text_tree(ptr);
> +		goto again;
>
It is yet another allocator built on top of vmalloc. So there are 4 then.
Could you please avoid of doing it? I do not find it as something that is
reasonable.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next v1 1/5] vmalloc: introduce vmalloc_exec, vfree_exec, and vcopy_exec
  2022-11-01 11:54   ` Uladzislau Rezki
@ 2022-11-01 15:06     ` Song Liu
  0 siblings, 0 replies; 8+ messages in thread
From: Song Liu @ 2022-11-01 15:06 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: bpf, linux-mm, akpm, x86, peterz, hch, rick.p.edgecombe,
	dave.hansen, mcgrof, kernel-team

On Tue, Nov 1, 2022 at 4:54 AM Uladzislau Rezki <urezki@gmail.com> wrote:
>
> On Mon, Oct 31, 2022 at 02:58:30PM -0700, Song Liu wrote:
> > vmalloc_exec is used to allocate memory to host dynamic kernel text
> > (modules, BPF programs, etc.) with huge pages. This is similar to the
> > proposal by Peter in [1].
> >
> > A new tree of vmap_area, free_text_area_* tree, is introduced in addition
> > to free_vmap_area_* and vmap_area_*. vmalloc_exec allocates pages from
> > free_text_area_*. When there isn't enough space left in free_text_area_*,
> > new PMD_SIZE page(s) is allocated from free_vmap_area_* and added to
> > free_text_area_*. To be more accurate, the vmap_area is first added to
> > vmap_area_* tree and then moved to free_text_area_*. This extra move
> > simplifies the logic of vmalloc_exec.
> >
> > vmap_area in free_text_area_* tree are backed with memory, but we need
> > subtree_max_size for tree operations. Therefore, vm_struct for these
> > vmap_area are stored in a separate list, all_text_vm.
> >
> > The new tree allows separate handling of < PAGE_SIZE allocations, as
> > current vmalloc code mostly assumes PAGE_SIZE aligned allocations. This
> > version of vmalloc_exec can handle bpf programs, which uses 64 byte
> > aligned allocations), and modules, which uses PAGE_SIZE aligned
> > allocations.
> >
> > Memory allocated by vmalloc_exec() is set to RO+X before returning to the
> > caller. Therefore, the caller cannot write directly write to the memory.
> > Instead, the caller is required to use vcopy_exec() to update the memory.
> > For the safety and security of X memory, vcopy_exec() checks the data
> > being updated always in the memory allocated by one vmalloc_exec() call.
> > vcopy_exec() uses text_poke like mechanism and requires arch support.
> > Specifically, the arch need to implement arch_vcopy_exec().
> >
> > In vfree_exec(), the memory is first erased with arch_invalidate_exec().
> > Then, the memory is added to free_text_area_*. If this free creates big
> > enough continuous free space (> PMD_SIZE), vfree_exec() will try to free
> > the backing vm_struct.
> >
> > [1] https://lore.kernel.org/bpf/Ys6cWUMHO8XwyYgr@hirez.programming.kicks-ass.net/
> >
> > Signed-off-by: Song Liu <song@kernel.org>
> > ---
> >  include/linux/vmalloc.h |   5 +
> >  mm/nommu.c              |  12 ++
> >  mm/vmalloc.c            | 318 ++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 335 insertions(+)
> >
> > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > index 096d48aa3437..9b2042313c12 100644
> > --- a/include/linux/vmalloc.h
> > +++ b/include/linux/vmalloc.h
> > @@ -154,6 +154,11 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
> >  void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
> >               int node, const void *caller) __alloc_size(1);
> >  void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
> > +void *vmalloc_exec(unsigned long size, unsigned long align) __alloc_size(1);
> > +void *vcopy_exec(void *dst, void *src, size_t len);
> > +void vfree_exec(void *addr);
> > +void *arch_vcopy_exec(void *dst, void *src, size_t len);
> > +int arch_invalidate_exec(void *ptr, size_t len);
> >
> >  extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
> >  extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2);
> > diff --git a/mm/nommu.c b/mm/nommu.c
> > index 214c70e1d059..8a1317247ef0 100644
> > --- a/mm/nommu.c
> > +++ b/mm/nommu.c
> > @@ -371,6 +371,18 @@ int vm_map_pages_zero(struct vm_area_struct *vma, struct page **pages,
> >  }
> >  EXPORT_SYMBOL(vm_map_pages_zero);
> >
> > +void *vmalloc_exec(unsigned long size, unsigned long align)
> > +{
> > +     return NULL;
> > +}
> > +
> > +void *vcopy_exec(void *dst, void *src, size_t len)
> > +{
> > +     return ERR_PTR(-EOPNOTSUPP);
> > +}
> > +
> > +void vfree_exec(const void *addr) { }
> > +
> >  /*
> >   *  sys_brk() for the most part doesn't need the global kernel
> >   *  lock, except when an application is doing something nasty
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index ccaa461998f3..6f4c73e67191 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -72,6 +72,9 @@ early_param("nohugevmalloc", set_nohugevmalloc);
> >  static const bool vmap_allow_huge = false;
> >  #endif       /* CONFIG_HAVE_ARCH_HUGE_VMALLOC */
> >
> > +#define PMD_ALIGN(addr) ALIGN(addr, PMD_SIZE)
> > +#define PMD_ALIGN_DOWN(addr) ALIGN_DOWN(addr, PMD_SIZE)
> > +
> >  bool is_vmalloc_addr(const void *x)
> >  {
> >       unsigned long addr = (unsigned long)kasan_reset_tag(x);
> > @@ -769,6 +772,38 @@ static LIST_HEAD(free_vmap_area_list);
> >   */
> >  static struct rb_root free_vmap_area_root = RB_ROOT;
> >
> > +/*
> > + * free_text_area for vmalloc_exec()
> > + */
> > +static DEFINE_SPINLOCK(free_text_area_lock);
> > +/*
> > + * This linked list is used in pair with free_text_area_root.
> > + * It gives O(1) access to prev/next to perform fast coalescing.
> > + */
> > +static LIST_HEAD(free_text_area_list);
> > +
> > +/*
> > + * This augment red-black tree represents the free text space.
> > + * All vmap_area objects in this tree are sorted by va->va_start
> > + * address. It is used for allocation and merging when a vmap
> > + * object is released.
> > + *
> > + * Each vmap_area node contains a maximum available free block
> > + * of its sub-tree, right or left. Therefore it is possible to
> > + * find a lowest match of free area.
> > + *
> > + * vmap_area in this tree are backed by RO+X memory, but they do
> > + * not have valid vm pointer (because we need subtree_max_size).
> > + * The vm for these vmap_area are stored in all_text_vm.
> > + */
> > +static struct rb_root free_text_area_root = RB_ROOT;
> > +
> > +/*
> > + * List of vm_struct for free_text_area_root. This list is rarely
> > + * accessed, so the O(N) complexity is not likely a real issue.
> > + */
> > +struct vm_struct *all_text_vm;
> > +
> >  /*
> >   * Preload a CPU with one object for "no edge" split case. The
> >   * aim is to get rid of allocations from the atomic context, thus
> > @@ -3313,6 +3348,289 @@ void *vmalloc(unsigned long size)
> >  }
> >  EXPORT_SYMBOL(vmalloc);
> >
> > +#if defined(CONFIG_MODULES) && defined(MODULES_VADDR)
> > +#define VMALLOC_EXEC_START MODULES_VADDR
> > +#define VMALLOC_EXEC_END MODULES_END
> > +#else
> > +#define VMALLOC_EXEC_START VMALLOC_START
> > +#define VMALLOC_EXEC_END VMALLOC_END
> > +#endif
> > +
> > +static void move_vmap_to_free_text_tree(void *addr)
> > +{
> > +     struct vmap_area *va;
> > +
> > +     /* remove from vmap_area_root */
> > +     spin_lock(&vmap_area_lock);
> > +     va = __find_vmap_area((unsigned long)addr, &vmap_area_root);
> > +     if (WARN_ON_ONCE(!va)) {
> > +             spin_unlock(&vmap_area_lock);
> > +             return;
> > +     }
> > +     unlink_va(va, &vmap_area_root);
> > +     spin_unlock(&vmap_area_lock);
> > +
> > +     /* make the memory RO+X */
> > +     memset(addr, 0, va->va_end - va->va_start);
> > +     set_memory_ro(va->va_start, (va->va_end - va->va_start) >> PAGE_SHIFT);
> > +     set_memory_x(va->va_start, (va->va_end - va->va_start) >> PAGE_SHIFT);
> > +
> > +     /* add to all_text_vm */
> > +     va->vm->next = all_text_vm;
> > +     all_text_vm = va->vm;
> > +
> > +     /* add to free_text_area_root */
> > +     spin_lock(&free_text_area_lock);
> > +     merge_or_add_vmap_area_augment(va, &free_text_area_root, &free_text_area_list);
> > +     spin_unlock(&free_text_area_lock);
> > +}
> > +
> > +/**
> > + * vmalloc_exec - allocate virtually contiguous RO+X memory
> > + * @size:    allocation size
> > + *
> > + * This is used to allocate dynamic kernel text, such as module text, BPF
> > + * programs, etc. User need to use text_poke to update the memory allocated
> > + * by vmalloc_exec.
> > + *
> > + * Return: pointer to the allocated memory or %NULL on error
> > + */
> > +void *vmalloc_exec(unsigned long size, unsigned long align)
> > +{
> > +     struct vmap_area *va, *tmp;
> > +     unsigned long addr;
> > +     enum fit_type type;
> > +     int ret;
> > +
> > +     va = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, NUMA_NO_NODE);
> > +     if (unlikely(!va))
> > +             return NULL;
> > +
> > +again:
> > +     preload_this_cpu_lock(&free_text_area_lock, GFP_KERNEL, NUMA_NO_NODE);
> > +     tmp = find_vmap_lowest_match(&free_text_area_root, size, align, 1, false);
> > +
> > +     if (!tmp) {
> > +             unsigned long alloc_size;
> > +             void *ptr;
> > +
> > +             spin_unlock(&free_text_area_lock);
> > +
> > +             /*
> > +              * Not enough continuous space in free_text_area_root, try
> > +              * allocate more memory. The memory is first added to
> > +              * vmap_area_root, and then moved to free_text_area_root.
> > +              */
> > +             alloc_size = roundup(size, PMD_SIZE * num_online_nodes());
> > +             ptr = __vmalloc_node_range(alloc_size, PMD_SIZE, VMALLOC_EXEC_START,
> > +                                        VMALLOC_EXEC_END, GFP_KERNEL, PAGE_KERNEL,
> > +                                        VM_ALLOW_HUGE_VMAP | VM_NO_GUARD,
> > +                                        NUMA_NO_NODE, __builtin_return_address(0));
> > +             if (unlikely(!ptr))
> > +                     goto err_out;
> > +
> > +             move_vmap_to_free_text_tree(ptr);
> > +             goto again;
> >
> It is yet another allocator built on top of vmalloc. So there are 4 then.
> Could you please avoid of doing it? I do not find it as something that is
> reasonable.

Could you please elaborate why this is not reasonable? Or, what would
be a more reasonable alternative?

Thanks,
Song

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-11-01 15:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-31 21:58 [PATCH bpf-next v1 0/5] vmalloc_exec for modules and BPF programs Song Liu
2022-10-31 21:58 ` [PATCH bpf-next v1 1/5] vmalloc: introduce vmalloc_exec, vfree_exec, and vcopy_exec Song Liu
2022-11-01 11:54   ` Uladzislau Rezki
2022-11-01 15:06     ` Song Liu
2022-10-31 21:58 ` [PATCH bpf-next v1 2/5] x86/alternative: support vmalloc_exec() and vfree_exec() Song Liu
2022-10-31 21:58 ` [PATCH bpf-next v1 3/5] bpf: use vmalloc_exec for bpf program and bpf dispatcher Song Liu
2022-10-31 21:58 ` [PATCH bpf-next v1 4/5] vmalloc: introduce register_text_tail_vm() Song Liu
2022-10-31 21:58 ` [PATCH bpf-next v1 5/5] x86: use register_text_tail_vm Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.