All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 bpf-next 0/8] bpf_prog_pack followup
@ 2022-05-20  3:15 Song Liu
  2022-05-20  3:15 ` [PATCH v3 bpf-next 1/8] bpf: fill new bpf_prog_pack with illegal instructions Song Liu
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Song Liu @ 2022-05-20  3:15 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-mm
  Cc: ast, daniel, peterz, mcgrof, torvalds, rick.p.edgecombe,
	kernel-team, Song Liu

Changes v2 => v3:
1. Fix issues reported by kernel test robot <lkp@intel.com>.

Changes v1 => v2:
1. Add WARN to set_vm_flush_reset_perms() on huge pages. (Rick Edgecombe)
2. Simplify select_bpf_prog_pack_size. (Rick Edgecombe)

As of 5.18-rc6, x86_64 uses bpf_prog_pack on 4kB pages. This set contains
a few followups:
  1/8 - 3/8 fills unused part of bpf_prog_pack with illegal instructions.
  4/8 - 5/8 enables bpf_prog_pack on 2MB pages.

The primary goal of bpf_prog_pack is to reduce iTLB miss rate and reduce
direct memory mapping fragmentation. This leads to non-trivial performance
improvements.

For our web service production benchmark, bpf_prog_pack on 4kB pages
gives 0.5% to 0.7% more throughput than not using bpf_prog_pack.
bpf_prog_pack on 2MB pages 0.6% to 0.9% more throughput than not using
bpf_prog_pack. Note that 0.5% is a huge improvement for our fleet. I
believe this is also significant for other companies with many thousand
servers.

bpf_prog_pack on 2MB pages may use slightly more memory for systems
without many BPF programs. However, such waste in memory (<2MB) is within
noisy for modern x86_64 systems.

Song Liu (8):
  bpf: fill new bpf_prog_pack with illegal instructions
  x86/alternative: introduce text_poke_set
  bpf: introduce bpf_arch_text_invalidate for bpf_prog_pack
  module: introduce module_alloc_huge
  bpf: use module_alloc_huge for bpf_prog_pack
  vmalloc: WARN for set_vm_flush_reset_perms() on huge pages
  vmalloc: introduce huge_vmalloc_supported
  bpf: simplify select_bpf_prog_pack_size

 arch/x86/include/asm/text-patching.h |  1 +
 arch/x86/kernel/alternative.c        | 67 +++++++++++++++++++++++-----
 arch/x86/kernel/module.c             | 21 +++++++++
 arch/x86/net/bpf_jit_comp.c          |  5 +++
 include/linux/bpf.h                  |  1 +
 include/linux/moduleloader.h         |  5 +++
 include/linux/vmalloc.h              |  7 +++
 kernel/bpf/core.c                    | 43 ++++++++++--------
 kernel/module.c                      |  8 ++++
 mm/vmalloc.c                         |  5 +++
 10 files changed, 134 insertions(+), 29 deletions(-)

--
2.30.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 bpf-next 1/8] bpf: fill new bpf_prog_pack with illegal instructions
  2022-05-20  3:15 [PATCH v3 bpf-next 0/8] bpf_prog_pack followup Song Liu
@ 2022-05-20  3:15 ` Song Liu
  2022-05-20  3:15 ` [PATCH v3 bpf-next 2/8] x86/alternative: introduce text_poke_set Song Liu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Song Liu @ 2022-05-20  3:15 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-mm
  Cc: ast, daniel, peterz, mcgrof, torvalds, rick.p.edgecombe,
	kernel-team, Song Liu

bpf_prog_pack enables sharing huge pages among multiple BPF programs.
These pages are marked as executable before the JIT engine fill it with
BPF programs. To make these pages safe, fill the hole bpf_prog_pack with
illegal instructions before making it executable.

Fixes: 57631054fae6 ("bpf: Introduce bpf_prog_pack allocator")
Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]")
Signed-off-by: Song Liu <song@kernel.org>
---
 kernel/bpf/core.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9cc91f0f3115..2d0c9d4696ad 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -873,7 +873,7 @@ static size_t select_bpf_prog_pack_size(void)
 	return size;
 }
 
-static struct bpf_prog_pack *alloc_new_pack(void)
+static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_insns)
 {
 	struct bpf_prog_pack *pack;
 
@@ -886,6 +886,7 @@ static struct bpf_prog_pack *alloc_new_pack(void)
 		kfree(pack);
 		return NULL;
 	}
+	bpf_fill_ill_insns(pack->ptr, bpf_prog_pack_size);
 	bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE);
 	list_add_tail(&pack->list, &pack_list);
 
@@ -895,7 +896,7 @@ static struct bpf_prog_pack *alloc_new_pack(void)
 	return pack;
 }
 
-static void *bpf_prog_pack_alloc(u32 size)
+static void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns)
 {
 	unsigned int nbits = BPF_PROG_SIZE_TO_NBITS(size);
 	struct bpf_prog_pack *pack;
@@ -910,6 +911,7 @@ static void *bpf_prog_pack_alloc(u32 size)
 		size = round_up(size, PAGE_SIZE);
 		ptr = module_alloc(size);
 		if (ptr) {
+			bpf_fill_ill_insns(ptr, size);
 			set_vm_flush_reset_perms(ptr);
 			set_memory_ro((unsigned long)ptr, size / PAGE_SIZE);
 			set_memory_x((unsigned long)ptr, size / PAGE_SIZE);
@@ -923,7 +925,7 @@ static void *bpf_prog_pack_alloc(u32 size)
 			goto found_free_area;
 	}
 
-	pack = alloc_new_pack();
+	pack = alloc_new_pack(bpf_fill_ill_insns);
 	if (!pack)
 		goto out;
 
@@ -1102,7 +1104,7 @@ bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **image_ptr,
 
 	if (bpf_jit_charge_modmem(size))
 		return NULL;
-	ro_header = bpf_prog_pack_alloc(size);
+	ro_header = bpf_prog_pack_alloc(size, bpf_fill_ill_insns);
 	if (!ro_header) {
 		bpf_jit_uncharge_modmem(size);
 		return NULL;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 bpf-next 2/8] x86/alternative: introduce text_poke_set
  2022-05-20  3:15 [PATCH v3 bpf-next 0/8] bpf_prog_pack followup Song Liu
  2022-05-20  3:15 ` [PATCH v3 bpf-next 1/8] bpf: fill new bpf_prog_pack with illegal instructions Song Liu
@ 2022-05-20  3:15 ` Song Liu
  2022-05-22  5:38   ` Hyeonggon Yoo
  2022-05-20  3:15 ` [PATCH v3 bpf-next 3/8] bpf: introduce bpf_arch_text_invalidate for bpf_prog_pack Song Liu
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Song Liu @ 2022-05-20  3:15 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-mm
  Cc: ast, daniel, peterz, mcgrof, torvalds, rick.p.edgecombe,
	kernel-team, Song Liu

Introduce a memset like API for text_poke. This will be used to fill the
unused RX memory with illegal instructions.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Song Liu <song@kernel.org>
---
 arch/x86/include/asm/text-patching.h |  1 +
 arch/x86/kernel/alternative.c        | 67 +++++++++++++++++++++++-----
 2 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index d20ab0921480..1cc15528ce29 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -45,6 +45,7 @@ extern void *text_poke(void *addr, const void *opcode, size_t len);
 extern void text_poke_sync(void);
 extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
 extern void *text_poke_copy(void *addr, const void *opcode, size_t len);
+extern void *text_poke_set(void *addr, int c, size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
 extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate);
 
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index d374cb3cf024..7563b5bc8328 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -994,7 +994,21 @@ static inline void unuse_temporary_mm(temp_mm_state_t prev_state)
 __ro_after_init struct mm_struct *poking_mm;
 __ro_after_init unsigned long poking_addr;
 
-static void *__text_poke(void *addr, const void *opcode, size_t len)
+static void text_poke_memcpy(void *dst, const void *src, size_t len)
+{
+	memcpy(dst, src, len);
+}
+
+static void text_poke_memset(void *dst, const void *src, size_t len)
+{
+	int c = *(const int *)src;
+
+	memset(dst, c, len);
+}
+
+typedef void text_poke_f(void *dst, const void *src, size_t len);
+
+static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t len)
 {
 	bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE;
 	struct page *pages[2] = {NULL};
@@ -1059,7 +1073,7 @@ static void *__text_poke(void *addr, const void *opcode, size_t len)
 	prev = use_temporary_mm(poking_mm);
 
 	kasan_disable_current();
-	memcpy((u8 *)poking_addr + offset_in_page(addr), opcode, len);
+	func((u8 *)poking_addr + offset_in_page(addr), src, len);
 	kasan_enable_current();
 
 	/*
@@ -1087,11 +1101,13 @@ static void *__text_poke(void *addr, const void *opcode, size_t len)
 			   (cross_page_boundary ? 2 : 1) * PAGE_SIZE,
 			   PAGE_SHIFT, false);
 
-	/*
-	 * If the text does not match what we just wrote then something is
-	 * fundamentally screwy; there's nothing we can really do about that.
-	 */
-	BUG_ON(memcmp(addr, opcode, len));
+	if (func == text_poke_memcpy) {
+		/*
+		 * If the text does not match what we just wrote then something is
+		 * fundamentally screwy; there's nothing we can really do about that.
+		 */
+		BUG_ON(memcmp(addr, src, len));
+	}
 
 	local_irq_restore(flags);
 	pte_unmap_unlock(ptep, ptl);
@@ -1118,7 +1134,7 @@ void *text_poke(void *addr, const void *opcode, size_t len)
 {
 	lockdep_assert_held(&text_mutex);
 
-	return __text_poke(addr, opcode, len);
+	return __text_poke(text_poke_memcpy, addr, opcode, len);
 }
 
 /**
@@ -1137,7 +1153,7 @@ void *text_poke(void *addr, const void *opcode, size_t len)
  */
 void *text_poke_kgdb(void *addr, const void *opcode, size_t len)
 {
-	return __text_poke(addr, opcode, len);
+	return __text_poke(text_poke_memcpy, addr, opcode, len);
 }
 
 /**
@@ -1167,7 +1183,38 @@ void *text_poke_copy(void *addr, const void *opcode, size_t len)
 
 		s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched);
 
-		__text_poke((void *)ptr, opcode + patched, s);
+		__text_poke(text_poke_memcpy, (void *)ptr, opcode + patched, s);
+		patched += s;
+	}
+	mutex_unlock(&text_mutex);
+	return addr;
+}
+
+/**
+ * text_poke_set - memset into (an unused part of) RX memory
+ * @addr: address to modify
+ * @c: the byte to fill the area with
+ * @len: length to copy, could be more than 2x PAGE_SIZE
+ *
+ * This is useful to overwrite unused regions of RX memory with illegal
+ * instructions.
+ */
+void *text_poke_set(void *addr, int c, size_t len)
+{
+	unsigned long start = (unsigned long)addr;
+	size_t patched = 0;
+
+	if (WARN_ON_ONCE(core_kernel_text(start)))
+		return NULL;
+
+	mutex_lock(&text_mutex);
+	while (patched < len) {
+		unsigned long ptr = start + patched;
+		size_t s;
+
+		s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched);
+
+		__text_poke(text_poke_memset, (void *)ptr, (void *)&c, s);
 		patched += s;
 	}
 	mutex_unlock(&text_mutex);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 bpf-next 3/8] bpf: introduce bpf_arch_text_invalidate for bpf_prog_pack
  2022-05-20  3:15 [PATCH v3 bpf-next 0/8] bpf_prog_pack followup Song Liu
  2022-05-20  3:15 ` [PATCH v3 bpf-next 1/8] bpf: fill new bpf_prog_pack with illegal instructions Song Liu
  2022-05-20  3:15 ` [PATCH v3 bpf-next 2/8] x86/alternative: introduce text_poke_set Song Liu
@ 2022-05-20  3:15 ` Song Liu
  2022-05-20  3:15 ` [PATCH v3 bpf-next 4/8] module: introduce module_alloc_huge Song Liu
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Song Liu @ 2022-05-20  3:15 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-mm
  Cc: ast, daniel, peterz, mcgrof, torvalds, rick.p.edgecombe,
	kernel-team, Song Liu

Introduce bpf_arch_text_invalidate and use it to fill unused part of the
bpf_prog_pack with illegal instructions when a BPF program is freed.

Signed-off-by: Song Liu <song@kernel.org>
---
 arch/x86/net/bpf_jit_comp.c | 5 +++++
 include/linux/bpf.h         | 1 +
 kernel/bpf/core.c           | 8 ++++++++
 3 files changed, 14 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index a2b6d197c226..f298b18a9a3d 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -228,6 +228,11 @@ static void jit_fill_hole(void *area, unsigned int size)
 	memset(area, 0xcc, size);
 }
 
+int bpf_arch_text_invalidate(void *dst, size_t len)
+{
+	return IS_ERR_OR_NULL(text_poke_set(dst, 0xcc, len));
+}
+
 struct jit_context {
 	int cleanup_addr; /* Epilogue code offset */
 
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c107392b0ba7..f6dfa416f892 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2364,6 +2364,7 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 		       void *addr1, void *addr2);
 
 void *bpf_arch_text_copy(void *dst, void *src, size_t len);
+int bpf_arch_text_invalidate(void *dst, size_t len);
 
 struct btf_id_set;
 bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 2d0c9d4696ad..cacd8684c3c4 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -968,6 +968,9 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
 	nbits = BPF_PROG_SIZE_TO_NBITS(hdr->size);
 	pos = ((unsigned long)hdr - (unsigned long)pack_ptr) >> BPF_PROG_CHUNK_SHIFT;
 
+	WARN_ONCE(bpf_arch_text_invalidate(hdr, hdr->size),
+		  "bpf_prog_pack bug: missing bpf_arch_text_invalidate?\n");
+
 	bitmap_clear(pack->bitmap, pos, nbits);
 	if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
 				       bpf_prog_chunk_count(), 0) == 0) {
@@ -2740,6 +2743,11 @@ void * __weak bpf_arch_text_copy(void *dst, void *src, size_t len)
 	return ERR_PTR(-ENOTSUPP);
 }
 
+int __weak bpf_arch_text_invalidate(void *dst, size_t len)
+{
+	return -ENOTSUPP;
+}
+
 DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
 EXPORT_SYMBOL(bpf_stats_enabled_key);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 bpf-next 4/8] module: introduce module_alloc_huge
  2022-05-20  3:15 [PATCH v3 bpf-next 0/8] bpf_prog_pack followup Song Liu
                   ` (2 preceding siblings ...)
  2022-05-20  3:15 ` [PATCH v3 bpf-next 3/8] bpf: introduce bpf_arch_text_invalidate for bpf_prog_pack Song Liu
@ 2022-05-20  3:15 ` Song Liu
  2022-05-20  3:15 ` [PATCH v3 bpf-next 5/8] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Song Liu @ 2022-05-20  3:15 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-mm
  Cc: ast, daniel, peterz, mcgrof, torvalds, rick.p.edgecombe,
	kernel-team, Song Liu, Stephen Rothwell

Introduce module_alloc_huge, which allocates huge page backed memory in
module memory space. The primary user of this memory is bpf_prog_pack
(multiple BPF programs sharing a huge page).

Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Song Liu <song@kernel.org>

---
Note: This conflicts with the module.c => module/ split in modules-next.
Current patch is for module.c in the bpf-next tree. After the split,
__weak module_alloc_huge() should be added to kernel/module/main.c.
---
 arch/x86/kernel/module.c     | 21 +++++++++++++++++++++
 include/linux/moduleloader.h |  5 +++++
 kernel/module.c              |  8 ++++++++
 3 files changed, 34 insertions(+)

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index b98ffcf4d250..63f6a16c70dc 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -86,6 +86,27 @@ void *module_alloc(unsigned long size)
 	return p;
 }
 
+void *module_alloc_huge(unsigned long size)
+{
+	gfp_t gfp_mask = GFP_KERNEL;
+	void *p;
+
+	if (PAGE_ALIGN(size) > MODULES_LEN)
+		return NULL;
+
+	p = __vmalloc_node_range(size, MODULE_ALIGN,
+				 MODULES_VADDR + get_module_load_offset(),
+				 MODULES_END, gfp_mask, PAGE_KERNEL,
+				 VM_DEFER_KMEMLEAK | VM_ALLOW_HUGE_VMAP,
+				 NUMA_NO_NODE, __builtin_return_address(0));
+	if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
+		vfree(p);
+		return NULL;
+	}
+
+	return p;
+}
+
 #ifdef CONFIG_X86_32
 int apply_relocate(Elf32_Shdr *sechdrs,
 		   const char *strtab,
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 9e09d11ffe5b..d34743a88938 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -26,6 +26,11 @@ unsigned int arch_mod_section_prepend(struct module *mod, unsigned int section);
    sections.  Returns NULL on failure. */
 void *module_alloc(unsigned long size);
 
+/* Allocator used for allocating memory in module memory space. If size is
+ * greater than PMD_SIZE, allow using huge pages. Returns NULL on failure.
+ */
+void *module_alloc_huge(unsigned long size);
+
 /* Free memory returned from module_alloc. */
 void module_memfree(void *module_region);
 
diff --git a/kernel/module.c b/kernel/module.c
index 6cea788fd965..2af20ac3209c 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2839,6 +2839,14 @@ void * __weak module_alloc(unsigned long size)
 			NUMA_NO_NODE, __builtin_return_address(0));
 }
 
+void * __weak module_alloc_huge(unsigned long size)
+{
+	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+				    GFP_KERNEL, PAGE_KERNEL_EXEC,
+				    VM_FLUSH_RESET_PERMS | VM_ALLOW_HUGE_VMAP,
+				    NUMA_NO_NODE, __builtin_return_address(0));
+}
+
 bool __weak module_init_section(const char *name)
 {
 	return strstarts(name, ".init");
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 bpf-next 5/8] bpf: use module_alloc_huge for bpf_prog_pack
  2022-05-20  3:15 [PATCH v3 bpf-next 0/8] bpf_prog_pack followup Song Liu
                   ` (3 preceding siblings ...)
  2022-05-20  3:15 ` [PATCH v3 bpf-next 4/8] module: introduce module_alloc_huge Song Liu
@ 2022-05-20  3:15 ` Song Liu
  2022-05-21  1:00   ` Luis Chamberlain
  2022-05-20  3:15 ` [PATCH v3 bpf-next 6/8] vmalloc: WARN for set_vm_flush_reset_perms() on huge pages Song Liu
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Song Liu @ 2022-05-20  3:15 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-mm
  Cc: ast, daniel, peterz, mcgrof, torvalds, rick.p.edgecombe,
	kernel-team, Song Liu

Use module_alloc_huge for bpf_prog_pack so that BPF programs sit on
PMD_SIZE pages. This benefits system performance by reducing iTLB miss
rate. Benchmark of a real web service workload shows this change gives
another ~0.2% performance boost on top of PAGE_SIZE bpf_prog_pack
(which improve system throughput by ~0.5%).

Also, remove set_vm_flush_reset_perms() from alloc_new_pack() and use
set_memory_[nx|rw] in bpf_prog_pack_free(). This is because
VM_FLUSH_RESET_PERMS does not work with huge pages yet. [1]

[1] https://lore.kernel.org/bpf/aeeeaf0b7ec63fdba55d4834d2f524d8bf05b71b.camel@intel.com/
Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Song Liu <song@kernel.org>
---
 kernel/bpf/core.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index cacd8684c3c4..b64d91fcb0ba 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -857,7 +857,7 @@ static size_t select_bpf_prog_pack_size(void)
 	void *ptr;
 
 	size = BPF_HPAGE_SIZE * num_online_nodes();
-	ptr = module_alloc(size);
+	ptr = module_alloc_huge(size);
 
 	/* Test whether we can get huge pages. If not just use PAGE_SIZE
 	 * packs.
@@ -881,7 +881,7 @@ static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins
 		       GFP_KERNEL);
 	if (!pack)
 		return NULL;
-	pack->ptr = module_alloc(bpf_prog_pack_size);
+	pack->ptr = module_alloc_huge(bpf_prog_pack_size);
 	if (!pack->ptr) {
 		kfree(pack);
 		return NULL;
@@ -890,7 +890,6 @@ static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins
 	bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE);
 	list_add_tail(&pack->list, &pack_list);
 
-	set_vm_flush_reset_perms(pack->ptr);
 	set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
 	set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
 	return pack;
@@ -909,10 +908,9 @@ static void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insn
 
 	if (size > bpf_prog_pack_size) {
 		size = round_up(size, PAGE_SIZE);
-		ptr = module_alloc(size);
+		ptr = module_alloc_huge(size);
 		if (ptr) {
 			bpf_fill_ill_insns(ptr, size);
-			set_vm_flush_reset_perms(ptr);
 			set_memory_ro((unsigned long)ptr, size / PAGE_SIZE);
 			set_memory_x((unsigned long)ptr, size / PAGE_SIZE);
 		}
@@ -949,6 +947,8 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
 
 	mutex_lock(&pack_mutex);
 	if (hdr->size > bpf_prog_pack_size) {
+		set_memory_nx((unsigned long)hdr, hdr->size / PAGE_SIZE);
+		set_memory_rw((unsigned long)hdr, hdr->size / PAGE_SIZE);
 		module_memfree(hdr);
 		goto out;
 	}
@@ -975,6 +975,8 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
 	if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
 				       bpf_prog_chunk_count(), 0) == 0) {
 		list_del(&pack->list);
+		set_memory_nx((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
+		set_memory_rw((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
 		module_memfree(pack->ptr);
 		kfree(pack);
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 bpf-next 6/8] vmalloc: WARN for set_vm_flush_reset_perms() on huge pages
  2022-05-20  3:15 [PATCH v3 bpf-next 0/8] bpf_prog_pack followup Song Liu
                   ` (4 preceding siblings ...)
  2022-05-20  3:15 ` [PATCH v3 bpf-next 5/8] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
@ 2022-05-20  3:15 ` Song Liu
  2022-05-20  3:15 ` [PATCH v3 bpf-next 7/8] vmalloc: introduce huge_vmalloc_supported Song Liu
  2022-05-20  3:15 ` [PATCH v3 bpf-next 8/8] bpf: simplify select_bpf_prog_pack_size Song Liu
  7 siblings, 0 replies; 17+ messages in thread
From: Song Liu @ 2022-05-20  3:15 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-mm
  Cc: ast, daniel, peterz, mcgrof, torvalds, rick.p.edgecombe,
	kernel-team, Song Liu

VM_FLUSH_RESET_PERMS is not yet ready for huge pages, add a WARN to
catch misuse soon.

Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Song Liu <song@kernel.org>
---
 include/linux/vmalloc.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index b159c2789961..5e0d0a60d9d5 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -238,6 +238,7 @@ static inline void set_vm_flush_reset_perms(void *addr)
 {
 	struct vm_struct *vm = find_vm_area(addr);
 
+	WARN_ON_ONCE(is_vm_area_hugepages(addr));
 	if (vm)
 		vm->flags |= VM_FLUSH_RESET_PERMS;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 bpf-next 7/8] vmalloc: introduce huge_vmalloc_supported
  2022-05-20  3:15 [PATCH v3 bpf-next 0/8] bpf_prog_pack followup Song Liu
                   ` (5 preceding siblings ...)
  2022-05-20  3:15 ` [PATCH v3 bpf-next 6/8] vmalloc: WARN for set_vm_flush_reset_perms() on huge pages Song Liu
@ 2022-05-20  3:15 ` Song Liu
  2022-05-20  3:15 ` [PATCH v3 bpf-next 8/8] bpf: simplify select_bpf_prog_pack_size Song Liu
  7 siblings, 0 replies; 17+ messages in thread
From: Song Liu @ 2022-05-20  3:15 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-mm
  Cc: ast, daniel, peterz, mcgrof, torvalds, rick.p.edgecombe,
	kernel-team, Song Liu

huge_vmalloc_supported() exposes vmap_allow_huge so that users of vmalloc
APIs could know whether vmalloc will return huge pages.

Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Song Liu <song@kernel.org>
---
 include/linux/vmalloc.h | 6 ++++++
 mm/vmalloc.c            | 5 +++++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 5e0d0a60d9d5..22e81c1813bd 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -242,11 +242,17 @@ static inline void set_vm_flush_reset_perms(void *addr)
 	if (vm)
 		vm->flags |= VM_FLUSH_RESET_PERMS;
 }
+bool huge_vmalloc_supported(void);
 
 #else
 static inline void set_vm_flush_reset_perms(void *addr)
 {
 }
+
+static inline bool huge_vmalloc_supported(void)
+{
+	return false;
+}
 #endif
 
 /* for /proc/kcore */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 07da85ae825b..d3b11317b025 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -72,6 +72,11 @@ early_param("nohugevmalloc", set_nohugevmalloc);
 static const bool vmap_allow_huge = false;
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMALLOC */
 
+bool huge_vmalloc_supported(void)
+{
+	return vmap_allow_huge;
+}
+
 bool is_vmalloc_addr(const void *x)
 {
 	unsigned long addr = (unsigned long)kasan_reset_tag(x);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 bpf-next 8/8] bpf: simplify select_bpf_prog_pack_size
  2022-05-20  3:15 [PATCH v3 bpf-next 0/8] bpf_prog_pack followup Song Liu
                   ` (6 preceding siblings ...)
  2022-05-20  3:15 ` [PATCH v3 bpf-next 7/8] vmalloc: introduce huge_vmalloc_supported Song Liu
@ 2022-05-20  3:15 ` Song Liu
  7 siblings, 0 replies; 17+ messages in thread
From: Song Liu @ 2022-05-20  3:15 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-mm
  Cc: ast, daniel, peterz, mcgrof, torvalds, rick.p.edgecombe,
	kernel-team, Song Liu

Use huge_vmalloc_supported to simplify select_bpf_prog_pack_size, so that
we don't allocate some huge pages and free them immediately.

Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Song Liu <song@kernel.org>
---
 kernel/bpf/core.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b64d91fcb0ba..b5dcc8f182b3 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -854,22 +854,15 @@ static LIST_HEAD(pack_list);
 static size_t select_bpf_prog_pack_size(void)
 {
 	size_t size;
-	void *ptr;
-
-	size = BPF_HPAGE_SIZE * num_online_nodes();
-	ptr = module_alloc_huge(size);
 
-	/* Test whether we can get huge pages. If not just use PAGE_SIZE
-	 * packs.
-	 */
-	if (!ptr || !is_vm_area_hugepages(ptr)) {
+	if (huge_vmalloc_supported()) {
+		size = BPF_HPAGE_SIZE * num_online_nodes();
+		bpf_prog_pack_mask = BPF_HPAGE_MASK;
+	} else {
 		size = PAGE_SIZE;
 		bpf_prog_pack_mask = PAGE_MASK;
-	} else {
-		bpf_prog_pack_mask = BPF_HPAGE_MASK;
 	}
 
-	vfree(ptr);
 	return size;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 5/8] bpf: use module_alloc_huge for bpf_prog_pack
  2022-05-20  3:15 ` [PATCH v3 bpf-next 5/8] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
@ 2022-05-21  1:00   ` Luis Chamberlain
  2022-05-21  1:20     ` Luis Chamberlain
  2022-05-21  3:20     ` Edgecombe, Rick P
  0 siblings, 2 replies; 17+ messages in thread
From: Luis Chamberlain @ 2022-05-21  1:00 UTC (permalink / raw)
  To: Song Liu, Rick Edgecombe, Arnd Bergmann
  Cc: linux-kernel, bpf, linux-mm, ast, daniel, peterz, torvalds, kernel-team

On Thu, May 19, 2022 at 08:15:45PM -0700, Song Liu wrote:
> Also, remove set_vm_flush_reset_perms() from alloc_new_pack() and use
> set_memory_[nx|rw] in bpf_prog_pack_free(). This is because
> VM_FLUSH_RESET_PERMS does not work with huge pages yet. [1]
> 
> [1] https://lore.kernel.org/bpf/aeeeaf0b7ec63fdba55d4834d2f524d8bf05b71b.camel@intel.com/
> Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Song Liu <song@kernel.org>
> ---

Rick,

although VM_FLUSH_RESET_PERMS is rather new my concern here is we're
essentially enabling sloppy users to grow without also addressing
what if we have to take the leash back to support VM_FLUSH_RESET_PERMS
properly? If the hack to support this on other architectures other than
x86 is as simple as the one you in vm_remove_mappings() today:

	if (flush_reset && !IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
		set_memory_nx(addr, area->nr_pages);
		set_memory_rw(addr, area->nr_pages);
	}

then I suppose this isn't a big deal. I'm just concerned here this being
a slippery slope of sloppiness leading to something which we will
regret later.

My intution tells me this shouldn't be a big issue, but I just want to
confirm.

  Luis

> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index cacd8684c3c4..b64d91fcb0ba 100644
> @@ -949,6 +947,8 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
>  
>  	mutex_lock(&pack_mutex);
>  	if (hdr->size > bpf_prog_pack_size) {
> +		set_memory_nx((unsigned long)hdr, hdr->size / PAGE_SIZE);
> +		set_memory_rw((unsigned long)hdr, hdr->size / PAGE_SIZE);
>  		module_memfree(hdr);
>  		goto out;
>  	}
> @@ -975,6 +975,8 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
>  	if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
>  				       bpf_prog_chunk_count(), 0) == 0) {
>  		list_del(&pack->list);
> +		set_memory_nx((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
> +		set_memory_rw((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
>  		module_memfree(pack->ptr);
>  		kfree(pack);
>  	}
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 5/8] bpf: use module_alloc_huge for bpf_prog_pack
  2022-05-21  1:00   ` Luis Chamberlain
@ 2022-05-21  1:20     ` Luis Chamberlain
  2022-05-21  3:20     ` Edgecombe, Rick P
  1 sibling, 0 replies; 17+ messages in thread
From: Luis Chamberlain @ 2022-05-21  1:20 UTC (permalink / raw)
  To: Song Liu, Rick Edgecombe, Arnd Bergmann, Davidlohr Bueso,
	Borislav Petkov
  Cc: linux-kernel, bpf, linux-mm, ast, daniel, peterz, torvalds, kernel-team

On Fri, May 20, 2022 at 06:00:57PM -0700, Luis Chamberlain wrote:
> On Thu, May 19, 2022 at 08:15:45PM -0700, Song Liu wrote:
> > Use module_alloc_huge for bpf_prog_pack so that BPF programs sit on
> > PMD_SIZE pages. This benefits system performance by reducing iTLB miss
> > rate. Benchmark of a real web service workload shows this change gives
> > another ~0.2% performance boost on top of PAGE_SIZE bpf_prog_pack
> > (which improve system throughput by ~0.5%).

Also, seems like a is a missed opportunity to show iTLB misses with more
detail. If there was a selftest to stress bpf JIT you could use perf and
enable anyone to quanitfy gains. Dave hinted with some ideas with perf:

perf stat -e cpu/event=0x8,umask=0x84,name=dtlb_load_misses_walk_duration/,cpu/event=0x8,umask=0x82,name=dtlb_load_misses_walk_completed/,cpu/event=0x49,umask=0x4,name=dtlb_store_misses_walk_duration/,cpu/event=0x49,umask=0x2,name=dtlb_store_misses_walk_completed/,cpu/event=0x85,umask=0x4,name=itlb_misses_walk_duration/,cpu/event=0x85,umask=0x2,name=itlb_misses_walk_completed/ some_bpf_jit_test

  Luis

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 5/8] bpf: use module_alloc_huge for bpf_prog_pack
  2022-05-21  1:00   ` Luis Chamberlain
  2022-05-21  1:20     ` Luis Chamberlain
@ 2022-05-21  3:20     ` Edgecombe, Rick P
  2022-05-21 20:06       ` Luis Chamberlain
  1 sibling, 1 reply; 17+ messages in thread
From: Edgecombe, Rick P @ 2022-05-21  3:20 UTC (permalink / raw)
  To: song, arnd, mcgrof
  Cc: linux-kernel, daniel, peterz, ast, bpf, kernel-team, linux-mm,
	Torvalds, Linus

On Fri, 2022-05-20 at 18:00 -0700, Luis Chamberlain wrote:
> although VM_FLUSH_RESET_PERMS is rather new my concern here is we're
> essentially enabling sloppy users to grow without also addressing
> what if we have to take the leash back to support
> VM_FLUSH_RESET_PERMS
> properly? If the hack to support this on other architectures other
> than
> x86 is as simple as the one you in vm_remove_mappings() today:
> 
>         if (flush_reset &&
> !IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
>                 set_memory_nx(addr, area->nr_pages);
>                 set_memory_rw(addr, area->nr_pages);
>         }
> 
> then I suppose this isn't a big deal. I'm just concerned here this
> being
> a slippery slope of sloppiness leading to something which we will
> regret later.
> 
> My intution tells me this shouldn't be a big issue, but I just want
> to
> confirm.

Yea, I commented the same concern on the last thread:

https://lore.kernel.org/lkml/83a69976cb93e69c5ad7a9511b5e57c402eee19d.camel@intel.com/

Song said he plans to make kprobes and ftrace work with this new
allocator. If that happens VM_FLUSH_RESET_PERMS would only have one
user - modules. Care to chime in with your plans for modules? If there
are actual near term plans to keep working on this,
VM_FLUSH_RESET_PERMS might be changed again or turn into something
else. Like if we are about to re-think everything, then it doesn't
matter as much to fix what would then be old.

Besides not fixing VM_FLUSH_RESET_PERMS/hibernate though, I think this
allocator still feels a little rough. For example I don't think we
actually know how much the huge mappings are helping. It is also
allocating memory in a big chunk from a single node and reusing it,
where before we were allocating based on numa node for each jit. Would
some user's suffer from that? Maybe it's obvious to others, but I would
have expected to see more discussion of MM things like that.

But I like general direction of caching and using text_poke() to write
the jits a lot. However it works, it seems to make a big impact in at
least some workloads.

So yea, seems sloppy, but probably (...I guess?) more good for users
then sloppy for us.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 5/8] bpf: use module_alloc_huge for bpf_prog_pack
  2022-05-21  3:20     ` Edgecombe, Rick P
@ 2022-05-21 20:06       ` Luis Chamberlain
  2022-05-24 17:40         ` Edgecombe, Rick P
  0 siblings, 1 reply; 17+ messages in thread
From: Luis Chamberlain @ 2022-05-21 20:06 UTC (permalink / raw)
  To: Edgecombe, Rick P, Christoph Hellwig, Davidlohr Bueso
  Cc: song, arnd, linux-kernel, daniel, peterz, ast, bpf, kernel-team,
	linux-mm, Torvalds, Linus

On Sat, May 21, 2022 at 03:20:28AM +0000, Edgecombe, Rick P wrote:
> On Fri, 2022-05-20 at 18:00 -0700, Luis Chamberlain wrote:
> > although VM_FLUSH_RESET_PERMS is rather new my concern here is we're
> > essentially enabling sloppy users to grow without also addressing
> > what if we have to take the leash back to support
> > VM_FLUSH_RESET_PERMS
> > properly? If the hack to support this on other architectures other
> > than
> > x86 is as simple as the one you in vm_remove_mappings() today:
> > 
> >         if (flush_reset &&
> > !IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
> >                 set_memory_nx(addr, area->nr_pages);
> >                 set_memory_rw(addr, area->nr_pages);
> >         }
> > 
> > then I suppose this isn't a big deal. I'm just concerned here this
> > being
> > a slippery slope of sloppiness leading to something which we will
> > regret later.
> > 
> > My intution tells me this shouldn't be a big issue, but I just want
> > to
> > confirm.
> 
> Yea, I commented the same concern on the last thread:
> 
> https://lore.kernel.org/lkml/83a69976cb93e69c5ad7a9511b5e57c402eee19d.camel@intel.com/
> 
> Song said he plans to make kprobes and ftrace work with this new
> allocator. If that happens VM_FLUSH_RESET_PERMS would only have one
> user - modules. Care to chime in with your plans for modules?

My plans are to not break things and to slowly tidy things up. If
you see linux-next, things are at least starting to be split in
nice pieces. With time, clean that further so to not break things.
You were the one who added VM_FLUSH_RESET_PERMS, wasn't that to deal
with secmem stuff? So wouldn't you know better what you recommend for it?

Seeing all this, given module_alloc() users are growing and seeing
the tiny bit of growth of use in this space, I'd think we should
rename module_alloc() to vmalloc_exec(), and likewise the same for
module_memfree() to vmalloc_exec_free(). But it would be our first
__weak vmalloc, and not sure if that's looked down upon.

> If there
> are actual near term plans to keep working on this,
> VM_FLUSH_RESET_PERMS might be changed again or turn into something
> else. Like if we are about to re-think everything, then it doesn't
> matter as much to fix what would then be old.

I think it's up to you as you added it and I'm not looking to add
any bells or wistles, just tidy things up *slowly*.

> Besides not fixing VM_FLUSH_RESET_PERMS/hibernate though, I think this
> allocator still feels a little rough. For example I don't think we
> actually know how much the huge mappings are helping.

Right, 100% agreed. The performance numbers provided are nice but
they are not anything folks can reproduce at all. I hinted towards
perf stuff which could be used and enable other users later to also
use similar stats to showcase its value if they want to move to
huge pages.

It is a side note, and perhaps a stupid question, as I don't grok mm,
but I'm perplexed about the fact that if the value is seen so high towards
huge pages for exec stuff in kernel, wouldn't there be a few folks who
might want to try this for regular exec stuff? Wouldn't there be much
more gains there?

> It is also
> allocating memory in a big chunk from a single node and reusing it,
> where before we were allocating based on numa node for each jit. Would
> some user's suffer from that? Maybe it's obvious to others, but I would
> have expected to see more discussion of MM things like that.

Curious, why was it moved to use a single node?

> But I like general direction of caching and using text_poke() to write
> the jits a lot. However it works, it seems to make a big impact in at
> least some workloads.
> 
> So yea, seems sloppy, but probably (...I guess?) more good for users
> then sloppy for us.

The impact of sloppiness lies in possible odd bugs later and trying to
decipher what was being done. So I do have concerns with the immediate
tribal knowlege incurred by the current implementation. What is your
own roadmap for VM_FLUSH_RESET_PERMS? Sounds like a future possibly
maybe re-do?

  Luis

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 2/8] x86/alternative: introduce text_poke_set
  2022-05-20  3:15 ` [PATCH v3 bpf-next 2/8] x86/alternative: introduce text_poke_set Song Liu
@ 2022-05-22  5:38   ` Hyeonggon Yoo
  0 siblings, 0 replies; 17+ messages in thread
From: Hyeonggon Yoo @ 2022-05-22  5:38 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, bpf, linux-mm, ast, daniel, peterz, mcgrof,
	torvalds, rick.p.edgecombe, kernel-team

On Thu, May 19, 2022 at 08:15:42PM -0700, Song Liu wrote:
> Introduce a memset like API for text_poke. This will be used to fill the
> unused RX memory with illegal instructions.
> 
> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  arch/x86/include/asm/text-patching.h |  1 +
>  arch/x86/kernel/alternative.c        | 67 +++++++++++++++++++++++-----
>  2 files changed, 58 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
> index d20ab0921480..1cc15528ce29 100644
> --- a/arch/x86/include/asm/text-patching.h
> +++ b/arch/x86/include/asm/text-patching.h
> @@ -45,6 +45,7 @@ extern void *text_poke(void *addr, const void *opcode, size_t len);
>  extern void text_poke_sync(void);
>  extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
>  extern void *text_poke_copy(void *addr, const void *opcode, size_t len);
> +extern void *text_poke_set(void *addr, int c, size_t len);
>  extern int poke_int3_handler(struct pt_regs *regs);
>  extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate);
>  
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index d374cb3cf024..7563b5bc8328 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -994,7 +994,21 @@ static inline void unuse_temporary_mm(temp_mm_state_t prev_state)
>  __ro_after_init struct mm_struct *poking_mm;
>  __ro_after_init unsigned long poking_addr;
>  
> -static void *__text_poke(void *addr, const void *opcode, size_t len)
> +static void text_poke_memcpy(void *dst, const void *src, size_t len)
> +{
> +	memcpy(dst, src, len);
> +}
> +
> +static void text_poke_memset(void *dst, const void *src, size_t len)
> +{
> +	int c = *(const int *)src;
> +
> +	memset(dst, c, len);
> +}
> +
> +typedef void text_poke_f(void *dst, const void *src, size_t len);
> +
> +static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t len)
>  {
>  	bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE;
>  	struct page *pages[2] = {NULL};
> @@ -1059,7 +1073,7 @@ static void *__text_poke(void *addr, const void *opcode, size_t len)
>  	prev = use_temporary_mm(poking_mm);
>  
>  	kasan_disable_current();
> -	memcpy((u8 *)poking_addr + offset_in_page(addr), opcode, len);
> +	func((u8 *)poking_addr + offset_in_page(addr), src, len);
>  	kasan_enable_current();
>  
>  	/*
> @@ -1087,11 +1101,13 @@ static void *__text_poke(void *addr, const void *opcode, size_t len)
>  			   (cross_page_boundary ? 2 : 1) * PAGE_SIZE,
>  			   PAGE_SHIFT, false);
>  
> -	/*
> -	 * If the text does not match what we just wrote then something is
> -	 * fundamentally screwy; there's nothing we can really do about that.
> -	 */
> -	BUG_ON(memcmp(addr, opcode, len));
> +	if (func == text_poke_memcpy) {
> +		/*
> +		 * If the text does not match what we just wrote then something is
> +		 * fundamentally screwy; there's nothing we can really do about that.
> +		 */
> +		BUG_ON(memcmp(addr, src, len));

Maybe something like this?

	} else if (func == text_poke_memset) {
		WARN_ON or BUG_ON(memchr_inv(addr, *((const int *)src), len));
	}

Thanks,
Hyeonggon

>  
>  	local_irq_restore(flags);
>  	pte_unmap_unlock(ptep, ptl);
> @@ -1118,7 +1134,7 @@ void *text_poke(void *addr, const void *opcode, size_t len)
>  {
>  	lockdep_assert_held(&text_mutex);
>  
> -	return __text_poke(addr, opcode, len);
> +	return __text_poke(text_poke_memcpy, addr, opcode, len);
>  }
>  
>  /**
> @@ -1137,7 +1153,7 @@ void *text_poke(void *addr, const void *opcode, size_t len)
>   */
>  void *text_poke_kgdb(void *addr, const void *opcode, size_t len)
>  {
> -	return __text_poke(addr, opcode, len);
> +	return __text_poke(text_poke_memcpy, addr, opcode, len);
>  }
>  
>  /**
> @@ -1167,7 +1183,38 @@ void *text_poke_copy(void *addr, const void *opcode, size_t len)
>  
>  		s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched);
>  
> -		__text_poke((void *)ptr, opcode + patched, s);
> +		__text_poke(text_poke_memcpy, (void *)ptr, opcode + patched, s);
> +		patched += s;
> +	}
> +	mutex_unlock(&text_mutex);
> +	return addr;
> +}
> +
> +/**
> + * text_poke_set - memset into (an unused part of) RX memory
> + * @addr: address to modify
> + * @c: the byte to fill the area with
> + * @len: length to copy, could be more than 2x PAGE_SIZE
> + *
> + * This is useful to overwrite unused regions of RX memory with illegal
> + * instructions.
> + */
> +void *text_poke_set(void *addr, int c, size_t len)
> +{
> +	unsigned long start = (unsigned long)addr;
> +	size_t patched = 0;
> +
> +	if (WARN_ON_ONCE(core_kernel_text(start)))
> +		return NULL;
> +
> +	mutex_lock(&text_mutex);
> +	while (patched < len) {
> +		unsigned long ptr = start + patched;
> +		size_t s;
> +
> +		s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched);
> +
> +		__text_poke(text_poke_memset, (void *)ptr, (void *)&c, s);
>  		patched += s;
>  	}
>  	mutex_unlock(&text_mutex);
> -- 
> 2.30.2
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 5/8] bpf: use module_alloc_huge for bpf_prog_pack
  2022-05-21 20:06       ` Luis Chamberlain
@ 2022-05-24 17:40         ` Edgecombe, Rick P
  2022-05-24 22:08           ` Luis Chamberlain
  0 siblings, 1 reply; 17+ messages in thread
From: Edgecombe, Rick P @ 2022-05-24 17:40 UTC (permalink / raw)
  To: hch, mcgrof, dave
  Cc: linux-kernel, daniel, peterz, ast, bpf, kernel-team, linux-mm,
	song, Torvalds, Linus, arnd

On Sat, 2022-05-21 at 13:06 -0700, Luis Chamberlain wrote:
> On Sat, May 21, 2022 at 03:20:28AM +0000, Edgecombe, Rick P wrote:
> > On Fri, 2022-05-20 at 18:00 -0700, Luis Chamberlain wrote:
> > > although VM_FLUSH_RESET_PERMS is rather new my concern here is
> > > we're
> > > essentially enabling sloppy users to grow without also addressing
> > > what if we have to take the leash back to support
> > > VM_FLUSH_RESET_PERMS
> > > properly? If the hack to support this on other architectures
> > > other
> > > than
> > > x86 is as simple as the one you in vm_remove_mappings() today:
> > > 
> > >         if (flush_reset &&
> > > !IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
> > >                 set_memory_nx(addr, area->nr_pages);
> > >                 set_memory_rw(addr, area->nr_pages);
> > >         }
> > > 
> > > then I suppose this isn't a big deal. I'm just concerned here
> > > this
> > > being
> > > a slippery slope of sloppiness leading to something which we will
> > > regret later.
> > > 
> > > My intution tells me this shouldn't be a big issue, but I just
> > > want
> > > to
> > > confirm.
> > 
> > Yea, I commented the same concern on the last thread:
> > 
> > 
https://lore.kernel.org/lkml/83a69976cb93e69c5ad7a9511b5e57c402eee19d.camel@intel.com/
> > 
> > Song said he plans to make kprobes and ftrace work with this new
> > allocator. If that happens VM_FLUSH_RESET_PERMS would only have one
> > user - modules. Care to chime in with your plans for modules?
> 
> My plans are to not break things and to slowly tidy things up. If
> you see linux-next, things are at least starting to be split in
> nice pieces. With time, clean that further so to not break things.
> You were the one who added VM_FLUSH_RESET_PERMS, wasn't that to deal
> with secmem stuff? So wouldn't you know better what you recommend for
> it?

It was originally to correct some W^X issues. If a vmalloc was freed
with X permission it caused some exposure. The security side could be
fixed with copious set_memory() calls in just the right order, but
there was a suggestion to make vmalloc handle it so it could be done
more efficiently and callers would not have to know the details for at
least that part of the operation. This prog pack stuff is already more
efficient with respect to TLB flushes. So while VM_FLUSH_RESET_PERMS
could still improve it slightly, the situation is now probably better
than it was pre-VM_FLUSH_RESET_PERMS anyway. So that mostly leaves the
problem of some special knowledge leaking back into the callers.

With a next solution it would hopefully be handled differently still,
using the the unmapped page stuff Mike Rapoport was working on.

> 
> Seeing all this, given module_alloc() users are growing and seeing
> the tiny bit of growth of use in this space, I'd think we should
> rename module_alloc() to vmalloc_exec(), and likewise the same for
> module_memfree() to vmalloc_exec_free(). But it would be our first
> __weak vmalloc, and not sure if that's looked down upon.

A rename seems good to me. Module space is really just dynamically
allocated text space now. There used to be a vmalloc_exec() that
allocated text in vmalloc space, so maybe the name should have
something to denote that it goes into the special arch specific text
space.

> 
> > If there
> > are actual near term plans to keep working on this,
> > VM_FLUSH_RESET_PERMS might be changed again or turn into something
> > else. Like if we are about to re-think everything, then it doesn't
> > matter as much to fix what would then be old.
> 
> I think it's up to you as you added it and I'm not looking to add
> any bells or wistles, just tidy things up *slowly*.
> 
> > Besides not fixing VM_FLUSH_RESET_PERMS/hibernate though, I think
> > this
> > allocator still feels a little rough. For example I don't think we
> > actually know how much the huge mappings are helping.
> 
> Right, 100% agreed. The performance numbers provided are nice but
> they are not anything folks can reproduce at all. I hinted towards
> perf stuff which could be used and enable other users later to also
> use similar stats to showcase its value if they want to move to
> huge pages.
> 
> It is a side note, and perhaps a stupid question, as I don't grok mm,
> but I'm perplexed about the fact that if the value is seen so high
> towards
> huge pages for exec stuff in kernel, wouldn't there be a few folks
> who
> might want to try this for regular exec stuff? Wouldn't there be much
> more gains there?

Core kernel text is already 2MB mapped, on x86 at least. It indeed
helps performance. I'd like to see about 2MB module text. I can only
assume that it would help performance though. Some people wiser than me
in performance stuff suggested it should be tested to actually know.

> 
> > It is also
> > allocating memory in a big chunk from a single node and reusing it,
> > where before we were allocating based on numa node for each jit.
> > Would
> > some user's suffer from that? Maybe it's obvious to others, but I
> > would
> > have expected to see more discussion of MM things like that.
> 
> Curious, why was it moved to use a single node?

To allocate from the closest node you need to have per-node caches.
When I tried to do something similar to this with the grouped page
cache, having per-node caches was suggested should be required. I never
benchmarked the difference though.

> 
> > But I like general direction of caching and using text_poke() to
> > write
> > the jits a lot. However it works, it seems to make a big impact in
> > at
> > least some workloads.
> > 
> > So yea, seems sloppy, but probably (...I guess?) more good for
> > users
> > then sloppy for us.
> 
> The impact of sloppiness lies in possible odd bugs later and trying
> to
> decipher what was being done. So I do have concerns with the
> immediate
> tribal knowlege incurred by the current implementation.

I am also bothered by it. I'm glad to hear someone else cares. I can
think about doing it more incrementally. The problem is you kind of
need to know if you can integrate with all the module_alloc() users and
get sane behavior on the backend, to tell if your new interface is
actually any good.

This is pretty much how I think we can:
 - remove all special knowledge from callers
 - support all module_alloc() callers
 - do things more efficiently on x86
 - support all the arch specific extra capabilities that I know about

https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@intel.com/#r

It's why I shrug a little about writing caller code with special
knowledge in it. It's not really possible to avoid it completely with
the current interfaces IMO.

> What is your
> own roadmap for VM_FLUSH_RESET_PERMS? Sounds like a future possibly
> maybe re-do?

If it were me, I would start back with that RFC and try to move the
allocation side forward too. I haven't seen anything since, that makes
me think it was the wrong direction. But I have employer tasks that
take priority unfortunately. If anyone else wants to take a shot at it,
I can help review. Otherwise, hopefully I can get back to it someday.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 5/8] bpf: use module_alloc_huge for bpf_prog_pack
  2022-05-24 17:40         ` Edgecombe, Rick P
@ 2022-05-24 22:08           ` Luis Chamberlain
  2022-05-25  6:01             ` hch
  0 siblings, 1 reply; 17+ messages in thread
From: Luis Chamberlain @ 2022-05-24 22:08 UTC (permalink / raw)
  To: Edgecombe, Rick P, Christophe Leroy
  Cc: hch, dave, linux-kernel, daniel, peterz, ast, bpf, kernel-team,
	linux-mm, song, Torvalds, Linus, arnd, Adam Manzanares

On Tue, May 24, 2022 at 05:40:53PM +0000, Edgecombe, Rick P wrote:
> On Sat, 2022-05-21 at 13:06 -0700, Luis Chamberlain wrote:
> > On Sat, May 21, 2022 at 03:20:28AM +0000, Edgecombe, Rick P wrote:
> > > On Fri, 2022-05-20 at 18:00 -0700, Luis Chamberlain wrote:
> > > > although VM_FLUSH_RESET_PERMS is rather new my concern here is
> > > > we're
> > > > essentially enabling sloppy users to grow without also addressing
> > > > what if we have to take the leash back to support
> > > > VM_FLUSH_RESET_PERMS
> > > > properly? If the hack to support this on other architectures
> > > > other
> > > > than
> > > > x86 is as simple as the one you in vm_remove_mappings() today:
> > > > 
> > > >         if (flush_reset &&
> > > > !IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
> > > >                 set_memory_nx(addr, area->nr_pages);
> > > >                 set_memory_rw(addr, area->nr_pages);
> > > >         }
> > > > 
> > > > then I suppose this isn't a big deal. I'm just concerned here
> > > > this
> > > > being
> > > > a slippery slope of sloppiness leading to something which we will
> > > > regret later.
> > > > 
> > > > My intution tells me this shouldn't be a big issue, but I just
> > > > want
> > > > to
> > > > confirm.
> > > 
> > > Yea, I commented the same concern on the last thread:
> > > 
> > > 
> https://lore.kernel.org/lkml/83a69976cb93e69c5ad7a9511b5e57c402eee19d.camel@intel.com/
> > > 
> > > Song said he plans to make kprobes and ftrace work with this new
> > > allocator. If that happens VM_FLUSH_RESET_PERMS would only have one
> > > user - modules. Care to chime in with your plans for modules?
> > 
> > My plans are to not break things and to slowly tidy things up. If
> > you see linux-next, things are at least starting to be split in
> > nice pieces. With time, clean that further so to not break things.
> > You were the one who added VM_FLUSH_RESET_PERMS, wasn't that to deal
> > with secmem stuff? So wouldn't you know better what you recommend for
> > it?
> 
> It was originally to correct some W^X issues. If a vmalloc was freed
> with X permission it caused some exposure.

Perhaps clarifying this on the docs would help as it was not clear
on patch review taking a time machine, but that's as a vm-outsider.

> The security side could be
> fixed with copious set_memory() calls in just the right order, but
> there was a suggestion to make vmalloc handle it so it could be done
> more efficiently and callers would not have to know the details for at
> least that part of the operation.

Makes sense also given there are more users than modules using
module_alloc() now.

> This prog pack stuff is already more
> efficient with respect to TLB flushes. So while VM_FLUSH_RESET_PERMS
> could still improve it slightly, the situation is now probably better
> than it was pre-VM_FLUSH_RESET_PERMS anyway. So that mostly leaves the
> problem of some special knowledge leaking back into the callers.

OK that I think is a good summary then of the impact of not having
this generalized.

> With a next solution it would hopefully be handled differently still,
> using the the unmapped page stuff Mike Rapoport was working on.

Thanks for the heads ups.

> > Seeing all this, given module_alloc() users are growing and seeing
> > the tiny bit of growth of use in this space, I'd think we should
> > rename module_alloc() to vmalloc_exec(), and likewise the same for
> > module_memfree() to vmalloc_exec_free(). But it would be our first
> > __weak vmalloc, and not sure if that's looked down upon.
> 
> A rename seems good to me. Module space is really just dynamically
> allocated text space now. There used to be a vmalloc_exec() that
> allocated text in vmalloc space, 

Yes I saw that but it was generic and it did not do the arch-specific
override, and so that is why Christoph ripped it out and open coded
it on the only user, on module_alloc().

> so maybe the name should have
> something to denote that it goes into the special arch specific text
> space.

On the arch space other precedents I see in vmalloc space are
vm_pgprot_modify() which calls to pgprot_modify which arch can
override. I think we'll have to just keep the __weak effort
behing module_alloc(), we can strive for that post v5.19.

> > > If there
> > > are actual near term plans to keep working on this,
> > > VM_FLUSH_RESET_PERMS might be changed again or turn into something
> > > else. Like if we are about to re-think everything, then it doesn't
> > > matter as much to fix what would then be old.
> > 
> > I think it's up to you as you added it and I'm not looking to add
> > any bells or wistles, just tidy things up *slowly*.
> > 
> > > Besides not fixing VM_FLUSH_RESET_PERMS/hibernate though, I think
> > > this
> > > allocator still feels a little rough. For example I don't think we
> > > actually know how much the huge mappings are helping.
> > 
> > Right, 100% agreed. The performance numbers provided are nice but
> > they are not anything folks can reproduce at all. I hinted towards
> > perf stuff which could be used and enable other users later to also
> > use similar stats to showcase its value if they want to move to
> > huge pages.
> > 
> > It is a side note, and perhaps a stupid question, as I don't grok mm,
> > but I'm perplexed about the fact that if the value is seen so high
> > towards
> > huge pages for exec stuff in kernel, wouldn't there be a few folks
> > who
> > might want to try this for regular exec stuff? Wouldn't there be much
> > more gains there?
> 
> Core kernel text is already 2MB mapped, on x86 at least. It indeed
> helps performance. I'd like to see about 2MB module text.

Yeah that would make sense *if* the arch supports it. I went and read
your 2020 "[PATCH RFC 00/10] New permission vmalloc interface" and
I suspect some new work on modules will help with your goals.

There are some optimizations architectures will be able to do for
v5.19+ by selecting ARCH_WANTS_MODULES_DATA_IN_VMALLOC so that
module data uses vmalloc instead. This can be for two reasons:

1) On some architectures (like book3s/32) it is not possible to protect
against execution on a page basis. The exec stuff is mapped can be
mapped by different arch segment sizes (on book3s/32 that is 256M segments).
By default the module area is in an Exec segment while vmalloc area is in a
NoExec segment. Using vmalloc lets you muck with module data as noexec
on those architectures whereas before you could not.

2) By pushing more module data to vmalloc you also increase the probability
of module text to remain within a closer distance from kernel core text
and this reduces trampolines, this has been reported on arm first and
powerpc folks are following that lead.

So I suspect that using ARCH_WANTS_MODULES_DATA_IN_VMALLOC plays well
with your idea to separate at least the allocated space. The
optimizations seem to be for exec and to zap this as well and
hence your goal to use 2 MiB pages and fancy hacks for this.

Yes, generalizing this for all architectures will be hard, but I think
we can get enough arch folks to chime for a least a generic mechanism.

> I can only
> assume that it would help performance though. Some people wiser than me
> in performance stuff suggested it should be tested to actually know.

Folks speak of performance but I don't think we have generic baselines.
For kernel image text I suppose we can use boot times. For tracepoints /
ftrace and eBPF JIT and the rest I suppose we can use something like what
Dave suggested:

[0] https://lore.kernel.org/all/Yog+d+oR5TtPp2cs@bombadil.infradead.org/

But that still begs the question as to what to use to run perf with.

Perhaps we just need a generic kernel module_alloc() abuser selftest
which stresses the hell out of things with a few variability in ways
in which we want to do allocations (2 MiB pages, test different archs,
etc). I can work on that if folks think this can be useful as I don't
think we have anything generic at the moment.

> > > It is also
> > > allocating memory in a big chunk from a single node and reusing it,
> > > where before we were allocating based on numa node for each jit.
> > > Would
> > > some user's suffer from that? Maybe it's obvious to others, but I
> > > would
> > > have expected to see more discussion of MM things like that.
> > 
> > Curious, why was it moved to use a single node?
> 
> To allocate from the closest node you need to have per-node caches.
> When I tried to do something similar to this with the grouped page
> cache, having per-node caches was suggested should be required. I never
> benchmarked the difference though.

Sounds like tribal knowledge... 

> > > But I like general direction of caching and using text_poke() to
> > > write
> > > the jits a lot. However it works, it seems to make a big impact in
> > > at
> > > least some workloads.
> > > 
> > > So yea, seems sloppy, but probably (...I guess?) more good for
> > > users
> > > then sloppy for us.
> > 
> > The impact of sloppiness lies in possible odd bugs later and trying
> > to
> > decipher what was being done. So I do have concerns with the
> > immediate
> > tribal knowlege incurred by the current implementation.
> 
> I am also bothered by it. I'm glad to hear someone else cares. I can
> think about doing it more incrementally. The problem is you kind of
> need to know if you can integrate with all the module_alloc() users and
> get sane behavior on the backend, to tell if your new interface is
> actually any good.
> 
> This is pretty much how I think we can:
>  - remove all special knowledge from callers
>  - support all module_alloc() callers
>  - do things more efficiently on x86
>  - support all the arch specific extra capabilities that I know about
> 
> https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@intel.com/#r

Consider me interested, I'm not a fan of hacks and requing developers to
pick up on random tribal knowledge.

> It's why I shrug a little about writing caller code with special
> knowledge in it. It's not really possible to avoid it completely with
> the current interfaces IMO.

Yeah makes sense.

> > What is your
> > own roadmap for VM_FLUSH_RESET_PERMS? Sounds like a future possibly
> > maybe re-do?
> 
> If it were me, I would start back with that RFC and try to move the
> allocation side forward too. I haven't seen anything since, that makes
> me think it was the wrong direction.

Did you get enough arch folks involved?

> But I have employer tasks that
> take priority unfortunately. If anyone else wants to take a shot at it,
> I can help review. Otherwise, hopefully I can get back to it someday.

Sure, understood.

I can perhaps help on module_alloc() sefltest, and make module_alloc()
generic. Happy to review patches too.

  Luis

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 5/8] bpf: use module_alloc_huge for bpf_prog_pack
  2022-05-24 22:08           ` Luis Chamberlain
@ 2022-05-25  6:01             ` hch
  0 siblings, 0 replies; 17+ messages in thread
From: hch @ 2022-05-25  6:01 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Edgecombe, Rick P, Christophe Leroy, hch, dave, linux-kernel,
	daniel, peterz, ast, bpf, kernel-team, linux-mm, song, Torvalds,
	Linus, arnd, Adam Manzanares

On Tue, May 24, 2022 at 03:08:12PM -0700, Luis Chamberlain wrote:
> > A rename seems good to me. Module space is really just dynamically
> > allocated text space now. There used to be a vmalloc_exec() that
> > allocated text in vmalloc space, 
> 
> Yes I saw that but it was generic and it did not do the arch-specific
> override, and so that is why Christoph ripped it out and open coded
> it on the only user, on module_alloc().

It it also because random code does not have any business allocating
executable memory.  Executable memory in kernel is basically for
modules and module-like code like eBPF, and no one else has any business
allocating pages with the execute bit set (or the NX bit not set).

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-05-25  6:02 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-20  3:15 [PATCH v3 bpf-next 0/8] bpf_prog_pack followup Song Liu
2022-05-20  3:15 ` [PATCH v3 bpf-next 1/8] bpf: fill new bpf_prog_pack with illegal instructions Song Liu
2022-05-20  3:15 ` [PATCH v3 bpf-next 2/8] x86/alternative: introduce text_poke_set Song Liu
2022-05-22  5:38   ` Hyeonggon Yoo
2022-05-20  3:15 ` [PATCH v3 bpf-next 3/8] bpf: introduce bpf_arch_text_invalidate for bpf_prog_pack Song Liu
2022-05-20  3:15 ` [PATCH v3 bpf-next 4/8] module: introduce module_alloc_huge Song Liu
2022-05-20  3:15 ` [PATCH v3 bpf-next 5/8] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
2022-05-21  1:00   ` Luis Chamberlain
2022-05-21  1:20     ` Luis Chamberlain
2022-05-21  3:20     ` Edgecombe, Rick P
2022-05-21 20:06       ` Luis Chamberlain
2022-05-24 17:40         ` Edgecombe, Rick P
2022-05-24 22:08           ` Luis Chamberlain
2022-05-25  6:01             ` hch
2022-05-20  3:15 ` [PATCH v3 bpf-next 6/8] vmalloc: WARN for set_vm_flush_reset_perms() on huge pages Song Liu
2022-05-20  3:15 ` [PATCH v3 bpf-next 7/8] vmalloc: introduce huge_vmalloc_supported Song Liu
2022-05-20  3:15 ` [PATCH v3 bpf-next 8/8] bpf: simplify select_bpf_prog_pack_size Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.