linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: next/master bisection: baseline.login on qemu_arm64-virt-gicv3-uefi
       [not found] <65e9e748.a70a0220.606f7.53c0@mx.google.com>
@ 2024-03-07 16:36 ` Mark Brown
  2024-03-07 18:16   ` Song Liu
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Brown @ 2024-03-07 16:36 UTC (permalink / raw)
  To: Puranjay Mohan, Song Liu, Catalin Marinas, Alexei Starovoitov,
	Will Deacon
  Cc: kernelci-results, bot, linux-arm-kernel, linux-mm, bpf, linux-next

[-- Attachment #1: Type: text/plain, Size: 17605 bytes --]

On Thu, Mar 07, 2024 at 08:12:00AM -0800, KernelCI bot wrote:

The KernelCI bisection bot found a boot regression n today's -next on
qemu arm64 UEFI platforms with 64K pages which was bisected to commit
1dad391daef1 ("bpf, arm64: use bpf_prog_pack for memory management").
We OOM quite early in boot:

<6>[    0.174998] DMI: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
<4>[    0.189015] swapper/0 invoked oom-killer: gfp_mask=0x2cc2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN), order=0, oom_score_adj=0
<4>[    0.189835] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc7-next-20240307 #1
<4>[    0.190093] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015

...


<4>[    0.192209]  vmalloc+0x84/0x94
<4>[    0.192301]  bpf_jit_alloc_exec+0x10/0x1c
<4>[    0.192402]  bpf_prog_pack_alloc+0x120/0x26c
<4>[    0.192505]  bpf_jit_binary_pack_alloc+0x6c/0x128
<4>[    0.192611]  bpf_int_jit_compile+0x5f4/0x69c
<4>[    0.192718]  bpf_prog_select_runtime+0x11c/0x194
<4>[    0.192832]  bpf_migrate_filter+0x120/0x164
<4>[    0.192940]  bpf_prog_create+0x124/0x154
<4>[    0.193045]  ptp_classifier_init+0x40/0x70
<4>[    0.193152]  sock_init+0xa4/0xbc

It just seems to be these platforms that are affected, the same
platforms without UEFI are fine and no other platforms seem to be
showing similar behaviour.  A full log from one of the failed boots can
be seen here:

   https://storage.kernelci.org/next/master/next-20240307/arm64/defconfig+CONFIG_ARM64_64K_PAGES=y/gcc-10/lab-broonie/baseline-qemu_arm64-virt-gicv2-uefi.txt

I've left the full report from the bot below, including links to other
boot logs and further information as well as a Reported-by for the bot.

> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has      *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.      *
> *                                                               *
> * If you do send a fix, please include this trailer:            *
> *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
> *                                                               *
> * Hope this helps!                                              *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on qemu_arm64-virt-gicv3-uefi
> 
> Summary:
>   Start:      1843e16d2df9 Add linux-next specific files for 20240307
>   Plain log:  https://storage.kernelci.org/next/master/next-20240307/arm64/defconfig+CONFIG_ARM64_64K_PAGES=y/gcc-10/lab-collabora/baseline-qemu_arm64-virt-gicv3-uefi.txt
>   HTML log:   https://storage.kernelci.org/next/master/next-20240307/arm64/defconfig+CONFIG_ARM64_64K_PAGES=y/gcc-10/lab-collabora/baseline-qemu_arm64-virt-gicv3-uefi.html
>   Result:     1dad391daef1 bpf, arm64: use bpf_prog_pack for memory management
> 
> Checks:
>   revert:     PASS
>   verify:     PASS
> 
> Parameters:
>   Tree:       next
>   URL:        https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch:     master
>   Target:     qemu_arm64-virt-gicv3-uefi
>   CPU arch:   arm64
>   Lab:        lab-collabora
>   Compiler:   gcc-10
>   Config:     defconfig+CONFIG_ARM64_64K_PAGES=y
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> -------------------------------------------------------------------------------
> commit 1dad391daef129e01e28206b8d586608ff026548
> Author: Puranjay Mohan <puranjay12@gmail.com>
> Date:   Wed Feb 28 14:18:24 2024 +0000
> 
>     bpf, arm64: use bpf_prog_pack for memory management
>     
>     Use bpf_jit_binary_pack_alloc for memory management of JIT binaries in
>     ARM64 BPF JIT. The bpf_jit_binary_pack_alloc creates a pair of RW and RX
>     buffers. The JIT writes the program into the RW buffer. When the JIT is
>     done, the program is copied to the final RX buffer
>     with bpf_jit_binary_pack_finalize.
>     
>     Implement bpf_arch_text_copy() and bpf_arch_text_invalidate() for ARM64
>     JIT as these functions are required by bpf_jit_binary_pack allocator.
>     
>     Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
>     Acked-by: Song Liu <song@kernel.org>
>     Acked-by: Catalin Marinas <catalin.marinas@arm.com>
>     Link: https://lore.kernel.org/r/20240228141824.119877-3-puranjay12@gmail.com
>     Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> 
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 20720ec346b8..5afc7a525eca 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -76,6 +76,7 @@ struct jit_ctx {
>  	int *offset;
>  	int exentry_idx;
>  	__le32 *image;
> +	__le32 *ro_image;
>  	u32 stack_size;
>  	int fpb_offset;
>  };
> @@ -205,6 +206,14 @@ static void jit_fill_hole(void *area, unsigned int size)
>  		*ptr++ = cpu_to_le32(AARCH64_BREAK_FAULT);
>  }
>  
> +int bpf_arch_text_invalidate(void *dst, size_t len)
> +{
> +	if (!aarch64_insn_set(dst, AARCH64_BREAK_FAULT, len))
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
>  static inline int epilogue_offset(const struct jit_ctx *ctx)
>  {
>  	int to = ctx->epilogue_offset;
> @@ -746,7 +755,8 @@ static int add_exception_handler(const struct bpf_insn *insn,
>  				 struct jit_ctx *ctx,
>  				 int dst_reg)
>  {
> -	off_t offset;
> +	off_t ins_offset;
> +	off_t fixup_offset;
>  	unsigned long pc;
>  	struct exception_table_entry *ex;
>  
> @@ -763,12 +773,17 @@ static int add_exception_handler(const struct bpf_insn *insn,
>  		return -EINVAL;
>  
>  	ex = &ctx->prog->aux->extable[ctx->exentry_idx];
> -	pc = (unsigned long)&ctx->image[ctx->idx - 1];
> +	pc = (unsigned long)&ctx->ro_image[ctx->idx - 1];
>  
> -	offset = pc - (long)&ex->insn;
> -	if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
> +	/*
> +	 * This is the relative offset of the instruction that may fault from
> +	 * the exception table itself. This will be written to the exception
> +	 * table and if this instruction faults, the destination register will
> +	 * be set to '0' and the execution will jump to the next instruction.
> +	 */
> +	ins_offset = pc - (long)&ex->insn;
> +	if (WARN_ON_ONCE(ins_offset >= 0 || ins_offset < INT_MIN))
>  		return -ERANGE;
> -	ex->insn = offset;
>  
>  	/*
>  	 * Since the extable follows the program, the fixup offset is always
> @@ -777,12 +792,25 @@ static int add_exception_handler(const struct bpf_insn *insn,
>  	 * bits. We don't need to worry about buildtime or runtime sort
>  	 * modifying the upper bits because the table is already sorted, and
>  	 * isn't part of the main exception table.
> +	 *
> +	 * The fixup_offset is set to the next instruction from the instruction
> +	 * that may fault. The execution will jump to this after handling the
> +	 * fault.
>  	 */
> -	offset = (long)&ex->fixup - (pc + AARCH64_INSN_SIZE);
> -	if (!FIELD_FIT(BPF_FIXUP_OFFSET_MASK, offset))
> +	fixup_offset = (long)&ex->fixup - (pc + AARCH64_INSN_SIZE);
> +	if (!FIELD_FIT(BPF_FIXUP_OFFSET_MASK, fixup_offset))
>  		return -ERANGE;
>  
> -	ex->fixup = FIELD_PREP(BPF_FIXUP_OFFSET_MASK, offset) |
> +	/*
> +	 * The offsets above have been calculated using the RO buffer but we
> +	 * need to use the R/W buffer for writes.
> +	 * switch ex to rw buffer for writing.
> +	 */
> +	ex = (void *)ctx->image + ((void *)ex - (void *)ctx->ro_image);
> +
> +	ex->insn = ins_offset;
> +
> +	ex->fixup = FIELD_PREP(BPF_FIXUP_OFFSET_MASK, fixup_offset) |
>  		    FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
>  
>  	ex->type = EX_TYPE_BPF;
> @@ -1550,7 +1578,8 @@ static inline void bpf_flush_icache(void *start, void *end)
>  
>  struct arm64_jit_data {
>  	struct bpf_binary_header *header;
> -	u8 *image;
> +	u8 *ro_image;
> +	struct bpf_binary_header *ro_header;
>  	struct jit_ctx ctx;
>  };
>  
> @@ -1559,12 +1588,14 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>  	int image_size, prog_size, extable_size, extable_align, extable_offset;
>  	struct bpf_prog *tmp, *orig_prog = prog;
>  	struct bpf_binary_header *header;
> +	struct bpf_binary_header *ro_header;
>  	struct arm64_jit_data *jit_data;
>  	bool was_classic = bpf_prog_was_classic(prog);
>  	bool tmp_blinded = false;
>  	bool extra_pass = false;
>  	struct jit_ctx ctx;
>  	u8 *image_ptr;
> +	u8 *ro_image_ptr;
>  
>  	if (!prog->jit_requested)
>  		return orig_prog;
> @@ -1591,8 +1622,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>  	}
>  	if (jit_data->ctx.offset) {
>  		ctx = jit_data->ctx;
> -		image_ptr = jit_data->image;
> +		ro_image_ptr = jit_data->ro_image;
> +		ro_header = jit_data->ro_header;
>  		header = jit_data->header;
> +		image_ptr = (void *)header + ((void *)ro_image_ptr
> +						 - (void *)ro_header);
>  		extra_pass = true;
>  		prog_size = sizeof(u32) * ctx.idx;
>  		goto skip_init_ctx;
> @@ -1637,18 +1671,27 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>  	/* also allocate space for plt target */
>  	extable_offset = round_up(prog_size + PLT_TARGET_SIZE, extable_align);
>  	image_size = extable_offset + extable_size;
> -	header = bpf_jit_binary_alloc(image_size, &image_ptr,
> -				      sizeof(u32), jit_fill_hole);
> -	if (header == NULL) {
> +	ro_header = bpf_jit_binary_pack_alloc(image_size, &ro_image_ptr,
> +					      sizeof(u32), &header, &image_ptr,
> +					      jit_fill_hole);
> +	if (!ro_header) {
>  		prog = orig_prog;
>  		goto out_off;
>  	}
>  
>  	/* 2. Now, the actual pass. */
>  
> +	/*
> +	 * Use the image(RW) for writing the JITed instructions. But also save
> +	 * the ro_image(RX) for calculating the offsets in the image. The RW
> +	 * image will be later copied to the RX image from where the program
> +	 * will run. The bpf_jit_binary_pack_finalize() will do this copy in the
> +	 * final step.
> +	 */
>  	ctx.image = (__le32 *)image_ptr;
> +	ctx.ro_image = (__le32 *)ro_image_ptr;
>  	if (extable_size)
> -		prog->aux->extable = (void *)image_ptr + extable_offset;
> +		prog->aux->extable = (void *)ro_image_ptr + extable_offset;
>  skip_init_ctx:
>  	ctx.idx = 0;
>  	ctx.exentry_idx = 0;
> @@ -1656,9 +1699,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>  	build_prologue(&ctx, was_classic, prog->aux->exception_cb);
>  
>  	if (build_body(&ctx, extra_pass)) {
> -		bpf_jit_binary_free(header);
>  		prog = orig_prog;
> -		goto out_off;
> +		goto out_free_hdr;
>  	}
>  
>  	build_epilogue(&ctx, prog->aux->exception_cb);
> @@ -1666,34 +1708,44 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>  
>  	/* 3. Extra pass to validate JITed code. */
>  	if (validate_ctx(&ctx)) {
> -		bpf_jit_binary_free(header);
>  		prog = orig_prog;
> -		goto out_off;
> +		goto out_free_hdr;
>  	}
>  
>  	/* And we're done. */
>  	if (bpf_jit_enable > 1)
>  		bpf_jit_dump(prog->len, prog_size, 2, ctx.image);
>  
> -	bpf_flush_icache(header, ctx.image + ctx.idx);
> -
>  	if (!prog->is_func || extra_pass) {
>  		if (extra_pass && ctx.idx != jit_data->ctx.idx) {
>  			pr_err_once("multi-func JIT bug %d != %d\n",
>  				    ctx.idx, jit_data->ctx.idx);
> -			bpf_jit_binary_free(header);
>  			prog->bpf_func = NULL;
>  			prog->jited = 0;
>  			prog->jited_len = 0;
> +			goto out_free_hdr;
> +		}
> +		if (WARN_ON(bpf_jit_binary_pack_finalize(prog, ro_header,
> +							 header))) {
> +			/* ro_header has been freed */
> +			ro_header = NULL;
> +			prog = orig_prog;
>  			goto out_off;
>  		}
> -		bpf_jit_binary_lock_ro(header);
> +		/*
> +		 * The instructions have now been copied to the ROX region from
> +		 * where they will execute. Now the data cache has to be cleaned to
> +		 * the PoU and the I-cache has to be invalidated for the VAs.
> +		 */
> +		bpf_flush_icache(ro_header, ctx.ro_image + ctx.idx);
>  	} else {
>  		jit_data->ctx = ctx;
> -		jit_data->image = image_ptr;
> +		jit_data->ro_image = ro_image_ptr;
>  		jit_data->header = header;
> +		jit_data->ro_header = ro_header;
>  	}
> -	prog->bpf_func = (void *)ctx.image;
> +
> +	prog->bpf_func = (void *)ctx.ro_image;
>  	prog->jited = 1;
>  	prog->jited_len = prog_size;
>  
> @@ -1714,6 +1766,14 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>  		bpf_jit_prog_release_other(prog, prog == orig_prog ?
>  					   tmp : orig_prog);
>  	return prog;
> +
> +out_free_hdr:
> +	if (header) {
> +		bpf_arch_text_copy(&ro_header->size, &header->size,
> +				   sizeof(header->size));
> +		bpf_jit_binary_pack_free(ro_header, header);
> +	}
> +	goto out_off;
>  }
>  
>  bool bpf_jit_supports_kfunc_call(void)
> @@ -1721,6 +1781,13 @@ bool bpf_jit_supports_kfunc_call(void)
>  	return true;
>  }
>  
> +void *bpf_arch_text_copy(void *dst, void *src, size_t len)
> +{
> +	if (!aarch64_insn_copy(dst, src, len))
> +		return ERR_PTR(-EINVAL);
> +	return dst;
> +}
> +
>  u64 bpf_jit_alloc_exec_limit(void)
>  {
>  	return VMALLOC_END - VMALLOC_START;
> @@ -2359,3 +2426,27 @@ bool bpf_jit_supports_exceptions(void)
>  	 */
>  	return true;
>  }
> +
> +void bpf_jit_free(struct bpf_prog *prog)
> +{
> +	if (prog->jited) {
> +		struct arm64_jit_data *jit_data = prog->aux->jit_data;
> +		struct bpf_binary_header *hdr;
> +
> +		/*
> +		 * If we fail the final pass of JIT (from jit_subprogs),
> +		 * the program may not be finalized yet. Call finalize here
> +		 * before freeing it.
> +		 */
> +		if (jit_data) {
> +			bpf_arch_text_copy(&jit_data->ro_header->size, &jit_data->header->size,
> +					   sizeof(jit_data->header->size));
> +			kfree(jit_data);
> +		}
> +		hdr = bpf_jit_binary_pack_hdr(prog);
> +		bpf_jit_binary_pack_free(hdr, NULL);
> +		WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog));
> +	}
> +
> +	bpf_prog_unlock_free(prog);
> +}
> -------------------------------------------------------------------------------
> 
> 
> Git bisection log:
> 
> -------------------------------------------------------------------------------
> git bisect start
> # good: [67be068d31d423b857ffd8c34dbcc093f8dfff76] Merge tag 'vfs-6.8-release.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
> git bisect good 67be068d31d423b857ffd8c34dbcc093f8dfff76
> # bad: [1843e16d2df9d98427ef8045589571749d627cf7] Add linux-next specific files for 20240307
> git bisect bad 1843e16d2df9d98427ef8045589571749d627cf7
> # bad: [1a0a33e22e715a4cc7bb2709cfda8e83b8c756b2] Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
> git bisect bad 1a0a33e22e715a4cc7bb2709cfda8e83b8c756b2
> # good: [5a0d6b0465ab18fefa8abc9afbabce19e7de2fe5] Merge branch 'nfsd-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
> git bisect good 5a0d6b0465ab18fefa8abc9afbabce19e7de2fe5
> # good: [a5fcf74d80bec9948701ff0f7529ae96a0c4a41c] inet: annotate data-races around ifa->ifa_valid_lft
> git bisect good a5fcf74d80bec9948701ff0f7529ae96a0c4a41c
> # good: [7aaab3e1fcee995b70d344687fa0f33b0b9d33f7] Merge branch 'hwmon-next' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git
> git bisect good 7aaab3e1fcee995b70d344687fa0f33b0b9d33f7
> # good: [dfa2c103738019033bad44d8be7919ad8bc19dd6] Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
> git bisect good dfa2c103738019033bad44d8be7919ad8bc19dd6
> # bad: [564ae6794ec5f050bb6df4446820777e7253f960] mptcp: get addr in userspace pm list
> git bisect bad 564ae6794ec5f050bb6df4446820777e7253f960
> # good: [2c21a0f67c8ce334b8a58332e8c2d71694bef0ab] Merge branch 'Support PTR_MAYBE_NULL for struct_ops arguments.'
> git bisect good 2c21a0f67c8ce334b8a58332e8c2d71694bef0ab
> # bad: [4ab597d2962195ca4f01a429f9b2ea87ee684be3] net: bareudp: Remove generic .ndo_get_stats64
> git bisect bad 4ab597d2962195ca4f01a429f9b2ea87ee684be3
> # good: [dfe6625df48ec54c6dc9b86d361f26962d09de88] bpf: introduce in_sleepable() helper
> git bisect good dfe6625df48ec54c6dc9b86d361f26962d09de88
> # bad: [f2e81192e07e87897ff1296c96775eceea8f582a] bpftool: Add an example for struct_ops map and shadow type.
> git bisect bad f2e81192e07e87897ff1296c96775eceea8f582a
> # bad: [1dad391daef129e01e28206b8d586608ff026548] bpf, arm64: use bpf_prog_pack for memory management
> git bisect bad 1dad391daef129e01e28206b8d586608ff026548
> # good: [22fc0e80aeb5c0c1377e6c02d7248f8fbf5df7fc] bpf, arm64: support exceptions
> git bisect good 22fc0e80aeb5c0c1377e6c02d7248f8fbf5df7fc
> # good: [451c3cab9a65e656c3b3d106831fc02d56b8c34a] arm64: patching: implement text_poke API
> git bisect good 451c3cab9a65e656c3b3d106831fc02d56b8c34a
> # first bad commit: [1dad391daef129e01e28206b8d586608ff026548] bpf, arm64: use bpf_prog_pack for memory management
> -------------------------------------------------------------------------------
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Groups.io Links: You receive all messages sent to this group.
> View/Reply Online (#52295): https://groups.io/g/kernelci-results/message/52295
> Mute This Topic: https://groups.io/mt/104790392/1131744
> Group Owner: kernelci-results+owner@groups.io
> Unsubscribe: https://groups.io/g/kernelci-results/unsub [broonie@kernel.org]
> -=-=-=-=-=-=-=-=-=-=-=-
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: next/master bisection: baseline.login on qemu_arm64-virt-gicv3-uefi
  2024-03-07 16:36 ` next/master bisection: baseline.login on qemu_arm64-virt-gicv3-uefi Mark Brown
@ 2024-03-07 18:16   ` Song Liu
  2024-03-07 18:34     ` Mark Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Song Liu @ 2024-03-07 18:16 UTC (permalink / raw)
  To: Mark Brown
  Cc: Puranjay Mohan, Catalin Marinas, Alexei Starovoitov, Will Deacon,
	kernelci-results, bot, linux-arm-kernel, linux-mm, bpf,
	linux-next

On Thu, Mar 7, 2024 at 8:36 AM Mark Brown <broonie@kernel.org> wrote:
>
> On Thu, Mar 07, 2024 at 08:12:00AM -0800, KernelCI bot wrote:
>
> The KernelCI bisection bot found a boot regression n today's -next on
> qemu arm64 UEFI platforms with 64K pages which was bisected to commit
> 1dad391daef1 ("bpf, arm64: use bpf_prog_pack for memory management").
> We OOM quite early in boot:

IIUC, 64kB pages means 512MB PMD. I think that's indeed too big. We
will need some logic to limit such cases.

Song

>
> <6>[    0.174998] DMI: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> <4>[    0.189015] swapper/0 invoked oom-killer: gfp_mask=0x2cc2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN), order=0, oom_score_adj=0
> <4>[    0.189835] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc7-next-20240307 #1
> <4>[    0.190093] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>
> ...
>
>
> <4>[    0.192209]  vmalloc+0x84/0x94
> <4>[    0.192301]  bpf_jit_alloc_exec+0x10/0x1c
> <4>[    0.192402]  bpf_prog_pack_alloc+0x120/0x26c
> <4>[    0.192505]  bpf_jit_binary_pack_alloc+0x6c/0x128
> <4>[    0.192611]  bpf_int_jit_compile+0x5f4/0x69c
> <4>[    0.192718]  bpf_prog_select_runtime+0x11c/0x194
> <4>[    0.192832]  bpf_migrate_filter+0x120/0x164
> <4>[    0.192940]  bpf_prog_create+0x124/0x154
> <4>[    0.193045]  ptp_classifier_init+0x40/0x70
> <4>[    0.193152]  sock_init+0xa4/0xbc
>
> It just seems to be these platforms that are affected, the same
> platforms without UEFI are fine and no other platforms seem to be
> showing similar behaviour.  A full log from one of the failed boots can
> be seen here:
>
>    https://storage.kernelci.org/next/master/next-20240307/arm64/defconfig+CONFIG_ARM64_64K_PAGES=y/gcc-10/lab-broonie/baseline-qemu_arm64-virt-gicv2-uefi.txt
>
> I've left the full report from the bot below, including links to other
> boot logs and further information as well as a Reported-by for the bot.
>
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> > * This automated bisection report was sent to you on the basis  *
> > * that you may be involved with the breaking commit it has      *
> > * found.  No manual investigation has been done to verify it,   *
> > * and the root cause of the problem may be somewhere else.      *
> > *                                                               *
> > * If you do send a fix, please include this trailer:            *
> > *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
> > *                                                               *
> > * Hope this helps!                                              *
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> >
> > next/master bisection: baseline.login on qemu_arm64-virt-gicv3-uefi
> >
> > Summary:
> >   Start:      1843e16d2df9 Add linux-next specific files for 20240307
> >   Plain log:  https://storage.kernelci.org/next/master/next-20240307/arm64/defconfig+CONFIG_ARM64_64K_PAGES=y/gcc-10/lab-collabora/baseline-qemu_arm64-virt-gicv3-uefi.txt
> >   HTML log:   https://storage.kernelci.org/next/master/next-20240307/arm64/defconfig+CONFIG_ARM64_64K_PAGES=y/gcc-10/lab-collabora/baseline-qemu_arm64-virt-gicv3-uefi.html
> >   Result:     1dad391daef1 bpf, arm64: use bpf_prog_pack for memory management
> >
> > Checks:
> >   revert:     PASS
> >   verify:     PASS
> >
> > Parameters:
> >   Tree:       next
> >   URL:        https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> >   Branch:     master
> >   Target:     qemu_arm64-virt-gicv3-uefi
> >   CPU arch:   arm64
> >   Lab:        lab-collabora
> >   Compiler:   gcc-10
> >   Config:     defconfig+CONFIG_ARM64_64K_PAGES=y
> >   Test case:  baseline.login
> >
> > Breaking commit found:
> >
> > -------------------------------------------------------------------------------
> > commit 1dad391daef129e01e28206b8d586608ff026548
> > Author: Puranjay Mohan <puranjay12@gmail.com>
> > Date:   Wed Feb 28 14:18:24 2024 +0000
> >
> >     bpf, arm64: use bpf_prog_pack for memory management
> >
> >     Use bpf_jit_binary_pack_alloc for memory management of JIT binaries in
> >     ARM64 BPF JIT. The bpf_jit_binary_pack_alloc creates a pair of RW and RX
> >     buffers. The JIT writes the program into the RW buffer. When the JIT is
> >     done, the program is copied to the final RX buffer
> >     with bpf_jit_binary_pack_finalize.
> >
> >     Implement bpf_arch_text_copy() and bpf_arch_text_invalidate() for ARM64
> >     JIT as these functions are required by bpf_jit_binary_pack allocator.
> >
> >     Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
> >     Acked-by: Song Liu <song@kernel.org>
> >     Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> >     Link: https://lore.kernel.org/r/20240228141824.119877-3-puranjay12@gmail.com
> >     Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> >
> > diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> > index 20720ec346b8..5afc7a525eca 100644
> > --- a/arch/arm64/net/bpf_jit_comp.c
> > +++ b/arch/arm64/net/bpf_jit_comp.c
> > @@ -76,6 +76,7 @@ struct jit_ctx {
> >       int *offset;
> >       int exentry_idx;
> >       __le32 *image;
> > +     __le32 *ro_image;
> >       u32 stack_size;
> >       int fpb_offset;
> >  };
> > @@ -205,6 +206,14 @@ static void jit_fill_hole(void *area, unsigned int size)
> >               *ptr++ = cpu_to_le32(AARCH64_BREAK_FAULT);
> >  }
> >
> > +int bpf_arch_text_invalidate(void *dst, size_t len)
> > +{
> > +     if (!aarch64_insn_set(dst, AARCH64_BREAK_FAULT, len))
> > +             return -EINVAL;
> > +
> > +     return 0;
> > +}
> > +
> >  static inline int epilogue_offset(const struct jit_ctx *ctx)
> >  {
> >       int to = ctx->epilogue_offset;
> > @@ -746,7 +755,8 @@ static int add_exception_handler(const struct bpf_insn *insn,
> >                                struct jit_ctx *ctx,
> >                                int dst_reg)
> >  {
> > -     off_t offset;
> > +     off_t ins_offset;
> > +     off_t fixup_offset;
> >       unsigned long pc;
> >       struct exception_table_entry *ex;
> >
> > @@ -763,12 +773,17 @@ static int add_exception_handler(const struct bpf_insn *insn,
> >               return -EINVAL;
> >
> >       ex = &ctx->prog->aux->extable[ctx->exentry_idx];
> > -     pc = (unsigned long)&ctx->image[ctx->idx - 1];
> > +     pc = (unsigned long)&ctx->ro_image[ctx->idx - 1];
> >
> > -     offset = pc - (long)&ex->insn;
> > -     if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
> > +     /*
> > +      * This is the relative offset of the instruction that may fault from
> > +      * the exception table itself. This will be written to the exception
> > +      * table and if this instruction faults, the destination register will
> > +      * be set to '0' and the execution will jump to the next instruction.
> > +      */
> > +     ins_offset = pc - (long)&ex->insn;
> > +     if (WARN_ON_ONCE(ins_offset >= 0 || ins_offset < INT_MIN))
> >               return -ERANGE;
> > -     ex->insn = offset;
> >
> >       /*
> >        * Since the extable follows the program, the fixup offset is always
> > @@ -777,12 +792,25 @@ static int add_exception_handler(const struct bpf_insn *insn,
> >        * bits. We don't need to worry about buildtime or runtime sort
> >        * modifying the upper bits because the table is already sorted, and
> >        * isn't part of the main exception table.
> > +      *
> > +      * The fixup_offset is set to the next instruction from the instruction
> > +      * that may fault. The execution will jump to this after handling the
> > +      * fault.
> >        */
> > -     offset = (long)&ex->fixup - (pc + AARCH64_INSN_SIZE);
> > -     if (!FIELD_FIT(BPF_FIXUP_OFFSET_MASK, offset))
> > +     fixup_offset = (long)&ex->fixup - (pc + AARCH64_INSN_SIZE);
> > +     if (!FIELD_FIT(BPF_FIXUP_OFFSET_MASK, fixup_offset))
> >               return -ERANGE;
> >
> > -     ex->fixup = FIELD_PREP(BPF_FIXUP_OFFSET_MASK, offset) |
> > +     /*
> > +      * The offsets above have been calculated using the RO buffer but we
> > +      * need to use the R/W buffer for writes.
> > +      * switch ex to rw buffer for writing.
> > +      */
> > +     ex = (void *)ctx->image + ((void *)ex - (void *)ctx->ro_image);
> > +
> > +     ex->insn = ins_offset;
> > +
> > +     ex->fixup = FIELD_PREP(BPF_FIXUP_OFFSET_MASK, fixup_offset) |
> >                   FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
> >
> >       ex->type = EX_TYPE_BPF;
> > @@ -1550,7 +1578,8 @@ static inline void bpf_flush_icache(void *start, void *end)
> >
> >  struct arm64_jit_data {
> >       struct bpf_binary_header *header;
> > -     u8 *image;
> > +     u8 *ro_image;
> > +     struct bpf_binary_header *ro_header;
> >       struct jit_ctx ctx;
> >  };
> >
> > @@ -1559,12 +1588,14 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
> >       int image_size, prog_size, extable_size, extable_align, extable_offset;
> >       struct bpf_prog *tmp, *orig_prog = prog;
> >       struct bpf_binary_header *header;
> > +     struct bpf_binary_header *ro_header;
> >       struct arm64_jit_data *jit_data;
> >       bool was_classic = bpf_prog_was_classic(prog);
> >       bool tmp_blinded = false;
> >       bool extra_pass = false;
> >       struct jit_ctx ctx;
> >       u8 *image_ptr;
> > +     u8 *ro_image_ptr;
> >
> >       if (!prog->jit_requested)
> >               return orig_prog;
> > @@ -1591,8 +1622,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
> >       }
> >       if (jit_data->ctx.offset) {
> >               ctx = jit_data->ctx;
> > -             image_ptr = jit_data->image;
> > +             ro_image_ptr = jit_data->ro_image;
> > +             ro_header = jit_data->ro_header;
> >               header = jit_data->header;
> > +             image_ptr = (void *)header + ((void *)ro_image_ptr
> > +                                              - (void *)ro_header);
> >               extra_pass = true;
> >               prog_size = sizeof(u32) * ctx.idx;
> >               goto skip_init_ctx;
> > @@ -1637,18 +1671,27 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
> >       /* also allocate space for plt target */
> >       extable_offset = round_up(prog_size + PLT_TARGET_SIZE, extable_align);
> >       image_size = extable_offset + extable_size;
> > -     header = bpf_jit_binary_alloc(image_size, &image_ptr,
> > -                                   sizeof(u32), jit_fill_hole);
> > -     if (header == NULL) {
> > +     ro_header = bpf_jit_binary_pack_alloc(image_size, &ro_image_ptr,
> > +                                           sizeof(u32), &header, &image_ptr,
> > +                                           jit_fill_hole);
> > +     if (!ro_header) {
> >               prog = orig_prog;
> >               goto out_off;
> >       }
> >
> >       /* 2. Now, the actual pass. */
> >
> > +     /*
> > +      * Use the image(RW) for writing the JITed instructions. But also save
> > +      * the ro_image(RX) for calculating the offsets in the image. The RW
> > +      * image will be later copied to the RX image from where the program
> > +      * will run. The bpf_jit_binary_pack_finalize() will do this copy in the
> > +      * final step.
> > +      */
> >       ctx.image = (__le32 *)image_ptr;
> > +     ctx.ro_image = (__le32 *)ro_image_ptr;
> >       if (extable_size)
> > -             prog->aux->extable = (void *)image_ptr + extable_offset;
> > +             prog->aux->extable = (void *)ro_image_ptr + extable_offset;
> >  skip_init_ctx:
> >       ctx.idx = 0;
> >       ctx.exentry_idx = 0;
> > @@ -1656,9 +1699,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
> >       build_prologue(&ctx, was_classic, prog->aux->exception_cb);
> >
> >       if (build_body(&ctx, extra_pass)) {
> > -             bpf_jit_binary_free(header);
> >               prog = orig_prog;
> > -             goto out_off;
> > +             goto out_free_hdr;
> >       }
> >
> >       build_epilogue(&ctx, prog->aux->exception_cb);
> > @@ -1666,34 +1708,44 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
> >
> >       /* 3. Extra pass to validate JITed code. */
> >       if (validate_ctx(&ctx)) {
> > -             bpf_jit_binary_free(header);
> >               prog = orig_prog;
> > -             goto out_off;
> > +             goto out_free_hdr;
> >       }
> >
> >       /* And we're done. */
> >       if (bpf_jit_enable > 1)
> >               bpf_jit_dump(prog->len, prog_size, 2, ctx.image);
> >
> > -     bpf_flush_icache(header, ctx.image + ctx.idx);
> > -
> >       if (!prog->is_func || extra_pass) {
> >               if (extra_pass && ctx.idx != jit_data->ctx.idx) {
> >                       pr_err_once("multi-func JIT bug %d != %d\n",
> >                                   ctx.idx, jit_data->ctx.idx);
> > -                     bpf_jit_binary_free(header);
> >                       prog->bpf_func = NULL;
> >                       prog->jited = 0;
> >                       prog->jited_len = 0;
> > +                     goto out_free_hdr;
> > +             }
> > +             if (WARN_ON(bpf_jit_binary_pack_finalize(prog, ro_header,
> > +                                                      header))) {
> > +                     /* ro_header has been freed */
> > +                     ro_header = NULL;
> > +                     prog = orig_prog;
> >                       goto out_off;
> >               }
> > -             bpf_jit_binary_lock_ro(header);
> > +             /*
> > +              * The instructions have now been copied to the ROX region from
> > +              * where they will execute. Now the data cache has to be cleaned to
> > +              * the PoU and the I-cache has to be invalidated for the VAs.
> > +              */
> > +             bpf_flush_icache(ro_header, ctx.ro_image + ctx.idx);
> >       } else {
> >               jit_data->ctx = ctx;
> > -             jit_data->image = image_ptr;
> > +             jit_data->ro_image = ro_image_ptr;
> >               jit_data->header = header;
> > +             jit_data->ro_header = ro_header;
> >       }
> > -     prog->bpf_func = (void *)ctx.image;
> > +
> > +     prog->bpf_func = (void *)ctx.ro_image;
> >       prog->jited = 1;
> >       prog->jited_len = prog_size;
> >
> > @@ -1714,6 +1766,14 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
> >               bpf_jit_prog_release_other(prog, prog == orig_prog ?
> >                                          tmp : orig_prog);
> >       return prog;
> > +
> > +out_free_hdr:
> > +     if (header) {
> > +             bpf_arch_text_copy(&ro_header->size, &header->size,
> > +                                sizeof(header->size));
> > +             bpf_jit_binary_pack_free(ro_header, header);
> > +     }
> > +     goto out_off;
> >  }
> >
> >  bool bpf_jit_supports_kfunc_call(void)
> > @@ -1721,6 +1781,13 @@ bool bpf_jit_supports_kfunc_call(void)
> >       return true;
> >  }
> >
> > +void *bpf_arch_text_copy(void *dst, void *src, size_t len)
> > +{
> > +     if (!aarch64_insn_copy(dst, src, len))
> > +             return ERR_PTR(-EINVAL);
> > +     return dst;
> > +}
> > +
> >  u64 bpf_jit_alloc_exec_limit(void)
> >  {
> >       return VMALLOC_END - VMALLOC_START;
> > @@ -2359,3 +2426,27 @@ bool bpf_jit_supports_exceptions(void)
> >        */
> >       return true;
> >  }
> > +
> > +void bpf_jit_free(struct bpf_prog *prog)
> > +{
> > +     if (prog->jited) {
> > +             struct arm64_jit_data *jit_data = prog->aux->jit_data;
> > +             struct bpf_binary_header *hdr;
> > +
> > +             /*
> > +              * If we fail the final pass of JIT (from jit_subprogs),
> > +              * the program may not be finalized yet. Call finalize here
> > +              * before freeing it.
> > +              */
> > +             if (jit_data) {
> > +                     bpf_arch_text_copy(&jit_data->ro_header->size, &jit_data->header->size,
> > +                                        sizeof(jit_data->header->size));
> > +                     kfree(jit_data);
> > +             }
> > +             hdr = bpf_jit_binary_pack_hdr(prog);
> > +             bpf_jit_binary_pack_free(hdr, NULL);
> > +             WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog));
> > +     }
> > +
> > +     bpf_prog_unlock_free(prog);
> > +}
> > -------------------------------------------------------------------------------
> >
> >
> > Git bisection log:
> >
> > -------------------------------------------------------------------------------
> > git bisect start
> > # good: [67be068d31d423b857ffd8c34dbcc093f8dfff76] Merge tag 'vfs-6.8-release.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
> > git bisect good 67be068d31d423b857ffd8c34dbcc093f8dfff76
> > # bad: [1843e16d2df9d98427ef8045589571749d627cf7] Add linux-next specific files for 20240307
> > git bisect bad 1843e16d2df9d98427ef8045589571749d627cf7
> > # bad: [1a0a33e22e715a4cc7bb2709cfda8e83b8c756b2] Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
> > git bisect bad 1a0a33e22e715a4cc7bb2709cfda8e83b8c756b2
> > # good: [5a0d6b0465ab18fefa8abc9afbabce19e7de2fe5] Merge branch 'nfsd-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
> > git bisect good 5a0d6b0465ab18fefa8abc9afbabce19e7de2fe5
> > # good: [a5fcf74d80bec9948701ff0f7529ae96a0c4a41c] inet: annotate data-races around ifa->ifa_valid_lft
> > git bisect good a5fcf74d80bec9948701ff0f7529ae96a0c4a41c
> > # good: [7aaab3e1fcee995b70d344687fa0f33b0b9d33f7] Merge branch 'hwmon-next' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git
> > git bisect good 7aaab3e1fcee995b70d344687fa0f33b0b9d33f7
> > # good: [dfa2c103738019033bad44d8be7919ad8bc19dd6] Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
> > git bisect good dfa2c103738019033bad44d8be7919ad8bc19dd6
> > # bad: [564ae6794ec5f050bb6df4446820777e7253f960] mptcp: get addr in userspace pm list
> > git bisect bad 564ae6794ec5f050bb6df4446820777e7253f960
> > # good: [2c21a0f67c8ce334b8a58332e8c2d71694bef0ab] Merge branch 'Support PTR_MAYBE_NULL for struct_ops arguments.'
> > git bisect good 2c21a0f67c8ce334b8a58332e8c2d71694bef0ab
> > # bad: [4ab597d2962195ca4f01a429f9b2ea87ee684be3] net: bareudp: Remove generic .ndo_get_stats64
> > git bisect bad 4ab597d2962195ca4f01a429f9b2ea87ee684be3
> > # good: [dfe6625df48ec54c6dc9b86d361f26962d09de88] bpf: introduce in_sleepable() helper
> > git bisect good dfe6625df48ec54c6dc9b86d361f26962d09de88
> > # bad: [f2e81192e07e87897ff1296c96775eceea8f582a] bpftool: Add an example for struct_ops map and shadow type.
> > git bisect bad f2e81192e07e87897ff1296c96775eceea8f582a
> > # bad: [1dad391daef129e01e28206b8d586608ff026548] bpf, arm64: use bpf_prog_pack for memory management
> > git bisect bad 1dad391daef129e01e28206b8d586608ff026548
> > # good: [22fc0e80aeb5c0c1377e6c02d7248f8fbf5df7fc] bpf, arm64: support exceptions
> > git bisect good 22fc0e80aeb5c0c1377e6c02d7248f8fbf5df7fc
> > # good: [451c3cab9a65e656c3b3d106831fc02d56b8c34a] arm64: patching: implement text_poke API
> > git bisect good 451c3cab9a65e656c3b3d106831fc02d56b8c34a
> > # first bad commit: [1dad391daef129e01e28206b8d586608ff026548] bpf, arm64: use bpf_prog_pack for memory management
> > -------------------------------------------------------------------------------
> >
> >
> > -=-=-=-=-=-=-=-=-=-=-=-
> > Groups.io Links: You receive all messages sent to this group.
> > View/Reply Online (#52295): https://groups.io/g/kernelci-results/message/52295
> > Mute This Topic: https://groups.io/mt/104790392/1131744
> > Group Owner: kernelci-results+owner@groups.io
> > Unsubscribe: https://groups.io/g/kernelci-results/unsub [broonie@kernel.org]
> > -=-=-=-=-=-=-=-=-=-=-=-
> >
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: next/master bisection: baseline.login on qemu_arm64-virt-gicv3-uefi
  2024-03-07 18:16   ` Song Liu
@ 2024-03-07 18:34     ` Mark Brown
  2024-03-07 19:02       ` Puranjay Mohan
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Brown @ 2024-03-07 18:34 UTC (permalink / raw)
  To: Song Liu
  Cc: Puranjay Mohan, Catalin Marinas, Alexei Starovoitov, Will Deacon,
	kernelci-results, bot, linux-arm-kernel, linux-mm, bpf,
	linux-next

[-- Attachment #1: Type: text/plain, Size: 603 bytes --]

On Thu, Mar 07, 2024 at 10:16:21AM -0800, Song Liu wrote:
> On Thu, Mar 7, 2024 at 8:36 AM Mark Brown <broonie@kernel.org> wrote:

> > The KernelCI bisection bot found a boot regression n today's -next on
> > qemu arm64 UEFI platforms with 64K pages which was bisected to commit
> > 1dad391daef1 ("bpf, arm64: use bpf_prog_pack for memory management").
> > We OOM quite early in boot:

> IIUC, 64kB pages means 512MB PMD. I think that's indeed too big. We
> will need some logic to limit such cases.

These qemu instances are only configured with 1G of RAM so that's rather
large indeed.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: next/master bisection: baseline.login on qemu_arm64-virt-gicv3-uefi
  2024-03-07 18:34     ` Mark Brown
@ 2024-03-07 19:02       ` Puranjay Mohan
  2024-03-07 23:00         ` Song Liu
  0 siblings, 1 reply; 6+ messages in thread
From: Puranjay Mohan @ 2024-03-07 19:02 UTC (permalink / raw)
  To: Mark Brown
  Cc: Song Liu, Catalin Marinas, Alexei Starovoitov, Will Deacon,
	kernelci-results, bot, linux-arm-kernel, linux-mm, bpf,
	linux-next

On Thu, Mar 7, 2024 at 7:34 PM Mark Brown <broonie@kernel.org> wrote:
>
> On Thu, Mar 07, 2024 at 10:16:21AM -0800, Song Liu wrote:
> > On Thu, Mar 7, 2024 at 8:36 AM Mark Brown <broonie@kernel.org> wrote:
>
> > > The KernelCI bisection bot found a boot regression n today's -next on
> > > qemu arm64 UEFI platforms with 64K pages which was bisected to commit
> > > 1dad391daef1 ("bpf, arm64: use bpf_prog_pack for memory management").
> > > We OOM quite early in boot:
>
> > IIUC, 64kB pages means 512MB PMD. I think that's indeed too big. We
> > will need some logic to limit such cases.

As far as I understand, we need the prog pack to be PMD sized so it is
allocated as a huge page
and if we limit this then vmalloc() will not allocate a huge page and
the performance benefit will be lost.

>
> These qemu instances are only configured with 1G of RAM so that's rather
> large indeed.

I was able to reproduce this without UEFI as well, I used 600MB in
place of 1G. Prog pack tries to
allocate 512 MB and this causes the OOM panic.

Can we implement this in a way where if the memory can't be allocated
then we fallback to allocating less
memory rather than panicking. I don't know enough memory management to
know how it would be done.

Thanks,
Puranjay

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: next/master bisection: baseline.login on qemu_arm64-virt-gicv3-uefi
  2024-03-07 19:02       ` Puranjay Mohan
@ 2024-03-07 23:00         ` Song Liu
  2024-03-08 12:12           ` Puranjay Mohan
  0 siblings, 1 reply; 6+ messages in thread
From: Song Liu @ 2024-03-07 23:00 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: Mark Brown, Catalin Marinas, Alexei Starovoitov, Will Deacon,
	kernelci-results, bot, linux-arm-kernel, linux-mm, bpf,
	linux-next

On Thu, Mar 7, 2024 at 11:02 AM Puranjay Mohan <puranjay12@gmail.com> wrote:
>
> On Thu, Mar 7, 2024 at 7:34 PM Mark Brown <broonie@kernel.org> wrote:
> >
> > On Thu, Mar 07, 2024 at 10:16:21AM -0800, Song Liu wrote:
> > > On Thu, Mar 7, 2024 at 8:36 AM Mark Brown <broonie@kernel.org> wrote:
> >
> > > > The KernelCI bisection bot found a boot regression n today's -next on
> > > > qemu arm64 UEFI platforms with 64K pages which was bisected to commit
> > > > 1dad391daef1 ("bpf, arm64: use bpf_prog_pack for memory management").
> > > > We OOM quite early in boot:
> >
> > > IIUC, 64kB pages means 512MB PMD. I think that's indeed too big. We
> > > will need some logic to limit such cases.
>
> As far as I understand, we need the prog pack to be PMD sized so it is
> allocated as a huge page
> and if we limit this then vmalloc() will not allocate a huge page and
> the performance benefit will be lost.

bpf_prog_pack gives benefits without using PMD pages. For arm64
with 64kB page, even bpf_prog_pack of 64kB can fit multiple bpf
programs in it. OTOH, 512MB is really big.

How about we do something like the following?

Thanks,
Song

diff --git i/kernel/bpf/core.c w/kernel/bpf/core.c
index 9ee4536d0a09..1fe05c280e31 100644
--- i/kernel/bpf/core.c
+++ w/kernel/bpf/core.c
@@ -888,7 +888,15 @@ static LIST_HEAD(pack_list);
  * CONFIG_MMU=n. Use PAGE_SIZE in these cases.
  */
 #ifdef PMD_SIZE
-#define BPF_PROG_PACK_SIZE (PMD_SIZE * num_possible_nodes())
+  /* PMD_SIZE is really big for some archs. It doesn't make sense to
+   * reserve too much memory in one allocation. Cap BPF_PROG_PACK_SIZE to
+   * 2MiB * num_possible_nodes().
+   */
+  #if PMD_SIZE <= (1 << 21)
+    #define BPF_PROG_PACK_SIZE (PMD_SIZE * num_possible_nodes())
+  #else
+    #define BPF_PROG_PACK_SIZE ((1 << 21) * num_possible_nodes())
+  #endif
 #else
 #define BPF_PROG_PACK_SIZE PAGE_SIZE
 #endif

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: next/master bisection: baseline.login on qemu_arm64-virt-gicv3-uefi
  2024-03-07 23:00         ` Song Liu
@ 2024-03-08 12:12           ` Puranjay Mohan
  0 siblings, 0 replies; 6+ messages in thread
From: Puranjay Mohan @ 2024-03-08 12:12 UTC (permalink / raw)
  To: Song Liu
  Cc: Mark Brown, Catalin Marinas, Alexei Starovoitov, Will Deacon,
	kernelci-results, bot, linux-arm-kernel, linux-mm, bpf,
	linux-next

Hi Song,

>
> bpf_prog_pack gives benefits without using PMD pages. For arm64
> with 64kB page, even bpf_prog_pack of 64kB can fit multiple bpf
> programs in it. OTOH, 512MB is really big.
>
> How about we do something like the following?
>
> Thanks,
> Song
>
> diff --git i/kernel/bpf/core.c w/kernel/bpf/core.c
> index 9ee4536d0a09..1fe05c280e31 100644
> --- i/kernel/bpf/core.c
> +++ w/kernel/bpf/core.c
> @@ -888,7 +888,15 @@ static LIST_HEAD(pack_list);
>   * CONFIG_MMU=n. Use PAGE_SIZE in these cases.
>   */
>  #ifdef PMD_SIZE
> -#define BPF_PROG_PACK_SIZE (PMD_SIZE * num_possible_nodes())
> +  /* PMD_SIZE is really big for some archs. It doesn't make sense to
> +   * reserve too much memory in one allocation. Cap BPF_PROG_PACK_SIZE to
> +   * 2MiB * num_possible_nodes().
> +   */
> +  #if PMD_SIZE <= (1 << 21)
> +    #define BPF_PROG_PACK_SIZE (PMD_SIZE * num_possible_nodes())
> +  #else
> +    #define BPF_PROG_PACK_SIZE ((1 << 21) * num_possible_nodes())
> +  #endif
>  #else
>  #define BPF_PROG_PACK_SIZE PAGE_SIZE
>  #endif

I have sent a patch with the above:
https://lore.kernel.org/all/20240308120712.88122-1-puranjay12@gmail.com/
Thanks for helping with this.

I have tested that this patch fixes this issue.

Thanks,
Puranjay

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-03-08 12:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <65e9e748.a70a0220.606f7.53c0@mx.google.com>
2024-03-07 16:36 ` next/master bisection: baseline.login on qemu_arm64-virt-gicv3-uefi Mark Brown
2024-03-07 18:16   ` Song Liu
2024-03-07 18:34     ` Mark Brown
2024-03-07 19:02       ` Puranjay Mohan
2024-03-07 23:00         ` Song Liu
2024-03-08 12:12           ` Puranjay Mohan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).