All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf] bpf: Fix crash due to OOB access when reg->type > __BPF_REG_TYPE_MAX
@ 2022-02-16  9:13 Kumar Kartikeya Dwivedi
  2022-02-16 15:45 ` Alexei Starovoitov
  0 siblings, 1 reply; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-16  9:13 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, Hao Luo

When commit e6ac2450d6de ("bpf: Support bpf program calling kernel
function") added kfunc support, it defined reg2btf_ids as a cheap way to
translate the verifier reg type to the appropriate btf_vmlinux BTF ID,
however commit c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with
PTR_TO_XXX | PTR_MAYBE_NULL") moved the __BPF_REG_TYPE_MAX from the last
member of bpf_reg_type enum to after the base register types, and
defined other variants using type flag composition. However, now, the
direct usage of reg->type to index into reg2btf_ids may no longer fall
into __BPF_REG_TYPE_MAX range, and hence lead to out of bounds access
and kernel crash on dereference of bad pointer.

[   20.448584] BUG: unable to handle page fault for address: 0000000000524156
[   20.449149] #PF: supervisor read access in kernel mode
[   20.449537] #PF: error_code(0x0000) - not-present page
[   20.450025] PGD 0 P4D 0
[   20.450303] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[   20.450808] CPU: 0 PID: 323 Comm: test_verifier Tainted: G    B   W   E     5.17.0-rc2+ #286
[   20.451567] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[   20.452364] RIP: 0010:btf_check_func_arg_match+0x651/0xe10
[   20.452868] Code: ea 04 80 fa 01 0f 87 2a 06 00 00 48 c7 c7 a0 16 f8 a5 e8 82 6d 15 00 48 8b 05 1b 3d aa 04 48 89 df 48 89 04 24 e8 2f 6c 15 00 <8b> 2b 89 ac 24 c0 00 00 00 89 ee 48 8b 2c 24 48 8d 94 24 c0 00 00
[   20.454693] RSP: 0018:ffff88800727f4f8 EFLAGS: 00010282
[   20.455232] RAX: 0000000000000000 RBX: 0000000000524156 RCX: ffffffffa14dd991
[   20.455963] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000524156
[   20.456683] RBP: ffffffffa398c6d4 R08: ffffffffa14dd991 R09: 0000000000000000
[   20.457297] R10: fffffbfff484f31c R11: 0000000000000001 R12: ffff888007bf8600
[   20.457917] R13: ffff88800c2f6078 R14: 0000000000000000 R15: 0000000000000001
[   20.458596] FS:  00007fe06ae70740(0000) GS:ffff88808cc00000(0000) knlGS:0000000000000000
[   20.459303] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   20.459817] CR2: 0000000000524156 CR3: 000000000902a004 CR4: 0000000000770ef0
[   20.460527] PKRU: 55555554
[   20.460746] Call Trace:
[   20.460938]  <TASK>
[   20.461120]  ? btf_kfunc_id_set_contains+0x100/0x100
[   20.461534]  ? btf_kfunc_id_set_contains+0xd6/0x100
[   20.461996]  ? btf_try_get_module+0xc0/0xc0
[   20.462361]  do_check_common+0x3961/0x5490
[   20.462726]  ? bpf_check+0x27e0/0x4c60
[   20.463036]  ? check_helper_call+0x3050/0x3050
[   20.463435]  ? lock_is_held_type+0xe4/0x140
[   20.463873]  ? lockdep_hardirqs_on_prepare+0x129/0x220
[   20.464402]  ? kasan_quarantine_put+0x9c/0x1f0
[   20.464815]  ? lockdep_hardirqs_on+0x7e/0x100
[   20.465253]  ? kasan_quarantine_put+0x9c/0x1f0
[   20.465580]  ? slab_free_freelist_hook+0x7d/0x150
[   20.465961]  ? bpf_check+0x3ae1/0x4c60
[   20.466262]  ? kfree+0xbe/0x2d0
[   20.466518]  bpf_check+0x3da4/0x4c60
[   20.466800]  ? bpf_get_btf_vmlinux+0x50/0x50
[   20.467152]  ? lock_is_held_type+0xe4/0x140
[   20.467471]  ? lock_release+0x238/0x410
[   20.467761]  ? ktime_get_with_offset+0x3c/0xf0
[   20.468168]  ? lock_downgrade+0x390/0x390
[   20.468578]  ? __might_fault+0x57/0xb0
[   20.468974]  ? memset+0x20/0x40
[   20.469314]  bpf_prog_load+0x7b3/0xfb0
[   20.469714]  ? __bpf_prog_put.constprop.0+0x120/0x120
[   20.470235]  ? lock_is_held_type+0xe4/0x140
[   20.470641]  ? avc_has_perm+0x7e/0x110
[   20.470982]  ? avc_has_perm_noaudit+0x250/0x250
[   20.471461]  __sys_bpf+0x121b/0x3350
[   20.471826]  ? bpf_raw_tracepoint_open+0x3e0/0x3e0
[   20.472291]  ? __do_fault+0x210/0x210
[   20.472675]  ? __handle_mm_fault+0x15a0/0x1c20
[   20.473138]  ? lock_is_held_type+0xe4/0x140
[   20.473589]  ? up_write+0x270/0x270
[   20.473964]  ? mark_held_locks+0x24/0x90
[   20.474383]  __x64_sys_bpf+0x44/0x50
[   20.474765]  do_syscall_64+0x59/0x80
[   20.475066]  ? asm_exc_page_fault+0x1e/0x30
[   20.475439]  ? asm_exc_page_fault+0x8/0x30
[   20.475801]  ? lockdep_hardirqs_on+0x7e/0x100
[   20.476252]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   20.476778] RIP: 0033:0x7fe06af6e18d
[   20.477159] Code: b4 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 6c 0c 00 f7 d8 64 89 01 48
[   20.479034] RSP: 002b:00007ffcf84bbc78 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[   20.479602] RAX: ffffffffffffffda RBX: 00007fe06b16fc40 RCX: 00007fe06af6e18d
[   20.480187] RDX: 0000000000000090 RSI: 00007ffcf84bbd80 RDI: 0000000000000005
[   20.480727] RBP: 00007ffcf84bbc90 R08: 00007ffcf84bbeb0 R09: 00007ffcf84bbd80
[   20.481441] R10: 0000000000000007 R11: 0000000000000202 R12: 00007fe06b11f7b0
[   20.482127] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   20.482628]  </TASK>
[   20.482810] Modules linked in: crc32_pclmul(E) intel_rapl_msr(E) intel_rapl_common(E) rapl(E) ghash_clmulni_intel(E) crct10dif_pclmul(E) crc32c_intel(E) serio_raw(E)
[   20.484185] CR2: 0000000000524156
[   20.484441] ---[ end trace 0000000000000000 ]---
[   20.484900] RIP: 0010:btf_check_func_arg_match+0x651/0xe10
[   20.485401] Code: ea 04 80 fa 01 0f 87 2a 06 00 00 48 c7 c7 a0 16 f8 a5 e8 82 6d 15 00 48 8b 05 1b 3d aa 04 48 89 df 48 89 04 24 e8 2f 6c 15 00 <8b> 2b 89 ac 24 c0 00 00 00 89 ee 48 8b 2c 24 48 8d 94 24 c0 00 00
[   20.487173] RSP: 0018:ffff88800727f4f8 EFLAGS: 00010282
[   20.487709] RAX: 0000000000000000 RBX: 0000000000524156 RCX: ffffffffa14dd991
[   20.488393] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000524156
[   20.489045] RBP: ffffffffa398c6d4 R08: ffffffffa14dd991 R09: 0000000000000000
[   20.489696] R10: fffffbfff484f31c R11: 0000000000000001 R12: ffff888007bf8600
[   20.490377] R13: ffff88800c2f6078 R14: 0000000000000000 R15: 0000000000000001
[   20.491065] FS:  00007fe06ae70740(0000) GS:ffff88808cc00000(0000) knlGS:0000000000000000
[   20.491782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   20.492272] CR2: 0000000000524156 CR3: 000000000902a004 CR4: 0000000000770ef0
[   20.492925] PKRU: 55555554

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Hao Luo <haoluo@google.com>
Fixes: c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/btf.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index e16dafeb2450..416345798e0a 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5568,13 +5568,21 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
 	return btf_check_func_type_match(log, btf1, t1, btf2, t2);
 }

-static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
+static u32 *reg2btf_ids(enum bpf_reg_type type)
+{
+	switch (type) {
 #ifdef CONFIG_NET
-	[PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
-	[PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
-	[PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP],
+	case PTR_TO_SOCKET:
+		return &btf_sock_ids[BTF_SOCK_TYPE_SOCK];
+	case PTR_TO_SOCK_COMMON:
+		return &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON];
+	case PTR_TO_TCP_SOCK:
+		return &btf_sock_ids[BTF_SOCK_TYPE_TCP];
 #endif
-};
+	default:
+		return NULL;
+	}
+}

 /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
 static bool __btf_type_is_scalar_struct(struct bpf_verifier_log *log,
@@ -5688,7 +5696,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 			}
 			if (check_ptr_off_reg(env, reg, regno))
 				return -EINVAL;
-		} else if (is_kfunc && (reg->type == PTR_TO_BTF_ID || reg2btf_ids[reg->type])) {
+		} else if (is_kfunc && (reg->type == PTR_TO_BTF_ID || reg2btf_ids(reg->type))) {
 			const struct btf_type *reg_ref_t;
 			const struct btf *reg_btf;
 			const char *reg_ref_tname;
@@ -5706,7 +5714,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 				reg_ref_id = reg->btf_id;
 			} else {
 				reg_btf = btf_vmlinux;
-				reg_ref_id = *reg2btf_ids[reg->type];
+				reg_ref_id = *reg2btf_ids(reg->type);
 			}

 			reg_ref_t = btf_type_skip_modifiers(reg_btf, reg_ref_id,
--
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf] bpf: Fix crash due to OOB access when reg->type > __BPF_REG_TYPE_MAX
  2022-02-16  9:13 [PATCH bpf] bpf: Fix crash due to OOB access when reg->type > __BPF_REG_TYPE_MAX Kumar Kartikeya Dwivedi
@ 2022-02-16 15:45 ` Alexei Starovoitov
  2022-02-16 17:33   ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2022-02-16 15:45 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Hao Luo

On Wed, Feb 16, 2022 at 1:13 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> When commit e6ac2450d6de ("bpf: Support bpf program calling kernel
> function") added kfunc support, it defined reg2btf_ids as a cheap way to
> translate the verifier reg type to the appropriate btf_vmlinux BTF ID,
> however commit c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with
> PTR_TO_XXX | PTR_MAYBE_NULL") moved the __BPF_REG_TYPE_MAX from the last
> member of bpf_reg_type enum to after the base register types, and
> defined other variants using type flag composition. However, now, the
> direct usage of reg->type to index into reg2btf_ids may no longer fall
> into __BPF_REG_TYPE_MAX range, and hence lead to out of bounds access
> and kernel crash on dereference of bad pointer.
...
> [   20.488393] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000524156
> [   20.489045] RBP: ffffffffa398c6d4 R08: ffffffffa14dd991 R09: 0000000000000000
> [   20.489696] R10: fffffbfff484f31c R11: 0000000000000001 R12: ffff888007bf8600
> [   20.490377] R13: ffff88800c2f6078 R14: 0000000000000000 R15: 0000000000000001
> [   20.491065] FS:  00007fe06ae70740(0000) GS:ffff88808cc00000(0000) knlGS:0000000000000000
> [   20.491782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   20.492272] CR2: 0000000000524156 CR3: 000000000902a004 CR4: 0000000000770ef0
> [   20.492925] PKRU: 55555554

Please do not include a full kernel dump in the commit log.
It provides no value.
The first paragraph was enough.

> Cc: Martin KaFai Lau <kafai@fb.com>
> Cc: Hao Luo <haoluo@google.com>
> Fixes: c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL")
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  kernel/bpf/btf.c | 22 +++++++++++++++-------
>  1 file changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index e16dafeb2450..416345798e0a 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -5568,13 +5568,21 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
>         return btf_check_func_type_match(log, btf1, t1, btf2, t2);
>  }
>
> -static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
> +static u32 *reg2btf_ids(enum bpf_reg_type type)
> +{
> +       switch (type) {
>  #ifdef CONFIG_NET
> -       [PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
> -       [PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
> -       [PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP],
> +       case PTR_TO_SOCKET:
> +               return &btf_sock_ids[BTF_SOCK_TYPE_SOCK];
> +       case PTR_TO_SOCK_COMMON:
> +               return &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON];
> +       case PTR_TO_TCP_SOCK:
> +               return &btf_sock_ids[BTF_SOCK_TYPE_TCP];
>  #endif
> -};
> +       default:
> +               return NULL;
> +       }
> +}
>
>  /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
>  static bool __btf_type_is_scalar_struct(struct bpf_verifier_log *log,
> @@ -5688,7 +5696,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>                         }
>                         if (check_ptr_off_reg(env, reg, regno))
>                                 return -EINVAL;
> -               } else if (is_kfunc && (reg->type == PTR_TO_BTF_ID || reg2btf_ids[reg->type])) {
> +               } else if (is_kfunc && (reg->type == PTR_TO_BTF_ID || reg2btf_ids(reg->type))) {

Just use reg2btf_ids[base_type(reg->type)] instead?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf] bpf: Fix crash due to OOB access when reg->type > __BPF_REG_TYPE_MAX
  2022-02-16 15:45 ` Alexei Starovoitov
@ 2022-02-16 17:33   ` Kumar Kartikeya Dwivedi
  2022-02-16 18:19     ` Alexei Starovoitov
  0 siblings, 1 reply; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-16 17:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Hao Luo

On Wed, Feb 16, 2022 at 09:15:47PM IST, Alexei Starovoitov wrote:
> On Wed, Feb 16, 2022 at 1:13 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > When commit e6ac2450d6de ("bpf: Support bpf program calling kernel
> > function") added kfunc support, it defined reg2btf_ids as a cheap way to
> > translate the verifier reg type to the appropriate btf_vmlinux BTF ID,
> > however commit c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with
> > PTR_TO_XXX | PTR_MAYBE_NULL") moved the __BPF_REG_TYPE_MAX from the last
> > member of bpf_reg_type enum to after the base register types, and
> > defined other variants using type flag composition. However, now, the
> > direct usage of reg->type to index into reg2btf_ids may no longer fall
> > into __BPF_REG_TYPE_MAX range, and hence lead to out of bounds access
> > and kernel crash on dereference of bad pointer.
> ...
> > [   20.488393] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000524156
> > [   20.489045] RBP: ffffffffa398c6d4 R08: ffffffffa14dd991 R09: 0000000000000000
> > [   20.489696] R10: fffffbfff484f31c R11: 0000000000000001 R12: ffff888007bf8600
> > [   20.490377] R13: ffff88800c2f6078 R14: 0000000000000000 R15: 0000000000000001
> > [   20.491065] FS:  00007fe06ae70740(0000) GS:ffff88808cc00000(0000) knlGS:0000000000000000
> > [   20.491782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   20.492272] CR2: 0000000000524156 CR3: 000000000902a004 CR4: 0000000000770ef0
> > [   20.492925] PKRU: 55555554
>
> Please do not include a full kernel dump in the commit log.
> It provides no value.
> The first paragraph was enough.
>

Ok, won't include next time.

> > Cc: Martin KaFai Lau <kafai@fb.com>
> > Cc: Hao Luo <haoluo@google.com>
> > Fixes: c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL")
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  kernel/bpf/btf.c | 22 +++++++++++++++-------
> >  1 file changed, 15 insertions(+), 7 deletions(-)
> >
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index e16dafeb2450..416345798e0a 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -5568,13 +5568,21 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
> >         return btf_check_func_type_match(log, btf1, t1, btf2, t2);
> >  }
> >
> > -static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
> > +static u32 *reg2btf_ids(enum bpf_reg_type type)
> > +{
> > +       switch (type) {
> >  #ifdef CONFIG_NET
> > -       [PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
> > -       [PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
> > -       [PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP],
> > +       case PTR_TO_SOCKET:
> > +               return &btf_sock_ids[BTF_SOCK_TYPE_SOCK];
> > +       case PTR_TO_SOCK_COMMON:
> > +               return &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON];
> > +       case PTR_TO_TCP_SOCK:
> > +               return &btf_sock_ids[BTF_SOCK_TYPE_TCP];
> >  #endif
> > -};
> > +       default:
> > +               return NULL;
> > +       }
> > +}
> >
> >  /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
> >  static bool __btf_type_is_scalar_struct(struct bpf_verifier_log *log,
> > @@ -5688,7 +5696,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
> >                         }
> >                         if (check_ptr_off_reg(env, reg, regno))
> >                                 return -EINVAL;
> > -               } else if (is_kfunc && (reg->type == PTR_TO_BTF_ID || reg2btf_ids[reg->type])) {
> > +               } else if (is_kfunc && (reg->type == PTR_TO_BTF_ID || reg2btf_ids(reg->type))) {
>
> Just use reg2btf_ids[base_type(reg->type)] instead?

That would be incorrect I think, then we'd allow e.g. PTR_TO_TCP_SOCK_OR_NULL
and treat it as PTR_TO_TCP_SOCK, while current code only wants to permit
non-NULL variants for these three.

--
Kartikeya

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf] bpf: Fix crash due to OOB access when reg->type > __BPF_REG_TYPE_MAX
  2022-02-16 17:33   ` Kumar Kartikeya Dwivedi
@ 2022-02-16 18:19     ` Alexei Starovoitov
  2022-02-16 18:29       ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2022-02-16 18:19 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Hao Luo

On Wed, Feb 16, 2022 at 9:33 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Wed, Feb 16, 2022 at 09:15:47PM IST, Alexei Starovoitov wrote:
> > On Wed, Feb 16, 2022 at 1:13 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > When commit e6ac2450d6de ("bpf: Support bpf program calling kernel
> > > function") added kfunc support, it defined reg2btf_ids as a cheap way to
> > > translate the verifier reg type to the appropriate btf_vmlinux BTF ID,
> > > however commit c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with
> > > PTR_TO_XXX | PTR_MAYBE_NULL") moved the __BPF_REG_TYPE_MAX from the last
> > > member of bpf_reg_type enum to after the base register types, and
> > > defined other variants using type flag composition. However, now, the
> > > direct usage of reg->type to index into reg2btf_ids may no longer fall
> > > into __BPF_REG_TYPE_MAX range, and hence lead to out of bounds access
> > > and kernel crash on dereference of bad pointer.
> > ...
> > > [   20.488393] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000524156
> > > [   20.489045] RBP: ffffffffa398c6d4 R08: ffffffffa14dd991 R09: 0000000000000000
> > > [   20.489696] R10: fffffbfff484f31c R11: 0000000000000001 R12: ffff888007bf8600
> > > [   20.490377] R13: ffff88800c2f6078 R14: 0000000000000000 R15: 0000000000000001
> > > [   20.491065] FS:  00007fe06ae70740(0000) GS:ffff88808cc00000(0000) knlGS:0000000000000000
> > > [   20.491782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   20.492272] CR2: 0000000000524156 CR3: 000000000902a004 CR4: 0000000000770ef0
> > > [   20.492925] PKRU: 55555554
> >
> > Please do not include a full kernel dump in the commit log.
> > It provides no value.
> > The first paragraph was enough.
> >
>
> Ok, won't include next time.
>
> > > Cc: Martin KaFai Lau <kafai@fb.com>
> > > Cc: Hao Luo <haoluo@google.com>
> > > Fixes: c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL")
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > >  kernel/bpf/btf.c | 22 +++++++++++++++-------
> > >  1 file changed, 15 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > index e16dafeb2450..416345798e0a 100644
> > > --- a/kernel/bpf/btf.c
> > > +++ b/kernel/bpf/btf.c
> > > @@ -5568,13 +5568,21 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
> > >         return btf_check_func_type_match(log, btf1, t1, btf2, t2);
> > >  }
> > >
> > > -static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
> > > +static u32 *reg2btf_ids(enum bpf_reg_type type)
> > > +{
> > > +       switch (type) {
> > >  #ifdef CONFIG_NET
> > > -       [PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
> > > -       [PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
> > > -       [PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP],
> > > +       case PTR_TO_SOCKET:
> > > +               return &btf_sock_ids[BTF_SOCK_TYPE_SOCK];
> > > +       case PTR_TO_SOCK_COMMON:
> > > +               return &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON];
> > > +       case PTR_TO_TCP_SOCK:
> > > +               return &btf_sock_ids[BTF_SOCK_TYPE_TCP];
> > >  #endif
> > > -};
> > > +       default:
> > > +               return NULL;
> > > +       }
> > > +}
> > >
> > >  /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
> > >  static bool __btf_type_is_scalar_struct(struct bpf_verifier_log *log,
> > > @@ -5688,7 +5696,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
> > >                         }
> > >                         if (check_ptr_off_reg(env, reg, regno))
> > >                                 return -EINVAL;
> > > -               } else if (is_kfunc && (reg->type == PTR_TO_BTF_ID || reg2btf_ids[reg->type])) {
> > > +               } else if (is_kfunc && (reg->type == PTR_TO_BTF_ID || reg2btf_ids(reg->type))) {
> >
> > Just use reg2btf_ids[base_type(reg->type)] instead?
>
> That would be incorrect I think, then we'd allow e.g. PTR_TO_TCP_SOCK_OR_NULL
> and treat it as PTR_TO_TCP_SOCK, while current code only wants to permit
> non-NULL variants for these three.

add && !type_flag(reg->type) ?

But, first, please describe how you found it.
Tried to pass PTR_TO_BTF_ID_OR_NULL into kfunc?
Do you have some other changes in the verifier?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf] bpf: Fix crash due to OOB access when reg->type > __BPF_REG_TYPE_MAX
  2022-02-16 18:19     ` Alexei Starovoitov
@ 2022-02-16 18:29       ` Kumar Kartikeya Dwivedi
  2022-02-16 19:49         ` Alexei Starovoitov
  0 siblings, 1 reply; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-16 18:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Hao Luo

On Wed, Feb 16, 2022 at 11:49:54PM IST, Alexei Starovoitov wrote:
> On Wed, Feb 16, 2022 at 9:33 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Wed, Feb 16, 2022 at 09:15:47PM IST, Alexei Starovoitov wrote:
> > > On Wed, Feb 16, 2022 at 1:13 AM Kumar Kartikeya Dwivedi
> > > <memxor@gmail.com> wrote:
> > > >
> > > > When commit e6ac2450d6de ("bpf: Support bpf program calling kernel
> > > > function") added kfunc support, it defined reg2btf_ids as a cheap way to
> > > > translate the verifier reg type to the appropriate btf_vmlinux BTF ID,
> > > > however commit c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with
> > > > PTR_TO_XXX | PTR_MAYBE_NULL") moved the __BPF_REG_TYPE_MAX from the last
> > > > member of bpf_reg_type enum to after the base register types, and
> > > > defined other variants using type flag composition. However, now, the
> > > > direct usage of reg->type to index into reg2btf_ids may no longer fall
> > > > into __BPF_REG_TYPE_MAX range, and hence lead to out of bounds access
> > > > and kernel crash on dereference of bad pointer.
> > > ...
> > > > [   20.488393] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000524156
> > > > [   20.489045] RBP: ffffffffa398c6d4 R08: ffffffffa14dd991 R09: 0000000000000000
> > > > [   20.489696] R10: fffffbfff484f31c R11: 0000000000000001 R12: ffff888007bf8600
> > > > [   20.490377] R13: ffff88800c2f6078 R14: 0000000000000000 R15: 0000000000000001
> > > > [   20.491065] FS:  00007fe06ae70740(0000) GS:ffff88808cc00000(0000) knlGS:0000000000000000
> > > > [   20.491782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [   20.492272] CR2: 0000000000524156 CR3: 000000000902a004 CR4: 0000000000770ef0
> > > > [   20.492925] PKRU: 55555554
> > >
> > > Please do not include a full kernel dump in the commit log.
> > > It provides no value.
> > > The first paragraph was enough.
> > >
> >
> > Ok, won't include next time.
> >
> > > > Cc: Martin KaFai Lau <kafai@fb.com>
> > > > Cc: Hao Luo <haoluo@google.com>
> > > > Fixes: c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL")
> > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > ---
> > > >  kernel/bpf/btf.c | 22 +++++++++++++++-------
> > > >  1 file changed, 15 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > > index e16dafeb2450..416345798e0a 100644
> > > > --- a/kernel/bpf/btf.c
> > > > +++ b/kernel/bpf/btf.c
> > > > @@ -5568,13 +5568,21 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
> > > >         return btf_check_func_type_match(log, btf1, t1, btf2, t2);
> > > >  }
> > > >
> > > > -static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
> > > > +static u32 *reg2btf_ids(enum bpf_reg_type type)
> > > > +{
> > > > +       switch (type) {
> > > >  #ifdef CONFIG_NET
> > > > -       [PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
> > > > -       [PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
> > > > -       [PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP],
> > > > +       case PTR_TO_SOCKET:
> > > > +               return &btf_sock_ids[BTF_SOCK_TYPE_SOCK];
> > > > +       case PTR_TO_SOCK_COMMON:
> > > > +               return &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON];
> > > > +       case PTR_TO_TCP_SOCK:
> > > > +               return &btf_sock_ids[BTF_SOCK_TYPE_TCP];
> > > >  #endif
> > > > -};
> > > > +       default:
> > > > +               return NULL;
> > > > +       }
> > > > +}
> > > >
> > > >  /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
> > > >  static bool __btf_type_is_scalar_struct(struct bpf_verifier_log *log,
> > > > @@ -5688,7 +5696,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
> > > >                         }
> > > >                         if (check_ptr_off_reg(env, reg, regno))
> > > >                                 return -EINVAL;
> > > > -               } else if (is_kfunc && (reg->type == PTR_TO_BTF_ID || reg2btf_ids[reg->type])) {
> > > > +               } else if (is_kfunc && (reg->type == PTR_TO_BTF_ID || reg2btf_ids(reg->type))) {
> > >
> > > Just use reg2btf_ids[base_type(reg->type)] instead?
> >
> > That would be incorrect I think, then we'd allow e.g. PTR_TO_TCP_SOCK_OR_NULL
> > and treat it as PTR_TO_TCP_SOCK, while current code only wants to permit
> > non-NULL variants for these three.
>
> add && !type_flag(reg->type) ?
>

Ok, we can do that. WRT Hao's suggestion, we do allow NULL for ptr_to_mem_ok
case, so doing it for all of check_func_arg_match won't work.

> But, first, please describe how you found it.
> Tried to pass PTR_TO_BTF_ID_OR_NULL into kfunc?
> Do you have some other changes in the verifier?

Yes, was working on [0], tried to come up with a test case where verifier
printed bad register type being passed (one marked with a new flag), but noticed
that it would fall out of __BPF_REG_TYPE_MAX limit during kfunc check. Also, it
seems on non-KASAN build it actually doesn't crash sometimes, depends on what
the value at that offset is.

  [0]: https://github.com/kkdwivedi/linux/commits/btf-ptr-in-map

I was planning to send a verifier test exercising this but it seems
fixup_kfunc_btf_id support for test_verifier.c is not in bpf tree yet, so when
it is merged I will provide a small test case, it is basically this on bpf-next:

diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
index 829be2b9e08e..5f26007ceef1 100644
--- a/tools/testing/selftests/bpf/verifier/calls.c
+++ b/tools/testing/selftests/bpf/verifier/calls.c
@@ -1,3 +1,22 @@
+{
+       "calls: trigger reg->type > __BPF_REG_TYPE_MAX",
+       .insns = {
+       BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+       BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
+       BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0),
+       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+       BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+       BPF_EXIT_INSN(),
+       },
+       .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+       .result = REJECT,
+       .errstr = "XXX,
+       .fixup_kfunc_btf_id = {
+               { "bpf_kfunc_call_test_acquire", 3 },
+               { "bpf_kfunc_call_test_release", 5 },
+       },
+},
 {

Sending it rn I think may lead to flaky CI, so we can wait.

If all of this makes sense, should I respin?

--
Kartikeya

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf] bpf: Fix crash due to OOB access when reg->type > __BPF_REG_TYPE_MAX
  2022-02-16 18:29       ` Kumar Kartikeya Dwivedi
@ 2022-02-16 19:49         ` Alexei Starovoitov
  2022-02-16 20:58           ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2022-02-16 19:49 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Hao Luo

On Wed, Feb 16, 2022 at 11:59:54PM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > Just use reg2btf_ids[base_type(reg->type)] instead?
> > >
> > > That would be incorrect I think, then we'd allow e.g. PTR_TO_TCP_SOCK_OR_NULL
> > > and treat it as PTR_TO_TCP_SOCK, while current code only wants to permit
> > > non-NULL variants for these three.
> >
> > add && !type_flag(reg->type) ?
> >
> 
> Ok, we can do that. WRT Hao's suggestion, we do allow NULL for ptr_to_mem_ok
> case, so doing it for all of check_func_arg_match won't work.

Right.

> > But, first, please describe how you found it.
> > Tried to pass PTR_TO_BTF_ID_OR_NULL into kfunc?
> > Do you have some other changes in the verifier?
> 
> Yes, was working on [0], tried to come up with a test case where verifier
> printed bad register type being passed (one marked with a new flag), but noticed
> that it would fall out of __BPF_REG_TYPE_MAX limit during kfunc check. Also, it
> seems on non-KASAN build it actually doesn't crash sometimes, depends on what
> the value at that offset is.
> 
>   [0]: https://github.com/kkdwivedi/linux/commits/btf-ptr-in-map

Nice. Thanks a ton. I did a quick look. Seems to be ready to post on the list?
Can you add .c tests and post?

Only one suggestion so far:
Could you get rid of callback in btf_find_struct_field ?
The callbacks obscure the control flow and make the code harder to follow.
Do refactor btf_find_struct_field, but call btf_find_ptr_to_btf_id directly from it.
Then you wouldn't need this ugly part:
/* While callback cleans up after itself, later iterations can still
 * return error without calling cb, so call free function again.
 */
if (ret < 0)
  bpf_map_free_ptr_off_tab(map);

I've been thinking about the api for refcnted PTR_TO_BTF_ID in the map too.
Regarding your issue with cmpxchg. I've come up with two helpers:

BPF_CALL_3(bpf_kptr_try_set, void **, kptr, void *, ptr, int, refcnt_off)
{
        /* ptr is ptr_to_btf_id returned from bpf_*_lookup() with ptr->refcnt >= 1
         * refcount_inc() has to be done before cmpxchg() because
         * another cpu might do bpf_kptr_xchg+release.
         */
        refcount_inc((refcount_t *)(ptr + refcnt_off));
        if (cmpxchg(kptr, NULL, ptr)) {
                /* refcnt >= 2 here */
                refcount_dec((refcount_t *)(ptr + refcnt_off));
                return -EBUSY;
        }
        return 0;
}

and

BPF_CALL_2(bpf_kptr_get, void **, kptr, int, refcnt_off)
{
        void *ptr = READ_ONCE(kptr);

        if (!ptr)
                return 0;
        /* ptr->refcnt could be == 0 if another cpu did
         * ptr2 = bpf_kptr_xchg();
         * bpf_*_release(ptr2);
         */
        if (!refcount_inc_not_zero((refcount_t *)(ptr + refcnt_off)))
                return 0;
        return (long) ptr;
}


and instead of recognizing xchg insn directly in the verifier
I went with the helper:
BPF_CALL_2(bpf_kptr_xchg, void **, kptr, void *, ptr)
{
        /* ptr is ptr_to_btf_id returned from bpf_*_lookup() with ptr->refcnt >= 1
         * or ptr == NULL.
         * returns ptr_to_btf_id with refcnt >= 1 or NULL
         */
        return (long) xchg(kptr, ptr);
}
It was easier to deal with.
I don't have a strong preference for helper vs insn though.
Let's brainstorm after you post patches.

xchg or bpf_kptr_xchg only operation seems to be too limiting.
For example we might want to remember struct task_struct or struct net
in a map for faster access.
The __bpf_nf_ct_lookup is doing get_net_ns_by_id().
idr_find isn't fast enough for XDP.
We can store refcnted 'struct net *' for faster lookup.
Then multiple cpus will be reading that pointer the map, but if only 'xchg' is
available that use case won't work. One cpu will win the race.
So we need bpf_kptr_get() will be an acquire helper.
The bpf prog will do an explicit put_net(ptr); which will be a release kfunc.

Simply reading refcnted ptr_to_btf_id from a map is PTR_UNTRUSTED too for sleepable bpf progs.
Such pointers are under rcu_read_lock in non-sleepable, but it still feels safer
to always enforce refcnt inc/dec when accessing them to pass into helpers.

> I was planning to send a verifier test exercising this but it seems
> fixup_kfunc_btf_id support for test_verifier.c is not in bpf tree yet, so when

Obviously we have to wait until the merge window.

> it is merged I will provide a small test case, it is basically this on bpf-next:
> 
> diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
> index 829be2b9e08e..5f26007ceef1 100644
> --- a/tools/testing/selftests/bpf/verifier/calls.c
> +++ b/tools/testing/selftests/bpf/verifier/calls.c
> @@ -1,3 +1,22 @@
> +{
> +       "calls: trigger reg->type > __BPF_REG_TYPE_MAX",
> +       .insns = {
> +       BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
> +       BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
> +       BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0),
> +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
> +       BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
> +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
> +       BPF_EXIT_INSN(),
> +       },
> +       .prog_type = BPF_PROG_TYPE_SCHED_CLS,
> +       .result = REJECT,
> +       .errstr = "XXX,
> +       .fixup_kfunc_btf_id = {
> +               { "bpf_kfunc_call_test_acquire", 3 },
> +               { "bpf_kfunc_call_test_release", 5 },
> +       },
> +},
>  {
> 
> Sending it rn I think may lead to flaky CI, so we can wait.

Right. Let's wait until bpf tree merges into bpf-next.

> If all of this makes sense, should I respin?

Please do.
We can get bpf PR out asap after that.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf] bpf: Fix crash due to OOB access when reg->type > __BPF_REG_TYPE_MAX
  2022-02-16 19:49         ` Alexei Starovoitov
@ 2022-02-16 20:58           ` Kumar Kartikeya Dwivedi
  2022-02-16 21:44             ` Alexei Starovoitov
  0 siblings, 1 reply; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-16 20:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Hao Luo, Toke Høiland-Jørgensen

On Thu, Feb 17, 2022 at 01:19:56AM IST, Alexei Starovoitov wrote:
> On Wed, Feb 16, 2022 at 11:59:54PM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > > Just use reg2btf_ids[base_type(reg->type)] instead?
> > > >
> > > > That would be incorrect I think, then we'd allow e.g. PTR_TO_TCP_SOCK_OR_NULL
> > > > and treat it as PTR_TO_TCP_SOCK, while current code only wants to permit
> > > > non-NULL variants for these three.
> > >
> > > add && !type_flag(reg->type) ?
> > >
> >
> > Ok, we can do that. WRT Hao's suggestion, we do allow NULL for ptr_to_mem_ok
> > case, so doing it for all of check_func_arg_match won't work.
>
> Right.
>
> > > But, first, please describe how you found it.
> > > Tried to pass PTR_TO_BTF_ID_OR_NULL into kfunc?
> > > Do you have some other changes in the verifier?
> >
> > Yes, was working on [0], tried to come up with a test case where verifier
> > printed bad register type being passed (one marked with a new flag), but noticed
> > that it would fall out of __BPF_REG_TYPE_MAX limit during kfunc check. Also, it
> > seems on non-KASAN build it actually doesn't crash sometimes, depends on what
> > the value at that offset is.
> >
> >   [0]: https://github.com/kkdwivedi/linux/commits/btf-ptr-in-map
>
> Nice. Thanks a ton. I did a quick look. Seems to be ready to post on the list?
> Can you add .c tests and post?
>

Thanks for the quick review!

Yes, I should be able to post by this weekend if I don't run into any other
problems/bugs in my code.

> Only one suggestion so far:
> Could you get rid of callback in btf_find_struct_field ?
> The callbacks obscure the control flow and make the code harder to follow.
> Do refactor btf_find_struct_field, but call btf_find_ptr_to_btf_id directly from it.
> Then you wouldn't need this ugly part:
> /* While callback cleans up after itself, later iterations can still
>  * return error without calling cb, so call free function again.
>  */
> if (ret < 0)
>   bpf_map_free_ptr_off_tab(map);
>

Ok, makes sense, will change.

> I've been thinking about the api for refcnted PTR_TO_BTF_ID in the map too.
> Regarding your issue with cmpxchg. I've come up with two helpers:
>
> BPF_CALL_3(bpf_kptr_try_set, void **, kptr, void *, ptr, int, refcnt_off)
> {
>         /* ptr is ptr_to_btf_id returned from bpf_*_lookup() with ptr->refcnt >= 1
>          * refcount_inc() has to be done before cmpxchg() because
>          * another cpu might do bpf_kptr_xchg+release.
>          */
>         refcount_inc((refcount_t *)(ptr + refcnt_off));
>         if (cmpxchg(kptr, NULL, ptr)) {
>                 /* refcnt >= 2 here */
>                 refcount_dec((refcount_t *)(ptr + refcnt_off));
>                 return -EBUSY;
>         }
>         return 0;
> }
>
> and
>
> BPF_CALL_2(bpf_kptr_get, void **, kptr, int, refcnt_off)
> {
>         void *ptr = READ_ONCE(kptr);
>
>         if (!ptr)
>                 return 0;
>         /* ptr->refcnt could be == 0 if another cpu did
>          * ptr2 = bpf_kptr_xchg();
>          * bpf_*_release(ptr2);
>          */
>         if (!refcount_inc_not_zero((refcount_t *)(ptr + refcnt_off)))
>                 return 0;
>         return (long) ptr;
> }
>

Good point. I did think about this problem. Right now when you load a referenced
pointer from the BPF map, we mark it as PTR_UNTRUSTED, so you can dereference
but not pass into a helper like this, so my thinking was that BPF user will
already have an additional check (e.g. pointer being reachable from hash table
key) before accessing it, and that the xchg can happen after the other program
has removed the value's visibility globally (by deleting it from table). And for
RCU objects dereference of PTR_UNTRUSTED still works and reads valid data, they
can also skip reading by observing ptr->refcnt == 0 for dying case.

So in case of fast path as long as you don't need to pass BTF ID into helpers,
you don't have to increment refcount.

As you note, this is not enough, e.g. we might want to pass a referenced pointer
into helpers without removing it from the map. In that case such bpf_kptr_get
helper makes sense, but I think 1) It needs to be inlined, because refcount_off
will only be available easily at verification time (from BTF), 2) We cannot
guarantee that it keeps working for same object across kernel versions, that
ties kernel into a stability guarantee reg. internal changes, 3) We can only
support it for subset of objects (that have RCU protection), which we can
ideally detect from BTF itself, by marking them as such, or using a flag during
destructor registration (which is done for ref'd pointer case).

>
> and instead of recognizing xchg insn directly in the verifier
> I went with the helper:
> BPF_CALL_2(bpf_kptr_xchg, void **, kptr, void *, ptr)
> {
>         /* ptr is ptr_to_btf_id returned from bpf_*_lookup() with ptr->refcnt >= 1
>          * or ptr == NULL.
>          * returns ptr_to_btf_id with refcnt >= 1 or NULL
>          */
>         return (long) xchg(kptr, ptr);
> }
> It was easier to deal with.
> I don't have a strong preference for helper vs insn though.
> Let's brainstorm after you post patches.
>
> xchg or bpf_kptr_xchg only operation seems to be too limiting.

Right, one other usecase me and Toke were exploring is using this support to
enable XDP queueing. We can have a normal PIFO map and allow embedding xdp_frame
in it. Then xchg is used during enqueue and dequeue from the map, and user can
have additional stuff in map value (like embedding both bpf_timer and xdp_frame,
or extra scalars as data), which should allow more composition/flexibility.

To also reduce the overhead of this xchg on fast path, we can consider relaxing
atomic restriction for per-CPU maps instead, then add a bpf_move_ptr helper that
is non-atomic, but ensures invariant that value in per-CPU map is either
refcounted or NULL, and only in program or map at any point of time. Then the
overhead of moving object ownership is also eliminated, and helper itself can be
inlined to a load from the map and a store to the map (of the new pointer or
NULL).

From my notes:

  6 * Introduce per-CPU map support, allow non-atomic update for ref'd pointer, and
  7   enforce consistent value using a ptr = bpf_move_ptr(&map_val_ptr->ptr, ptr);
  8
  9   To move value out of map: ptr = bpf_move_ptr(&val->ptr, NULL)
 10   To move value into the map: ptr = bpf_move_ptr(&val->ptr, ptr)

In both cases, reference state for R0 (with PTR_MAYBE_NULL set) is created, and
since per-CPU map is only accessible to that CPU, we don't need additional
protections, as long as we also block userspace bpf_map_update_elem in this
case, or only permit storing such pointers with per-CPU map setting
BPF_MAP_F_RDONLY_USER flag.

Will revisit this in detail when I post it, code for this is missing from
GitHub.

> For example we might want to remember struct task_struct or struct net
> in a map for faster access.
> The __bpf_nf_ct_lookup is doing get_net_ns_by_id().
> idr_find isn't fast enough for XDP.
> We can store refcnted 'struct net *' for faster lookup.
> Then multiple cpus will be reading that pointer the map, but if only 'xchg' is
> available that use case won't work. One cpu will win the race.
> So we need bpf_kptr_get() will be an acquire helper.
> The bpf prog will do an explicit put_net(ptr); which will be a release kfunc.
>
> Simply reading refcnted ptr_to_btf_id from a map is PTR_UNTRUSTED too for sleepable bpf progs.

It is already marked PTR_UNTRUSTED for sleepable too, IIUC, but based on above I
guess we can remove the flag for objects which are RCU protected, in case of
non-sleepable progs. We also need to prevent the kptr_get helper from being
called from sleepable progs.

> Such pointers are under rcu_read_lock in non-sleepable, but it still feels safer
> to always enforce refcnt inc/dec when accessing them to pass into helpers.
>
> > I was planning to send a verifier test exercising this but it seems
> > fixup_kfunc_btf_id support for test_verifier.c is not in bpf tree yet, so when
>
> Obviously we have to wait until the merge window.
>
> > it is merged I will provide a small test case, it is basically this on bpf-next:
> >
> > diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
> > index 829be2b9e08e..5f26007ceef1 100644
> > --- a/tools/testing/selftests/bpf/verifier/calls.c
> > +++ b/tools/testing/selftests/bpf/verifier/calls.c
> > @@ -1,3 +1,22 @@
> > +{
> > +       "calls: trigger reg->type > __BPF_REG_TYPE_MAX",
> > +       .insns = {
> > +       BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
> > +       BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
> > +       BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0),
> > +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
> > +       BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
> > +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
> > +       BPF_EXIT_INSN(),
> > +       },
> > +       .prog_type = BPF_PROG_TYPE_SCHED_CLS,
> > +       .result = REJECT,
> > +       .errstr = "XXX,
> > +       .fixup_kfunc_btf_id = {
> > +               { "bpf_kfunc_call_test_acquire", 3 },
> > +               { "bpf_kfunc_call_test_release", 5 },
> > +       },
> > +},
> >  {
> >
> > Sending it rn I think may lead to flaky CI, so we can wait.
>
> Right. Let's wait until bpf tree merges into bpf-next.
>
> > If all of this makes sense, should I respin?
>
> Please do.
> We can get bpf PR out asap after that.

Sent, but looks I missed the bpf: prefix in haste :/. Please fix it up while
applying, otherwise I can respin again.

--
Kartikeya

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf] bpf: Fix crash due to OOB access when reg->type > __BPF_REG_TYPE_MAX
  2022-02-16 20:58           ` Kumar Kartikeya Dwivedi
@ 2022-02-16 21:44             ` Alexei Starovoitov
  2022-02-16 22:23               ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2022-02-16 21:44 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Hao Luo, Toke Høiland-Jørgensen

On Thu, Feb 17, 2022 at 02:28:42AM +0530, Kumar Kartikeya Dwivedi wrote:
> On Thu, Feb 17, 2022 at 01:19:56AM IST, Alexei Starovoitov wrote:
> > On Wed, Feb 16, 2022 at 11:59:54PM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > > > Just use reg2btf_ids[base_type(reg->type)] instead?
> > > > >
> > > > > That would be incorrect I think, then we'd allow e.g. PTR_TO_TCP_SOCK_OR_NULL
> > > > > and treat it as PTR_TO_TCP_SOCK, while current code only wants to permit
> > > > > non-NULL variants for these three.
> > > >
> > > > add && !type_flag(reg->type) ?
> > > >
> > >
> > > Ok, we can do that. WRT Hao's suggestion, we do allow NULL for ptr_to_mem_ok
> > > case, so doing it for all of check_func_arg_match won't work.
> >
> > Right.
> >
> > > > But, first, please describe how you found it.
> > > > Tried to pass PTR_TO_BTF_ID_OR_NULL into kfunc?
> > > > Do you have some other changes in the verifier?
> > >
> > > Yes, was working on [0], tried to come up with a test case where verifier
> > > printed bad register type being passed (one marked with a new flag), but noticed
> > > that it would fall out of __BPF_REG_TYPE_MAX limit during kfunc check. Also, it
> > > seems on non-KASAN build it actually doesn't crash sometimes, depends on what
> > > the value at that offset is.
> > >
> > >   [0]: https://github.com/kkdwivedi/linux/commits/btf-ptr-in-map
> >
> > Nice. Thanks a ton. I did a quick look. Seems to be ready to post on the list?
> > Can you add .c tests and post?
> >
> 
> Thanks for the quick review!
> 
> Yes, I should be able to post by this weekend if I don't run into any other
> problems/bugs in my code.

The sooner the better, so we can merge our efforts sooner.

> > Only one suggestion so far:
> > Could you get rid of callback in btf_find_struct_field ?
> > The callbacks obscure the control flow and make the code harder to follow.
> > Do refactor btf_find_struct_field, but call btf_find_ptr_to_btf_id directly from it.
> > Then you wouldn't need this ugly part:
> > /* While callback cleans up after itself, later iterations can still
> >  * return error without calling cb, so call free function again.
> >  */
> > if (ret < 0)
> >   bpf_map_free_ptr_off_tab(map);
> >
> 
> Ok, makes sense, will change.
> 
> > I've been thinking about the api for refcnted PTR_TO_BTF_ID in the map too.
> > Regarding your issue with cmpxchg. I've come up with two helpers:
> >
> > BPF_CALL_3(bpf_kptr_try_set, void **, kptr, void *, ptr, int, refcnt_off)
> > {
> >         /* ptr is ptr_to_btf_id returned from bpf_*_lookup() with ptr->refcnt >= 1
> >          * refcount_inc() has to be done before cmpxchg() because
> >          * another cpu might do bpf_kptr_xchg+release.
> >          */
> >         refcount_inc((refcount_t *)(ptr + refcnt_off));
> >         if (cmpxchg(kptr, NULL, ptr)) {
> >                 /* refcnt >= 2 here */
> >                 refcount_dec((refcount_t *)(ptr + refcnt_off));
> >                 return -EBUSY;
> >         }
> >         return 0;
> > }
> >
> > and
> >
> > BPF_CALL_2(bpf_kptr_get, void **, kptr, int, refcnt_off)
> > {
> >         void *ptr = READ_ONCE(kptr);
> >
> >         if (!ptr)
> >                 return 0;
> >         /* ptr->refcnt could be == 0 if another cpu did
> >          * ptr2 = bpf_kptr_xchg();
> >          * bpf_*_release(ptr2);
> >          */
> >         if (!refcount_inc_not_zero((refcount_t *)(ptr + refcnt_off)))
> >                 return 0;
> >         return (long) ptr;
> > }
> >
> 
> Good point. I did think about this problem. Right now when you load a referenced
> pointer from the BPF map, we mark it as PTR_UNTRUSTED, so you can dereference

you mean UNreferenced pointer, right? Since that's what your patch is doing.

> but not pass into a helper like this, so my thinking was that BPF user will
> already have an additional check (e.g. pointer being reachable from hash table
> key) before accessing it, and that the xchg can happen after the other program
> has removed the value's visibility globally (by deleting it from table). And for
> RCU objects dereference of PTR_UNTRUSTED still works and reads valid data, they
> can also skip reading by observing ptr->refcnt == 0 for dying case.

I was thinking whether we can allow LDX to return PTR_TO_BTF_ID | PTR_UNTRUSTED
and do explicit refcount_inc_not_zero done in bpf program by exposing
functions like maybe_get_net() as kfunc and mark them as 'acquire' functions.
After acquire the pointer would lose PTR_UNTRUSTED flag.
That's another option instead of doing bpf_kptr_get().
The advantage of bpf_kptr_get() that it's one way of operating with kernel pointers.
No need to remember which kfunc to call.
But on the other side explicit acquire call is more flexible.

> So in case of fast path as long as you don't need to pass BTF ID into helpers,
> you don't have to increment refcount.

pass btf_id into helpers? What do you mean? Pass ptr_to_btf_id into helpers?
Right. If it's PTR_UNTRUSTED then having such struct as read-only would be ok,
but that is a narrow use case.
All kernel structs are beefy. They have a ton of stuff and would typically be
passed into something else. (struct nf_conn *)->status si the only example
I can think of where refcnt-ing is not needed.

The refcnt inc/dec is very fast when it's not contended.

> As you note, this is not enough, e.g. we might want to pass a referenced pointer
> into helpers without removing it from the map. In that case such bpf_kptr_get
> helper makes sense, but I think 1) It needs to be inlined, because refcount_off
> will only be available easily at verification time (from BTF), 

Sorry that bit wasn't obvious. The refcnt_off argument to kptr_try_set and kptr_get
will be provided by the verifier. The prog cannot pass it on its own.
The verifier will do btf_id of arg2 -> offsetof(struct nf_conn, ct_general.use)
and patch R3 with that constant before the call insn.

So inlining of kptr_try_set and kptr_get is not needed.

> 2) We cannot
> guarantee that it keeps working for same object across kernel versions, that
> ties kernel into a stability guarantee reg. internal changes, 

See explanation above. It will work across kernels. bpf progs don't need to change.

> 3) We can only
> support it for subset of objects (that have RCU protection), which we can
> ideally detect from BTF itself, by marking them as such, or using a flag during
> destructor registration (which is done for ref'd pointer case).

The kptr_try_set, kptr_get, kptr_xchg will work for sleepable bpf progs too.
rcu protection is not a requirement.
Since btf_id -> offsetof(refcnt variable) mapping is done on the kernel side
it will work as allowlist for kernel structs too.
If btf_id_of(some kernel struct) is not there, then such btf_id cannot be
persisted into the map value.

> 
> >
> > and instead of recognizing xchg insn directly in the verifier
> > I went with the helper:
> > BPF_CALL_2(bpf_kptr_xchg, void **, kptr, void *, ptr)
> > {
> >         /* ptr is ptr_to_btf_id returned from bpf_*_lookup() with ptr->refcnt >= 1
> >          * or ptr == NULL.
> >          * returns ptr_to_btf_id with refcnt >= 1 or NULL
> >          */
> >         return (long) xchg(kptr, ptr);
> > }
> > It was easier to deal with.
> > I don't have a strong preference for helper vs insn though.
> > Let's brainstorm after you post patches.
> >
> > xchg or bpf_kptr_xchg only operation seems to be too limiting.
> 
> Right, one other usecase me and Toke were exploring is using this support to
> enable XDP queueing. We can have a normal PIFO map and allow embedding xdp_frame
> in it. Then xchg is used during enqueue and dequeue from the map, and user can
> have additional stuff in map value (like embedding both bpf_timer and xdp_frame,
> or extra scalars as data), which should allow more composition/flexibility.
> 
> To also reduce the overhead of this xchg on fast path, we can consider relaxing
> atomic restriction for per-CPU maps instead, then add a bpf_move_ptr helper that
> is non-atomic, but ensures invariant that value in per-CPU map is either
> refcounted or NULL, and only in program or map at any point of time. Then the
> overhead of moving object ownership is also eliminated, and helper itself can be
> inlined to a load from the map and a store to the map (of the new pointer or
> NULL).
> 
> From my notes:
> 
>   6 * Introduce per-CPU map support, allow non-atomic update for ref'd pointer, and
>   7   enforce consistent value using a ptr = bpf_move_ptr(&map_val_ptr->ptr, ptr);
>   8
>   9   To move value out of map: ptr = bpf_move_ptr(&val->ptr, NULL)
>  10   To move value into the map: ptr = bpf_move_ptr(&val->ptr, ptr)
> 
> In both cases, reference state for R0 (with PTR_MAYBE_NULL set) is created, and
> since per-CPU map is only accessible to that CPU, we don't need additional
> protections, as long as we also block userspace bpf_map_update_elem in this
> case, or only permit storing such pointers with per-CPU map setting
> BPF_MAP_F_RDONLY_USER flag.
> 
> Will revisit this in detail when I post it, code for this is missing from
> GitHub.

Please hold on to that.
It looks like premature optimization to me.
xchg is fast when it's not contended.
Adding all that per-cpu map logic won't make it faster. percpu array lookup
will cost more than xchg.
You might end up doing all that just to realize that it's slower.
Let's land the main infra first.

> > For example we might want to remember struct task_struct or struct net
> > in a map for faster access.
> > The __bpf_nf_ct_lookup is doing get_net_ns_by_id().
> > idr_find isn't fast enough for XDP.
> > We can store refcnted 'struct net *' for faster lookup.
> > Then multiple cpus will be reading that pointer the map, but if only 'xchg' is
> > available that use case won't work. One cpu will win the race.
> > So we need bpf_kptr_get() will be an acquire helper.
> > The bpf prog will do an explicit put_net(ptr); which will be a release kfunc.
> >
> > Simply reading refcnted ptr_to_btf_id from a map is PTR_UNTRUSTED too for sleepable bpf progs.
> 
> It is already marked PTR_UNTRUSTED for sleepable too, IIUC, but based on above I
> guess we can remove the flag for objects which are RCU protected, in case of
> non-sleepable progs. 

Probably not. Even non-sleepable prog under rcu_read_lock shouldn't be able 
to pass 'struct tcp_sock *' from the map into helpers.
That sock can have refcnt==0. Even under rcu it's dangerous.

> We also need to prevent the kptr_get helper from being
> called from sleepable progs.

why is that?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf] bpf: Fix crash due to OOB access when reg->type > __BPF_REG_TYPE_MAX
  2022-02-16 21:44             ` Alexei Starovoitov
@ 2022-02-16 22:23               ` Kumar Kartikeya Dwivedi
  2022-02-16 23:06                 ` Alexei Starovoitov
  0 siblings, 1 reply; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-16 22:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Hao Luo, Toke Høiland-Jørgensen

On Thu, Feb 17, 2022 at 03:14:05AM IST, Alexei Starovoitov wrote:
> On Thu, Feb 17, 2022 at 02:28:42AM +0530, Kumar Kartikeya Dwivedi wrote:
> > On Thu, Feb 17, 2022 at 01:19:56AM IST, Alexei Starovoitov wrote:
> > > On Wed, Feb 16, 2022 at 11:59:54PM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > > > > Just use reg2btf_ids[base_type(reg->type)] instead?
> > > > > >
> > > > > > That would be incorrect I think, then we'd allow e.g. PTR_TO_TCP_SOCK_OR_NULL
> > > > > > and treat it as PTR_TO_TCP_SOCK, while current code only wants to permit
> > > > > > non-NULL variants for these three.
> > > > >
> > > > > add && !type_flag(reg->type) ?
> > > > >
> > > >
> > > > Ok, we can do that. WRT Hao's suggestion, we do allow NULL for ptr_to_mem_ok
> > > > case, so doing it for all of check_func_arg_match won't work.
> > >
> > > Right.
> > >
> > > > > But, first, please describe how you found it.
> > > > > Tried to pass PTR_TO_BTF_ID_OR_NULL into kfunc?
> > > > > Do you have some other changes in the verifier?
> > > >
> > > > Yes, was working on [0], tried to come up with a test case where verifier
> > > > printed bad register type being passed (one marked with a new flag), but noticed
> > > > that it would fall out of __BPF_REG_TYPE_MAX limit during kfunc check. Also, it
> > > > seems on non-KASAN build it actually doesn't crash sometimes, depends on what
> > > > the value at that offset is.
> > > >
> > > >   [0]: https://github.com/kkdwivedi/linux/commits/btf-ptr-in-map
> > >
> > > Nice. Thanks a ton. I did a quick look. Seems to be ready to post on the list?
> > > Can you add .c tests and post?
> > >
> >
> > Thanks for the quick review!
> >
> > Yes, I should be able to post by this weekend if I don't run into any other
> > problems/bugs in my code.
>
> The sooner the better, so we can merge our efforts sooner.
>
> > > Only one suggestion so far:
> > > Could you get rid of callback in btf_find_struct_field ?
> > > The callbacks obscure the control flow and make the code harder to follow.
> > > Do refactor btf_find_struct_field, but call btf_find_ptr_to_btf_id directly from it.
> > > Then you wouldn't need this ugly part:
> > > /* While callback cleans up after itself, later iterations can still
> > >  * return error without calling cb, so call free function again.
> > >  */
> > > if (ret < 0)
> > >   bpf_map_free_ptr_off_tab(map);
> > >
> >
> > Ok, makes sense, will change.
> >
> > > I've been thinking about the api for refcnted PTR_TO_BTF_ID in the map too.
> > > Regarding your issue with cmpxchg. I've come up with two helpers:
> > >
> > > BPF_CALL_3(bpf_kptr_try_set, void **, kptr, void *, ptr, int, refcnt_off)
> > > {
> > >         /* ptr is ptr_to_btf_id returned from bpf_*_lookup() with ptr->refcnt >= 1
> > >          * refcount_inc() has to be done before cmpxchg() because
> > >          * another cpu might do bpf_kptr_xchg+release.
> > >          */
> > >         refcount_inc((refcount_t *)(ptr + refcnt_off));
> > >         if (cmpxchg(kptr, NULL, ptr)) {
> > >                 /* refcnt >= 2 here */
> > >                 refcount_dec((refcount_t *)(ptr + refcnt_off));
> > >                 return -EBUSY;
> > >         }
> > >         return 0;
> > > }
> > >
> > > and
> > >
> > > BPF_CALL_2(bpf_kptr_get, void **, kptr, int, refcnt_off)
> > > {
> > >         void *ptr = READ_ONCE(kptr);
> > >
> > >         if (!ptr)
> > >                 return 0;
> > >         /* ptr->refcnt could be == 0 if another cpu did
> > >          * ptr2 = bpf_kptr_xchg();
> > >          * bpf_*_release(ptr2);
> > >          */
> > >         if (!refcount_inc_not_zero((refcount_t *)(ptr + refcnt_off)))
> > >                 return 0;
> > >         return (long) ptr;
> > > }
> > >
> >
> > Good point. I did think about this problem. Right now when you load a referenced
> > pointer from the BPF map, we mark it as PTR_UNTRUSTED, so you can dereference
>
> you mean UNreferenced pointer, right? Since that's what your patch is doing.
>

You caught a bug :). Instead of...

	else if (!ref_ptr)
		reg_flags |= PTR_UNTRUSTED;

it should simply be 'else', then for non-percpu BTF ID pointer and non-user BTF
ID pointer, only XCHG (from referenced pointer) unsets PTR_UNTRUSTED, both
CMPXCHG and LDX have untrusted bit set during mark_btf_ld_reg.

> > but not pass into a helper like this, so my thinking was that BPF user will
> > already have an additional check (e.g. pointer being reachable from hash table
> > key) before accessing it, and that the xchg can happen after the other program
> > has removed the value's visibility globally (by deleting it from table). And for
> > RCU objects dereference of PTR_UNTRUSTED still works and reads valid data, they
> > can also skip reading by observing ptr->refcnt == 0 for dying case.
>
> I was thinking whether we can allow LDX to return PTR_TO_BTF_ID | PTR_UNTRUSTED
> and do explicit refcount_inc_not_zero done in bpf program by exposing
> functions like maybe_get_net() as kfunc and mark them as 'acquire' functions.

Ack, code does not agree (see above) but I will fix it to be like this.

> After acquire the pointer would lose PTR_UNTRUSTED flag.

Right, we can allow passing PTR_TO_BTF_ID | PTR_UNTRUSTED only for this specific
inc_not_zero helpers.

> That's another option instead of doing bpf_kptr_get().
> The advantage of bpf_kptr_get() that it's one way of operating with kernel pointers.
> No need to remember which kfunc to call.
> But on the other side explicit acquire call is more flexible.
>

Right, I'm leaning towards an explicit call as well, atleast for now, using
unstable kfunc.

> > So in case of fast path as long as you don't need to pass BTF ID into helpers,
> > you don't have to increment refcount.
>
> pass btf_id into helpers? What do you mean? Pass ptr_to_btf_id into helpers?

Sorry, yes.

> Right. If it's PTR_UNTRUSTED then having such struct as read-only would be ok,
> but that is a narrow use case.
> All kernel structs are beefy. They have a ton of stuff and would typically be
> passed into something else. (struct nf_conn *)->status si the only example
> I can think of where refcnt-ing is not needed.
>
> The refcnt inc/dec is very fast when it's not contended.
>
> > As you note, this is not enough, e.g. we might want to pass a referenced pointer
> > into helpers without removing it from the map. In that case such bpf_kptr_get
> > helper makes sense, but I think 1) It needs to be inlined, because refcount_off
> > will only be available easily at verification time (from BTF),
>
> Sorry that bit wasn't obvious. The refcnt_off argument to kptr_try_set and kptr_get
> will be provided by the verifier. The prog cannot pass it on its own.
> The verifier will do btf_id of arg2 -> offsetof(struct nf_conn, ct_general.use)
> and patch R3 with that constant before the call insn.
>
> So inlining of kptr_try_set and kptr_get is not needed.
>

Ah, right, we can pass a hidden parameter. I was under the impression you'd want
to hardcode the offset and call during verification/JIT time itself, though we
can always change how its implemented later. Makes no difference in terms of
performance I think.

> > 2) We cannot
> > guarantee that it keeps working for same object across kernel versions, that
> > ties kernel into a stability guarantee reg. internal changes,
>
> See explanation above. It will work across kernels. bpf progs don't need to change.
>
> > 3) We can only
> > support it for subset of objects (that have RCU protection), which we can
> > ideally detect from BTF itself, by marking them as such, or using a flag during
> > destructor registration (which is done for ref'd pointer case).
>
> The kptr_try_set, kptr_get, kptr_xchg will work for sleepable bpf progs too.
> rcu protection is not a requirement.
> Since btf_id -> offsetof(refcnt variable) mapping is done on the kernel side
> it will work as allowlist for kernel structs too.
> If btf_id_of(some kernel struct) is not there, then such btf_id cannot be
> persisted into the map value.
>
> >
> > >
> > > and instead of recognizing xchg insn directly in the verifier
> > > I went with the helper:
> > > BPF_CALL_2(bpf_kptr_xchg, void **, kptr, void *, ptr)
> > > {
> > >         /* ptr is ptr_to_btf_id returned from bpf_*_lookup() with ptr->refcnt >= 1
> > >          * or ptr == NULL.
> > >          * returns ptr_to_btf_id with refcnt >= 1 or NULL
> > >          */
> > >         return (long) xchg(kptr, ptr);
> > > }
> > > It was easier to deal with.
> > > I don't have a strong preference for helper vs insn though.
> > > Let's brainstorm after you post patches.
> > >
> > > xchg or bpf_kptr_xchg only operation seems to be too limiting.
> >
> > Right, one other usecase me and Toke were exploring is using this support to
> > enable XDP queueing. We can have a normal PIFO map and allow embedding xdp_frame
> > in it. Then xchg is used during enqueue and dequeue from the map, and user can
> > have additional stuff in map value (like embedding both bpf_timer and xdp_frame,
> > or extra scalars as data), which should allow more composition/flexibility.
> >
> > To also reduce the overhead of this xchg on fast path, we can consider relaxing
> > atomic restriction for per-CPU maps instead, then add a bpf_move_ptr helper that
> > is non-atomic, but ensures invariant that value in per-CPU map is either
> > refcounted or NULL, and only in program or map at any point of time. Then the
> > overhead of moving object ownership is also eliminated, and helper itself can be
> > inlined to a load from the map and a store to the map (of the new pointer or
> > NULL).
> >
> > From my notes:
> >
> >   6 * Introduce per-CPU map support, allow non-atomic update for ref'd pointer, and
> >   7   enforce consistent value using a ptr = bpf_move_ptr(&map_val_ptr->ptr, ptr);
> >   8
> >   9   To move value out of map: ptr = bpf_move_ptr(&val->ptr, NULL)
> >  10   To move value into the map: ptr = bpf_move_ptr(&val->ptr, ptr)
> >
> > In both cases, reference state for R0 (with PTR_MAYBE_NULL set) is created, and
> > since per-CPU map is only accessible to that CPU, we don't need additional
> > protections, as long as we also block userspace bpf_map_update_elem in this
> > case, or only permit storing such pointers with per-CPU map setting
> > BPF_MAP_F_RDONLY_USER flag.
> >
> > Will revisit this in detail when I post it, code for this is missing from
> > GitHub.
>
> Please hold on to that.
> It looks like premature optimization to me.
> xchg is fast when it's not contended.
> Adding all that per-cpu map logic won't make it faster. percpu array lookup
> will cost more than xchg.
> You might end up doing all that just to realize that it's slower.
> Let's land the main infra first.
>

Ok, will wait.

> > > For example we might want to remember struct task_struct or struct net
> > > in a map for faster access.
> > > The __bpf_nf_ct_lookup is doing get_net_ns_by_id().
> > > idr_find isn't fast enough for XDP.
> > > We can store refcnted 'struct net *' for faster lookup.
> > > Then multiple cpus will be reading that pointer the map, but if only 'xchg' is
> > > available that use case won't work. One cpu will win the race.
> > > So we need bpf_kptr_get() will be an acquire helper.
> > > The bpf prog will do an explicit put_net(ptr); which will be a release kfunc.
> > >
> > > Simply reading refcnted ptr_to_btf_id from a map is PTR_UNTRUSTED too for sleepable bpf progs.
> >
> > It is already marked PTR_UNTRUSTED for sleepable too, IIUC, but based on above I
> > guess we can remove the flag for objects which are RCU protected, in case of
> > non-sleepable progs.
>
> Probably not. Even non-sleepable prog under rcu_read_lock shouldn't be able
> to pass 'struct tcp_sock *' from the map into helpers.
> That sock can have refcnt==0. Even under rcu it's dangerous.
>

Good point, so instead of helpers doing the inc_not_zero each time, prog can do
it once and pass into other helpers.

> > We also need to prevent the kptr_get helper from being
> > called from sleepable progs.
>
> why is that?

My confusion comes because I don't think that normal call_rcu/synchronize_rcu
grace period for the object waits for RCU trace readers. In sleepable prog, that
same xchg+release can happen while we hold a untrusted pointer from LDX in the
RCU trace section. Then I think that kptr_get may be wrong, because object may
already have been freed without waiting for sleepable prog to call
rcu_read_unlock_trace.

Please correct me if I'm wrong, or misunderstood this case.

Or did you mean we will fix this for objects if exposing to sleepable progs, so
that we can make kptr_get work?

--
Kartikeya

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf] bpf: Fix crash due to OOB access when reg->type > __BPF_REG_TYPE_MAX
  2022-02-16 22:23               ` Kumar Kartikeya Dwivedi
@ 2022-02-16 23:06                 ` Alexei Starovoitov
  0 siblings, 0 replies; 10+ messages in thread
From: Alexei Starovoitov @ 2022-02-16 23:06 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Hao Luo, Toke Høiland-Jørgensen

On Thu, Feb 17, 2022 at 03:53:41AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > >
> > >
> > > Good point. I did think about this problem. Right now when you load a referenced
> > > pointer from the BPF map, we mark it as PTR_UNTRUSTED, so you can dereference
> >
> > you mean UNreferenced pointer, right? Since that's what your patch is doing.
> >
> 
> You caught a bug :). Instead of...
> 
> 	else if (!ref_ptr)
> 		reg_flags |= PTR_UNTRUSTED;
> 
> it should simply be 'else', then for non-percpu BTF ID pointer and non-user BTF
> ID pointer, only XCHG (from referenced pointer) unsets PTR_UNTRUSTED, both
> CMPXCHG and LDX have untrusted bit set during mark_btf_ld_reg.

btw please drop CMPXCHG for now.
The "} else if (is_cmpxchg_insn(insn)) {" part looks incomplete.
The verifier cannot support acquire/release semantics, so the value
of cmpxchg is dubious.
Let's brainstorm after the main pieces land.

> > >
> > > It is already marked PTR_UNTRUSTED for sleepable too, IIUC, but based on above I
> > > guess we can remove the flag for objects which are RCU protected, in case of
> > > non-sleepable progs.
> >
> > Probably not. Even non-sleepable prog under rcu_read_lock shouldn't be able
> > to pass 'struct tcp_sock *' from the map into helpers.
> > That sock can have refcnt==0. Even under rcu it's dangerous.
> >
> 
> Good point, so instead of helpers doing the inc_not_zero each time, prog can do
> it once and pass into other helpers.

The kptr_get helper will do it once too.
So it's the same overhead.

> > > We also need to prevent the kptr_get helper from being
> > > called from sleepable progs.
> >
> > why is that?
> 
> My confusion comes because I don't think that normal call_rcu/synchronize_rcu
> grace period for the object waits for RCU trace readers. In sleepable prog, that
> same xchg+release can happen while we hold a untrusted pointer from LDX in the
> RCU trace section. Then I think that kptr_get may be wrong, because object may
> already have been freed without waiting for sleepable prog to call
> rcu_read_unlock_trace.

Right. If we go with the helper approach we can do:
BPF_CALL_2(bpf_kptr_get, void **, kptr, int, refcnt_off)
{
        void *ptr;

        rcu_read_lock();
        ptr = READ_ONCE(kptr);

        if (!ptr)
                goto out;
        /* ptr->refcnt could be == 0 if another cpu did
         * ptr2 = bpf_kptr_xchg();
         * bpf_*_release(ptr2);
         */
        if (!refcount_inc_not_zero((refcount_t *)(ptr + refcnt_off)))
                goto out;
        rcu_read_unlock();
        return (long) ptr;
out:
        rcu_read_unlock();
        return 0;
}

and it will work in sleepable progs.
But if we allow direct LDX to return PTR_TO_BTF_ID | PTR_UNTRUSTED
then we need to support explicit bpf_rcu_read_lock|unlock() done
inside bpf prog which is unnecessary overhead for everyone put PREEMPT=y.
Plus a bunch of additional complexity in the verifier to support
explicit RCU section.
Pros and cons.
Anyway post your patches and will get it from there.

I've been thinking about
obj = bpf_obj_alloc(bpf_core_type_id_local(struct my_obj), BPF_TYPE_ID_LOCAL);
...
bpf_obj_put(obj);
on top of refcnted ptr_to_btf_id stored into maps.
That's another huge discussion.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-02-16 23:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16  9:13 [PATCH bpf] bpf: Fix crash due to OOB access when reg->type > __BPF_REG_TYPE_MAX Kumar Kartikeya Dwivedi
2022-02-16 15:45 ` Alexei Starovoitov
2022-02-16 17:33   ` Kumar Kartikeya Dwivedi
2022-02-16 18:19     ` Alexei Starovoitov
2022-02-16 18:29       ` Kumar Kartikeya Dwivedi
2022-02-16 19:49         ` Alexei Starovoitov
2022-02-16 20:58           ` Kumar Kartikeya Dwivedi
2022-02-16 21:44             ` Alexei Starovoitov
2022-02-16 22:23               ` Kumar Kartikeya Dwivedi
2022-02-16 23:06                 ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.