All of lore.kernel.org
 help / color / mirror / Atom feed
* Calling kfuncs in modules - BTF mismatch?
@ 2022-11-11 14:21 Toke Høiland-Jørgensen
  2022-11-11 18:08 ` Lorenzo Bianconi
  0 siblings, 1 reply; 10+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-11 14:21 UTC (permalink / raw)
  To: Lorenzo Bianconi, Kumar Kartikeya Dwivedi, Andrii Nakryiko; +Cc: bpf, Jiri Benc

Hi everyone

There seems to be some issue with BTF mismatch when trying to run the
bpf_ct_set_nat_info() kfunc from a module. I was under the impression
that this is supposed to work, so is there some kind of BTF dedup issue
here or something?

Steps to reproduce:

1. Compile kernel with nf_conntrack built-in and run selftests;
   './test_progs -a bpf_nf' works

2. Change the kernel config so nf_conntrack is build as a module

3. Start the test kernel and manually modprobe nf_conntrack and nf_nat

4. Run ./test_progs -a bpf_nf; this now fails with an error like:

kernel function bpf_ct_set_nat_info args#0 expected pointer to STRUCT nf_conn___init but R1 has a pointer to STRUCT nf_conn___init

Anyone has any ideas what's going on here, and how to fix it?

-Toke


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Calling kfuncs in modules - BTF mismatch?
  2022-11-11 14:21 Calling kfuncs in modules - BTF mismatch? Toke Høiland-Jørgensen
@ 2022-11-11 18:08 ` Lorenzo Bianconi
  2022-11-13 18:04   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 10+ messages in thread
From: Lorenzo Bianconi @ 2022-11-11 18:08 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Kumar Kartikeya Dwivedi, Andrii Nakryiko, bpf, Jiri Benc,
	Lorenzo Bianconi

>
> Hi everyone
>
> There seems to be some issue with BTF mismatch when trying to run the
> bpf_ct_set_nat_info() kfunc from a module. I was under the impression
> that this is supposed to work, so is there some kind of BTF dedup issue
> here or something?
>
> Steps to reproduce:
>
> 1. Compile kernel with nf_conntrack built-in and run selftests;
>    './test_progs -a bpf_nf' works
>
> 2. Change the kernel config so nf_conntrack is build as a module
>
> 3. Start the test kernel and manually modprobe nf_conntrack and nf_nat
>
> 4. Run ./test_progs -a bpf_nf; this now fails with an error like:
>
> kernel function bpf_ct_set_nat_info args#0 expected pointer to STRUCT nf_conn___init but R1 has a pointer to STRUCT nf_conn___init

This week Kumar and I took a look at this issue and we ended up
identifying a duplication of nf_conn___init structure. In particular:

[~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
net/netfilter/nf_conntrack.ko format raw | grep nf_conn__
[110941] STRUCT 'nf_conn___init' size=248 vlen=1
[~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
net/netfilter/nf_nat.ko format raw | grep nf_conn__
[107488] STRUCT 'nf_conn___init' size=248 vlen=1

Is it the root cause of the problem?

Regards,
Lorenzo

>
> Anyone has any ideas what's going on here, and how to fix it?
>
> -Toke
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Calling kfuncs in modules - BTF mismatch?
  2022-11-11 18:08 ` Lorenzo Bianconi
@ 2022-11-13 18:04   ` Toke Høiland-Jørgensen
  2022-11-29 15:00     ` Artem Savkov
  2022-11-29 16:21     ` Alan Maguire
  0 siblings, 2 replies; 10+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-13 18:04 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Kumar Kartikeya Dwivedi, Andrii Nakryiko, bpf, Jiri Benc,
	Lorenzo Bianconi

Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes:

>>
>> Hi everyone
>>
>> There seems to be some issue with BTF mismatch when trying to run the
>> bpf_ct_set_nat_info() kfunc from a module. I was under the impression
>> that this is supposed to work, so is there some kind of BTF dedup issue
>> here or something?
>>
>> Steps to reproduce:
>>
>> 1. Compile kernel with nf_conntrack built-in and run selftests;
>>    './test_progs -a bpf_nf' works
>>
>> 2. Change the kernel config so nf_conntrack is build as a module
>>
>> 3. Start the test kernel and manually modprobe nf_conntrack and nf_nat
>>
>> 4. Run ./test_progs -a bpf_nf; this now fails with an error like:
>>
>> kernel function bpf_ct_set_nat_info args#0 expected pointer to STRUCT nf_conn___init but R1 has a pointer to STRUCT nf_conn___init
>
> This week Kumar and I took a look at this issue and we ended up
> identifying a duplication of nf_conn___init structure. In particular:
>
> [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
> net/netfilter/nf_conntrack.ko format raw | grep nf_conn__
> [110941] STRUCT 'nf_conn___init' size=248 vlen=1
> [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
> net/netfilter/nf_nat.ko format raw | grep nf_conn__
> [107488] STRUCT 'nf_conn___init' size=248 vlen=1
>
> Is it the root cause of the problem?

It certainly seems to be related to it, at least. Amending the log
message to include the BTF object IDs of the two versions shows that the
register has a reference to nf_conn__init in nf_conntrack.ko, while the kernel
expects it to point to nf_nat.ko.

Not sure what's the right fix for this? Should libbpf be smart enough to
pull the kfunc arg ID from the same BTF ID as the function itself? Or
should the kernel compare structs and allow things if they're identical?
Andrii, WDYT?

-Toke


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Calling kfuncs in modules - BTF mismatch?
  2022-11-13 18:04   ` Toke Høiland-Jørgensen
@ 2022-11-29 15:00     ` Artem Savkov
  2022-11-29 20:12       ` Toke Høiland-Jørgensen
  2022-11-29 16:21     ` Alan Maguire
  1 sibling, 1 reply; 10+ messages in thread
From: Artem Savkov @ 2022-11-29 15:00 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Lorenzo Bianconi, Kumar Kartikeya Dwivedi, Andrii Nakryiko, bpf,
	Jiri Benc, Lorenzo Bianconi

On Sun, Nov 13, 2022 at 07:04:43PM +0100, Toke Høiland-Jørgensen wrote:
> Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes:
> 
> >>
> >> Hi everyone
> >>
> >> There seems to be some issue with BTF mismatch when trying to run the
> >> bpf_ct_set_nat_info() kfunc from a module. I was under the impression
> >> that this is supposed to work, so is there some kind of BTF dedup issue
> >> here or something?
> >>
> >> Steps to reproduce:
> >>
> >> 1. Compile kernel with nf_conntrack built-in and run selftests;
> >>    './test_progs -a bpf_nf' works
> >>
> >> 2. Change the kernel config so nf_conntrack is build as a module
> >>
> >> 3. Start the test kernel and manually modprobe nf_conntrack and nf_nat
> >>
> >> 4. Run ./test_progs -a bpf_nf; this now fails with an error like:
> >>
> >> kernel function bpf_ct_set_nat_info args#0 expected pointer to STRUCT nf_conn___init but R1 has a pointer to STRUCT nf_conn___init
> >
> > This week Kumar and I took a look at this issue and we ended up
> > identifying a duplication of nf_conn___init structure. In particular:
> >
> > [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
> > net/netfilter/nf_conntrack.ko format raw | grep nf_conn__
> > [110941] STRUCT 'nf_conn___init' size=248 vlen=1
> > [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
> > net/netfilter/nf_nat.ko format raw | grep nf_conn__
> > [107488] STRUCT 'nf_conn___init' size=248 vlen=1
> >
> > Is it the root cause of the problem?
> 
> It certainly seems to be related to it, at least. Amending the log
> message to include the BTF object IDs of the two versions shows that the
> register has a reference to nf_conn__init in nf_conntrack.ko, while the kernel
> expects it to point to nf_nat.ko.
> 
> Not sure what's the right fix for this? Should libbpf be smart enough to
> pull the kfunc arg ID from the same BTF ID as the function itself? Or

Verifier chose the ID based on where the variable was allocated, which
is bpf_(skb|xdp)_ct_alloc() from nf_conntrack.ko. Assuming btf id based
on usage location might be error prone.

> should the kernel compare structs and allow things if they're identical?
> Andrii, WDYT?

This works but might make verifier slower.

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 195d24316750..562d2c15906d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -24,6 +24,7 @@
 #include <linux/bpf_lsm.h>
 #include <linux/btf_ids.h>
 #include <linux/poison.h>
+#include "../tools/lib/bpf/relo_core.h"

 #include "disasm.h"

@@ -8236,7 +8237,8 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,

        reg_ref_t = btf_type_skip_modifiers(reg_btf, reg_ref_id, &reg_ref_id);
        reg_ref_tname = btf_name_by_offset(reg_btf, reg_ref_t->name_off);
-       if (!btf_struct_ids_match(&env->log, reg_btf, reg_ref_id, reg->off, meta->btf, ref_id, strict_type_match)) {
+       if (!btf_struct_ids_match(&env->log, reg_btf, reg_ref_id, reg->off, meta->btf, ref_id, strict_type_match) && \
+                       !__bpf_core_types_match(reg_btf, reg_ref_id, meta->btf, ref_id, false, 10)) {
                verbose(env, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n",
                        meta->func_name, argno, btf_type_str(ref_t), ref_tname, argno + 1,
                        btf_type_str(reg_ref_t), reg_ref_tname);

-- 
Regards,
  Artem


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Calling kfuncs in modules - BTF mismatch?
  2022-11-13 18:04   ` Toke Høiland-Jørgensen
  2022-11-29 15:00     ` Artem Savkov
@ 2022-11-29 16:21     ` Alan Maguire
  2022-11-29 19:41       ` Arnaldo Carvalho de Melo
  2022-11-29 20:12       ` Toke Høiland-Jørgensen
  1 sibling, 2 replies; 10+ messages in thread
From: Alan Maguire @ 2022-11-29 16:21 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Lorenzo Bianconi,
	Arnaldo Carvalho de Melo
  Cc: Kumar Kartikeya Dwivedi, Andrii Nakryiko, bpf, Jiri Benc,
	Lorenzo Bianconi

On 13/11/2022 18:04, Toke Høiland-Jørgensen wrote:
> Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes:
> 
>>>
>>> Hi everyone
>>>
>>> There seems to be some issue with BTF mismatch when trying to run the
>>> bpf_ct_set_nat_info() kfunc from a module. I was under the impression
>>> that this is supposed to work, so is there some kind of BTF dedup issue
>>> here or something?
>>>
>>> Steps to reproduce:
>>>
>>> 1. Compile kernel with nf_conntrack built-in and run selftests;
>>>    './test_progs -a bpf_nf' works
>>>
>>> 2. Change the kernel config so nf_conntrack is build as a module
>>>
>>> 3. Start the test kernel and manually modprobe nf_conntrack and nf_nat
>>>
>>> 4. Run ./test_progs -a bpf_nf; this now fails with an error like:
>>>
>>> kernel function bpf_ct_set_nat_info args#0 expected pointer to STRUCT nf_conn___init but R1 has a pointer to STRUCT nf_conn___init
>>
>> This week Kumar and I took a look at this issue and we ended up
>> identifying a duplication of nf_conn___init structure. In particular:
>>
>> [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
>> net/netfilter/nf_conntrack.ko format raw | grep nf_conn__
>> [110941] STRUCT 'nf_conn___init' size=248 vlen=1
>> [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
>> net/netfilter/nf_nat.ko format raw | grep nf_conn__
>> [107488] STRUCT 'nf_conn___init' size=248 vlen=1
>>
>> Is it the root cause of the problem?
> 
> It certainly seems to be related to it, at least. Amending the log
> message to include the BTF object IDs of the two versions shows that the
> register has a reference to nf_conn__init in nf_conntrack.ko, while the kernel
> expects it to point to nf_nat.ko.
> 
> Not sure what's the right fix for this? Should libbpf be smart enough to
> pull the kfunc arg ID from the same BTF ID as the function itself? Or
> should the kernel compare structs and allow things if they're identical?
> Andrii, WDYT?
> 

There were some dedup issues fixed recently in pahole
and libbpf; since dwarves libbpf hasn't been synced with
libbpf recently as far as I can see it won't have the fix 
for [1]; I suspect it may help with dedup-ing here. Would 
probably be worth trying rebuilding dwarves with a libbpf 
with [1] applied and seeing if the dedup issue goes away
before we go any further. If it fixes the issue, would it
be worth updating the libbpf that dwarves uses Arnaldo?
I saw some pretty large improvements in removing
redundant definitions.

Thanks!

Alan

https://lore.kernel.org/bpf/1666622309-22289-1-git-send-email-alan.maguire@oracle.com/


> -Toke
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Calling kfuncs in modules - BTF mismatch?
  2022-11-29 16:21     ` Alan Maguire
@ 2022-11-29 19:41       ` Arnaldo Carvalho de Melo
  2022-11-29 20:12       ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-11-29 19:41 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Toke Høiland-Jørgensen, Lorenzo Bianconi,
	Kumar Kartikeya Dwivedi, Andrii Nakryiko, bpf, Jiri Benc,
	Lorenzo Bianconi

Em Tue, Nov 29, 2022 at 04:21:23PM +0000, Alan Maguire escreveu:
> On 13/11/2022 18:04, Toke Høiland-Jørgensen wrote:
> > Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes:
> > 
> >>>
> >>> Hi everyone
> >>>
> >>> There seems to be some issue with BTF mismatch when trying to run the
> >>> bpf_ct_set_nat_info() kfunc from a module. I was under the impression
> >>> that this is supposed to work, so is there some kind of BTF dedup issue
> >>> here or something?
> >>>
> >>> Steps to reproduce:
> >>>
> >>> 1. Compile kernel with nf_conntrack built-in and run selftests;
> >>>    './test_progs -a bpf_nf' works
> >>>
> >>> 2. Change the kernel config so nf_conntrack is build as a module
> >>>
> >>> 3. Start the test kernel and manually modprobe nf_conntrack and nf_nat
> >>>
> >>> 4. Run ./test_progs -a bpf_nf; this now fails with an error like:
> >>>
> >>> kernel function bpf_ct_set_nat_info args#0 expected pointer to STRUCT nf_conn___init but R1 has a pointer to STRUCT nf_conn___init
> >>
> >> This week Kumar and I took a look at this issue and we ended up
> >> identifying a duplication of nf_conn___init structure. In particular:
> >>
> >> [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
> >> net/netfilter/nf_conntrack.ko format raw | grep nf_conn__
> >> [110941] STRUCT 'nf_conn___init' size=248 vlen=1
> >> [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
> >> net/netfilter/nf_nat.ko format raw | grep nf_conn__
> >> [107488] STRUCT 'nf_conn___init' size=248 vlen=1
> >>
> >> Is it the root cause of the problem?
> > 
> > It certainly seems to be related to it, at least. Amending the log
> > message to include the BTF object IDs of the two versions shows that the
> > register has a reference to nf_conn__init in nf_conntrack.ko, while the kernel
> > expects it to point to nf_nat.ko.
> > 
> > Not sure what's the right fix for this? Should libbpf be smart enough to
> > pull the kfunc arg ID from the same BTF ID as the function itself? Or
> > should the kernel compare structs and allow things if they're identical?
> > Andrii, WDYT?
> > 
> 
> There were some dedup issues fixed recently in pahole
> and libbpf; since dwarves libbpf hasn't been synced with
> libbpf recently as far as I can see it won't have the fix 
> for [1]; I suspect it may help with dedup-ing here. Would 
> probably be worth trying rebuilding dwarves with a libbpf 
> with [1] applied and seeing if the dedup issue goes away
> before we go any further. If it fixes the issue, would it
> be worth updating the libbpf that dwarves uses Arnaldo?
> I saw some pretty large improvements in removing
> redundant definitions.

Feel free to send a patch updating the libbpf copy in pahole, so that I
can go on testing it.

I'm working on supporting DW_TAG_imported_unit and split DWARF (.dwz) so
as soon as I get that properly tested and make sure it doesn't regresses
wrt BTF encoding I'll cut a new version.

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Calling kfuncs in modules - BTF mismatch?
  2022-11-29 16:21     ` Alan Maguire
  2022-11-29 19:41       ` Arnaldo Carvalho de Melo
@ 2022-11-29 20:12       ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 10+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-29 20:12 UTC (permalink / raw)
  To: Alan Maguire, Lorenzo Bianconi, Arnaldo Carvalho de Melo
  Cc: Kumar Kartikeya Dwivedi, Andrii Nakryiko, bpf, Jiri Benc,
	Lorenzo Bianconi

Alan Maguire <alan.maguire@oracle.com> writes:

> On 13/11/2022 18:04, Toke Høiland-Jørgensen wrote:
>> Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes:
>> 
>>>>
>>>> Hi everyone
>>>>
>>>> There seems to be some issue with BTF mismatch when trying to run the
>>>> bpf_ct_set_nat_info() kfunc from a module. I was under the impression
>>>> that this is supposed to work, so is there some kind of BTF dedup issue
>>>> here or something?
>>>>
>>>> Steps to reproduce:
>>>>
>>>> 1. Compile kernel with nf_conntrack built-in and run selftests;
>>>>    './test_progs -a bpf_nf' works
>>>>
>>>> 2. Change the kernel config so nf_conntrack is build as a module
>>>>
>>>> 3. Start the test kernel and manually modprobe nf_conntrack and nf_nat
>>>>
>>>> 4. Run ./test_progs -a bpf_nf; this now fails with an error like:
>>>>
>>>> kernel function bpf_ct_set_nat_info args#0 expected pointer to STRUCT nf_conn___init but R1 has a pointer to STRUCT nf_conn___init
>>>
>>> This week Kumar and I took a look at this issue and we ended up
>>> identifying a duplication of nf_conn___init structure. In particular:
>>>
>>> [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
>>> net/netfilter/nf_conntrack.ko format raw | grep nf_conn__
>>> [110941] STRUCT 'nf_conn___init' size=248 vlen=1
>>> [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
>>> net/netfilter/nf_nat.ko format raw | grep nf_conn__
>>> [107488] STRUCT 'nf_conn___init' size=248 vlen=1
>>>
>>> Is it the root cause of the problem?
>> 
>> It certainly seems to be related to it, at least. Amending the log
>> message to include the BTF object IDs of the two versions shows that the
>> register has a reference to nf_conn__init in nf_conntrack.ko, while the kernel
>> expects it to point to nf_nat.ko.
>> 
>> Not sure what's the right fix for this? Should libbpf be smart enough to
>> pull the kfunc arg ID from the same BTF ID as the function itself? Or
>> should the kernel compare structs and allow things if they're identical?
>> Andrii, WDYT?
>> 
>
> There were some dedup issues fixed recently in pahole
> and libbpf; since dwarves libbpf hasn't been synced with
> libbpf recently as far as I can see it won't have the fix 
> for [1]; I suspect it may help with dedup-ing here. Would 
> probably be worth trying rebuilding dwarves with a libbpf 
> with [1] applied and seeing if the dedup issue goes away
> before we go any further. If it fixes the issue, would it
> be worth updating the libbpf that dwarves uses Arnaldo?
> I saw some pretty large improvements in removing
> redundant definitions.

I don't think it's a deduplication issue; the type is simply not defined
in vmlinux, so each module has its own "version" of it. And since each
module's BTF is semantically independent of other modules the
duplication is expected in this case.

-Toke


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Calling kfuncs in modules - BTF mismatch?
  2022-11-29 15:00     ` Artem Savkov
@ 2022-11-29 20:12       ` Toke Høiland-Jørgensen
  2022-12-01  1:09         ` Andrii Nakryiko
  0 siblings, 1 reply; 10+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-11-29 20:12 UTC (permalink / raw)
  To: Artem Savkov
  Cc: Lorenzo Bianconi, Kumar Kartikeya Dwivedi, Andrii Nakryiko, bpf,
	Jiri Benc, Lorenzo Bianconi, Alexei Starovoitov

Artem Savkov <asavkov@redhat.com> writes:

> On Sun, Nov 13, 2022 at 07:04:43PM +0100, Toke Høiland-Jørgensen wrote:
>> Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes:
>> 
>> >>
>> >> Hi everyone
>> >>
>> >> There seems to be some issue with BTF mismatch when trying to run the
>> >> bpf_ct_set_nat_info() kfunc from a module. I was under the impression
>> >> that this is supposed to work, so is there some kind of BTF dedup issue
>> >> here or something?
>> >>
>> >> Steps to reproduce:
>> >>
>> >> 1. Compile kernel with nf_conntrack built-in and run selftests;
>> >>    './test_progs -a bpf_nf' works
>> >>
>> >> 2. Change the kernel config so nf_conntrack is build as a module
>> >>
>> >> 3. Start the test kernel and manually modprobe nf_conntrack and nf_nat
>> >>
>> >> 4. Run ./test_progs -a bpf_nf; this now fails with an error like:
>> >>
>> >> kernel function bpf_ct_set_nat_info args#0 expected pointer to STRUCT nf_conn___init but R1 has a pointer to STRUCT nf_conn___init
>> >
>> > This week Kumar and I took a look at this issue and we ended up
>> > identifying a duplication of nf_conn___init structure. In particular:
>> >
>> > [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
>> > net/netfilter/nf_conntrack.ko format raw | grep nf_conn__
>> > [110941] STRUCT 'nf_conn___init' size=248 vlen=1
>> > [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
>> > net/netfilter/nf_nat.ko format raw | grep nf_conn__
>> > [107488] STRUCT 'nf_conn___init' size=248 vlen=1
>> >
>> > Is it the root cause of the problem?
>> 
>> It certainly seems to be related to it, at least. Amending the log
>> message to include the BTF object IDs of the two versions shows that the
>> register has a reference to nf_conn__init in nf_conntrack.ko, while the kernel
>> expects it to point to nf_nat.ko.
>> 
>> Not sure what's the right fix for this? Should libbpf be smart enough to
>> pull the kfunc arg ID from the same BTF ID as the function itself? Or
>
> Verifier chose the ID based on where the variable was allocated, which
> is bpf_(skb|xdp)_ct_alloc() from nf_conntrack.ko. Assuming btf id based
> on usage location might be error prone.

Yeah, I think we need something more robust.

>> should the kernel compare structs and allow things if they're identical?
>> Andrii, WDYT?
>
> This works but might make verifier slower.
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 195d24316750..562d2c15906d 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -24,6 +24,7 @@
>  #include <linux/bpf_lsm.h>
>  #include <linux/btf_ids.h>
>  #include <linux/poison.h>
> +#include "../tools/lib/bpf/relo_core.h"
>
>  #include "disasm.h"
>
> @@ -8236,7 +8237,8 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
>
>         reg_ref_t = btf_type_skip_modifiers(reg_btf, reg_ref_id, &reg_ref_id);
>         reg_ref_tname = btf_name_by_offset(reg_btf, reg_ref_t->name_off);
> -       if (!btf_struct_ids_match(&env->log, reg_btf, reg_ref_id, reg->off, meta->btf, ref_id, strict_type_match)) {
> +       if (!btf_struct_ids_match(&env->log, reg_btf, reg_ref_id, reg->off, meta->btf, ref_id, strict_type_match) && \
> +                       !__bpf_core_types_match(reg_btf, reg_ref_id, meta->btf, ref_id, false, 10)) {
>                 verbose(env, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n",
>                         meta->func_name, argno, btf_type_str(ref_t), ref_tname, argno + 1,
>                         btf_type_str(reg_ref_t), reg_ref_tname);

Ah, cool! This is a lot of code to important into the kernel, though; do
we really want to do that?

I guess an alternative could be to make sure that every data type used
by a kfunc is always referenced in a built-in module somewhere, so
modules will use the definition from vmlinux.

Andrii, Alexei, any opinions on which way to go with this?

-Toke


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Calling kfuncs in modules - BTF mismatch?
  2022-11-29 20:12       ` Toke Høiland-Jørgensen
@ 2022-12-01  1:09         ` Andrii Nakryiko
  2022-12-01 13:51           ` Artem Savkov
  0 siblings, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2022-12-01  1:09 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Artem Savkov, Lorenzo Bianconi, Kumar Kartikeya Dwivedi,
	Andrii Nakryiko, bpf, Jiri Benc, Lorenzo Bianconi,
	Alexei Starovoitov

On Tue, Nov 29, 2022 at 12:12 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Artem Savkov <asavkov@redhat.com> writes:
>
> > On Sun, Nov 13, 2022 at 07:04:43PM +0100, Toke Høiland-Jørgensen wrote:
> >> Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes:
> >>
> >> >>
> >> >> Hi everyone
> >> >>
> >> >> There seems to be some issue with BTF mismatch when trying to run the
> >> >> bpf_ct_set_nat_info() kfunc from a module. I was under the impression
> >> >> that this is supposed to work, so is there some kind of BTF dedup issue
> >> >> here or something?
> >> >>
> >> >> Steps to reproduce:
> >> >>
> >> >> 1. Compile kernel with nf_conntrack built-in and run selftests;
> >> >>    './test_progs -a bpf_nf' works
> >> >>
> >> >> 2. Change the kernel config so nf_conntrack is build as a module
> >> >>
> >> >> 3. Start the test kernel and manually modprobe nf_conntrack and nf_nat
> >> >>
> >> >> 4. Run ./test_progs -a bpf_nf; this now fails with an error like:
> >> >>
> >> >> kernel function bpf_ct_set_nat_info args#0 expected pointer to STRUCT nf_conn___init but R1 has a pointer to STRUCT nf_conn___init
> >> >
> >> > This week Kumar and I took a look at this issue and we ended up
> >> > identifying a duplication of nf_conn___init structure. In particular:
> >> >
> >> > [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
> >> > net/netfilter/nf_conntrack.ko format raw | grep nf_conn__
> >> > [110941] STRUCT 'nf_conn___init' size=248 vlen=1
> >> > [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
> >> > net/netfilter/nf_nat.ko format raw | grep nf_conn__
> >> > [107488] STRUCT 'nf_conn___init' size=248 vlen=1
> >> >
> >> > Is it the root cause of the problem?
> >>
> >> It certainly seems to be related to it, at least. Amending the log
> >> message to include the BTF object IDs of the two versions shows that the
> >> register has a reference to nf_conn__init in nf_conntrack.ko, while the kernel
> >> expects it to point to nf_nat.ko.
> >>
> >> Not sure what's the right fix for this? Should libbpf be smart enough to
> >> pull the kfunc arg ID from the same BTF ID as the function itself? Or

Libbpf is doing just that. Or rather this just happens automatically.
Libbpf finds the FUNC type corresponding to a kfunc, and then all the
types of all the arguments are consistent with that FUNC definition.

I think the problem is that test is getting `struct nf_conn` from
bpf_xdp_ct_alloc() kfunc, which is defined in nf_conntrack module (and
so specifies that it returns `struct nf_conn` coming from
nf_conntrack's module BTF), while bpf_ct_set_nat_info() kfunc is
defined in nf_nat module and specifies that it expects `struct
nf_conn` defined in nf_nat's module BTF.

And those two types are two completely different types, with different
BTF object ID and BTF type ID, as far as all the BTF stuff is
concerned.

I don't know what the solution here is, but it's not on the libbpf
side at all for sure. As Toke said, bringing BTF dedup into the kernel
seems like an overkill. So some hacky "let's compare struct name and
size" approach perhaps?

> >
> > Verifier chose the ID based on where the variable was allocated, which
> > is bpf_(skb|xdp)_ct_alloc() from nf_conntrack.ko. Assuming btf id based
> > on usage location might be error prone.
>
> Yeah, I think we need something more robust.
>
> >> should the kernel compare structs and allow things if they're identical?
> >> Andrii, WDYT?
> >
> > This works but might make verifier slower.
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 195d24316750..562d2c15906d 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -24,6 +24,7 @@
> >  #include <linux/bpf_lsm.h>
> >  #include <linux/btf_ids.h>
> >  #include <linux/poison.h>
> > +#include "../tools/lib/bpf/relo_core.h"
> >
> >  #include "disasm.h"
> >
> > @@ -8236,7 +8237,8 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
> >
> >         reg_ref_t = btf_type_skip_modifiers(reg_btf, reg_ref_id, &reg_ref_id);
> >         reg_ref_tname = btf_name_by_offset(reg_btf, reg_ref_t->name_off);
> > -       if (!btf_struct_ids_match(&env->log, reg_btf, reg_ref_id, reg->off, meta->btf, ref_id, strict_type_match)) {
> > +       if (!btf_struct_ids_match(&env->log, reg_btf, reg_ref_id, reg->off, meta->btf, ref_id, strict_type_match) && \
> > +                       !__bpf_core_types_match(reg_btf, reg_ref_id, meta->btf, ref_id, false, 10)) {
> >                 verbose(env, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n",
> >                         meta->func_name, argno, btf_type_str(ref_t), ref_tname, argno + 1,
> >                         btf_type_str(reg_ref_t), reg_ref_tname);
>
> Ah, cool! This is a lot of code to important into the kernel, though; do
> we really want to do that?
>
> I guess an alternative could be to make sure that every data type used
> by a kfunc is always referenced in a built-in module somewhere, so
> modules will use the definition from vmlinux.
>
> Andrii, Alexei, any opinions on which way to go with this?
>
> -Toke
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Calling kfuncs in modules - BTF mismatch?
  2022-12-01  1:09         ` Andrii Nakryiko
@ 2022-12-01 13:51           ` Artem Savkov
  0 siblings, 0 replies; 10+ messages in thread
From: Artem Savkov @ 2022-12-01 13:51 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Toke Høiland-Jørgensen, Lorenzo Bianconi,
	Kumar Kartikeya Dwivedi, Andrii Nakryiko, bpf, Jiri Benc,
	Lorenzo Bianconi, Alexei Starovoitov

On Wed, Nov 30, 2022 at 05:09:03PM -0800, Andrii Nakryiko wrote:
> On Tue, Nov 29, 2022 at 12:12 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >> > This week Kumar and I took a look at this issue and we ended up
> > >> > identifying a duplication of nf_conn___init structure. In particular:
> > >> >
> > >> > [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
> > >> > net/netfilter/nf_conntrack.ko format raw | grep nf_conn__
> > >> > [110941] STRUCT 'nf_conn___init' size=248 vlen=1
> > >> > [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file
> > >> > net/netfilter/nf_nat.ko format raw | grep nf_conn__
> > >> > [107488] STRUCT 'nf_conn___init' size=248 vlen=1
> > >> >
> > >> > Is it the root cause of the problem?
> > >>
> > >> It certainly seems to be related to it, at least. Amending the log
> > >> message to include the BTF object IDs of the two versions shows that the
> > >> register has a reference to nf_conn__init in nf_conntrack.ko, while the kernel
> > >> expects it to point to nf_nat.ko.
> > >>
> > >> Not sure what's the right fix for this? Should libbpf be smart enough to
> > >> pull the kfunc arg ID from the same BTF ID as the function itself? Or
> 
> Libbpf is doing just that. Or rather this just happens automatically.
> Libbpf finds the FUNC type corresponding to a kfunc, and then all the
> types of all the arguments are consistent with that FUNC definition.
> 
> I think the problem is that test is getting `struct nf_conn` from
> bpf_xdp_ct_alloc() kfunc, which is defined in nf_conntrack module (and
> so specifies that it returns `struct nf_conn` coming from
> nf_conntrack's module BTF), while bpf_ct_set_nat_info() kfunc is
> defined in nf_nat module and specifies that it expects `struct
> nf_conn` defined in nf_nat's module BTF.
> 
> And those two types are two completely different types, with different
> BTF object ID and BTF type ID, as far as all the BTF stuff is
> concerned.
> 
> I don't know what the solution here is, but it's not on the libbpf
> side at all for sure. As Toke said, bringing BTF dedup into the kernel
> seems like an overkill. So some hacky "let's compare struct name and
> size" approach perhaps?

Wouldn't that be a bit too relaxed for a general case? I wonder how
often can this issue come up. If this is relatively rare maybe known
kfuncs that need this can be flagged with a new flag
(KF_RELAXED_ARG_CHECK or similar) to allow this shortcut?

-- 
Regards,
  Artem


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-12-01 13:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-11 14:21 Calling kfuncs in modules - BTF mismatch? Toke Høiland-Jørgensen
2022-11-11 18:08 ` Lorenzo Bianconi
2022-11-13 18:04   ` Toke Høiland-Jørgensen
2022-11-29 15:00     ` Artem Savkov
2022-11-29 20:12       ` Toke Høiland-Jørgensen
2022-12-01  1:09         ` Andrii Nakryiko
2022-12-01 13:51           ` Artem Savkov
2022-11-29 16:21     ` Alan Maguire
2022-11-29 19:41       ` Arnaldo Carvalho de Melo
2022-11-29 20:12       ` Toke Høiland-Jørgensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.