bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BTF compatibility issue across builds
@ 2022-01-27 15:10 Shung-Hsi Yu
  2022-01-31 17:36 ` Yonghong Song
  0 siblings, 1 reply; 22+ messages in thread
From: Shung-Hsi Yu @ 2022-01-27 15:10 UTC (permalink / raw)
  To: bpf, netdev, Andrii Nakryiko; +Cc: Daniel Borkmann, Alexei Starovoitov

Hi,

We recently run into module load failure related to split BTF on openSUSE
Tumbleweed[1], which I believe is something that may also happen on other
rolling distros.

The error looks like the follow (though failure is not limited to ipheth)

    BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:

    failed to validate module [ipheth] BTF: -22

The error comes down to trying to load BTF of *kernel modules from a
different build* than the runtime kernel (but the source is the same), where
the base BTF of the two build is different.

While it may be too far stretched to call this a bug, solving this might
make BTF adoption easier. I'd natively think that we could further split
base BTF into two part to avoid this issue, where .BTF only contain exported
types, and the other (still residing in vmlinux) holds the unexported types.

Does that sound like something reasonable to work on?


## Root case (in case anyone is interested in a verbose version)

On openSUSE Tumbleweed there can be several builds of the same source. Since
the source is the same, the binaries are simply replaced when a package with
a larger build number is installed during upgrade.

In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
are previously missing in base BTF of 5.15.12-1.1.

The addition of entries in BTF type and string table caused extra offset of
type IDs and string position in the base BTF, and as such the same type ID
may refers to a totally different type, and as does name_off of types.

When users on build#1 (ie 5.15.12-1.1) installs build#3 (ie 5.15.12-1.3),
and then tries to load kernel module, they will be loading build#3 module on
build#1 kernel; and with base BTF of the two builds different, name_off of
some types will end up pointing at invalid string, and the kernel bails out.


Best,
Shung-Hsi Yu

1: https://bugzilla.opensuse.org/show_bug.cgi?id=1194501
2: my guess is rebuild is trigger due to compiler toolchain update, but I
   wasn't able to pin down exactly what changed


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-01-27 15:10 BTF compatibility issue across builds Shung-Hsi Yu
@ 2022-01-31 17:36 ` Yonghong Song
  2022-02-10 10:01   ` Michal Suchánek
  0 siblings, 1 reply; 22+ messages in thread
From: Yonghong Song @ 2022-01-31 17:36 UTC (permalink / raw)
  To: Shung-Hsi Yu, bpf, netdev, Andrii Nakryiko
  Cc: Daniel Borkmann, Alexei Starovoitov



On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
> Hi,
> 
> We recently run into module load failure related to split BTF on openSUSE
> Tumbleweed[1], which I believe is something that may also happen on other
> rolling distros.
> 
> The error looks like the follow (though failure is not limited to ipheth)
> 
>      BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
> 
>      failed to validate module [ipheth] BTF: -22
> 
> The error comes down to trying to load BTF of *kernel modules from a
> different build* than the runtime kernel (but the source is the same), where
> the base BTF of the two build is different.
> 
> While it may be too far stretched to call this a bug, solving this might
> make BTF adoption easier. I'd natively think that we could further split
> base BTF into two part to avoid this issue, where .BTF only contain exported
> types, and the other (still residing in vmlinux) holds the unexported types.

What is the exported types? The types used by export symbols?
This for sure will increase btf handling complexity.

> 
> Does that sound like something reasonable to work on?
> 
> 
> ## Root case (in case anyone is interested in a verbose version)
> 
> On openSUSE Tumbleweed there can be several builds of the same source. Since
> the source is the same, the binaries are simply replaced when a package with
> a larger build number is installed during upgrade.
> 
> In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
> More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
> struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
> struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
> are previously missing in base BTF of 5.15.12-1.1.

As stated in [2] below, I think we should understand why rebuild is 
triggered. If the rebuild for vmlinux is triggered, why the modules 
cannot be rebuild at the same time?

> 
> The addition of entries in BTF type and string table caused extra offset of
> type IDs and string position in the base BTF, and as such the same type ID
> may refers to a totally different type, and as does name_off of types.
> 
> When users on build#1 (ie 5.15.12-1.1) installs build#3 (ie 5.15.12-1.3),
> and then tries to load kernel module, they will be loading build#3 module on
> build#1 kernel; and with base BTF of the two builds different, name_off of
> some types will end up pointing at invalid string, and the kernel bails out.
> 
> 
> Best,
> Shung-Hsi Yu
> 
> 1: https://bugzilla.opensuse.org/show_bug.cgi?id=1194501
> 2: my guess is rebuild is trigger due to compiler toolchain update, but I
>     wasn't able to pin down exactly what changed
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-01-31 17:36 ` Yonghong Song
@ 2022-02-10 10:01   ` Michal Suchánek
  2022-02-10 18:17     ` Yonghong Song
  2022-02-11  6:01     ` Andrii Nakryiko
  0 siblings, 2 replies; 22+ messages in thread
From: Michal Suchánek @ 2022-02-10 10:01 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Shung-Hsi Yu, bpf, netdev, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov

Hello,

On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
> 
> 
> On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
> > Hi,
> > 
> > We recently run into module load failure related to split BTF on openSUSE
> > Tumbleweed[1], which I believe is something that may also happen on other
> > rolling distros.
> > 
> > The error looks like the follow (though failure is not limited to ipheth)
> > 
> >      BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
> > 
> >      failed to validate module [ipheth] BTF: -22
> > 
> > The error comes down to trying to load BTF of *kernel modules from a
> > different build* than the runtime kernel (but the source is the same), where
> > the base BTF of the two build is different.
> > 
> > While it may be too far stretched to call this a bug, solving this might
> > make BTF adoption easier. I'd natively think that we could further split
> > base BTF into two part to avoid this issue, where .BTF only contain exported
> > types, and the other (still residing in vmlinux) holds the unexported types.
> 
> What is the exported types? The types used by export symbols?
> This for sure will increase btf handling complexity.

And it will not actually help.

We have modversion ABI which checks the checksum of the symbols that the
module imports and fails the load if the checksum for these symbols does
not match. It's not concerned with symbols not exported, it's not
concerned with symbols not used by the module. This is something that is
sustainable across kernel rebuilds with minor fixes/features and what
distributions watch for.

Now with BTF the situation is vastly different. There are at least three
bugs:

 - The BTF check is global for all symbols, not for the symbols the
   module uses. This is not sustainable. Given the BTF is supposed to
   allow linking BPF programs that were built in completely different
   environment with the kernel it is completely within the scope of BTF
   to solve this problem, it's just neglected.
 - It is possible to load modules with no BTF but not modules with
   non-matching BTF. Surely the non-matching BTF could be discarded.
 - BTF is part of vermagic. This is completely pointless since modules
   without BTF can be loaded on BTF kernel. Surely it would not be too
   difficult to do the reverse as well. Given BTF must pass extra check
   to be used having it in vermagic is just useless moise.

> > Does that sound like something reasonable to work on?
> > 
> > 
> > ## Root case (in case anyone is interested in a verbose version)
> > 
> > On openSUSE Tumbleweed there can be several builds of the same source. Since
> > the source is the same, the binaries are simply replaced when a package with
> > a larger build number is installed during upgrade.
> > 
> > In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
> > More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
> > struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
> > struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
> > are previously missing in base BTF of 5.15.12-1.1.
> 
> As stated in [2] below, I think we should understand why rebuild is
> triggered. If the rebuild for vmlinux is triggered, why the modules cannot
> be rebuild at the same time?

They do get rebuilt. However, if you are running the kernel and install
the update you get the new modules with the old kernel. If the install
script fails to copy the kernel to your EFI partition based on the fact
a kernel with the same filename is alreasy there you get the same.

If you have 'stable' distribution adding new symbols is normal and it
does not break module loading without BTF but it breaks BTF.

Thanks

Michal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-10 10:01   ` Michal Suchánek
@ 2022-02-10 18:17     ` Yonghong Song
  2022-02-10 22:34       ` Alexei Starovoitov
  2022-02-11  6:01     ` Andrii Nakryiko
  1 sibling, 1 reply; 22+ messages in thread
From: Yonghong Song @ 2022-02-10 18:17 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Shung-Hsi Yu, bpf, netdev, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov



On 2/10/22 2:01 AM, Michal Suchánek wrote:
> Hello,
> 
> On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
>>
>>
>> On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
>>> Hi,
>>>
>>> We recently run into module load failure related to split BTF on openSUSE
>>> Tumbleweed[1], which I believe is something that may also happen on other
>>> rolling distros.
>>>
>>> The error looks like the follow (though failure is not limited to ipheth)
>>>
>>>       BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
>>>
>>>       failed to validate module [ipheth] BTF: -22
>>>
>>> The error comes down to trying to load BTF of *kernel modules from a
>>> different build* than the runtime kernel (but the source is the same), where
>>> the base BTF of the two build is different.
>>>
>>> While it may be too far stretched to call this a bug, solving this might
>>> make BTF adoption easier. I'd natively think that we could further split
>>> base BTF into two part to avoid this issue, where .BTF only contain exported
>>> types, and the other (still residing in vmlinux) holds the unexported types.
>>
>> What is the exported types? The types used by export symbols?
>> This for sure will increase btf handling complexity.
> 
> And it will not actually help.
> 
> We have modversion ABI which checks the checksum of the symbols that the
> module imports and fails the load if the checksum for these symbols does
> not match. It's not concerned with symbols not exported, it's not
> concerned with symbols not used by the module. This is something that is
> sustainable across kernel rebuilds with minor fixes/features and what
> distributions watch for.
> 
> Now with BTF the situation is vastly different. There are at least three
> bugs:
> 
>   - The BTF check is global for all symbols, not for the symbols the
>     module uses. This is not sustainable. Given the BTF is supposed to
>     allow linking BPF programs that were built in completely different
>     environment with the kernel it is completely within the scope of BTF
>     to solve this problem, it's just neglected.
>   - It is possible to load modules with no BTF but not modules with
>     non-matching BTF. Surely the non-matching BTF could be discarded.
>   - BTF is part of vermagic. This is completely pointless since modules
>     without BTF can be loaded on BTF kernel. Surely it would not be too
>     difficult to do the reverse as well. Given BTF must pass extra check
>     to be used having it in vermagic is just useless moise.
> 
>>> Does that sound like something reasonable to work on?
>>>
>>>
>>> ## Root case (in case anyone is interested in a verbose version)
>>>
>>> On openSUSE Tumbleweed there can be several builds of the same source. Since
>>> the source is the same, the binaries are simply replaced when a package with
>>> a larger build number is installed during upgrade.
>>>
>>> In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
>>> More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
>>> struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
>>> struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
>>> are previously missing in base BTF of 5.15.12-1.1.
>>
>> As stated in [2] below, I think we should understand why rebuild is
>> triggered. If the rebuild for vmlinux is triggered, why the modules cannot
>> be rebuild at the same time?
> 
> They do get rebuilt. However, if you are running the kernel and install
> the update you get the new modules with the old kernel. If the install
> script fails to copy the kernel to your EFI partition based on the fact
> a kernel with the same filename is alreasy there you get the same.
> 
> If you have 'stable' distribution adding new symbols is normal and it
> does not break module loading without BTF but it breaks BTF.

Okay, I see. One possible solution is that if kernel module btf
does not match vmlinux btf, the kernel module btf will be ignored
with a dmesg warning but kernel module load will proceed as normal.
I think this might be also useful for bpf lskel kernel modules as
well which tries to be portable (with CO-RE) for different kernels.

Alexei, what do you think?

> 
> Thanks
> 
> Michal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-10 18:17     ` Yonghong Song
@ 2022-02-10 22:34       ` Alexei Starovoitov
  2022-02-10 22:59         ` Yonghong Song
  0 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2022-02-10 22:34 UTC (permalink / raw)
  To: Yonghong Song, Connor O'Brien
  Cc: Michal Suchánek, Shung-Hsi Yu, bpf, Network Development,
	Andrii Nakryiko, Daniel Borkmann, Alexei Starovoitov

On Thu, Feb 10, 2022 at 10:17 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 2/10/22 2:01 AM, Michal Suchánek wrote:
> > Hello,
> >
> > On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
> >>
> >>
> >> On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
> >>> Hi,
> >>>
> >>> We recently run into module load failure related to split BTF on openSUSE
> >>> Tumbleweed[1], which I believe is something that may also happen on other
> >>> rolling distros.
> >>>
> >>> The error looks like the follow (though failure is not limited to ipheth)
> >>>
> >>>       BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
> >>>
> >>>       failed to validate module [ipheth] BTF: -22
> >>>
> >>> The error comes down to trying to load BTF of *kernel modules from a
> >>> different build* than the runtime kernel (but the source is the same), where
> >>> the base BTF of the two build is different.
> >>>
> >>> While it may be too far stretched to call this a bug, solving this might
> >>> make BTF adoption easier. I'd natively think that we could further split
> >>> base BTF into two part to avoid this issue, where .BTF only contain exported
> >>> types, and the other (still residing in vmlinux) holds the unexported types.
> >>
> >> What is the exported types? The types used by export symbols?
> >> This for sure will increase btf handling complexity.
> >
> > And it will not actually help.
> >
> > We have modversion ABI which checks the checksum of the symbols that the
> > module imports and fails the load if the checksum for these symbols does
> > not match. It's not concerned with symbols not exported, it's not
> > concerned with symbols not used by the module. This is something that is
> > sustainable across kernel rebuilds with minor fixes/features and what
> > distributions watch for.
> >
> > Now with BTF the situation is vastly different. There are at least three
> > bugs:
> >
> >   - The BTF check is global for all symbols, not for the symbols the
> >     module uses. This is not sustainable. Given the BTF is supposed to
> >     allow linking BPF programs that were built in completely different
> >     environment with the kernel it is completely within the scope of BTF
> >     to solve this problem, it's just neglected.
> >   - It is possible to load modules with no BTF but not modules with
> >     non-matching BTF. Surely the non-matching BTF could be discarded.
> >   - BTF is part of vermagic. This is completely pointless since modules
> >     without BTF can be loaded on BTF kernel. Surely it would not be too
> >     difficult to do the reverse as well. Given BTF must pass extra check
> >     to be used having it in vermagic is just useless moise.
> >
> >>> Does that sound like something reasonable to work on?
> >>>
> >>>
> >>> ## Root case (in case anyone is interested in a verbose version)
> >>>
> >>> On openSUSE Tumbleweed there can be several builds of the same source. Since
> >>> the source is the same, the binaries are simply replaced when a package with
> >>> a larger build number is installed during upgrade.
> >>>
> >>> In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
> >>> More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
> >>> struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
> >>> struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
> >>> are previously missing in base BTF of 5.15.12-1.1.
> >>
> >> As stated in [2] below, I think we should understand why rebuild is
> >> triggered. If the rebuild for vmlinux is triggered, why the modules cannot
> >> be rebuild at the same time?
> >
> > They do get rebuilt. However, if you are running the kernel and install
> > the update you get the new modules with the old kernel. If the install
> > script fails to copy the kernel to your EFI partition based on the fact
> > a kernel with the same filename is alreasy there you get the same.
> >
> > If you have 'stable' distribution adding new symbols is normal and it
> > does not break module loading without BTF but it breaks BTF.
>
> Okay, I see. One possible solution is that if kernel module btf
> does not match vmlinux btf, the kernel module btf will be ignored
> with a dmesg warning but kernel module load will proceed as normal.
> I think this might be also useful for bpf lskel kernel modules as
> well which tries to be portable (with CO-RE) for different kernels.

That sounds like #2 that Michal is proposing:
"It is possible to load modules with no BTF but not modules with
 non-matching BTF. Surely the non-matching BTF could be discarded."

That's probably the simplest way forward.

The patch
https://patchwork.kernel.org/project/netdevbpf/patch/20220209052141.140063-1-connoro@google.com/
shouldn't be necessary too.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-10 22:34       ` Alexei Starovoitov
@ 2022-02-10 22:59         ` Yonghong Song
  2022-02-12  5:40           ` Shung-Hsi Yu
  0 siblings, 1 reply; 22+ messages in thread
From: Yonghong Song @ 2022-02-10 22:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Connor O'Brien
  Cc: Michal Suchánek, Shung-Hsi Yu, bpf, Network Development,
	Andrii Nakryiko, Daniel Borkmann, Alexei Starovoitov



On 2/10/22 2:34 PM, Alexei Starovoitov wrote:
> On Thu, Feb 10, 2022 at 10:17 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 2/10/22 2:01 AM, Michal Suchánek wrote:
>>> Hello,
>>>
>>> On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
>>>>
>>>>
>>>> On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
>>>>> Hi,
>>>>>
>>>>> We recently run into module load failure related to split BTF on openSUSE
>>>>> Tumbleweed[1], which I believe is something that may also happen on other
>>>>> rolling distros.
>>>>>
>>>>> The error looks like the follow (though failure is not limited to ipheth)
>>>>>
>>>>>        BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
>>>>>
>>>>>        failed to validate module [ipheth] BTF: -22
>>>>>
>>>>> The error comes down to trying to load BTF of *kernel modules from a
>>>>> different build* than the runtime kernel (but the source is the same), where
>>>>> the base BTF of the two build is different.
>>>>>
>>>>> While it may be too far stretched to call this a bug, solving this might
>>>>> make BTF adoption easier. I'd natively think that we could further split
>>>>> base BTF into two part to avoid this issue, where .BTF only contain exported
>>>>> types, and the other (still residing in vmlinux) holds the unexported types.
>>>>
>>>> What is the exported types? The types used by export symbols?
>>>> This for sure will increase btf handling complexity.
>>>
>>> And it will not actually help.
>>>
>>> We have modversion ABI which checks the checksum of the symbols that the
>>> module imports and fails the load if the checksum for these symbols does
>>> not match. It's not concerned with symbols not exported, it's not
>>> concerned with symbols not used by the module. This is something that is
>>> sustainable across kernel rebuilds with minor fixes/features and what
>>> distributions watch for.
>>>
>>> Now with BTF the situation is vastly different. There are at least three
>>> bugs:
>>>
>>>    - The BTF check is global for all symbols, not for the symbols the
>>>      module uses. This is not sustainable. Given the BTF is supposed to
>>>      allow linking BPF programs that were built in completely different
>>>      environment with the kernel it is completely within the scope of BTF
>>>      to solve this problem, it's just neglected.
>>>    - It is possible to load modules with no BTF but not modules with
>>>      non-matching BTF. Surely the non-matching BTF could be discarded.
>>>    - BTF is part of vermagic. This is completely pointless since modules
>>>      without BTF can be loaded on BTF kernel. Surely it would not be too
>>>      difficult to do the reverse as well. Given BTF must pass extra check
>>>      to be used having it in vermagic is just useless moise.
>>>
>>>>> Does that sound like something reasonable to work on?
>>>>>
>>>>>
>>>>> ## Root case (in case anyone is interested in a verbose version)
>>>>>
>>>>> On openSUSE Tumbleweed there can be several builds of the same source. Since
>>>>> the source is the same, the binaries are simply replaced when a package with
>>>>> a larger build number is installed during upgrade.
>>>>>
>>>>> In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
>>>>> More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
>>>>> struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
>>>>> struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
>>>>> are previously missing in base BTF of 5.15.12-1.1.
>>>>
>>>> As stated in [2] below, I think we should understand why rebuild is
>>>> triggered. If the rebuild for vmlinux is triggered, why the modules cannot
>>>> be rebuild at the same time?
>>>
>>> They do get rebuilt. However, if you are running the kernel and install
>>> the update you get the new modules with the old kernel. If the install
>>> script fails to copy the kernel to your EFI partition based on the fact
>>> a kernel with the same filename is alreasy there you get the same.
>>>
>>> If you have 'stable' distribution adding new symbols is normal and it
>>> does not break module loading without BTF but it breaks BTF.
>>
>> Okay, I see. One possible solution is that if kernel module btf
>> does not match vmlinux btf, the kernel module btf will be ignored
>> with a dmesg warning but kernel module load will proceed as normal.
>> I think this might be also useful for bpf lskel kernel modules as
>> well which tries to be portable (with CO-RE) for different kernels.
> 
> That sounds like #2 that Michal is proposing:
> "It is possible to load modules with no BTF but not modules with
>   non-matching BTF. Surely the non-matching BTF could be discarded."
> 
> That's probably the simplest way forward.
> 
> The patch
> https://patchwork.kernel.org/project/netdevbpf/patch/20220209052141.140063-1-connoro@google.com/
> shouldn't be necessary too.

Right the patch tried to address this issue and if we allow
non-matching BTF is ignored and then treaking DEBUG_INFO_BTF_MODULES
is not necessary.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-10 10:01   ` Michal Suchánek
  2022-02-10 18:17     ` Yonghong Song
@ 2022-02-11  6:01     ` Andrii Nakryiko
  2022-02-11 17:20       ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2022-02-11  6:01 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Yonghong Song, Shung-Hsi Yu, bpf, Networking, Andrii Nakryiko,
	Daniel Borkmann, Alexei Starovoitov

On Thu, Feb 10, 2022 at 2:01 AM Michal Suchánek <msuchanek@suse.de> wrote:
>
> Hello,
>
> On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
> >
> >
> > On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
> > > Hi,
> > >
> > > We recently run into module load failure related to split BTF on openSUSE
> > > Tumbleweed[1], which I believe is something that may also happen on other
> > > rolling distros.
> > >
> > > The error looks like the follow (though failure is not limited to ipheth)
> > >
> > >      BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
> > >
> > >      failed to validate module [ipheth] BTF: -22
> > >
> > > The error comes down to trying to load BTF of *kernel modules from a
> > > different build* than the runtime kernel (but the source is the same), where
> > > the base BTF of the two build is different.
> > >
> > > While it may be too far stretched to call this a bug, solving this might
> > > make BTF adoption easier. I'd natively think that we could further split
> > > base BTF into two part to avoid this issue, where .BTF only contain exported
> > > types, and the other (still residing in vmlinux) holds the unexported types.
> >
> > What is the exported types? The types used by export symbols?
> > This for sure will increase btf handling complexity.
>
> And it will not actually help.
>
> We have modversion ABI which checks the checksum of the symbols that the
> module imports and fails the load if the checksum for these symbols does
> not match. It's not concerned with symbols not exported, it's not
> concerned with symbols not used by the module. This is something that is
> sustainable across kernel rebuilds with minor fixes/features and what
> distributions watch for.
>
> Now with BTF the situation is vastly different. There are at least three
> bugs:
>
>  - The BTF check is global for all symbols, not for the symbols the
>    module uses. This is not sustainable. Given the BTF is supposed to
>    allow linking BPF programs that were built in completely different
>    environment with the kernel it is completely within the scope of BTF
>    to solve this problem, it's just neglected.

You refer to BTF use in CO-RE with the latter. It's just one
application of BTF and it doesn't follow that you can do the same with
module BTF. It's not a neglect, it's a very big technical difficulty.

Each module's BTFs are designed as logical extensions of vmlinux BTF.
And each module BTF is independent and isolated from other modules
extension of the same vmlinux BTF. The way that BTF format is
designed, any tiny difference in vmlinux BTF effectively invalidates
all modules' BTFs and they have to be rebuilt.

Imagine that only one BTF type is added to vmlinux BTF. Last BTF type
ID in vmlinux BTF is shifted from, say, 1000 to 1001. While previously
every module's BTF type ID started with 1001, now they all have to
start with 1002 and be shifted by 1.

Now let's say that the order of two BTF types in vmlinux BTF is
changed, say type 10 becomes type 20 and type 20 becomes type 10 (just
because of slight difference in DWARF, for instance). Any type
reference to 10 or 20 in any module BTF has to be renumbered now.

Another one, let's say we add a new string to vmlinux BTF string
section somewhere at the beginning, say "abc" at offset 100. Any
string offset after 100 now has to be shifted *both* in vmlinux BTF
and all module BTFs. And also any string reference in module BTFs have
to be adjusted as well because now each module's BTF's logical string
offset is starting at 4 logical bytes higher (due to "abc\0" being
added and shifting everything right).

As you can see, any tiny change in vmlinux BTF, no matter where,
beginning, middle, or end, causes massive changes in type IDs and
offsets everywhere. It's impractical to do any local adjustments, it's
much simpler and more reliable to completely regenerate BTF
completely.

If it was reasonable to support what you are asking for, I'd probably
already have done that a long time ago.

>  - It is possible to load modules with no BTF but not modules with
>    non-matching BTF. Surely the non-matching BTF could be discarded.

We started out with strict behavior like this to be able to detect any
problems with module BTFs instead of silently ignoring them. It seems
like that worked, we do know about this problem acutely. But as Alexei
said, relaxing this is the simplest way forward. It could be easily
controlled by new Kconfig value, so that default strict behavior can
be preserved as well.


>  - BTF is part of vermagic. This is completely pointless since modules
>    without BTF can be loaded on BTF kernel. Surely it would not be too
>    difficult to do the reverse as well. Given BTF must pass extra check
>    to be used having it in vermagic is just useless moise.
>
> > > Does that sound like something reasonable to work on?

No, at least not to me. Splitting vmlinux BTF into two parts (internal
and external) doesn't help with anything, see above explanation.

> > >
> > >
> > > ## Root case (in case anyone is interested in a verbose version)
> > >
> > > On openSUSE Tumbleweed there can be several builds of the same source. Since
> > > the source is the same, the binaries are simply replaced when a package with
> > > a larger build number is installed during upgrade.
> > >
> > > In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
> > > More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
> > > struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
> > > struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
> > > are previously missing in base BTF of 5.15.12-1.1.
> >
> > As stated in [2] below, I think we should understand why rebuild is
> > triggered. If the rebuild for vmlinux is triggered, why the modules cannot
> > be rebuild at the same time?
>
> They do get rebuilt. However, if you are running the kernel and install
> the update you get the new modules with the old kernel. If the install
> script fails to copy the kernel to your EFI partition based on the fact
> a kernel with the same filename is alreasy there you get the same.

Isn't this failure to copy a new kernel a failure in itself? I'd blame
module BTF the last in this scenario, but maybe I don't understand the
case completely.

>
> If you have 'stable' distribution adding new symbols is normal and it
> does not break module loading without BTF but it breaks BTF.

You don't even need to add a new symbol. Just change the order of
types in DWARF information and you get a different (though equivalent)
BTF type ID numbering which invalidates module BTFs just as much. I'm
not saying it's a feature, but it isn't a bug either, IMO. Technical
decisions and tradeoffs.

>
> Thanks
>
> Michal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-11  6:01     ` Andrii Nakryiko
@ 2022-02-11 17:20       ` Toke Høiland-Jørgensen
  2022-02-11 22:20         ` Andrii Nakryiko
  0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-02-11 17:20 UTC (permalink / raw)
  To: Andrii Nakryiko, Michal Suchánek
  Cc: Yonghong Song, Shung-Hsi Yu, bpf, Networking, Andrii Nakryiko,
	Daniel Borkmann, Alexei Starovoitov

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Thu, Feb 10, 2022 at 2:01 AM Michal Suchánek <msuchanek@suse.de> wrote:
>>
>> Hello,
>>
>> On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
>> >
>> >
>> > On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
>> > > Hi,
>> > >
>> > > We recently run into module load failure related to split BTF on openSUSE
>> > > Tumbleweed[1], which I believe is something that may also happen on other
>> > > rolling distros.
>> > >
>> > > The error looks like the follow (though failure is not limited to ipheth)
>> > >
>> > >      BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
>> > >
>> > >      failed to validate module [ipheth] BTF: -22
>> > >
>> > > The error comes down to trying to load BTF of *kernel modules from a
>> > > different build* than the runtime kernel (but the source is the same), where
>> > > the base BTF of the two build is different.
>> > >
>> > > While it may be too far stretched to call this a bug, solving this might
>> > > make BTF adoption easier. I'd natively think that we could further split
>> > > base BTF into two part to avoid this issue, where .BTF only contain exported
>> > > types, and the other (still residing in vmlinux) holds the unexported types.
>> >
>> > What is the exported types? The types used by export symbols?
>> > This for sure will increase btf handling complexity.
>>
>> And it will not actually help.
>>
>> We have modversion ABI which checks the checksum of the symbols that the
>> module imports and fails the load if the checksum for these symbols does
>> not match. It's not concerned with symbols not exported, it's not
>> concerned with symbols not used by the module. This is something that is
>> sustainable across kernel rebuilds with minor fixes/features and what
>> distributions watch for.
>>
>> Now with BTF the situation is vastly different. There are at least three
>> bugs:
>>
>>  - The BTF check is global for all symbols, not for the symbols the
>>    module uses. This is not sustainable. Given the BTF is supposed to
>>    allow linking BPF programs that were built in completely different
>>    environment with the kernel it is completely within the scope of BTF
>>    to solve this problem, it's just neglected.
>
> You refer to BTF use in CO-RE with the latter. It's just one
> application of BTF and it doesn't follow that you can do the same with
> module BTF. It's not a neglect, it's a very big technical difficulty.
>
> Each module's BTFs are designed as logical extensions of vmlinux BTF.
> And each module BTF is independent and isolated from other modules
> extension of the same vmlinux BTF. The way that BTF format is
> designed, any tiny difference in vmlinux BTF effectively invalidates
> all modules' BTFs and they have to be rebuilt.
>
> Imagine that only one BTF type is added to vmlinux BTF. Last BTF type
> ID in vmlinux BTF is shifted from, say, 1000 to 1001. While previously
> every module's BTF type ID started with 1001, now they all have to
> start with 1002 and be shifted by 1.
>
> Now let's say that the order of two BTF types in vmlinux BTF is
> changed, say type 10 becomes type 20 and type 20 becomes type 10 (just
> because of slight difference in DWARF, for instance). Any type
> reference to 10 or 20 in any module BTF has to be renumbered now.
>
> Another one, let's say we add a new string to vmlinux BTF string
> section somewhere at the beginning, say "abc" at offset 100. Any
> string offset after 100 now has to be shifted *both* in vmlinux BTF
> and all module BTFs. And also any string reference in module BTFs have
> to be adjusted as well because now each module's BTF's logical string
> offset is starting at 4 logical bytes higher (due to "abc\0" being
> added and shifting everything right).
>
> As you can see, any tiny change in vmlinux BTF, no matter where,
> beginning, middle, or end, causes massive changes in type IDs and
> offsets everywhere. It's impractical to do any local adjustments, it's
> much simpler and more reliable to completely regenerate BTF
> completely.

This seems incredibly brittle, though? IIUC this means that if you want
BTF in your modules you *must* have not only the kernel headers of the
kernel it's going to run on, but the full BTF information for the exact
kernel image you're going to load that module on? How is that supposed
to work for any kind of environment where everything is not built
together? Third-party modules for distribution kernels is the obvious
example that comes to mind here, but as this thread shows, they don't
necessarily even have to be third party...

How would you go about "completely regenerating BTF" in practice for a
third-party module, say?

-Toke


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-11 17:20       ` Toke Høiland-Jørgensen
@ 2022-02-11 22:20         ` Andrii Nakryiko
  2022-02-11 23:58           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2022-02-11 22:20 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Michal Suchánek, Yonghong Song, Shung-Hsi Yu, bpf,
	Networking, Andrii Nakryiko, Daniel Borkmann, Alexei Starovoitov

On Fri, Feb 11, 2022 at 9:20 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Thu, Feb 10, 2022 at 2:01 AM Michal Suchánek <msuchanek@suse.de> wrote:
> >>
> >> Hello,
> >>
> >> On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
> >> >
> >> >
> >> > On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
> >> > > Hi,
> >> > >
> >> > > We recently run into module load failure related to split BTF on openSUSE
> >> > > Tumbleweed[1], which I believe is something that may also happen on other
> >> > > rolling distros.
> >> > >
> >> > > The error looks like the follow (though failure is not limited to ipheth)
> >> > >
> >> > >      BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
> >> > >
> >> > >      failed to validate module [ipheth] BTF: -22
> >> > >
> >> > > The error comes down to trying to load BTF of *kernel modules from a
> >> > > different build* than the runtime kernel (but the source is the same), where
> >> > > the base BTF of the two build is different.
> >> > >
> >> > > While it may be too far stretched to call this a bug, solving this might
> >> > > make BTF adoption easier. I'd natively think that we could further split
> >> > > base BTF into two part to avoid this issue, where .BTF only contain exported
> >> > > types, and the other (still residing in vmlinux) holds the unexported types.
> >> >
> >> > What is the exported types? The types used by export symbols?
> >> > This for sure will increase btf handling complexity.
> >>
> >> And it will not actually help.
> >>
> >> We have modversion ABI which checks the checksum of the symbols that the
> >> module imports and fails the load if the checksum for these symbols does
> >> not match. It's not concerned with symbols not exported, it's not
> >> concerned with symbols not used by the module. This is something that is
> >> sustainable across kernel rebuilds with minor fixes/features and what
> >> distributions watch for.
> >>
> >> Now with BTF the situation is vastly different. There are at least three
> >> bugs:
> >>
> >>  - The BTF check is global for all symbols, not for the symbols the
> >>    module uses. This is not sustainable. Given the BTF is supposed to
> >>    allow linking BPF programs that were built in completely different
> >>    environment with the kernel it is completely within the scope of BTF
> >>    to solve this problem, it's just neglected.
> >
> > You refer to BTF use in CO-RE with the latter. It's just one
> > application of BTF and it doesn't follow that you can do the same with
> > module BTF. It's not a neglect, it's a very big technical difficulty.
> >
> > Each module's BTFs are designed as logical extensions of vmlinux BTF.
> > And each module BTF is independent and isolated from other modules
> > extension of the same vmlinux BTF. The way that BTF format is
> > designed, any tiny difference in vmlinux BTF effectively invalidates
> > all modules' BTFs and they have to be rebuilt.
> >
> > Imagine that only one BTF type is added to vmlinux BTF. Last BTF type
> > ID in vmlinux BTF is shifted from, say, 1000 to 1001. While previously
> > every module's BTF type ID started with 1001, now they all have to
> > start with 1002 and be shifted by 1.
> >
> > Now let's say that the order of two BTF types in vmlinux BTF is
> > changed, say type 10 becomes type 20 and type 20 becomes type 10 (just
> > because of slight difference in DWARF, for instance). Any type
> > reference to 10 or 20 in any module BTF has to be renumbered now.
> >
> > Another one, let's say we add a new string to vmlinux BTF string
> > section somewhere at the beginning, say "abc" at offset 100. Any
> > string offset after 100 now has to be shifted *both* in vmlinux BTF
> > and all module BTFs. And also any string reference in module BTFs have
> > to be adjusted as well because now each module's BTF's logical string
> > offset is starting at 4 logical bytes higher (due to "abc\0" being
> > added and shifting everything right).
> >
> > As you can see, any tiny change in vmlinux BTF, no matter where,
> > beginning, middle, or end, causes massive changes in type IDs and
> > offsets everywhere. It's impractical to do any local adjustments, it's
> > much simpler and more reliable to completely regenerate BTF
> > completely.
>
> This seems incredibly brittle, though? IIUC this means that if you want
> BTF in your modules you *must* have not only the kernel headers of the
> kernel it's going to run on, but the full BTF information for the exact

From BTF perspective, only vmlinux BTF. Having exact kernel headers
would minimize type information duplication.

> kernel image you're going to load that module on? How is that supposed
> to work for any kind of environment where everything is not built
> together? Third-party modules for distribution kernels is the obvious
> example that comes to mind here, but as this thread shows, they don't
> necessarily even have to be third party...
>
> How would you go about "completely regenerating BTF" in practice for a
> third-party module, say?

Great questions. I was kind of hoping you'll have some suggestions as
well, though. Not just complaints.

>
> -Toke
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-11 22:20         ` Andrii Nakryiko
@ 2022-02-11 23:58           ` Toke Høiland-Jørgensen
  2022-02-12  7:37             ` Shung-Hsi Yu
  0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-02-11 23:58 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Michal Suchánek, Yonghong Song, Shung-Hsi Yu, bpf,
	Networking, Andrii Nakryiko, Daniel Borkmann, Alexei Starovoitov

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Fri, Feb 11, 2022 at 9:20 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Thu, Feb 10, 2022 at 2:01 AM Michal Suchánek <msuchanek@suse.de> wrote:
>> >>
>> >> Hello,
>> >>
>> >> On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
>> >> >
>> >> >
>> >> > On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
>> >> > > Hi,
>> >> > >
>> >> > > We recently run into module load failure related to split BTF on openSUSE
>> >> > > Tumbleweed[1], which I believe is something that may also happen on other
>> >> > > rolling distros.
>> >> > >
>> >> > > The error looks like the follow (though failure is not limited to ipheth)
>> >> > >
>> >> > >      BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
>> >> > >
>> >> > >      failed to validate module [ipheth] BTF: -22
>> >> > >
>> >> > > The error comes down to trying to load BTF of *kernel modules from a
>> >> > > different build* than the runtime kernel (but the source is the same), where
>> >> > > the base BTF of the two build is different.
>> >> > >
>> >> > > While it may be too far stretched to call this a bug, solving this might
>> >> > > make BTF adoption easier. I'd natively think that we could further split
>> >> > > base BTF into two part to avoid this issue, where .BTF only contain exported
>> >> > > types, and the other (still residing in vmlinux) holds the unexported types.
>> >> >
>> >> > What is the exported types? The types used by export symbols?
>> >> > This for sure will increase btf handling complexity.
>> >>
>> >> And it will not actually help.
>> >>
>> >> We have modversion ABI which checks the checksum of the symbols that the
>> >> module imports and fails the load if the checksum for these symbols does
>> >> not match. It's not concerned with symbols not exported, it's not
>> >> concerned with symbols not used by the module. This is something that is
>> >> sustainable across kernel rebuilds with minor fixes/features and what
>> >> distributions watch for.
>> >>
>> >> Now with BTF the situation is vastly different. There are at least three
>> >> bugs:
>> >>
>> >>  - The BTF check is global for all symbols, not for the symbols the
>> >>    module uses. This is not sustainable. Given the BTF is supposed to
>> >>    allow linking BPF programs that were built in completely different
>> >>    environment with the kernel it is completely within the scope of BTF
>> >>    to solve this problem, it's just neglected.
>> >
>> > You refer to BTF use in CO-RE with the latter. It's just one
>> > application of BTF and it doesn't follow that you can do the same with
>> > module BTF. It's not a neglect, it's a very big technical difficulty.
>> >
>> > Each module's BTFs are designed as logical extensions of vmlinux BTF.
>> > And each module BTF is independent and isolated from other modules
>> > extension of the same vmlinux BTF. The way that BTF format is
>> > designed, any tiny difference in vmlinux BTF effectively invalidates
>> > all modules' BTFs and they have to be rebuilt.
>> >
>> > Imagine that only one BTF type is added to vmlinux BTF. Last BTF type
>> > ID in vmlinux BTF is shifted from, say, 1000 to 1001. While previously
>> > every module's BTF type ID started with 1001, now they all have to
>> > start with 1002 and be shifted by 1.
>> >
>> > Now let's say that the order of two BTF types in vmlinux BTF is
>> > changed, say type 10 becomes type 20 and type 20 becomes type 10 (just
>> > because of slight difference in DWARF, for instance). Any type
>> > reference to 10 or 20 in any module BTF has to be renumbered now.
>> >
>> > Another one, let's say we add a new string to vmlinux BTF string
>> > section somewhere at the beginning, say "abc" at offset 100. Any
>> > string offset after 100 now has to be shifted *both* in vmlinux BTF
>> > and all module BTFs. And also any string reference in module BTFs have
>> > to be adjusted as well because now each module's BTF's logical string
>> > offset is starting at 4 logical bytes higher (due to "abc\0" being
>> > added and shifting everything right).
>> >
>> > As you can see, any tiny change in vmlinux BTF, no matter where,
>> > beginning, middle, or end, causes massive changes in type IDs and
>> > offsets everywhere. It's impractical to do any local adjustments, it's
>> > much simpler and more reliable to completely regenerate BTF
>> > completely.
>>
>> This seems incredibly brittle, though? IIUC this means that if you want
>> BTF in your modules you *must* have not only the kernel headers of the
>> kernel it's going to run on, but the full BTF information for the exact
>
> From BTF perspective, only vmlinux BTF. Having exact kernel headers
> would minimize type information duplication.

Right, I meant you'd need the kernel headers to compile the module, and
the vmlinux BTF to build the module BTF info.

>> kernel image you're going to load that module on? How is that supposed
>> to work for any kind of environment where everything is not built
>> together? Third-party modules for distribution kernels is the obvious
>> example that comes to mind here, but as this thread shows, they don't
>> necessarily even have to be third party...
>>
>> How would you go about "completely regenerating BTF" in practice for a
>> third-party module, say?
>
> Great questions. I was kind of hoping you'll have some suggestions as
> well, though. Not just complaints.

Well, I kinda took your "not really a bug either" comment to mean you
weren't really open to changing the current behaviour. But if that was a
misunderstanding on my part, I do have one thought:

The "partial BTF" thing in the modules is done to save space, right?
I.e., in principle there would be nothing preventing a module from
including a full (self-contained) set of BTF in its .ko when it is
compiled? Because if so, we could allow that as an optional mode that
can be enabled if you don't mind taking the size hit (any idea how large
that usually is, BTW?). And then we could teach 'modprobe' to do a fresh
deduplication of this full BTF set against the vmlinux BTF before
loading such a module into the kernel.

Or am I missing some reason why that wouldn't work?

-Toke


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-10 22:59         ` Yonghong Song
@ 2022-02-12  5:40           ` Shung-Hsi Yu
  2022-02-12  6:36             ` Yonghong Song
  0 siblings, 1 reply; 22+ messages in thread
From: Shung-Hsi Yu @ 2022-02-12  5:40 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Connor O'Brien, Michal Suchánek,
	bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov

On Thu, Feb 10, 2022 at 02:59:03PM -0800, Yonghong Song wrote:
> On 2/10/22 2:34 PM, Alexei Starovoitov wrote:
> > On Thu, Feb 10, 2022 at 10:17 AM Yonghong Song <yhs@fb.com> wrote:
> > > On 2/10/22 2:01 AM, Michal Suchánek wrote:
> > > > Hello,
> > > > 
> > > > On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
> > > > > 
> > > > > 
> > > > > On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > We recently run into module load failure related to split BTF on openSUSE
> > > > > > Tumbleweed[1], which I believe is something that may also happen on other
> > > > > > rolling distros.
> > > > > > 
> > > > > > The error looks like the follow (though failure is not limited to ipheth)
> > > > > > 
> > > > > >        BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
> > > > > > 
> > > > > >        failed to validate module [ipheth] BTF: -22
> > > > > > 
> > > > > > The error comes down to trying to load BTF of *kernel modules from a
> > > > > > different build* than the runtime kernel (but the source is the same), where
> > > > > > the base BTF of the two build is different.
> > > > > > 
> > > > > > While it may be too far stretched to call this a bug, solving this might
> > > > > > make BTF adoption easier. I'd natively think that we could further split
> > > > > > base BTF into two part to avoid this issue, where .BTF only contain exported
> > > > > > types, and the other (still residing in vmlinux) holds the unexported types.
> > > > > 
> > > > > What is the exported types? The types used by export symbols?
> > > > > This for sure will increase btf handling complexity.
> > > > 
> > > > And it will not actually help.
> > > > 
> > > > We have modversion ABI which checks the checksum of the symbols that the
> > > > module imports and fails the load if the checksum for these symbols does
> > > > not match. It's not concerned with symbols not exported, it's not
> > > > concerned with symbols not used by the module. This is something that is
> > > > sustainable across kernel rebuilds with minor fixes/features and what
> > > > distributions watch for.
> > > > 
> > > > Now with BTF the situation is vastly different. There are at least three
> > > > bugs:
> > > > 
> > > >    - The BTF check is global for all symbols, not for the symbols the
> > > >      module uses. This is not sustainable. Given the BTF is supposed to
> > > >      allow linking BPF programs that were built in completely different
> > > >      environment with the kernel it is completely within the scope of BTF
> > > >      to solve this problem, it's just neglected.
> > > >    - It is possible to load modules with no BTF but not modules with
> > > >      non-matching BTF. Surely the non-matching BTF could be discarded.
> > > >    - BTF is part of vermagic. This is completely pointless since modules
> > > >      without BTF can be loaded on BTF kernel. Surely it would not be too
> > > >      difficult to do the reverse as well. Given BTF must pass extra check
> > > >      to be used having it in vermagic is just useless moise.
> > > > 
> > > > > > Does that sound like something reasonable to work on?
> > > > > > 
> > > > > > 
> > > > > > ## Root case (in case anyone is interested in a verbose version)
> > > > > > 
> > > > > > On openSUSE Tumbleweed there can be several builds of the same source. Since
> > > > > > the source is the same, the binaries are simply replaced when a package with
> > > > > > a larger build number is installed during upgrade.
> > > > > > 
> > > > > > In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
> > > > > > More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
> > > > > > struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
> > > > > > struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
> > > > > > are previously missing in base BTF of 5.15.12-1.1.
> > > > > 
> > > > > As stated in [2] below, I think we should understand why rebuild is
> > > > > triggered. If the rebuild for vmlinux is triggered, why the modules cannot
> > > > > be rebuild at the same time?
> > > > 
> > > > They do get rebuilt. However, if you are running the kernel and install
> > > > the update you get the new modules with the old kernel. If the install
> > > > script fails to copy the kernel to your EFI partition based on the fact
> > > > a kernel with the same filename is alreasy there you get the same.
> > > > 
> > > > If you have 'stable' distribution adding new symbols is normal and it
> > > > does not break module loading without BTF but it breaks BTF.
> > > 
> > > Okay, I see. One possible solution is that if kernel module btf
> > > does not match vmlinux btf, the kernel module btf will be ignored
> > > with a dmesg warning but kernel module load will proceed as normal.
> > > I think this might be also useful for bpf lskel kernel modules as
> > > well which tries to be portable (with CO-RE) for different kernels.
> > 
> > That sounds like #2 that Michal is proposing:
> > "It is possible to load modules with no BTF but not modules with
> >   non-matching BTF. Surely the non-matching BTF could be discarded."

Since we're talking about matching check, I'd like bring up another issue.

AFAICT with current form of BTF, checking whether BTF on kernel module
matches cannot be made entirely robust without a new version of btf_header
that contain info about the base BTF.

As effective as the checks are in this case, by detecting a type name being
an empty string and thus conclude it's non-matching, with some (bad) luck a
non-matching BTF could pass these checks a gets loaded.

> > That's probably the simplest way forward.
> > 
> > The patch
> > https://patchwork.kernel.org/project/netdevbpf/patch/20220209052141.140063-1-connoro@google.com/
> > shouldn't be necessary too.
> 
> Right the patch tried to address this issue and if we allow
> non-matching BTF is ignored and then treaking DEBUG_INFO_BTF_MODULES
> is not necessary.

Not being able to load kernel module with non-matching BTF and the absence
of robust matching check are the two reasons that lead us to the same path
of disabling DEBUG_INFO_BTF_MODULES a while back.

Ignoring non-matching BTF will solve the former, but not the latter, so I'd
hope that the above patch get's taken (though I'm obviously biased).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-12  5:40           ` Shung-Hsi Yu
@ 2022-02-12  6:36             ` Yonghong Song
  2022-02-15 19:38               ` Shung-Hsi Yu
  0 siblings, 1 reply; 22+ messages in thread
From: Yonghong Song @ 2022-02-12  6:36 UTC (permalink / raw)
  To: Shung-Hsi Yu
  Cc: Alexei Starovoitov, Connor O'Brien, Michal Suchánek,
	bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov



On 2/11/22 9:40 PM, Shung-Hsi Yu wrote:
> On Thu, Feb 10, 2022 at 02:59:03PM -0800, Yonghong Song wrote:
>> On 2/10/22 2:34 PM, Alexei Starovoitov wrote:
>>> On Thu, Feb 10, 2022 at 10:17 AM Yonghong Song <yhs@fb.com> wrote:
>>>> On 2/10/22 2:01 AM, Michal Suchánek wrote:
>>>>> Hello,
>>>>>
>>>>> On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
>>>>>>
>>>>>>
>>>>>> On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We recently run into module load failure related to split BTF on openSUSE
>>>>>>> Tumbleweed[1], which I believe is something that may also happen on other
>>>>>>> rolling distros.
>>>>>>>
>>>>>>> The error looks like the follow (though failure is not limited to ipheth)
>>>>>>>
>>>>>>>         BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
>>>>>>>
>>>>>>>         failed to validate module [ipheth] BTF: -22
>>>>>>>
>>>>>>> The error comes down to trying to load BTF of *kernel modules from a
>>>>>>> different build* than the runtime kernel (but the source is the same), where
>>>>>>> the base BTF of the two build is different.
>>>>>>>
>>>>>>> While it may be too far stretched to call this a bug, solving this might
>>>>>>> make BTF adoption easier. I'd natively think that we could further split
>>>>>>> base BTF into two part to avoid this issue, where .BTF only contain exported
>>>>>>> types, and the other (still residing in vmlinux) holds the unexported types.
>>>>>>
>>>>>> What is the exported types? The types used by export symbols?
>>>>>> This for sure will increase btf handling complexity.
>>>>>
>>>>> And it will not actually help.
>>>>>
>>>>> We have modversion ABI which checks the checksum of the symbols that the
>>>>> module imports and fails the load if the checksum for these symbols does
>>>>> not match. It's not concerned with symbols not exported, it's not
>>>>> concerned with symbols not used by the module. This is something that is
>>>>> sustainable across kernel rebuilds with minor fixes/features and what
>>>>> distributions watch for.
>>>>>
>>>>> Now with BTF the situation is vastly different. There are at least three
>>>>> bugs:
>>>>>
>>>>>     - The BTF check is global for all symbols, not for the symbols the
>>>>>       module uses. This is not sustainable. Given the BTF is supposed to
>>>>>       allow linking BPF programs that were built in completely different
>>>>>       environment with the kernel it is completely within the scope of BTF
>>>>>       to solve this problem, it's just neglected.
>>>>>     - It is possible to load modules with no BTF but not modules with
>>>>>       non-matching BTF. Surely the non-matching BTF could be discarded.
>>>>>     - BTF is part of vermagic. This is completely pointless since modules
>>>>>       without BTF can be loaded on BTF kernel. Surely it would not be too
>>>>>       difficult to do the reverse as well. Given BTF must pass extra check
>>>>>       to be used having it in vermagic is just useless moise.
>>>>>
>>>>>>> Does that sound like something reasonable to work on?
>>>>>>>
>>>>>>>
>>>>>>> ## Root case (in case anyone is interested in a verbose version)
>>>>>>>
>>>>>>> On openSUSE Tumbleweed there can be several builds of the same source. Since
>>>>>>> the source is the same, the binaries are simply replaced when a package with
>>>>>>> a larger build number is installed during upgrade.
>>>>>>>
>>>>>>> In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
>>>>>>> More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
>>>>>>> struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
>>>>>>> struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
>>>>>>> are previously missing in base BTF of 5.15.12-1.1.
>>>>>>
>>>>>> As stated in [2] below, I think we should understand why rebuild is
>>>>>> triggered. If the rebuild for vmlinux is triggered, why the modules cannot
>>>>>> be rebuild at the same time?
>>>>>
>>>>> They do get rebuilt. However, if you are running the kernel and install
>>>>> the update you get the new modules with the old kernel. If the install
>>>>> script fails to copy the kernel to your EFI partition based on the fact
>>>>> a kernel with the same filename is alreasy there you get the same.
>>>>>
>>>>> If you have 'stable' distribution adding new symbols is normal and it
>>>>> does not break module loading without BTF but it breaks BTF.
>>>>
>>>> Okay, I see. One possible solution is that if kernel module btf
>>>> does not match vmlinux btf, the kernel module btf will be ignored
>>>> with a dmesg warning but kernel module load will proceed as normal.
>>>> I think this might be also useful for bpf lskel kernel modules as
>>>> well which tries to be portable (with CO-RE) for different kernels.
>>>
>>> That sounds like #2 that Michal is proposing:
>>> "It is possible to load modules with no BTF but not modules with
>>>    non-matching BTF. Surely the non-matching BTF could be discarded."
> 
> Since we're talking about matching check, I'd like bring up another issue.
> 
> AFAICT with current form of BTF, checking whether BTF on kernel module
> matches cannot be made entirely robust without a new version of btf_header
> that contain info about the base BTF.

The base BTF is always the one associated with running kernel and 
typically the BTF is under /sys/kernel/btf/vmlinux. Did I miss
anything here?

> 
> As effective as the checks are in this case, by detecting a type name being
> an empty string and thus conclude it's non-matching, with some (bad) luck a
> non-matching BTF could pass these checks a gets loaded.

Could you be a little bit more specific about the 'bad luck' a
non-matching BTF could get loaded? An example will be great.


> 
>>> That's probably the simplest way forward.
>>>
>>> The patch
>>> https://patchwork.kernel.org/project/netdevbpf/patch/20220209052141.140063-1-connoro@google.com/
>>> shouldn't be necessary too.
>>
>> Right the patch tried to address this issue and if we allow
>> non-matching BTF is ignored and then treaking DEBUG_INFO_BTF_MODULES
>> is not necessary.
> 
> Not being able to load kernel module with non-matching BTF and the absence
> of robust matching check are the two reasons that lead us to the same path
> of disabling DEBUG_INFO_BTF_MODULES a while back.
> 
> Ignoring non-matching BTF will solve the former, but not the latter, so I'd
> hope that the above patch get's taken (though I'm obviously biased).
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-11 23:58           ` Toke Høiland-Jørgensen
@ 2022-02-12  7:37             ` Shung-Hsi Yu
  2022-02-13 15:40               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 22+ messages in thread
From: Shung-Hsi Yu @ 2022-02-12  7:37 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, Michal Suchánek, Yonghong Song, bpf,
	Networking, Andrii Nakryiko, Daniel Borkmann, Alexei Starovoitov

On Sat, Feb 12, 2022 at 12:58:51AM +0100, Toke Høiland-Jørgensen wrote:
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> 
> > On Fri, Feb 11, 2022 at 9:20 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>
> >> > On Thu, Feb 10, 2022 at 2:01 AM Michal Suchánek <msuchanek@suse.de> wrote:
> >> >>
> >> >> Hello,
> >> >>
> >> >> On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
> >> >> >
> >> >> >
> >> >> > On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
> >> >> > > Hi,
> >> >> > >
> >> >> > > We recently run into module load failure related to split BTF on openSUSE
> >> >> > > Tumbleweed[1], which I believe is something that may also happen on other
> >> >> > > rolling distros.
> >> >> > >
> >> >> > > The error looks like the follow (though failure is not limited to ipheth)
> >> >> > >
> >> >> > >      BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
> >> >> > >
> >> >> > >      failed to validate module [ipheth] BTF: -22
> >> >> > >
> >> >> > > The error comes down to trying to load BTF of *kernel modules from a
> >> >> > > different build* than the runtime kernel (but the source is the same), where
> >> >> > > the base BTF of the two build is different.
> >> >> > >
> >> >> > > While it may be too far stretched to call this a bug, solving this might
> >> >> > > make BTF adoption easier. I'd natively think that we could further split
> >> >> > > base BTF into two part to avoid this issue, where .BTF only contain exported
> >> >> > > types, and the other (still residing in vmlinux) holds the unexported types.
> >> >> >
> >> >> > What is the exported types? The types used by export symbols?
> >> >> > This for sure will increase btf handling complexity.
> >> >>
> >> >> And it will not actually help.
> >> >>
> >> >> We have modversion ABI which checks the checksum of the symbols that the
> >> >> module imports and fails the load if the checksum for these symbols does
> >> >> not match. It's not concerned with symbols not exported, it's not
> >> >> concerned with symbols not used by the module. This is something that is
> >> >> sustainable across kernel rebuilds with minor fixes/features and what
> >> >> distributions watch for.
> >> >>
> >> >> Now with BTF the situation is vastly different. There are at least three
> >> >> bugs:
> >> >>
> >> >>  - The BTF check is global for all symbols, not for the symbols the
> >> >>    module uses. This is not sustainable. Given the BTF is supposed to
> >> >>    allow linking BPF programs that were built in completely different
> >> >>    environment with the kernel it is completely within the scope of BTF
> >> >>    to solve this problem, it's just neglected.
> >> >
> >> > You refer to BTF use in CO-RE with the latter. It's just one
> >> > application of BTF and it doesn't follow that you can do the same with
> >> > module BTF. It's not a neglect, it's a very big technical difficulty.
> >> >
> >> > Each module's BTFs are designed as logical extensions of vmlinux BTF.
> >> > And each module BTF is independent and isolated from other modules
> >> > extension of the same vmlinux BTF. The way that BTF format is
> >> > designed, any tiny difference in vmlinux BTF effectively invalidates
> >> > all modules' BTFs and they have to be rebuilt.
> >> >
> >> > Imagine that only one BTF type is added to vmlinux BTF. Last BTF type
> >> > ID in vmlinux BTF is shifted from, say, 1000 to 1001. While previously
> >> > every module's BTF type ID started with 1001, now they all have to
> >> > start with 1002 and be shifted by 1.
> >> >
> >> > Now let's say that the order of two BTF types in vmlinux BTF is
> >> > changed, say type 10 becomes type 20 and type 20 becomes type 10 (just
> >> > because of slight difference in DWARF, for instance). Any type
> >> > reference to 10 or 20 in any module BTF has to be renumbered now.
> >> >
> >> > Another one, let's say we add a new string to vmlinux BTF string
> >> > section somewhere at the beginning, say "abc" at offset 100. Any
> >> > string offset after 100 now has to be shifted *both* in vmlinux BTF
> >> > and all module BTFs. And also any string reference in module BTFs have
> >> > to be adjusted as well because now each module's BTF's logical string
> >> > offset is starting at 4 logical bytes higher (due to "abc\0" being
> >> > added and shifting everything right).
> >> >
> >> > As you can see, any tiny change in vmlinux BTF, no matter where,
> >> > beginning, middle, or end, causes massive changes in type IDs and
> >> > offsets everywhere. It's impractical to do any local adjustments, it's
> >> > much simpler and more reliable to completely regenerate BTF
> >> > completely.
> >>
> >> This seems incredibly brittle, though? IIUC this means that if you want
> >> BTF in your modules you *must* have not only the kernel headers of the
> >> kernel it's going to run on, but the full BTF information for the exact
> >
> > From BTF perspective, only vmlinux BTF. Having exact kernel headers
> > would minimize type information duplication.
> 
> Right, I meant you'd need the kernel headers to compile the module, and
> the vmlinux BTF to build the module BTF info.
> 
> >> kernel image you're going to load that module on? How is that supposed
> >> to work for any kind of environment where everything is not built
> >> together? Third-party modules for distribution kernels is the obvious
> >> example that comes to mind here, but as this thread shows, they don't
> >> necessarily even have to be third party...
> >>
> >> How would you go about "completely regenerating BTF" in practice for a
> >> third-party module, say?
> >
> > Great questions. I was kind of hoping you'll have some suggestions as
> > well, though. Not just complaints.
> 
> Well, I kinda took your "not really a bug either" comment to mean you
> weren't really open to changing the current behaviour. But if that was a
> misunderstanding on my part, I do have one thought:
> 
> The "partial BTF" thing in the modules is done to save space, right?
> I.e., in principle there would be nothing preventing a module from
> including a full (self-contained) set of BTF in its .ko when it is
> compiled? Because if so, we could allow that as an optional mode that
> can be enabled if you don't mind taking the size hit (any idea how large
> that usually is, BTW?).

This seems quite nice IMO as no change need to be made on the generation
side of existing BTF tooling. I test it out on openSUSE Tumbleweed 5.16.5
kernel modules, and for the sake of completeness, includes both the case
where BTF is stripped and using a pre-trained zstd dictionary as well.

Uncompressed, no BTF                             362MiB -27%
Uncompressed, parital BTF                        499MiB +0%
Uncompressed, self-contained BTF                1026MiB +105%

Zstd compressed, no BTF                           95MiB -35%
Zstd compressed, partial BTF                     147MiB +0%
Zstd compressed, self-contained BTF              361MiB +145%
Zstd compressed (trained), self-contained BTF    299MiB +103%

So we'd expect quite a bit of hit as the size of kernel module would double.

For servers and workstation environment an additional ~200MiB of disk space
seems like tolerable trade-off if it can get third-party kernel module to
work. But I cannot speak for other kind of use cases.

> And then we could teach 'modprobe' to do a fresh deduplication of this
> full BTF set against the vmlinux BTF before loading such a module into the
> kernel.
> 
> Or am I missing some reason why that wouldn't work?

One minor problem would be this is essentially introducing a new kernel
module BTF format that uses exactly the same header.

Ever since the introduction of split BTF, we're reusing btf_header but
acting as if there's an extra hidden flag indicating whether the BTF is
self-contained or partial. So far we could implicitly guess the value of the
flag since BTF in vmlinux is always self-contained and BTF in kernel module
is always partial; but if self-contained BTF on kernel module is introduced
this will no longer be the case.

Not sure if it'd be a issue in practice though, as we could go through the
type info and see whether there's any type ID that is too large and cannot
be found.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-12  7:37             ` Shung-Hsi Yu
@ 2022-02-13 15:40               ` Toke Høiland-Jørgensen
  2022-02-14 20:19                 ` Michal Suchánek
  0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-02-13 15:40 UTC (permalink / raw)
  To: Shung-Hsi Yu
  Cc: Andrii Nakryiko, Michal Suchánek, Yonghong Song, bpf,
	Networking, Andrii Nakryiko, Daniel Borkmann, Alexei Starovoitov

Shung-Hsi Yu <shung-hsi.yu@suse.com> writes:

> On Sat, Feb 12, 2022 at 12:58:51AM +0100, Toke Høiland-Jørgensen wrote:
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> 
>> > On Fri, Feb 11, 2022 at 9:20 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> >>
>> >> > On Thu, Feb 10, 2022 at 2:01 AM Michal Suchánek <msuchanek@suse.de> wrote:
>> >> >>
>> >> >> Hello,
>> >> >>
>> >> >> On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
>> >> >> >
>> >> >> >
>> >> >> > On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
>> >> >> > > Hi,
>> >> >> > >
>> >> >> > > We recently run into module load failure related to split BTF on openSUSE
>> >> >> > > Tumbleweed[1], which I believe is something that may also happen on other
>> >> >> > > rolling distros.
>> >> >> > >
>> >> >> > > The error looks like the follow (though failure is not limited to ipheth)
>> >> >> > >
>> >> >> > >      BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
>> >> >> > >
>> >> >> > >      failed to validate module [ipheth] BTF: -22
>> >> >> > >
>> >> >> > > The error comes down to trying to load BTF of *kernel modules from a
>> >> >> > > different build* than the runtime kernel (but the source is the same), where
>> >> >> > > the base BTF of the two build is different.
>> >> >> > >
>> >> >> > > While it may be too far stretched to call this a bug, solving this might
>> >> >> > > make BTF adoption easier. I'd natively think that we could further split
>> >> >> > > base BTF into two part to avoid this issue, where .BTF only contain exported
>> >> >> > > types, and the other (still residing in vmlinux) holds the unexported types.
>> >> >> >
>> >> >> > What is the exported types? The types used by export symbols?
>> >> >> > This for sure will increase btf handling complexity.
>> >> >>
>> >> >> And it will not actually help.
>> >> >>
>> >> >> We have modversion ABI which checks the checksum of the symbols that the
>> >> >> module imports and fails the load if the checksum for these symbols does
>> >> >> not match. It's not concerned with symbols not exported, it's not
>> >> >> concerned with symbols not used by the module. This is something that is
>> >> >> sustainable across kernel rebuilds with minor fixes/features and what
>> >> >> distributions watch for.
>> >> >>
>> >> >> Now with BTF the situation is vastly different. There are at least three
>> >> >> bugs:
>> >> >>
>> >> >>  - The BTF check is global for all symbols, not for the symbols the
>> >> >>    module uses. This is not sustainable. Given the BTF is supposed to
>> >> >>    allow linking BPF programs that were built in completely different
>> >> >>    environment with the kernel it is completely within the scope of BTF
>> >> >>    to solve this problem, it's just neglected.
>> >> >
>> >> > You refer to BTF use in CO-RE with the latter. It's just one
>> >> > application of BTF and it doesn't follow that you can do the same with
>> >> > module BTF. It's not a neglect, it's a very big technical difficulty.
>> >> >
>> >> > Each module's BTFs are designed as logical extensions of vmlinux BTF.
>> >> > And each module BTF is independent and isolated from other modules
>> >> > extension of the same vmlinux BTF. The way that BTF format is
>> >> > designed, any tiny difference in vmlinux BTF effectively invalidates
>> >> > all modules' BTFs and they have to be rebuilt.
>> >> >
>> >> > Imagine that only one BTF type is added to vmlinux BTF. Last BTF type
>> >> > ID in vmlinux BTF is shifted from, say, 1000 to 1001. While previously
>> >> > every module's BTF type ID started with 1001, now they all have to
>> >> > start with 1002 and be shifted by 1.
>> >> >
>> >> > Now let's say that the order of two BTF types in vmlinux BTF is
>> >> > changed, say type 10 becomes type 20 and type 20 becomes type 10 (just
>> >> > because of slight difference in DWARF, for instance). Any type
>> >> > reference to 10 or 20 in any module BTF has to be renumbered now.
>> >> >
>> >> > Another one, let's say we add a new string to vmlinux BTF string
>> >> > section somewhere at the beginning, say "abc" at offset 100. Any
>> >> > string offset after 100 now has to be shifted *both* in vmlinux BTF
>> >> > and all module BTFs. And also any string reference in module BTFs have
>> >> > to be adjusted as well because now each module's BTF's logical string
>> >> > offset is starting at 4 logical bytes higher (due to "abc\0" being
>> >> > added and shifting everything right).
>> >> >
>> >> > As you can see, any tiny change in vmlinux BTF, no matter where,
>> >> > beginning, middle, or end, causes massive changes in type IDs and
>> >> > offsets everywhere. It's impractical to do any local adjustments, it's
>> >> > much simpler and more reliable to completely regenerate BTF
>> >> > completely.
>> >>
>> >> This seems incredibly brittle, though? IIUC this means that if you want
>> >> BTF in your modules you *must* have not only the kernel headers of the
>> >> kernel it's going to run on, but the full BTF information for the exact
>> >
>> > From BTF perspective, only vmlinux BTF. Having exact kernel headers
>> > would minimize type information duplication.
>> 
>> Right, I meant you'd need the kernel headers to compile the module, and
>> the vmlinux BTF to build the module BTF info.
>> 
>> >> kernel image you're going to load that module on? How is that supposed
>> >> to work for any kind of environment where everything is not built
>> >> together? Third-party modules for distribution kernels is the obvious
>> >> example that comes to mind here, but as this thread shows, they don't
>> >> necessarily even have to be third party...
>> >>
>> >> How would you go about "completely regenerating BTF" in practice for a
>> >> third-party module, say?
>> >
>> > Great questions. I was kind of hoping you'll have some suggestions as
>> > well, though. Not just complaints.
>> 
>> Well, I kinda took your "not really a bug either" comment to mean you
>> weren't really open to changing the current behaviour. But if that was a
>> misunderstanding on my part, I do have one thought:
>> 
>> The "partial BTF" thing in the modules is done to save space, right?
>> I.e., in principle there would be nothing preventing a module from
>> including a full (self-contained) set of BTF in its .ko when it is
>> compiled? Because if so, we could allow that as an optional mode that
>> can be enabled if you don't mind taking the size hit (any idea how large
>> that usually is, BTW?).
>
> This seems quite nice IMO as no change need to be made on the generation
> side of existing BTF tooling. I test it out on openSUSE Tumbleweed 5.16.5
> kernel modules, and for the sake of completeness, includes both the case
> where BTF is stripped and using a pre-trained zstd dictionary as well.
>
> Uncompressed, no BTF                             362MiB -27%
> Uncompressed, parital BTF                        499MiB +0%
> Uncompressed, self-contained BTF                1026MiB +105%
>
> Zstd compressed, no BTF                           95MiB -35%
> Zstd compressed, partial BTF                     147MiB +0%
> Zstd compressed, self-contained BTF              361MiB +145%
> Zstd compressed (trained), self-contained BTF    299MiB +103%
>
> So we'd expect quite a bit of hit as the size of kernel module would double.
>
> For servers and workstation environment an additional ~200MiB of disk space
> seems like tolerable trade-off if it can get third-party kernel module to
> work. But I cannot speak for other kind of use cases.

Well, there are also in-between tradeoffs (i.e., you can build a subset
of the modules with self-contained BTF and a subset with partial BTF
depending on what fits your build environment).

We could also come up with more optimisations later if needed; one thing
that comes to mind is adding the option to build a set of modules
together and have them deduplicate BTF between them, but still be
self-contained as a group. E.g., all netfilter modules could share a
common BTF set if you can be sure they will always be rebuilt together.
Once we have the deduplication logic in 'modprobe' this could be made
infinitely complex (recursive groups of deduplicated chunks of BTF!),
but that's probably overdoing it ;)

>> And then we could teach 'modprobe' to do a fresh deduplication of this
>> full BTF set against the vmlinux BTF before loading such a module into the
>> kernel.
>> 
>> Or am I missing some reason why that wouldn't work?
>
> One minor problem would be this is essentially introducing a new kernel
> module BTF format that uses exactly the same header.
>
> Ever since the introduction of split BTF, we're reusing btf_header but
> acting as if there's an extra hidden flag indicating whether the BTF is
> self-contained or partial. So far we could implicitly guess the value of the
> flag since BTF in vmlinux is always self-contained and BTF in kernel module
> is always partial; but if self-contained BTF on kernel module is introduced
> this will no longer be the case.
>
> Not sure if it'd be a issue in practice though, as we could go through the
> type info and see whether there's any type ID that is too large and cannot
> be found.

Yeah, one option could be to just discover it when parsing the BTF: if
it's not self-contained, assume it's referring to the vmlinux BTF and
act accordingly. As long as there is no risk of "false positives" where
the loader detects the wrong thing, but I don't think this detection
would add any failure cases that are not present already today?

Another option would be to but self-contained BTF in a different
section. Or we could amend the header as you say, but with what?

-Toke


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-13 15:40               ` Toke Høiland-Jørgensen
@ 2022-02-14 20:19                 ` Michal Suchánek
  0 siblings, 0 replies; 22+ messages in thread
From: Michal Suchánek @ 2022-02-14 20:19 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Shung-Hsi Yu, Andrii Nakryiko, Yonghong Song, bpf, Networking,
	Andrii Nakryiko, Daniel Borkmann, Alexei Starovoitov

On Sun, Feb 13, 2022 at 04:40:44PM +0100, Toke Høiland-Jørgensen wrote:
> Shung-Hsi Yu <shung-hsi.yu@suse.com> writes:
> 
> > On Sat, Feb 12, 2022 at 12:58:51AM +0100, Toke Høiland-Jørgensen wrote:
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >> 
> >> > On Fri, Feb 11, 2022 at 9:20 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >>
> >> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >> >>
> >> >> > On Thu, Feb 10, 2022 at 2:01 AM Michal Suchánek <msuchanek@suse.de> wrote:
> >> >> >>
> >> >> >> Hello,
> >> >> >>
> >> >> >> On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
> >> >> >> >
> >> >> >> >
> >> >> >> > On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
> >> >> >> > > Hi,
> >> >> >> > >
> >> >> >> > > We recently run into module load failure related to split BTF on openSUSE
> >> >> >> > > Tumbleweed[1], which I believe is something that may also happen on other
> >> >> >> > > rolling distros.
> >> >> >> > >
> >> >> >> > > The error looks like the follow (though failure is not limited to ipheth)
> >> >> >> > >
> >> >> >> > >      BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
> >> >> >> > >
> >> >> >> > >      failed to validate module [ipheth] BTF: -22
> >> >> >> > >
> >> >> >> > > The error comes down to trying to load BTF of *kernel modules from a
> >> >> >> > > different build* than the runtime kernel (but the source is the same), where
> >> >> >> > > the base BTF of the two build is different.
> >> >> >> > >
> >> >> >> > > While it may be too far stretched to call this a bug, solving this might
> >> >> >> > > make BTF adoption easier. I'd natively think that we could further split
> >> >> >> > > base BTF into two part to avoid this issue, where .BTF only contain exported
> >> >> >> > > types, and the other (still residing in vmlinux) holds the unexported types.
> >> >> >> >
> >> >> >> > What is the exported types? The types used by export symbols?
> >> >> >> > This for sure will increase btf handling complexity.
> >> >> >>
> >> >> >> And it will not actually help.
> >> >> >>
> >> >> >> We have modversion ABI which checks the checksum of the symbols that the
> >> >> >> module imports and fails the load if the checksum for these symbols does
> >> >> >> not match. It's not concerned with symbols not exported, it's not
> >> >> >> concerned with symbols not used by the module. This is something that is
> >> >> >> sustainable across kernel rebuilds with minor fixes/features and what
> >> >> >> distributions watch for.
> >> >> >>
> >> >> >> Now with BTF the situation is vastly different. There are at least three
> >> >> >> bugs:
> >> >> >>
> >> >> >>  - The BTF check is global for all symbols, not for the symbols the
> >> >> >>    module uses. This is not sustainable. Given the BTF is supposed to
> >> >> >>    allow linking BPF programs that were built in completely different
> >> >> >>    environment with the kernel it is completely within the scope of BTF
> >> >> >>    to solve this problem, it's just neglected.
> >> >> >
> >> >> > You refer to BTF use in CO-RE with the latter. It's just one
> >> >> > application of BTF and it doesn't follow that you can do the same with
> >> >> > module BTF. It's not a neglect, it's a very big technical difficulty.
> >> >> >
> >> >> > Each module's BTFs are designed as logical extensions of vmlinux BTF.
> >> >> > And each module BTF is independent and isolated from other modules
> >> >> > extension of the same vmlinux BTF. The way that BTF format is
> >> >> > designed, any tiny difference in vmlinux BTF effectively invalidates
> >> >> > all modules' BTFs and they have to be rebuilt.
> >> >> >
> >> >> > Imagine that only one BTF type is added to vmlinux BTF. Last BTF type
> >> >> > ID in vmlinux BTF is shifted from, say, 1000 to 1001. While previously
> >> >> > every module's BTF type ID started with 1001, now they all have to
> >> >> > start with 1002 and be shifted by 1.
> >> >> >
> >> >> > Now let's say that the order of two BTF types in vmlinux BTF is
> >> >> > changed, say type 10 becomes type 20 and type 20 becomes type 10 (just
> >> >> > because of slight difference in DWARF, for instance). Any type
> >> >> > reference to 10 or 20 in any module BTF has to be renumbered now.
> >> >> >
> >> >> > Another one, let's say we add a new string to vmlinux BTF string
> >> >> > section somewhere at the beginning, say "abc" at offset 100. Any
> >> >> > string offset after 100 now has to be shifted *both* in vmlinux BTF
> >> >> > and all module BTFs. And also any string reference in module BTFs have
> >> >> > to be adjusted as well because now each module's BTF's logical string
> >> >> > offset is starting at 4 logical bytes higher (due to "abc\0" being
> >> >> > added and shifting everything right).
> >> >> >
> >> >> > As you can see, any tiny change in vmlinux BTF, no matter where,
> >> >> > beginning, middle, or end, causes massive changes in type IDs and
> >> >> > offsets everywhere. It's impractical to do any local adjustments, it's
> >> >> > much simpler and more reliable to completely regenerate BTF
> >> >> > completely.
> >> >>
> >> >> This seems incredibly brittle, though? IIUC this means that if you want
> >> >> BTF in your modules you *must* have not only the kernel headers of the
> >> >> kernel it's going to run on, but the full BTF information for the exact
> >> >
> >> > From BTF perspective, only vmlinux BTF. Having exact kernel headers
> >> > would minimize type information duplication.
> >> 
> >> Right, I meant you'd need the kernel headers to compile the module, and
> >> the vmlinux BTF to build the module BTF info.
> >> 
> >> >> kernel image you're going to load that module on? How is that supposed
> >> >> to work for any kind of environment where everything is not built
> >> >> together? Third-party modules for distribution kernels is the obvious
> >> >> example that comes to mind here, but as this thread shows, they don't
> >> >> necessarily even have to be third party...
> >> >>
> >> >> How would you go about "completely regenerating BTF" in practice for a
> >> >> third-party module, say?
> >> >
> >> > Great questions. I was kind of hoping you'll have some suggestions as
> >> > well, though. Not just complaints.
> >> 
> >> Well, I kinda took your "not really a bug either" comment to mean you
> >> weren't really open to changing the current behaviour. But if that was a
> >> misunderstanding on my part, I do have one thought:
> >> 
> >> The "partial BTF" thing in the modules is done to save space, right?
> >> I.e., in principle there would be nothing preventing a module from
> >> including a full (self-contained) set of BTF in its .ko when it is
> >> compiled? Because if so, we could allow that as an optional mode that
> >> can be enabled if you don't mind taking the size hit (any idea how large
> >> that usually is, BTW?).
> >
> > This seems quite nice IMO as no change need to be made on the generation
> > side of existing BTF tooling. I test it out on openSUSE Tumbleweed 5.16.5
> > kernel modules, and for the sake of completeness, includes both the case
> > where BTF is stripped and using a pre-trained zstd dictionary as well.
> >
> > Uncompressed, no BTF                             362MiB -27%
> > Uncompressed, parital BTF                        499MiB +0%
> > Uncompressed, self-contained BTF                1026MiB +105%
> >
> > Zstd compressed, no BTF                           95MiB -35%
> > Zstd compressed, partial BTF                     147MiB +0%
> > Zstd compressed, self-contained BTF              361MiB +145%
> > Zstd compressed (trained), self-contained BTF    299MiB +103%
> >
> > So we'd expect quite a bit of hit as the size of kernel module would double.
> >
> > For servers and workstation environment an additional ~200MiB of disk space
> > seems like tolerable trade-off if it can get third-party kernel module to
> > work. But I cannot speak for other kind of use cases.
> 
> Well, there are also in-between tradeoffs (i.e., you can build a subset
> of the modules with self-contained BTF and a subset with partial BTF
> depending on what fits your build environment).

As for that you would typically want in-tree modules with partial BTF.
It's a bug if they don't match, and if you can ignore the non-matching
BTF you should bee able to boot a system that is functional enough to
re-install the kernel. Today nothing critical depends on CO-RE.

On the othere hand if you build something out-of-tree be it virtualbox
or some module updated with cutting edge experimental changes you will
likely want full BTF.

Thanks

Michal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-15 19:38               ` Shung-Hsi Yu
@ 2022-02-15 17:47                 ` Yonghong Song
  2022-02-15 18:57                   ` Toke Høiland-Jørgensen
  2022-02-16  8:48                   ` David Laight
  2022-03-02 17:46                 ` Michal Suchánek
  1 sibling, 2 replies; 22+ messages in thread
From: Yonghong Song @ 2022-02-15 17:47 UTC (permalink / raw)
  To: Shung-Hsi Yu
  Cc: Alexei Starovoitov, Connor O'Brien, Michal Suchánek,
	bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov



On 2/15/22 11:38 AM, Shung-Hsi Yu wrote:
> On Fri, Feb 11, 2022 at 10:36:28PM -0800, Yonghong Song wrote:
>> On 2/11/22 9:40 PM, Shung-Hsi Yu wrote:
>>> On Thu, Feb 10, 2022 at 02:59:03PM -0800, Yonghong Song wrote:
>>>> On 2/10/22 2:34 PM, Alexei Starovoitov wrote:
>>>>> On Thu, Feb 10, 2022 at 10:17 AM Yonghong Song <yhs@fb.com> wrote:
>>>>>> On 2/10/22 2:01 AM, Michal Suchánek wrote:
>>>>>>> On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
>>>>>>>> On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> We recently run into module load failure related to split BTF on openSUSE
>>>>>>>>> Tumbleweed[1], which I believe is something that may also happen on other
>>>>>>>>> rolling distros.
>>>>>>>>>
>>>>>>>>> The error looks like the follow (though failure is not limited to ipheth)
>>>>>>>>>
>>>>>>>>>          BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
>>>>>>>>>
>>>>>>>>>          failed to validate module [ipheth] BTF: -22
>>>>>>>>>
>>>>>>>>> The error comes down to trying to load BTF of *kernel modules from a
>>>>>>>>> different build* than the runtime kernel (but the source is the same), where
>>>>>>>>> the base BTF of the two build is different.
>>>>>>>>>
>>>>>>>>> While it may be too far stretched to call this a bug, solving this might
>>>>>>>>> make BTF adoption easier. I'd natively think that we could further split
>>>>>>>>> base BTF into two part to avoid this issue, where .BTF only contain exported
>>>>>>>>> types, and the other (still residing in vmlinux) holds the unexported types.
>>>>>>>>
>>>>>>>> What is the exported types? The types used by export symbols?
>>>>>>>> This for sure will increase btf handling complexity.
>>>>>>>
>>>>>>> And it will not actually help.
>>>>>>>
>>>>>>> We have modversion ABI which checks the checksum of the symbols that the
>>>>>>> module imports and fails the load if the checksum for these symbols does
>>>>>>> not match. It's not concerned with symbols not exported, it's not
>>>>>>> concerned with symbols not used by the module. This is something that is
>>>>>>> sustainable across kernel rebuilds with minor fixes/features and what
>>>>>>> distributions watch for.
>>>>>>>
>>>>>>> Now with BTF the situation is vastly different. There are at least three
>>>>>>> bugs:
>>>>>>>
>>>>>>>      - The BTF check is global for all symbols, not for the symbols the
>>>>>>>        module uses. This is not sustainable. Given the BTF is supposed to
>>>>>>>        allow linking BPF programs that were built in completely different
>>>>>>>        environment with the kernel it is completely within the scope of BTF
>>>>>>>        to solve this problem, it's just neglected.
>>>>>>>      - It is possible to load modules with no BTF but not modules with
>>>>>>>        non-matching BTF. Surely the non-matching BTF could be discarded.
>>>>>>>      - BTF is part of vermagic. This is completely pointless since modules
>>>>>>>        without BTF can be loaded on BTF kernel. Surely it would not be too
>>>>>>>        difficult to do the reverse as well. Given BTF must pass extra check
>>>>>>>        to be used having it in vermagic is just useless moise.
>>>>>>>
>>>>>>>>> Does that sound like something reasonable to work on?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ## Root case (in case anyone is interested in a verbose version)
>>>>>>>>>
>>>>>>>>> On openSUSE Tumbleweed there can be several builds of the same source. Since
>>>>>>>>> the source is the same, the binaries are simply replaced when a package with
>>>>>>>>> a larger build number is installed during upgrade.
>>>>>>>>>
>>>>>>>>> In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
>>>>>>>>> More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
>>>>>>>>> struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
>>>>>>>>> struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
>>>>>>>>> are previously missing in base BTF of 5.15.12-1.1.
>>>>>>>>
>>>>>>>> As stated in [2] below, I think we should understand why rebuild is
>>>>>>>> triggered. If the rebuild for vmlinux is triggered, why the modules cannot
>>>>>>>> be rebuild at the same time?
>>>>>>>
>>>>>>> They do get rebuilt. However, if you are running the kernel and install
>>>>>>> the update you get the new modules with the old kernel. If the install
>>>>>>> script fails to copy the kernel to your EFI partition based on the fact
>>>>>>> a kernel with the same filename is alreasy there you get the same.
>>>>>>>
>>>>>>> If you have 'stable' distribution adding new symbols is normal and it
>>>>>>> does not break module loading without BTF but it breaks BTF.
>>>>>>
>>>>>> Okay, I see. One possible solution is that if kernel module btf
>>>>>> does not match vmlinux btf, the kernel module btf will be ignored
>>>>>> with a dmesg warning but kernel module load will proceed as normal.
>>>>>> I think this might be also useful for bpf lskel kernel modules as
>>>>>> well which tries to be portable (with CO-RE) for different kernels.
>>>>>
>>>>> That sounds like #2 that Michal is proposing:
>>>>> "It is possible to load modules with no BTF but not modules with
>>>>>     non-matching BTF. Surely the non-matching BTF could be discarded."
>>>
>>> Since we're talking about matching check, I'd like bring up another issue.
>>>
>>> AFAICT with current form of BTF, checking whether BTF on kernel module
>>> matches cannot be made entirely robust without a new version of btf_header
>>> that contain info about the base BTF.
>>
>> The base BTF is always the one associated with running kernel and typically
>> the BTF is under /sys/kernel/btf/vmlinux. Did I miss
>> anything here?
>>
>>> As effective as the checks are in this case, by detecting a type name being
>>> an empty string and thus conclude it's non-matching, with some (bad) luck a
>>> non-matching BTF could pass these checks a gets loaded.
>>
>> Could you be a little bit more specific about the 'bad luck' a
>> non-matching BTF could get loaded? An example will be great.
> 
> Let me try take a jab at it. Say here's a hypothetical BTF for a kernel
> module which only type information for `struct something *`:
> 
>    [5] PTR '(anon)' type_id=4
> 
> Which is built upon the follow base BTF:
> 
>    [1] INT 'unsigned char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
>    [2] PTR '(anon)' type_id=3
>    [3] STRUCT 'list_head' size=16 vlen=2
>          'next' type_id=2 bits_offset=0
>          'prev' type_id=2 bits_offset=64
>    [4] STRUCT 'something' size=2 vlen=2
>          'locked' type_id=1 bits_offset=0
>          'pending' type_id=1 bits_offset=8
> 
> Due to the situation mentioned in the beginning of the thread, the *runtime*
> kernel have a different base BTF, in this case type IDs are offset by 1 due
> to an additional typedef entry:
> 
>    [1] TYPEDEF 'u8' type_id=1
>    [2] INT 'unsigned char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
>    [3] PTR '(anon)' type_id=3
>    [4] STRUCT 'list_head' size=16 vlen=2
>          'next' type_id=2 bits_offset=0
>          'prev' type_id=2 bits_offset=64
>    [5] STRUCT 'something' size=2 vlen=2
>          'locked' type_id=1 bits_offset=0
>          'pending' type_id=1 bits_offset=8
> 
> Then when loading the BTF on kernel module on the runtime, the kernel will
> mistakenly interprets "PTR '(anon)' type_id=4" as `struct list_head *`
> rather than `struct something *`.
> 
> Does this should possible? (at least theoretically)

Thanks for explanation. Yes, from BTF type resolution point of view,
yes it is possible.

> 
>>>>> That's probably the simplest way forward.
>>>>>
>>>>> The patch
>>>>> https://patchwork.kernel.org/project/netdevbpf/patch/20220209052141.140063-1-connoro@google.com/
>>>>> shouldn't be necessary too.
>>>>
>>>> Right the patch tried to address this issue and if we allow
>>>> non-matching BTF is ignored and then treaking DEBUG_INFO_BTF_MODULES
>>>> is not necessary.
>>>
>>> Not being able to load kernel module with non-matching BTF and the absence
>>> of robust matching check are the two reasons that lead us to the same path
>>> of disabling DEBUG_INFO_BTF_MODULES a while back.
>>>
>>> Ignoring non-matching BTF will solve the former, but not the latter, so I'd
>>> hope that the above patch get's taken (though I'm obviously biased).
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-15 17:47                 ` Yonghong Song
@ 2022-02-15 18:57                   ` Toke Høiland-Jørgensen
  2022-02-20  0:28                     ` Andrii Nakryiko
  2022-02-16  8:48                   ` David Laight
  1 sibling, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-02-15 18:57 UTC (permalink / raw)
  To: Yonghong Song, Shung-Hsi Yu
  Cc: Alexei Starovoitov, Connor O'Brien, Michal Suchánek,
	bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov

Yonghong Song <yhs@fb.com> writes:

> On 2/15/22 11:38 AM, Shung-Hsi Yu wrote:
>> On Fri, Feb 11, 2022 at 10:36:28PM -0800, Yonghong Song wrote:
>>> On 2/11/22 9:40 PM, Shung-Hsi Yu wrote:
>>>> On Thu, Feb 10, 2022 at 02:59:03PM -0800, Yonghong Song wrote:
>>>>> On 2/10/22 2:34 PM, Alexei Starovoitov wrote:
>>>>>> On Thu, Feb 10, 2022 at 10:17 AM Yonghong Song <yhs@fb.com> wrote:
>>>>>>> On 2/10/22 2:01 AM, Michal Suchánek wrote:
>>>>>>>> On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
>>>>>>>>> On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> We recently run into module load failure related to split BTF on openSUSE
>>>>>>>>>> Tumbleweed[1], which I believe is something that may also happen on other
>>>>>>>>>> rolling distros.
>>>>>>>>>>
>>>>>>>>>> The error looks like the follow (though failure is not limited to ipheth)
>>>>>>>>>>
>>>>>>>>>>          BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
>>>>>>>>>>
>>>>>>>>>>          failed to validate module [ipheth] BTF: -22
>>>>>>>>>>
>>>>>>>>>> The error comes down to trying to load BTF of *kernel modules from a
>>>>>>>>>> different build* than the runtime kernel (but the source is the same), where
>>>>>>>>>> the base BTF of the two build is different.
>>>>>>>>>>
>>>>>>>>>> While it may be too far stretched to call this a bug, solving this might
>>>>>>>>>> make BTF adoption easier. I'd natively think that we could further split
>>>>>>>>>> base BTF into two part to avoid this issue, where .BTF only contain exported
>>>>>>>>>> types, and the other (still residing in vmlinux) holds the unexported types.
>>>>>>>>>
>>>>>>>>> What is the exported types? The types used by export symbols?
>>>>>>>>> This for sure will increase btf handling complexity.
>>>>>>>>
>>>>>>>> And it will not actually help.
>>>>>>>>
>>>>>>>> We have modversion ABI which checks the checksum of the symbols that the
>>>>>>>> module imports and fails the load if the checksum for these symbols does
>>>>>>>> not match. It's not concerned with symbols not exported, it's not
>>>>>>>> concerned with symbols not used by the module. This is something that is
>>>>>>>> sustainable across kernel rebuilds with minor fixes/features and what
>>>>>>>> distributions watch for.
>>>>>>>>
>>>>>>>> Now with BTF the situation is vastly different. There are at least three
>>>>>>>> bugs:
>>>>>>>>
>>>>>>>>      - The BTF check is global for all symbols, not for the symbols the
>>>>>>>>        module uses. This is not sustainable. Given the BTF is supposed to
>>>>>>>>        allow linking BPF programs that were built in completely different
>>>>>>>>        environment with the kernel it is completely within the scope of BTF
>>>>>>>>        to solve this problem, it's just neglected.
>>>>>>>>      - It is possible to load modules with no BTF but not modules with
>>>>>>>>        non-matching BTF. Surely the non-matching BTF could be discarded.
>>>>>>>>      - BTF is part of vermagic. This is completely pointless since modules
>>>>>>>>        without BTF can be loaded on BTF kernel. Surely it would not be too
>>>>>>>>        difficult to do the reverse as well. Given BTF must pass extra check
>>>>>>>>        to be used having it in vermagic is just useless moise.
>>>>>>>>
>>>>>>>>>> Does that sound like something reasonable to work on?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ## Root case (in case anyone is interested in a verbose version)
>>>>>>>>>>
>>>>>>>>>> On openSUSE Tumbleweed there can be several builds of the same source. Since
>>>>>>>>>> the source is the same, the binaries are simply replaced when a package with
>>>>>>>>>> a larger build number is installed during upgrade.
>>>>>>>>>>
>>>>>>>>>> In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
>>>>>>>>>> More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
>>>>>>>>>> struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
>>>>>>>>>> struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
>>>>>>>>>> are previously missing in base BTF of 5.15.12-1.1.
>>>>>>>>>
>>>>>>>>> As stated in [2] below, I think we should understand why rebuild is
>>>>>>>>> triggered. If the rebuild for vmlinux is triggered, why the modules cannot
>>>>>>>>> be rebuild at the same time?
>>>>>>>>
>>>>>>>> They do get rebuilt. However, if you are running the kernel and install
>>>>>>>> the update you get the new modules with the old kernel. If the install
>>>>>>>> script fails to copy the kernel to your EFI partition based on the fact
>>>>>>>> a kernel with the same filename is alreasy there you get the same.
>>>>>>>>
>>>>>>>> If you have 'stable' distribution adding new symbols is normal and it
>>>>>>>> does not break module loading without BTF but it breaks BTF.
>>>>>>>
>>>>>>> Okay, I see. One possible solution is that if kernel module btf
>>>>>>> does not match vmlinux btf, the kernel module btf will be ignored
>>>>>>> with a dmesg warning but kernel module load will proceed as normal.
>>>>>>> I think this might be also useful for bpf lskel kernel modules as
>>>>>>> well which tries to be portable (with CO-RE) for different kernels.
>>>>>>
>>>>>> That sounds like #2 that Michal is proposing:
>>>>>> "It is possible to load modules with no BTF but not modules with
>>>>>>     non-matching BTF. Surely the non-matching BTF could be discarded."
>>>>
>>>> Since we're talking about matching check, I'd like bring up another issue.
>>>>
>>>> AFAICT with current form of BTF, checking whether BTF on kernel module
>>>> matches cannot be made entirely robust without a new version of btf_header
>>>> that contain info about the base BTF.
>>>
>>> The base BTF is always the one associated with running kernel and typically
>>> the BTF is under /sys/kernel/btf/vmlinux. Did I miss
>>> anything here?
>>>
>>>> As effective as the checks are in this case, by detecting a type name being
>>>> an empty string and thus conclude it's non-matching, with some (bad) luck a
>>>> non-matching BTF could pass these checks a gets loaded.
>>>
>>> Could you be a little bit more specific about the 'bad luck' a
>>> non-matching BTF could get loaded? An example will be great.
>> 
>> Let me try take a jab at it. Say here's a hypothetical BTF for a kernel
>> module which only type information for `struct something *`:
>> 
>>    [5] PTR '(anon)' type_id=4
>> 
>> Which is built upon the follow base BTF:
>> 
>>    [1] INT 'unsigned char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
>>    [2] PTR '(anon)' type_id=3
>>    [3] STRUCT 'list_head' size=16 vlen=2
>>          'next' type_id=2 bits_offset=0
>>          'prev' type_id=2 bits_offset=64
>>    [4] STRUCT 'something' size=2 vlen=2
>>          'locked' type_id=1 bits_offset=0
>>          'pending' type_id=1 bits_offset=8
>> 
>> Due to the situation mentioned in the beginning of the thread, the *runtime*
>> kernel have a different base BTF, in this case type IDs are offset by 1 due
>> to an additional typedef entry:
>> 
>>    [1] TYPEDEF 'u8' type_id=1
>>    [2] INT 'unsigned char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
>>    [3] PTR '(anon)' type_id=3
>>    [4] STRUCT 'list_head' size=16 vlen=2
>>          'next' type_id=2 bits_offset=0
>>          'prev' type_id=2 bits_offset=64
>>    [5] STRUCT 'something' size=2 vlen=2
>>          'locked' type_id=1 bits_offset=0
>>          'pending' type_id=1 bits_offset=8
>> 
>> Then when loading the BTF on kernel module on the runtime, the kernel will
>> mistakenly interprets "PTR '(anon)' type_id=4" as `struct list_head *`
>> rather than `struct something *`.
>> 
>> Does this should possible? (at least theoretically)
>
> Thanks for explanation. Yes, from BTF type resolution point of view,
> yes it is possible.

Could we add a marker or something to prevent this from happening?
Something like putting the hash of the entire BTF structure into the
header and referring to that from the "child"; or really any other way
of detecting that the combined BTF you're constructing is going to be
wrong?

-Toke


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-12  6:36             ` Yonghong Song
@ 2022-02-15 19:38               ` Shung-Hsi Yu
  2022-02-15 17:47                 ` Yonghong Song
  2022-03-02 17:46                 ` Michal Suchánek
  0 siblings, 2 replies; 22+ messages in thread
From: Shung-Hsi Yu @ 2022-02-15 19:38 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Connor O'Brien, Michal Suchánek,
	bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov

On Fri, Feb 11, 2022 at 10:36:28PM -0800, Yonghong Song wrote:
> On 2/11/22 9:40 PM, Shung-Hsi Yu wrote:
> > On Thu, Feb 10, 2022 at 02:59:03PM -0800, Yonghong Song wrote:
> > > On 2/10/22 2:34 PM, Alexei Starovoitov wrote:
> > > > On Thu, Feb 10, 2022 at 10:17 AM Yonghong Song <yhs@fb.com> wrote:
> > > > > On 2/10/22 2:01 AM, Michal Suchánek wrote:
> > > > > > On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
> > > > > > > On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > We recently run into module load failure related to split BTF on openSUSE
> > > > > > > > Tumbleweed[1], which I believe is something that may also happen on other
> > > > > > > > rolling distros.
> > > > > > > > 
> > > > > > > > The error looks like the follow (though failure is not limited to ipheth)
> > > > > > > > 
> > > > > > > >         BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
> > > > > > > > 
> > > > > > > >         failed to validate module [ipheth] BTF: -22
> > > > > > > > 
> > > > > > > > The error comes down to trying to load BTF of *kernel modules from a
> > > > > > > > different build* than the runtime kernel (but the source is the same), where
> > > > > > > > the base BTF of the two build is different.
> > > > > > > > 
> > > > > > > > While it may be too far stretched to call this a bug, solving this might
> > > > > > > > make BTF adoption easier. I'd natively think that we could further split
> > > > > > > > base BTF into two part to avoid this issue, where .BTF only contain exported
> > > > > > > > types, and the other (still residing in vmlinux) holds the unexported types.
> > > > > > > 
> > > > > > > What is the exported types? The types used by export symbols?
> > > > > > > This for sure will increase btf handling complexity.
> > > > > > 
> > > > > > And it will not actually help.
> > > > > > 
> > > > > > We have modversion ABI which checks the checksum of the symbols that the
> > > > > > module imports and fails the load if the checksum for these symbols does
> > > > > > not match. It's not concerned with symbols not exported, it's not
> > > > > > concerned with symbols not used by the module. This is something that is
> > > > > > sustainable across kernel rebuilds with minor fixes/features and what
> > > > > > distributions watch for.
> > > > > > 
> > > > > > Now with BTF the situation is vastly different. There are at least three
> > > > > > bugs:
> > > > > > 
> > > > > >     - The BTF check is global for all symbols, not for the symbols the
> > > > > >       module uses. This is not sustainable. Given the BTF is supposed to
> > > > > >       allow linking BPF programs that were built in completely different
> > > > > >       environment with the kernel it is completely within the scope of BTF
> > > > > >       to solve this problem, it's just neglected.
> > > > > >     - It is possible to load modules with no BTF but not modules with
> > > > > >       non-matching BTF. Surely the non-matching BTF could be discarded.
> > > > > >     - BTF is part of vermagic. This is completely pointless since modules
> > > > > >       without BTF can be loaded on BTF kernel. Surely it would not be too
> > > > > >       difficult to do the reverse as well. Given BTF must pass extra check
> > > > > >       to be used having it in vermagic is just useless moise.
> > > > > > 
> > > > > > > > Does that sound like something reasonable to work on?
> > > > > > > > 
> > > > > > > > 
> > > > > > > > ## Root case (in case anyone is interested in a verbose version)
> > > > > > > > 
> > > > > > > > On openSUSE Tumbleweed there can be several builds of the same source. Since
> > > > > > > > the source is the same, the binaries are simply replaced when a package with
> > > > > > > > a larger build number is installed during upgrade.
> > > > > > > > 
> > > > > > > > In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
> > > > > > > > More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
> > > > > > > > struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
> > > > > > > > struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
> > > > > > > > are previously missing in base BTF of 5.15.12-1.1.
> > > > > > > 
> > > > > > > As stated in [2] below, I think we should understand why rebuild is
> > > > > > > triggered. If the rebuild for vmlinux is triggered, why the modules cannot
> > > > > > > be rebuild at the same time?
> > > > > > 
> > > > > > They do get rebuilt. However, if you are running the kernel and install
> > > > > > the update you get the new modules with the old kernel. If the install
> > > > > > script fails to copy the kernel to your EFI partition based on the fact
> > > > > > a kernel with the same filename is alreasy there you get the same.
> > > > > > 
> > > > > > If you have 'stable' distribution adding new symbols is normal and it
> > > > > > does not break module loading without BTF but it breaks BTF.
> > > > > 
> > > > > Okay, I see. One possible solution is that if kernel module btf
> > > > > does not match vmlinux btf, the kernel module btf will be ignored
> > > > > with a dmesg warning but kernel module load will proceed as normal.
> > > > > I think this might be also useful for bpf lskel kernel modules as
> > > > > well which tries to be portable (with CO-RE) for different kernels.
> > > > 
> > > > That sounds like #2 that Michal is proposing:
> > > > "It is possible to load modules with no BTF but not modules with
> > > >    non-matching BTF. Surely the non-matching BTF could be discarded."
> > 
> > Since we're talking about matching check, I'd like bring up another issue.
> > 
> > AFAICT with current form of BTF, checking whether BTF on kernel module
> > matches cannot be made entirely robust without a new version of btf_header
> > that contain info about the base BTF.
> 
> The base BTF is always the one associated with running kernel and typically
> the BTF is under /sys/kernel/btf/vmlinux. Did I miss
> anything here?
> 
> > As effective as the checks are in this case, by detecting a type name being
> > an empty string and thus conclude it's non-matching, with some (bad) luck a
> > non-matching BTF could pass these checks a gets loaded.
> 
> Could you be a little bit more specific about the 'bad luck' a
> non-matching BTF could get loaded? An example will be great.

Let me try take a jab at it. Say here's a hypothetical BTF for a kernel
module which only type information for `struct something *`:

  [5] PTR '(anon)' type_id=4

Which is built upon the follow base BTF:

  [1] INT 'unsigned char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
  [2] PTR '(anon)' type_id=3
  [3] STRUCT 'list_head' size=16 vlen=2
        'next' type_id=2 bits_offset=0
        'prev' type_id=2 bits_offset=64
  [4] STRUCT 'something' size=2 vlen=2
        'locked' type_id=1 bits_offset=0
        'pending' type_id=1 bits_offset=8

Due to the situation mentioned in the beginning of the thread, the *runtime*
kernel have a different base BTF, in this case type IDs are offset by 1 due
to an additional typedef entry:

  [1] TYPEDEF 'u8' type_id=1
  [2] INT 'unsigned char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
  [3] PTR '(anon)' type_id=3
  [4] STRUCT 'list_head' size=16 vlen=2
        'next' type_id=2 bits_offset=0
        'prev' type_id=2 bits_offset=64
  [5] STRUCT 'something' size=2 vlen=2
        'locked' type_id=1 bits_offset=0
        'pending' type_id=1 bits_offset=8

Then when loading the BTF on kernel module on the runtime, the kernel will
mistakenly interprets "PTR '(anon)' type_id=4" as `struct list_head *`
rather than `struct something *`.

Does this should possible? (at least theoretically)

> > > > That's probably the simplest way forward.
> > > > 
> > > > The patch
> > > > https://patchwork.kernel.org/project/netdevbpf/patch/20220209052141.140063-1-connoro@google.com/
> > > > shouldn't be necessary too.
> > > 
> > > Right the patch tried to address this issue and if we allow
> > > non-matching BTF is ignored and then treaking DEBUG_INFO_BTF_MODULES
> > > is not necessary.
> > 
> > Not being able to load kernel module with non-matching BTF and the absence
> > of robust matching check are the two reasons that lead us to the same path
> > of disabling DEBUG_INFO_BTF_MODULES a while back.
> > 
> > Ignoring non-matching BTF will solve the former, but not the latter, so I'd
> > hope that the above patch get's taken (though I'm obviously biased).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: BTF compatibility issue across builds
  2022-02-15 17:47                 ` Yonghong Song
  2022-02-15 18:57                   ` Toke Høiland-Jørgensen
@ 2022-02-16  8:48                   ` David Laight
  1 sibling, 0 replies; 22+ messages in thread
From: David Laight @ 2022-02-16  8:48 UTC (permalink / raw)
  To: 'Yonghong Song', Shung-Hsi Yu
  Cc: Alexei Starovoitov, Connor O'Brien, Michal Suchánek,
	bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov

From: Yonghong Song
> Sent: 15 February 2022 17:47
...
> > Let me try take a jab at it. Say here's a hypothetical BTF for a kernel
> > module which only type information for `struct something *`:
> >
> >    [5] PTR '(anon)' type_id=4
> >
> > Which is built upon the follow base BTF:
> >
> >    [1] INT 'unsigned char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
> >    [2] PTR '(anon)' type_id=3
> >    [3] STRUCT 'list_head' size=16 vlen=2
> >          'next' type_id=2 bits_offset=0
> >          'prev' type_id=2 bits_offset=64
> >    [4] STRUCT 'something' size=2 vlen=2
> >          'locked' type_id=1 bits_offset=0
> >          'pending' type_id=1 bits_offset=8
> >
> > Due to the situation mentioned in the beginning of the thread, the *runtime*
> > kernel have a different base BTF, in this case type IDs are offset by 1 due
> > to an additional typedef entry:
> >
> >    [1] TYPEDEF 'u8' type_id=1
> >    [2] INT 'unsigned char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
> >    [3] PTR '(anon)' type_id=3
> >    [4] STRUCT 'list_head' size=16 vlen=2
> >          'next' type_id=2 bits_offset=0
> >          'prev' type_id=2 bits_offset=64
> >    [5] STRUCT 'something' size=2 vlen=2
> >          'locked' type_id=1 bits_offset=0
> >          'pending' type_id=1 bits_offset=8
> >
> > Then when loading the BTF on kernel module on the runtime, the kernel will
> > mistakenly interprets "PTR '(anon)' type_id=4" as `struct list_head *`
> > rather than `struct something *`.
> >
> > Does this should possible? (at least theoretically)
> 
> Thanks for explanation. Yes, from BTF type resolution point of view,
> yes it is possible.

This looks so much like the old 'shared library function number'
ordinals from pre-SYSV and early windows shared libraries.
There is a good reason why it isn't done that way any more.

Has someone re-invented the square wheel??

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-15 18:57                   ` Toke Høiland-Jørgensen
@ 2022-02-20  0:28                     ` Andrii Nakryiko
  0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2022-02-20  0:28 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Yonghong Song, Shung-Hsi Yu, Alexei Starovoitov,
	Connor O'Brien, Michal Suchánek, bpf,
	Network Development, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov

On Tue, Feb 15, 2022 at 10:58 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Yonghong Song <yhs@fb.com> writes:
>
> > On 2/15/22 11:38 AM, Shung-Hsi Yu wrote:
> >> On Fri, Feb 11, 2022 at 10:36:28PM -0800, Yonghong Song wrote:
> >>> On 2/11/22 9:40 PM, Shung-Hsi Yu wrote:
> >>>> On Thu, Feb 10, 2022 at 02:59:03PM -0800, Yonghong Song wrote:
> >>>>> On 2/10/22 2:34 PM, Alexei Starovoitov wrote:
> >>>>>> On Thu, Feb 10, 2022 at 10:17 AM Yonghong Song <yhs@fb.com> wrote:
> >>>>>>> On 2/10/22 2:01 AM, Michal Suchánek wrote:
> >>>>>>>> On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
> >>>>>>>>> On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> We recently run into module load failure related to split BTF on openSUSE
> >>>>>>>>>> Tumbleweed[1], which I believe is something that may also happen on other
> >>>>>>>>>> rolling distros.
> >>>>>>>>>>
> >>>>>>>>>> The error looks like the follow (though failure is not limited to ipheth)
> >>>>>>>>>>
> >>>>>>>>>>          BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
> >>>>>>>>>>
> >>>>>>>>>>          failed to validate module [ipheth] BTF: -22
> >>>>>>>>>>
> >>>>>>>>>> The error comes down to trying to load BTF of *kernel modules from a
> >>>>>>>>>> different build* than the runtime kernel (but the source is the same), where
> >>>>>>>>>> the base BTF of the two build is different.
> >>>>>>>>>>
> >>>>>>>>>> While it may be too far stretched to call this a bug, solving this might
> >>>>>>>>>> make BTF adoption easier. I'd natively think that we could further split
> >>>>>>>>>> base BTF into two part to avoid this issue, where .BTF only contain exported
> >>>>>>>>>> types, and the other (still residing in vmlinux) holds the unexported types.
> >>>>>>>>>
> >>>>>>>>> What is the exported types? The types used by export symbols?
> >>>>>>>>> This for sure will increase btf handling complexity.
> >>>>>>>>
> >>>>>>>> And it will not actually help.
> >>>>>>>>
> >>>>>>>> We have modversion ABI which checks the checksum of the symbols that the
> >>>>>>>> module imports and fails the load if the checksum for these symbols does
> >>>>>>>> not match. It's not concerned with symbols not exported, it's not
> >>>>>>>> concerned with symbols not used by the module. This is something that is
> >>>>>>>> sustainable across kernel rebuilds with minor fixes/features and what
> >>>>>>>> distributions watch for.
> >>>>>>>>
> >>>>>>>> Now with BTF the situation is vastly different. There are at least three
> >>>>>>>> bugs:
> >>>>>>>>
> >>>>>>>>      - The BTF check is global for all symbols, not for the symbols the
> >>>>>>>>        module uses. This is not sustainable. Given the BTF is supposed to
> >>>>>>>>        allow linking BPF programs that were built in completely different
> >>>>>>>>        environment with the kernel it is completely within the scope of BTF
> >>>>>>>>        to solve this problem, it's just neglected.
> >>>>>>>>      - It is possible to load modules with no BTF but not modules with
> >>>>>>>>        non-matching BTF. Surely the non-matching BTF could be discarded.
> >>>>>>>>      - BTF is part of vermagic. This is completely pointless since modules
> >>>>>>>>        without BTF can be loaded on BTF kernel. Surely it would not be too
> >>>>>>>>        difficult to do the reverse as well. Given BTF must pass extra check
> >>>>>>>>        to be used having it in vermagic is just useless moise.
> >>>>>>>>
> >>>>>>>>>> Does that sound like something reasonable to work on?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> ## Root case (in case anyone is interested in a verbose version)
> >>>>>>>>>>
> >>>>>>>>>> On openSUSE Tumbleweed there can be several builds of the same source. Since
> >>>>>>>>>> the source is the same, the binaries are simply replaced when a package with
> >>>>>>>>>> a larger build number is installed during upgrade.
> >>>>>>>>>>
> >>>>>>>>>> In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
> >>>>>>>>>> More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
> >>>>>>>>>> struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
> >>>>>>>>>> struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
> >>>>>>>>>> are previously missing in base BTF of 5.15.12-1.1.
> >>>>>>>>>
> >>>>>>>>> As stated in [2] below, I think we should understand why rebuild is
> >>>>>>>>> triggered. If the rebuild for vmlinux is triggered, why the modules cannot
> >>>>>>>>> be rebuild at the same time?
> >>>>>>>>
> >>>>>>>> They do get rebuilt. However, if you are running the kernel and install
> >>>>>>>> the update you get the new modules with the old kernel. If the install
> >>>>>>>> script fails to copy the kernel to your EFI partition based on the fact
> >>>>>>>> a kernel with the same filename is alreasy there you get the same.
> >>>>>>>>
> >>>>>>>> If you have 'stable' distribution adding new symbols is normal and it
> >>>>>>>> does not break module loading without BTF but it breaks BTF.
> >>>>>>>
> >>>>>>> Okay, I see. One possible solution is that if kernel module btf
> >>>>>>> does not match vmlinux btf, the kernel module btf will be ignored
> >>>>>>> with a dmesg warning but kernel module load will proceed as normal.
> >>>>>>> I think this might be also useful for bpf lskel kernel modules as
> >>>>>>> well which tries to be portable (with CO-RE) for different kernels.
> >>>>>>
> >>>>>> That sounds like #2 that Michal is proposing:
> >>>>>> "It is possible to load modules with no BTF but not modules with
> >>>>>>     non-matching BTF. Surely the non-matching BTF could be discarded."
> >>>>
> >>>> Since we're talking about matching check, I'd like bring up another issue.
> >>>>
> >>>> AFAICT with current form of BTF, checking whether BTF on kernel module
> >>>> matches cannot be made entirely robust without a new version of btf_header
> >>>> that contain info about the base BTF.
> >>>
> >>> The base BTF is always the one associated with running kernel and typically
> >>> the BTF is under /sys/kernel/btf/vmlinux. Did I miss
> >>> anything here?
> >>>
> >>>> As effective as the checks are in this case, by detecting a type name being
> >>>> an empty string and thus conclude it's non-matching, with some (bad) luck a
> >>>> non-matching BTF could pass these checks a gets loaded.
> >>>
> >>> Could you be a little bit more specific about the 'bad luck' a
> >>> non-matching BTF could get loaded? An example will be great.
> >>
> >> Let me try take a jab at it. Say here's a hypothetical BTF for a kernel
> >> module which only type information for `struct something *`:
> >>
> >>    [5] PTR '(anon)' type_id=4
> >>
> >> Which is built upon the follow base BTF:
> >>
> >>    [1] INT 'unsigned char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
> >>    [2] PTR '(anon)' type_id=3
> >>    [3] STRUCT 'list_head' size=16 vlen=2
> >>          'next' type_id=2 bits_offset=0
> >>          'prev' type_id=2 bits_offset=64
> >>    [4] STRUCT 'something' size=2 vlen=2
> >>          'locked' type_id=1 bits_offset=0
> >>          'pending' type_id=1 bits_offset=8
> >>
> >> Due to the situation mentioned in the beginning of the thread, the *runtime*
> >> kernel have a different base BTF, in this case type IDs are offset by 1 due
> >> to an additional typedef entry:
> >>
> >>    [1] TYPEDEF 'u8' type_id=1
> >>    [2] INT 'unsigned char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
> >>    [3] PTR '(anon)' type_id=3
> >>    [4] STRUCT 'list_head' size=16 vlen=2
> >>          'next' type_id=2 bits_offset=0
> >>          'prev' type_id=2 bits_offset=64
> >>    [5] STRUCT 'something' size=2 vlen=2
> >>          'locked' type_id=1 bits_offset=0
> >>          'pending' type_id=1 bits_offset=8
> >>
> >> Then when loading the BTF on kernel module on the runtime, the kernel will
> >> mistakenly interprets "PTR '(anon)' type_id=4" as `struct list_head *`
> >> rather than `struct something *`.
> >>
> >> Does this should possible? (at least theoretically)
> >
> > Thanks for explanation. Yes, from BTF type resolution point of view,
> > yes it is possible.
>
> Could we add a marker or something to prevent this from happening?
> Something like putting the hash of the entire BTF structure into the
> header and referring to that from the "child"; or really any other way
> of detecting that the combined BTF you're constructing is going to be
> wrong?
>

Extending BTF format (including its header) is quite disrupting to the
entire ecosystem around BTF. Given split BTF is only used for kernel
modules, I think it's a better approach to add checksum to module's
ELF itself (as an extra BTF-related section, .BTF.base_checksum or
whatever) and check it during kernel module loading time.

As for having full BTF. You can do that, and it will work for generic
CO-RE approach, but it might not work for kfunc and other things that
expect that, say, struct task_struct has one specific ID that
corresponds to task_struct BTF ID in vmlinux BTF. If kernel module is
loaded against vmlinux BTF that has just slightly different definition
of task_struct (e.g., one field was added at the end), dedup algorithm
will detect those differences and will preserve module's definition of
task_struct as a separate type, which won't be recognized by kernel as
task_struct.

But again, given it's all module-specific, we can utilize custom
.BTF.* sections to record any such information without disrupting any
other user of BTF, including all the BPF applications out there.

> -Toke
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-02-15 19:38               ` Shung-Hsi Yu
  2022-02-15 17:47                 ` Yonghong Song
@ 2022-03-02 17:46                 ` Michal Suchánek
  2022-03-03  4:27                   ` Shung-Hsi Yu
  1 sibling, 1 reply; 22+ messages in thread
From: Michal Suchánek @ 2022-03-02 17:46 UTC (permalink / raw)
  To: Shung-Hsi Yu
  Cc: Yonghong Song, Alexei Starovoitov, Connor O'Brien, bpf,
	Network Development, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov

On Wed, Feb 16, 2022 at 03:38:31AM +0800, Shung-Hsi Yu wrote:
> On Fri, Feb 11, 2022 at 10:36:28PM -0800, Yonghong Song wrote:
> > On 2/11/22 9:40 PM, Shung-Hsi Yu wrote:
> > > On Thu, Feb 10, 2022 at 02:59:03PM -0800, Yonghong Song wrote:
> > > > On 2/10/22 2:34 PM, Alexei Starovoitov wrote:
> > > > > On Thu, Feb 10, 2022 at 10:17 AM Yonghong Song <yhs@fb.com> wrote:
> > > > > > On 2/10/22 2:01 AM, Michal Suchánek wrote:
> > > > > > > On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
> > > > > > > > On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > We recently run into module load failure related to split BTF on openSUSE
> > > > > > > > > Tumbleweed[1], which I believe is something that may also happen on other
> > > > > > > > > rolling distros.
> > > > > > > > > 
> > > > > > > > > The error looks like the follow (though failure is not limited to ipheth)
> > > > > > > > > 
> > > > > > > > >         BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
> > > > > > > > > 
> > > > > > > > >         failed to validate module [ipheth] BTF: -22
> > > > > > > > > 
> > > > > > > > > The error comes down to trying to load BTF of *kernel modules from a
> > > > > > > > > different build* than the runtime kernel (but the source is the same), where
> > > > > > > > > the base BTF of the two build is different.
> > > > > > > > > 
> > > > > > > > > While it may be too far stretched to call this a bug, solving this might
> > > > > > > > > make BTF adoption easier. I'd natively think that we could further split
> > > > > > > > > base BTF into two part to avoid this issue, where .BTF only contain exported
> > > > > > > > > types, and the other (still residing in vmlinux) holds the unexported types.
> > > > > > > > 
> > > > > > > > What is the exported types? The types used by export symbols?
> > > > > > > > This for sure will increase btf handling complexity.
> > > > > > > 
> > > > > > > And it will not actually help.
> > > > > > > 
> > > > > > > We have modversion ABI which checks the checksum of the symbols that the
> > > > > > > module imports and fails the load if the checksum for these symbols does
> > > > > > > not match. It's not concerned with symbols not exported, it's not
> > > > > > > concerned with symbols not used by the module. This is something that is
> > > > > > > sustainable across kernel rebuilds with minor fixes/features and what
> > > > > > > distributions watch for.
> > > > > > > 
> > > > > > > Now with BTF the situation is vastly different. There are at least three
> > > > > > > bugs:
> > > > > > > 
> > > > > > >     - The BTF check is global for all symbols, not for the symbols the
> > > > > > >       module uses. This is not sustainable. Given the BTF is supposed to
> > > > > > >       allow linking BPF programs that were built in completely different
> > > > > > >       environment with the kernel it is completely within the scope of BTF
> > > > > > >       to solve this problem, it's just neglected.
> > > > > > >     - It is possible to load modules with no BTF but not modules with
> > > > > > >       non-matching BTF. Surely the non-matching BTF could be discarded.
> > > > > > >     - BTF is part of vermagic. This is completely pointless since modules
> > > > > > >       without BTF can be loaded on BTF kernel. Surely it would not be too
> > > > > > >       difficult to do the reverse as well. Given BTF must pass extra check
> > > > > > >       to be used having it in vermagic is just useless moise.
> > > > > > > 
> > > > > > > > > Does that sound like something reasonable to work on?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > ## Root case (in case anyone is interested in a verbose version)
> > > > > > > > > 
> > > > > > > > > On openSUSE Tumbleweed there can be several builds of the same source. Since
> > > > > > > > > the source is the same, the binaries are simply replaced when a package with
> > > > > > > > > a larger build number is installed during upgrade.
> > > > > > > > > 
> > > > > > > > > In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
> > > > > > > > > More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
> > > > > > > > > struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
> > > > > > > > > struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
> > > > > > > > > are previously missing in base BTF of 5.15.12-1.1.
> > > > > > > > 
> > > > > > > > As stated in [2] below, I think we should understand why rebuild is
> > > > > > > > triggered. If the rebuild for vmlinux is triggered, why the modules cannot
> > > > > > > > be rebuild at the same time?
> > > > > > > 
> > > > > > > They do get rebuilt. However, if you are running the kernel and install
> > > > > > > the update you get the new modules with the old kernel. If the install
> > > > > > > script fails to copy the kernel to your EFI partition based on the fact
> > > > > > > a kernel with the same filename is alreasy there you get the same.
> > > > > > > 
> > > > > > > If you have 'stable' distribution adding new symbols is normal and it
> > > > > > > does not break module loading without BTF but it breaks BTF.
> > > > > > 
> > > > > > Okay, I see. One possible solution is that if kernel module btf
> > > > > > does not match vmlinux btf, the kernel module btf will be ignored
> > > > > > with a dmesg warning but kernel module load will proceed as normal.
> > > > > > I think this might be also useful for bpf lskel kernel modules as
> > > > > > well which tries to be portable (with CO-RE) for different kernels.
> > > > > 
> > > > > That sounds like #2 that Michal is proposing:
> > > > > "It is possible to load modules with no BTF but not modules with
> > > > >    non-matching BTF. Surely the non-matching BTF could be discarded."
> > > 
> > > Since we're talking about matching check, I'd like bring up another issue.
> > > 
> > > AFAICT with current form of BTF, checking whether BTF on kernel module
> > > matches cannot be made entirely robust without a new version of btf_header
> > > that contain info about the base BTF.
> > 
> > The base BTF is always the one associated with running kernel and typically
> > the BTF is under /sys/kernel/btf/vmlinux. Did I miss
> > anything here?
> > 
> > > As effective as the checks are in this case, by detecting a type name being
> > > an empty string and thus conclude it's non-matching, with some (bad) luck a
> > > non-matching BTF could pass these checks a gets loaded.
> > 
> > Could you be a little bit more specific about the 'bad luck' a
> > non-matching BTF could get loaded? An example will be great.
> 
> Let me try take a jab at it. Say here's a hypothetical BTF for a kernel
> module which only type information for `struct something *`:
> 
>   [5] PTR '(anon)' type_id=4
> 
> Which is built upon the follow base BTF:
> 
>   [1] INT 'unsigned char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
>   [2] PTR '(anon)' type_id=3
>   [3] STRUCT 'list_head' size=16 vlen=2
>         'next' type_id=2 bits_offset=0
>         'prev' type_id=2 bits_offset=64
>   [4] STRUCT 'something' size=2 vlen=2
>         'locked' type_id=1 bits_offset=0
>         'pending' type_id=1 bits_offset=8
> 
> Due to the situation mentioned in the beginning of the thread, the *runtime*
> kernel have a different base BTF, in this case type IDs are offset by 1 due
> to an additional typedef entry:
> 
>   [1] TYPEDEF 'u8' type_id=1
>   [2] INT 'unsigned char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
>   [3] PTR '(anon)' type_id=3
>   [4] STRUCT 'list_head' size=16 vlen=2
>         'next' type_id=2 bits_offset=0
>         'prev' type_id=2 bits_offset=64
>   [5] STRUCT 'something' size=2 vlen=2
>         'locked' type_id=1 bits_offset=0
>         'pending' type_id=1 bits_offset=8
> 
> Then when loading the BTF on kernel module on the runtime, the kernel will
> mistakenly interprets "PTR '(anon)' type_id=4" as `struct list_head *`
> rather than `struct something *`.
> 
> Does this should possible? (at least theoretically)

At least not this way because you have different number of entries which
was the original issue.

So is this possible at least theoretically, and how likely will that
happen in practice?

What else other than the number of entries has to match?

Thanks

Michal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BTF compatibility issue across builds
  2022-03-02 17:46                 ` Michal Suchánek
@ 2022-03-03  4:27                   ` Shung-Hsi Yu
  0 siblings, 0 replies; 22+ messages in thread
From: Shung-Hsi Yu @ 2022-03-03  4:27 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Yonghong Song, Alexei Starovoitov, Connor O'Brien, bpf,
	Network Development, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov

On Wed, Mar 02, 2022 at 06:46:45PM +0100, Michal Suchánek wrote:
> On Wed, Feb 16, 2022 at 03:38:31AM +0800, Shung-Hsi Yu wrote:
> > On Fri, Feb 11, 2022 at 10:36:28PM -0800, Yonghong Song wrote:
> > > On 2/11/22 9:40 PM, Shung-Hsi Yu wrote:
> > > > On Thu, Feb 10, 2022 at 02:59:03PM -0800, Yonghong Song wrote:
> > > > > On 2/10/22 2:34 PM, Alexei Starovoitov wrote:
> > > > > > On Thu, Feb 10, 2022 at 10:17 AM Yonghong Song <yhs@fb.com> wrote:
> > > > > > > On 2/10/22 2:01 AM, Michal Suchánek wrote:
> > > > > > > > On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
> > > > > > > > > On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > We recently run into module load failure related to split BTF on openSUSE
> > > > > > > > > > Tumbleweed[1], which I believe is something that may also happen on other
> > > > > > > > > > rolling distros.
> > > > > > > > > > 
> > > > > > > > > > The error looks like the follow (though failure is not limited to ipheth)
> > > > > > > > > > 
> > > > > > > > > >         BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
> > > > > > > > > > 
> > > > > > > > > >         failed to validate module [ipheth] BTF: -22
> > > > > > > > > > 
> > > > > > > > > > The error comes down to trying to load BTF of *kernel modules from a
> > > > > > > > > > different build* than the runtime kernel (but the source is the same), where
> > > > > > > > > > the base BTF of the two build is different.
> > > > > > > > > > 
> > > > > > > > > > While it may be too far stretched to call this a bug, solving this might
> > > > > > > > > > make BTF adoption easier. I'd natively think that we could further split
> > > > > > > > > > base BTF into two part to avoid this issue, where .BTF only contain exported
> > > > > > > > > > types, and the other (still residing in vmlinux) holds the unexported types.
> > > > > > > > > 
> > > > > > > > > What is the exported types? The types used by export symbols?
> > > > > > > > > This for sure will increase btf handling complexity.
> > > > > > > > 
> > > > > > > > And it will not actually help.
> > > > > > > > 
> > > > > > > > We have modversion ABI which checks the checksum of the symbols that the
> > > > > > > > module imports and fails the load if the checksum for these symbols does
> > > > > > > > not match. It's not concerned with symbols not exported, it's not
> > > > > > > > concerned with symbols not used by the module. This is something that is
> > > > > > > > sustainable across kernel rebuilds with minor fixes/features and what
> > > > > > > > distributions watch for.
> > > > > > > > 
> > > > > > > > Now with BTF the situation is vastly different. There are at least three
> > > > > > > > bugs:
> > > > > > > > 
> > > > > > > >     - The BTF check is global for all symbols, not for the symbols the
> > > > > > > >       module uses. This is not sustainable. Given the BTF is supposed to
> > > > > > > >       allow linking BPF programs that were built in completely different
> > > > > > > >       environment with the kernel it is completely within the scope of BTF
> > > > > > > >       to solve this problem, it's just neglected.
> > > > > > > >     - It is possible to load modules with no BTF but not modules with
> > > > > > > >       non-matching BTF. Surely the non-matching BTF could be discarded.
> > > > > > > >     - BTF is part of vermagic. This is completely pointless since modules
> > > > > > > >       without BTF can be loaded on BTF kernel. Surely it would not be too
> > > > > > > >       difficult to do the reverse as well. Given BTF must pass extra check
> > > > > > > >       to be used having it in vermagic is just useless moise.
> > > > > > > > 
> > > > > > > > > > Does that sound like something reasonable to work on?
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > ## Root case (in case anyone is interested in a verbose version)
> > > > > > > > > > 
> > > > > > > > > > On openSUSE Tumbleweed there can be several builds of the same source. Since
> > > > > > > > > > the source is the same, the binaries are simply replaced when a package with
> > > > > > > > > > a larger build number is installed during upgrade.
> > > > > > > > > > 
> > > > > > > > > > In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
> > > > > > > > > > More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
> > > > > > > > > > struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
> > > > > > > > > > struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
> > > > > > > > > > are previously missing in base BTF of 5.15.12-1.1.
> > > > > > > > > 
> > > > > > > > > As stated in [2] below, I think we should understand why rebuild is
> > > > > > > > > triggered. If the rebuild for vmlinux is triggered, why the modules cannot
> > > > > > > > > be rebuild at the same time?
> > > > > > > > 
> > > > > > > > They do get rebuilt. However, if you are running the kernel and install
> > > > > > > > the update you get the new modules with the old kernel. If the install
> > > > > > > > script fails to copy the kernel to your EFI partition based on the fact
> > > > > > > > a kernel with the same filename is alreasy there you get the same.
> > > > > > > > 
> > > > > > > > If you have 'stable' distribution adding new symbols is normal and it
> > > > > > > > does not break module loading without BTF but it breaks BTF.
> > > > > > > 
> > > > > > > Okay, I see. One possible solution is that if kernel module btf
> > > > > > > does not match vmlinux btf, the kernel module btf will be ignored
> > > > > > > with a dmesg warning but kernel module load will proceed as normal.
> > > > > > > I think this might be also useful for bpf lskel kernel modules as
> > > > > > > well which tries to be portable (with CO-RE) for different kernels.
> > > > > > 
> > > > > > That sounds like #2 that Michal is proposing:
> > > > > > "It is possible to load modules with no BTF but not modules with
> > > > > >    non-matching BTF. Surely the non-matching BTF could be discarded."
> > > > 
> > > > Since we're talking about matching check, I'd like bring up another issue.
> > > > 
> > > > AFAICT with current form of BTF, checking whether BTF on kernel module
> > > > matches cannot be made entirely robust without a new version of btf_header
> > > > that contain info about the base BTF.
> > > 
> > > The base BTF is always the one associated with running kernel and typically
> > > the BTF is under /sys/kernel/btf/vmlinux. Did I miss
> > > anything here?
> > > 
> > > > As effective as the checks are in this case, by detecting a type name being
> > > > an empty string and thus conclude it's non-matching, with some (bad) luck a
> > > > non-matching BTF could pass these checks a gets loaded.
> > > 
> > > Could you be a little bit more specific about the 'bad luck' a
> > > non-matching BTF could get loaded? An example will be great.
> > 
> > Let me try take a jab at it. Say here's a hypothetical BTF for a kernel
> > module which only type information for `struct something *`:
> > 
> >   [5] PTR '(anon)' type_id=4
> > 
> > Which is built upon the follow base BTF:
> > 
> >   [1] INT 'unsigned char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
> >   [2] PTR '(anon)' type_id=3
> >   [3] STRUCT 'list_head' size=16 vlen=2
> >         'next' type_id=2 bits_offset=0
> >         'prev' type_id=2 bits_offset=64
> >   [4] STRUCT 'something' size=2 vlen=2
> >         'locked' type_id=1 bits_offset=0
> >         'pending' type_id=1 bits_offset=8
> > 
> > Due to the situation mentioned in the beginning of the thread, the *runtime*
> > kernel have a different base BTF, in this case type IDs are offset by 1 due
> > to an additional typedef entry:
> > 
> >   [1] TYPEDEF 'u8' type_id=1
> >   [2] INT 'unsigned char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
> >   [3] PTR '(anon)' type_id=3
> >   [4] STRUCT 'list_head' size=16 vlen=2
> >         'next' type_id=2 bits_offset=0
> >         'prev' type_id=2 bits_offset=64
> >   [5] STRUCT 'something' size=2 vlen=2
> >         'locked' type_id=1 bits_offset=0
> >         'pending' type_id=1 bits_offset=8
> > 
> > Then when loading the BTF on kernel module on the runtime, the kernel will
> > mistakenly interprets "PTR '(anon)' type_id=4" as `struct list_head *`
> > rather than `struct something *`.
> > 
> > Does this should possible? (at least theoretically)
> 
> At least not this way because you have different number of entries which
> was the original issue.

Indeed the number of type entries differs, but type entry number of base BTF
isn't part of the check (and it can't be checked, again, because kernel
module doesn't store details of it's base BTF).

So this made-up kernel module BTF _can_ be loaded into the kernel if the
type names in string section are resolved correctly as well. It is the type
name check is that catches the original issue (i.e. the "Invalid name" error
we see), not the difference in the number of type entries. 

AFAIK type name check verifies that all types that are anonymous (e.g. PTR),
have struct btf_type.name_off of 0 and that types that can't be anonymous
(e.g. INT) has name_off that points to a valid string that isn't an empty
one.

> So is this possible at least theoretically, and how likely will that
> happen in practice?
> 
> What else other than the number of entries has to match?

To the best of my knowledge only the type name check needs to pass in this
case.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-03-03  4:27 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-27 15:10 BTF compatibility issue across builds Shung-Hsi Yu
2022-01-31 17:36 ` Yonghong Song
2022-02-10 10:01   ` Michal Suchánek
2022-02-10 18:17     ` Yonghong Song
2022-02-10 22:34       ` Alexei Starovoitov
2022-02-10 22:59         ` Yonghong Song
2022-02-12  5:40           ` Shung-Hsi Yu
2022-02-12  6:36             ` Yonghong Song
2022-02-15 19:38               ` Shung-Hsi Yu
2022-02-15 17:47                 ` Yonghong Song
2022-02-15 18:57                   ` Toke Høiland-Jørgensen
2022-02-20  0:28                     ` Andrii Nakryiko
2022-02-16  8:48                   ` David Laight
2022-03-02 17:46                 ` Michal Suchánek
2022-03-03  4:27                   ` Shung-Hsi Yu
2022-02-11  6:01     ` Andrii Nakryiko
2022-02-11 17:20       ` Toke Høiland-Jørgensen
2022-02-11 22:20         ` Andrii Nakryiko
2022-02-11 23:58           ` Toke Høiland-Jørgensen
2022-02-12  7:37             ` Shung-Hsi Yu
2022-02-13 15:40               ` Toke Høiland-Jørgensen
2022-02-14 20:19                 ` Michal Suchánek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).