All of lore.kernel.org
 help / color / mirror / Atom feed
* Crash caused by 778666df60f0 ("s390: compile relocatable kernel without -fPIE")
@ 2024-03-15 16:59 Kui-Feng Lee
  2024-03-18 20:38 ` Ilya Leoshkevich
  0 siblings, 1 reply; 4+ messages in thread
From: Kui-Feng Lee @ 2024-03-15 16:59 UTC (permalink / raw)
  To: Josh Poimboeuf, Sumanth Korikkar, Heiko Carstens
  Cc: bpf, Sumanth Korikkar, Vasily Gorbik, Ilya Leoshkevich

Hi Josh, Sumath, and Heiko,

Recently, all patches submitted to bpf have been failing the selftest
"dummy_st_ops/dummy_init_ptr_arg" on the s390x platform. Upon
investigation, it was determined that a specific patch, 778666df60f0
("s390: compile relocatable kernel without -fPIE"), is causing the 
problem. It is the first bad after bisecting. Please take a look.
Thank you!

You can reproduce it by running the command

     ./test_progs -a dummy_st_ops/dummy_init_ptr_arg

in the tools/testing/selftests/bpf directory in the Linux source tree.
This command causes a kernel crash with the following messages. It
appears to be related to attaching a trace program at the fentry of a
struct_ops operator. The same operator works fine without the trace
program. For more details about the test case, please check
dummy_st_ops.c in tools/testing/selftests/bpf/prog_tests/.

illegal operation: 0001 ilc:1 [#1] SMP
Modules linked in:
CPU: 0 PID: 82 Comm: test_progs Tainted: G           OE 
6.8.0-rc3-g778666df60f0 #34
Hardware name: QEMU 3906 QEMU (KVM/Linux)
Krnl PSW : 0704d00180000000 0000000000000002 (0x2)
            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 000003ff800a4956 0000000000000000 0000038000a73d20 
0000000082a08b58
            0000000000000000 000000007083ede2 0000000000000000 
0000000000000001
            00000000009d580e 000000008286bc00 0000037f000000d8 
000003ff800b2000
            0000000081a43c00 00000000011bfe20 000003ff800b208e 
0000038000a73c68
Krnl Code:#0000000000000000: 0000               illegal
           >0000000000000002: 0000               illegal
            0000000000000004: 0000               illegal
            0000000000000006: 0000               illegal
            0000000000000008: 0000               illegal
            000000000000000a: 0000               illegal
            000000000000000c: 0000               illegal
            000000000000000e: 0000               illegal
Call Trace:
  [<0000000000000002>] 0x2
  [<00000000009d5cde>] bpf_struct_ops_test_run+0x156/0x250
  [<000000000033145a>] __sys_bpf+0xa1a/0xd00
  [<00000000003319dc>] __s390x_sys_bpf+0x44/0x50
  [<0000000000c4382c>] __do_syscall+0x244/0x300
  [<0000000000c59a40>] system_call+0x70/0x98
INFO: lockdep is turned off.
Last Breaking-Event-Address:
  [<000003ff800a49ac>] bpf_prog_83ebf4e90b071f2b_test_1+0x5c/0x70
Kernel panic - not syncing: Fatal exception: panic_on_oops


[1] 
https://github.com/kernel-patches/bpf/actions/runs/8297891630/job/22710267571

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Crash caused by 778666df60f0 ("s390: compile relocatable kernel without -fPIE")
  2024-03-15 16:59 Crash caused by 778666df60f0 ("s390: compile relocatable kernel without -fPIE") Kui-Feng Lee
@ 2024-03-18 20:38 ` Ilya Leoshkevich
  2024-03-18 23:31   ` Alexei Starovoitov
  0 siblings, 1 reply; 4+ messages in thread
From: Ilya Leoshkevich @ 2024-03-18 20:38 UTC (permalink / raw)
  To: Kui-Feng Lee, Josh Poimboeuf, Sumanth Korikkar, Heiko Carstens
  Cc: bpf, Vasily Gorbik

On Fri, Mar 15, 2024 at 09:59:52AM -0700, Kui-Feng Lee wrote:
> Hi Josh, Sumath, and Heiko,
> 
> Recently, all patches submitted to bpf have been failing the selftest
> "dummy_st_ops/dummy_init_ptr_arg" on the s390x platform. Upon
> investigation, it was determined that a specific patch, 778666df60f0
> ("s390: compile relocatable kernel without -fPIE"), is causing the problem.
> It is the first bad after bisecting. Please take a look.
> Thank you!
> 
> You can reproduce it by running the command
> 
>     ./test_progs -a dummy_st_ops/dummy_init_ptr_arg
> 
> in the tools/testing/selftests/bpf directory in the Linux source tree.
> This command causes a kernel crash with the following messages. It
> appears to be related to attaching a trace program at the fentry of a
> struct_ops operator. The same operator works fine without the trace
> program. For more details about the test case, please check
> dummy_st_ops.c in tools/testing/selftests/bpf/prog_tests/.
> 
> illegal operation: 0001 ilc:1 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 82 Comm: test_progs Tainted: G           OE
> 6.8.0-rc3-g778666df60f0 #34
> Hardware name: QEMU 3906 QEMU (KVM/Linux)
> Krnl PSW : 0704d00180000000 0000000000000002 (0x2)
>            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
> Krnl GPRS: 000003ff800a4956 0000000000000000 0000038000a73d20
> 0000000082a08b58
>            0000000000000000 000000007083ede2 0000000000000000
> 0000000000000001
>            00000000009d580e 000000008286bc00 0000037f000000d8
> 000003ff800b2000
>            0000000081a43c00 00000000011bfe20 000003ff800b208e
> 0000038000a73c68
> Krnl Code:#0000000000000000: 0000               illegal
>           >0000000000000002: 0000               illegal
>            0000000000000004: 0000               illegal
>            0000000000000006: 0000               illegal
>            0000000000000008: 0000               illegal
>            000000000000000a: 0000               illegal
>            000000000000000c: 0000               illegal
>            000000000000000e: 0000               illegal
> Call Trace:
>  [<0000000000000002>] 0x2
>  [<00000000009d5cde>] bpf_struct_ops_test_run+0x156/0x250
>  [<000000000033145a>] __sys_bpf+0xa1a/0xd00
>  [<00000000003319dc>] __s390x_sys_bpf+0x44/0x50
>  [<0000000000c4382c>] __do_syscall+0x244/0x300
>  [<0000000000c59a40>] system_call+0x70/0x98
> INFO: lockdep is turned off.
> Last Breaking-Event-Address:
>  [<000003ff800a49ac>] bpf_prog_83ebf4e90b071f2b_test_1+0x5c/0x70
> Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
> 
> [1]
> https://github.com/kernel-patches/bpf/actions/runs/8297891630/job/22710267571

Hi,

thanks for the report, I'm currently looking into it.

The problem is that the following:

static void bpf_jit_plt(void *plt, void *ret, void *target)
{
	memcpy(plt, bpf_plt, BPF_PLT_SIZE);
	*(void **)((char *)plt + (bpf_plt_ret - bpf_plt)) = ret;
	*(void **)((char *)plt + (bpf_plt_target - bpf_plt)) = target ?: ret;
}

is (mis)compiled to the following:

   0x000000000017a26e <+238>:   stg     %r4,232(%r3,%r15)
   0x000000000017a274 <+244>:   ltgr    %r11,%r11
   0x000000000017a278 <+248>:   locgrne %r4,%r11
   0x000000000017a27c <+252>:   stg     %r4,232(%r1,%r15)
   0x000000000017a282 <+258>:   la      %r3,232(%r1,%r15)
   0x000000000017a286 <+262>:   lghi    %r4,8
   0x000000000017a28a <+266>:   mvc     232(16,%r15),0(%r2)
   0x000000000017a290 <+272>:   mvc     248(16,%r15),16(%r2)

Note that memcpy() and assignments are reordered, and this is causing
the crash. The C code violates the strict aliasing rules, but this
should be irrelevant for the kernel.

The problem does not occur with the latest GCC. Bisect says that the
following commit fixed it:

commit 2e96b5f14e4025691b57d2301d71aa6092ed44bc                                                          
Author: Aldy Hernandez <aldyh@redhat.com>                                                                
Date:   Tue Jun 15 12:32:51 2021 +0200                                                                   
                                                                                                         
    Backwards jump threader rewrite with ranger.

but I think that's accidental. The reordering is done by the sched1
pass. I plan to find out why it thinks it's okay to reorder these
memory accesses.

Best regards,
Ilya

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Crash caused by 778666df60f0 ("s390: compile relocatable kernel without -fPIE")
  2024-03-18 20:38 ` Ilya Leoshkevich
@ 2024-03-18 23:31   ` Alexei Starovoitov
  2024-03-20  1:59     ` Ilya Leoshkevich
  0 siblings, 1 reply; 4+ messages in thread
From: Alexei Starovoitov @ 2024-03-18 23:31 UTC (permalink / raw)
  To: Ilya Leoshkevich
  Cc: Kui-Feng Lee, Josh Poimboeuf, Sumanth Korikkar, Heiko Carstens,
	bpf, Vasily Gorbik

On Mon, Mar 18, 2024 at 1:38 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> On Fri, Mar 15, 2024 at 09:59:52AM -0700, Kui-Feng Lee wrote:
> > Hi Josh, Sumath, and Heiko,
> >
> > Recently, all patches submitted to bpf have been failing the selftest
> > "dummy_st_ops/dummy_init_ptr_arg" on the s390x platform. Upon
> > investigation, it was determined that a specific patch, 778666df60f0
> > ("s390: compile relocatable kernel without -fPIE"), is causing the problem.
> > It is the first bad after bisecting. Please take a look.
> > Thank you!
> >
> > You can reproduce it by running the command
> >
> >     ./test_progs -a dummy_st_ops/dummy_init_ptr_arg
> >
> > in the tools/testing/selftests/bpf directory in the Linux source tree.
> > This command causes a kernel crash with the following messages. It
> > appears to be related to attaching a trace program at the fentry of a
> > struct_ops operator. The same operator works fine without the trace
> > program. For more details about the test case, please check
> > dummy_st_ops.c in tools/testing/selftests/bpf/prog_tests/.
> >
> > illegal operation: 0001 ilc:1 [#1] SMP
> > Modules linked in:
> > CPU: 0 PID: 82 Comm: test_progs Tainted: G           OE
> > 6.8.0-rc3-g778666df60f0 #34
> > Hardware name: QEMU 3906 QEMU (KVM/Linux)
> > Krnl PSW : 0704d00180000000 0000000000000002 (0x2)
> >            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
> > Krnl GPRS: 000003ff800a4956 0000000000000000 0000038000a73d20
> > 0000000082a08b58
> >            0000000000000000 000000007083ede2 0000000000000000
> > 0000000000000001
> >            00000000009d580e 000000008286bc00 0000037f000000d8
> > 000003ff800b2000
> >            0000000081a43c00 00000000011bfe20 000003ff800b208e
> > 0000038000a73c68
> > Krnl Code:#0000000000000000: 0000               illegal
> >           >0000000000000002: 0000               illegal
> >            0000000000000004: 0000               illegal
> >            0000000000000006: 0000               illegal
> >            0000000000000008: 0000               illegal
> >            000000000000000a: 0000               illegal
> >            000000000000000c: 0000               illegal
> >            000000000000000e: 0000               illegal
> > Call Trace:
> >  [<0000000000000002>] 0x2
> >  [<00000000009d5cde>] bpf_struct_ops_test_run+0x156/0x250
> >  [<000000000033145a>] __sys_bpf+0xa1a/0xd00
> >  [<00000000003319dc>] __s390x_sys_bpf+0x44/0x50
> >  [<0000000000c4382c>] __do_syscall+0x244/0x300
> >  [<0000000000c59a40>] system_call+0x70/0x98
> > INFO: lockdep is turned off.
> > Last Breaking-Event-Address:
> >  [<000003ff800a49ac>] bpf_prog_83ebf4e90b071f2b_test_1+0x5c/0x70
> > Kernel panic - not syncing: Fatal exception: panic_on_oops
> >
> >
> > [1]
> > https://github.com/kernel-patches/bpf/actions/runs/8297891630/job/22710267571
>
> Hi,
>
> thanks for the report, I'm currently looking into it.
>
> The problem is that the following:
>
> static void bpf_jit_plt(void *plt, void *ret, void *target)
> {
>         memcpy(plt, bpf_plt, BPF_PLT_SIZE);
>         *(void **)((char *)plt + (bpf_plt_ret - bpf_plt)) = ret;
>         *(void **)((char *)plt + (bpf_plt_target - bpf_plt)) = target ?: ret;
> }
>
> is (mis)compiled to the following:
>
>    0x000000000017a26e <+238>:   stg     %r4,232(%r3,%r15)
>    0x000000000017a274 <+244>:   ltgr    %r11,%r11
>    0x000000000017a278 <+248>:   locgrne %r4,%r11
>    0x000000000017a27c <+252>:   stg     %r4,232(%r1,%r15)
>    0x000000000017a282 <+258>:   la      %r3,232(%r1,%r15)
>    0x000000000017a286 <+262>:   lghi    %r4,8
>    0x000000000017a28a <+266>:   mvc     232(16,%r15),0(%r2)
>    0x000000000017a290 <+272>:   mvc     248(16,%r15),16(%r2)
>
> Note that memcpy() and assignments are reordered, and this is causing
> the crash. The C code violates the strict aliasing rules, but this
> should be irrelevant for the kernel.
>
> The problem does not occur with the latest GCC. Bisect says that the
> following commit fixed it:
>
> commit 2e96b5f14e4025691b57d2301d71aa6092ed44bc
> Author: Aldy Hernandez <aldyh@redhat.com>
> Date:   Tue Jun 15 12:32:51 2021 +0200
>
>     Backwards jump threader rewrite with ranger.
>
> but I think that's accidental. The reordering is done by the sched1
> pass. I plan to find out why it thinks it's okay to reorder these
> memory accesses.

Wow. What a discovery.
Does this bug affect other architectures ?
Probably yes, since it's sched1 pass ?
Should we apply some workaround for now?

Also it's probably not strict aliasing related:
extern const char bpf_plt[];
extern const char bpf_plt_ret[];
extern const char bpf_plt_target[];
and bpf_jit_plt() casts the pointer to 'char *'.
The 'char *' was always special from GCC strict aliasing point of view.
iirc gcc used to treat such pointers as pointing anywhere.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Crash caused by 778666df60f0 ("s390: compile relocatable kernel without -fPIE")
  2024-03-18 23:31   ` Alexei Starovoitov
@ 2024-03-20  1:59     ` Ilya Leoshkevich
  0 siblings, 0 replies; 4+ messages in thread
From: Ilya Leoshkevich @ 2024-03-20  1:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kui-Feng Lee, Josh Poimboeuf, Sumanth Korikkar, Heiko Carstens,
	bpf, Vasily Gorbik

On Mon, Mar 18, 2024 at 04:31:00PM -0700, Alexei Starovoitov wrote:
> On Mon, Mar 18, 2024 at 1:38 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >
> > On Fri, Mar 15, 2024 at 09:59:52AM -0700, Kui-Feng Lee wrote:
> > > Hi Josh, Sumath, and Heiko,
> > >
> > > Recently, all patches submitted to bpf have been failing the selftest
> > > "dummy_st_ops/dummy_init_ptr_arg" on the s390x platform. Upon
> > > investigation, it was determined that a specific patch, 778666df60f0
> > > ("s390: compile relocatable kernel without -fPIE"), is causing the problem.
> > > It is the first bad after bisecting. Please take a look.
> > > Thank you!
> > >
> > > You can reproduce it by running the command
> > >
> > >     ./test_progs -a dummy_st_ops/dummy_init_ptr_arg
> > >
> > > in the tools/testing/selftests/bpf directory in the Linux source tree.
> > > This command causes a kernel crash with the following messages. It
> > > appears to be related to attaching a trace program at the fentry of a
> > > struct_ops operator. The same operator works fine without the trace
> > > program. For more details about the test case, please check
> > > dummy_st_ops.c in tools/testing/selftests/bpf/prog_tests/.
> > >
> > > illegal operation: 0001 ilc:1 [#1] SMP
> > > Modules linked in:
> > > CPU: 0 PID: 82 Comm: test_progs Tainted: G           OE
> > > 6.8.0-rc3-g778666df60f0 #34
> > > Hardware name: QEMU 3906 QEMU (KVM/Linux)
> > > Krnl PSW : 0704d00180000000 0000000000000002 (0x2)
> > >            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
> > > Krnl GPRS: 000003ff800a4956 0000000000000000 0000038000a73d20
> > > 0000000082a08b58
> > >            0000000000000000 000000007083ede2 0000000000000000
> > > 0000000000000001
> > >            00000000009d580e 000000008286bc00 0000037f000000d8
> > > 000003ff800b2000
> > >            0000000081a43c00 00000000011bfe20 000003ff800b208e
> > > 0000038000a73c68
> > > Krnl Code:#0000000000000000: 0000               illegal
> > >           >0000000000000002: 0000               illegal
> > >            0000000000000004: 0000               illegal
> > >            0000000000000006: 0000               illegal
> > >            0000000000000008: 0000               illegal
> > >            000000000000000a: 0000               illegal
> > >            000000000000000c: 0000               illegal
> > >            000000000000000e: 0000               illegal
> > > Call Trace:
> > >  [<0000000000000002>] 0x2
> > >  [<00000000009d5cde>] bpf_struct_ops_test_run+0x156/0x250
> > >  [<000000000033145a>] __sys_bpf+0xa1a/0xd00
> > >  [<00000000003319dc>] __s390x_sys_bpf+0x44/0x50
> > >  [<0000000000c4382c>] __do_syscall+0x244/0x300
> > >  [<0000000000c59a40>] system_call+0x70/0x98
> > > INFO: lockdep is turned off.
> > > Last Breaking-Event-Address:
> > >  [<000003ff800a49ac>] bpf_prog_83ebf4e90b071f2b_test_1+0x5c/0x70
> > > Kernel panic - not syncing: Fatal exception: panic_on_oops
> > >
> > >
> > > [1]
> > > https://github.com/kernel-patches/bpf/actions/runs/8297891630/job/22710267571
> >
> > Hi,
> >
> > thanks for the report, I'm currently looking into it.
> >
> > The problem is that the following:
> >
> > static void bpf_jit_plt(void *plt, void *ret, void *target)
> > {
> >         memcpy(plt, bpf_plt, BPF_PLT_SIZE);
> >         *(void **)((char *)plt + (bpf_plt_ret - bpf_plt)) = ret;
> >         *(void **)((char *)plt + (bpf_plt_target - bpf_plt)) = target ?: ret;
> > }
> >
> > is (mis)compiled to the following:
> >
> >    0x000000000017a26e <+238>:   stg     %r4,232(%r3,%r15)
> >    0x000000000017a274 <+244>:   ltgr    %r11,%r11
> >    0x000000000017a278 <+248>:   locgrne %r4,%r11
> >    0x000000000017a27c <+252>:   stg     %r4,232(%r1,%r15)
> >    0x000000000017a282 <+258>:   la      %r3,232(%r1,%r15)
> >    0x000000000017a286 <+262>:   lghi    %r4,8
> >    0x000000000017a28a <+266>:   mvc     232(16,%r15),0(%r2)
> >    0x000000000017a290 <+272>:   mvc     248(16,%r15),16(%r2)
> >
> > Note that memcpy() and assignments are reordered, and this is causing
> > the crash. The C code violates the strict aliasing rules, but this
> > should be irrelevant for the kernel.
> >
> > The problem does not occur with the latest GCC. Bisect says that the
> > following commit fixed it:
> >
> > commit 2e96b5f14e4025691b57d2301d71aa6092ed44bc
> > Author: Aldy Hernandez <aldyh@redhat.com>
> > Date:   Tue Jun 15 12:32:51 2021 +0200
> >
> >     Backwards jump threader rewrite with ranger.
> >
> > but I think that's accidental. The reordering is done by the sched1
> > pass. I plan to find out why it thinks it's okay to reorder these
> > memory accesses.
> 
> Wow. What a discovery.
> Does this bug affect other architectures ?
> Probably yes, since it's sched1 pass ?
> Should we apply some workaround for now?

I think it's because bpf_jit_plt() violates the C pointer subtraction
rules (bpf_plt_ret and bpf_plt are different objects from the C
perspective), so it should not affect the other architectures, unless
they use the same pattern elsewhere.

So I posted a patch fixing this in bpf_jit_plt().

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-03-20  1:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-15 16:59 Crash caused by 778666df60f0 ("s390: compile relocatable kernel without -fPIE") Kui-Feng Lee
2024-03-18 20:38 ` Ilya Leoshkevich
2024-03-18 23:31   ` Alexei Starovoitov
2024-03-20  1:59     ` Ilya Leoshkevich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.