All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel panic from malloc() on SUSE 15.1?
@ 2020-11-02 20:14 Carl Jacobsen
  2020-11-03  2:26 ` Michael Ellerman
  2020-11-06 12:51 ` Michal Suchánek
  0 siblings, 2 replies; 9+ messages in thread
From: Carl Jacobsen @ 2020-11-02 20:14 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1915 bytes --]

I've got a SUSE 15.1 install (on ppc64le) that kernel panics on a very
simple
test program, built in a slightly unusual way.

I'm compiling on SUSE 12, using gcc 4.8.3. I'm linking to a static
copy of libcrypto.a (from openssl-1.1.1g), built without threads.
I have a 10 line C test program that compiles and runs fine on the
SUSE 12 system. If I compile the same program on SUSE 15.1 (with
gcc 7.4.1), it runs fine on SUSE 15.1.

But, if I run the version that I compiled on SUSE 12, on the SUSE 15.1
system, the call to RAND_status() gets to a malloc() and then panics.
(And, of course, if I just compile a call to malloc(), that runs fine
on both systems.) Here's the test program, it's really just a call to
RAND_status():

    #include <stdio.h>
    #include <openssl/rand.h>

    int main(int argc, char **argv)
    {
        int has_enough_data = RAND_status();
        printf("The PRNG %s been seeded with enough data\n",
               has_enough_data ? "HAS" : "has NOT");
        return 0;
    }

openssl is configured/built with:
    ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
    make

and the test program is compiled with:
    gcc -ggdb3 -o rand_test rand_test.c libcrypto.a

The kernel on SUSE 12 is: 3.12.28-4-default
And glibc is: 2.19

The kernel on SUSE 15.1 is: 4.12.14-197.18-default
And glibc is: 2.26

In a previous iteration it was panicking in pthread_once(), so
I compiled openssl without pthreads support, and now it panics
calling malloc().

If I link to the system-supplied libcrypto.so, it works fine, and
running the same tests on x86_64 works fine, it's only ppc64le
that panics, and only running code from the old system on the
new one.

I'm trying to dig further down into this to come up with a standalone
test case, but I'm wondering if anything here stands out as a known
problem, or if someone can point me in the right direction.

Thanks,
Carl Jacobsen

[-- Attachment #2: Type: text/html, Size: 2231 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel panic from malloc() on SUSE 15.1?
  2020-11-02 20:14 Kernel panic from malloc() on SUSE 15.1? Carl Jacobsen
@ 2020-11-03  2:26 ` Michael Ellerman
  2020-11-03 22:09   ` Carl Jacobsen
  2020-11-06 12:51 ` Michal Suchánek
  1 sibling, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2020-11-03  2:26 UTC (permalink / raw)
  To: Carl Jacobsen, linuxppc-dev

Carl Jacobsen <cjacobsen@storix.com> writes:
> I've got a SUSE 15.1 install (on ppc64le) that kernel panics on a very
> simple
> test program, built in a slightly unusual way.
>
> I'm compiling on SUSE 12, using gcc 4.8.3. I'm linking to a static
> copy of libcrypto.a (from openssl-1.1.1g), built without threads.
> I have a 10 line C test program that compiles and runs fine on the
> SUSE 12 system. If I compile the same program on SUSE 15.1 (with
> gcc 7.4.1), it runs fine on SUSE 15.1.
>
> But, if I run the version that I compiled on SUSE 12, on the SUSE 15.1
> system, the call to RAND_status() gets to a malloc() and then panics.
> (And, of course, if I just compile a call to malloc(), that runs fine
> on both systems.) Here's the test program, it's really just a call to
> RAND_status():
>
>     #include <stdio.h>
>     #include <openssl/rand.h>
>
>     int main(int argc, char **argv)
>     {
>         int has_enough_data = RAND_status();
>         printf("The PRNG %s been seeded with enough data\n",
>                has_enough_data ? "HAS" : "has NOT");
>         return 0;
>     }
>
> openssl is configured/built with:
>     ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
>     make
>
> and the test program is compiled with:
>     gcc -ggdb3 -o rand_test rand_test.c libcrypto.a
>
> The kernel on SUSE 12 is: 3.12.28-4-default
> And glibc is: 2.19
>
> The kernel on SUSE 15.1 is: 4.12.14-197.18-default
> And glibc is: 2.26
>
> In a previous iteration it was panicking in pthread_once(), so
> I compiled openssl without pthreads support, and now it panics
> calling malloc().

What's the panic look like?

cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel panic from malloc() on SUSE 15.1?
  2020-11-03  2:26 ` Michael Ellerman
@ 2020-11-03 22:09   ` Carl Jacobsen
  2020-11-05 10:19     ` Michael Ellerman
  0 siblings, 1 reply; 9+ messages in thread
From: Carl Jacobsen @ 2020-11-03 22:09 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 9539 bytes --]

The panic (on a call to malloc from static linked libcrypto) looks like
this:

Bad kernel stack pointer 7fffffffeac0 at 700
Oops: Bad kernel stack pointer, sig: 6 [#1]
SMP NR_CPUS=2048
NUMA
pSeries
Modules linked in: scsi_transport_iscsi af_packet xt_tcpudp ip6t_rpfilter
ip6t_REJECT ipt_REJECT xt_conntrack ip_set nfnetlink ebtable_nat
ebtable_broute br_netfilter bridge stp llc ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
x_tables ibmveth(X) vmx_crypto gf128mul crct10dif_vpmsum rtc_generic btrfs
xor zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom sd_mod
ibmvscsi(X) scsi_transport_srp crc32c_vpmsum sg dm_multipath dm_mod
scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
Supported: Yes, External
CPU: 0 PID: 14144 Comm: rand_test_no_pt Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
task: c00000002fa23b80 task.stack: c000000032824000
NIP: 0000000000000700 LR: 0000000010004ad0 CTR: 0000000000000000
REGS: c00000001ec2fd40 TRAP: 0300   Tainted: G
 (4.12.14-197.18-default)
MSR: 8000000000001000 <SF,ME>
  CR: 44000844  XER: 20000000
CFAR: 00000000000010f0 DAR: ffffffffffffb27a DSISR: 40000000 SOFTE: 0
GPR00: 0000000020000000 00007fffffffeac0 00000000102af788 fffffffffffffffd
GPR04: 0000000000000020 0000000000000030 00000000102b0550 0000000000000001
GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0520 800000010280f033
GPR12: 0000000000004000 00007fffb7ffa100 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 00007fffb7fef4b8
GPR28: 00007fffb7ff0000 0000000000000000 0000000000000000 00007fffffffeac0
NIP [0000000000000700] 0x700
LR [0000000010004ad0] 0x10004ad0
Call Trace:
Instruction dump:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 7db243a6 7db142a6 f92d0080 7d20e2a6
---[ end trace cc04515f274cfbf6 ]---

Sending IPI to other CPUs
IPI complete
kexec: Starting switchover sequence.
I'm in purgatory
 -> smp_release_cpus()
spinning_secondaries = 0
 <- smp_release_cpus()
Kernel panic - not syncing: Out of memory and no killable processes...

CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
Call Trace:
[c000000012457210] [c000000008a20140] dump_stack+0xb0/0xf0 (unreliable)
[c000000012457250] [c000000008a1ccd4] panic+0x144/0x31c
[c0000000124572e0] [c0000000082efcc0] out_of_memory+0x3f0/0x700
[c000000012457380] [c0000000082f7ed4] __alloc_pages_nodemask+0x1004/0x10b0
[c000000012457570] [c00000000837f4d8] alloc_page_interleave+0x58/0x110
[c0000000124575b0] [c0000000083800bc] alloc_pages_current+0x16c/0x1d0
[c000000012457610] [c0000000082e8398] __page_cache_alloc+0xd8/0x150
[c000000012457650] [c0000000082e8574] pagecache_get_page+0x164/0x440
[c0000000124576b0] [c0000000082e8884] grab_cache_page_write_begin+0x34/0x70
[c0000000124576e0] [c00000000840ede8] simple_write_begin+0x48/0x190
[c000000012457720] [c0000000082e7c7c] generic_perform_write+0xec/0x270
[c0000000124577b0] [c0000000082ea2e0] __generic_file_write_iter+0x250/0x2a0
[c000000012457810] [c0000000082ea53c] generic_file_write_iter+0x20c/0x2e0
[c000000012457850] [c0000000083cc0e0] __vfs_write+0x120/0x1e0
[c0000000124578e0] [c0000000083cdfc8] vfs_write+0xd8/0x220
[c000000012457930] [c0000000083cfeec] SyS_write+0x6c/0x110
[c000000012457980] [c000000008d154c4] xwrite+0x54/0xb8
[c0000000124579c0] [c000000008d15574] do_copy+0x4c/0x17c
[c0000000124579f0] [c000000008d15140] write_buffer+0x64/0x90
[c000000012457a20] [c000000008d151d4] flush_buffer+0x68/0xf4
[c000000012457a70] [c000000008d62268] unxz+0x210/0x398
[c000000012457b10] [c000000008d15efc] unpack_to_rootfs+0x1f0/0x360
[c000000012457bc0] [c000000008d16108] populate_rootfs+0x9c/0x188
[c000000012457c40] [c00000000800f5d4] do_one_initcall+0x64/0x1d0
[c000000012457d00] [c000000008d14474] kernel_init_freeable+0x294/0x388
[c000000012457dc0] [c00000000801026c] kernel_init+0x2c/0x160
[c000000012457e30] [c00000000800b560] ret_from_kernel_thread+0x5c/0x7c
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at ../drivers/tty/vt/vt.c:3887
do_unblank_screen+0x1d0/0x270
Modules linked in:
Supported: Yes
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
task: c000000012449680 task.stack: c000000012454000
NIP: c0000000086d1ac0 LR: c0000000086d1918 CTR: c0000000085fc390
REGS: c000000012456f60 TRAP: 0700   Not tainted  (4.12.14-197.18-default)
MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>
  CR: 28242222  XER: 20000008
CFAR: c0000000086d1934 SOFTE: 0
GPR00: c0000000086d1918 c0000000124571e0 c000000009240000 0000000000000000
GPR04: 0000000000000000 c00000001237e00e 00000000000010b9 c000000012457170
GPR08: 000000000a610000 0000000000000000 c0000000090f38f0 c0000000122bc3d7
GPR12: 0000000028242428 c00000000f6c0000 00000000014200c2 00000000014200c2
GPR16: 00000000014200c2 0000000000000001 0000000000000000 0000000000000240
GPR20: 0000000000000001 0000000000000240 0000000000000000 c0000000140e1d10
GPR24: 0000000000000000 0000000000000000 0000000000000115 c000000009282374
GPR28: c000000009403508 c0000000094034d8 0000000000000000 0000000000000000
NIP [c0000000086d1ac0] do_unblank_screen+0x1d0/0x270
LR [c0000000086d1918] do_unblank_screen+0x28/0x270
Call Trace:
[c0000000124571e0] [c000000012457250] 0xc000000012457250 (unreliable)
[c000000012457250] [c000000008a1cd44] panic+0x1b4/0x31c
[c0000000124572e0] [c0000000082efcc0] out_of_memory+0x3f0/0x700
[c000000012457380] [c0000000082f7ed4] __alloc_pages_nodemask+0x1004/0x10b0
[c000000012457570] [c00000000837f4d8] alloc_page_interleave+0x58/0x110
[c0000000124575b0] [c0000000083800bc] alloc_pages_current+0x16c/0x1d0
[c000000012457610] [c0000000082e8398] __page_cache_alloc+0xd8/0x150
[c000000012457650] [c0000000082e8574] pagecache_get_page+0x164/0x440
[c0000000124576b0] [c0000000082e8884] grab_cache_page_write_begin+0x34/0x70
[c0000000124576e0] [c00000000840ede8] simple_write_begin+0x48/0x190
[c000000012457720] [c0000000082e7c7c] generic_perform_write+0xec/0x270
[c0000000124577b0] [c0000000082ea2e0] __generic_file_write_iter+0x250/0x2a0
[c000000012457810] [c0000000082ea53c] generic_file_write_iter+0x20c/0x2e0
[c000000012457850] [c0000000083cc0e0] __vfs_write+0x120/0x1e0
[c0000000124578e0] [c0000000083cdfc8] vfs_write+0xd8/0x220
[c000000012457930] [c0000000083cfeec] SyS_write+0x6c/0x110
[c000000012457980] [c000000008d154c4] xwrite+0x54/0xb8
[c0000000124579c0] [c000000008d15574] do_copy+0x4c/0x17c
[c0000000124579f0] [c000000008d15140] write_buffer+0x64/0x90
[c000000012457a20] [c000000008d151d4] flush_buffer+0x68/0xf4
[c000000012457a70] [c000000008d62268] unxz+0x210/0x398
[c000000012457b10] [c000000008d15efc] unpack_to_rootfs+0x1f0/0x360
[c000000012457bc0] [c000000008d16108] populate_rootfs+0x9c/0x188
[c000000012457c40] [c00000000800f5d4] do_one_initcall+0x64/0x1d0
[c000000012457d00] [c000000008d14474] kernel_init_freeable+0x294/0x388
[c000000012457dc0] [c00000000801026c] kernel_init+0x2c/0x160
[c000000012457e30] [c00000000800b560] ret_from_kernel_thread+0x5c/0x7c
Instruction dump:
3d22001c 39293920 81290000 2f890000 409cff00 ebe10068 38210070 e8010010
ebc1fff0 7c0803a6 4e800020 60000000 <0fe00000> 4bfffe74 60000000 60000000
---[ end trace ad1803c957b45442 ]---
---[ end Kernel panic - not syncing: Out of memory and no killable
processes...


On Mon, Nov 2, 2020 at 6:26 PM Michael Ellerman <mpe@ellerman.id.au> wrote:

> Carl Jacobsen <cjacobsen@storix.com> writes:
> > I've got a SUSE 15.1 install (on ppc64le) that kernel panics on a very
> > simple
> > test program, built in a slightly unusual way.
> >
> > I'm compiling on SUSE 12, using gcc 4.8.3. I'm linking to a static
> > copy of libcrypto.a (from openssl-1.1.1g), built without threads.
> > I have a 10 line C test program that compiles and runs fine on the
> > SUSE 12 system. If I compile the same program on SUSE 15.1 (with
> > gcc 7.4.1), it runs fine on SUSE 15.1.
> >
> > But, if I run the version that I compiled on SUSE 12, on the SUSE 15.1
> > system, the call to RAND_status() gets to a malloc() and then panics.
> > (And, of course, if I just compile a call to malloc(), that runs fine
> > on both systems.) Here's the test program, it's really just a call to
> > RAND_status():
> >
> >     #include <stdio.h>
> >     #include <openssl/rand.h>
> >
> >     int main(int argc, char **argv)
> >     {
> >         int has_enough_data = RAND_status();
> >         printf("The PRNG %s been seeded with enough data\n",
> >                has_enough_data ? "HAS" : "has NOT");
> >         return 0;
> >     }
> >
> > openssl is configured/built with:
> >     ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
> >     make
> >
> > and the test program is compiled with:
> >     gcc -ggdb3 -o rand_test rand_test.c libcrypto.a
> >
> > The kernel on SUSE 12 is: 3.12.28-4-default
> > And glibc is: 2.19
> >
> > The kernel on SUSE 15.1 is: 4.12.14-197.18-default
> > And glibc is: 2.26
> >
> > In a previous iteration it was panicking in pthread_once(), so
> > I compiled openssl without pthreads support, and now it panics
> > calling malloc().
>
> What's the panic look like?
>
> cheers
>


-- 
Carl Jacobsen
Storix, Inc.

[-- Attachment #2: Type: text/html, Size: 10895 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel panic from malloc() on SUSE 15.1?
  2020-11-03 22:09   ` Carl Jacobsen
@ 2020-11-05 10:19     ` Michael Ellerman
  2020-11-05 11:00       ` Segher Boessenkool
  2020-11-05 19:44       ` Carl Jacobsen
  0 siblings, 2 replies; 9+ messages in thread
From: Michael Ellerman @ 2020-11-05 10:19 UTC (permalink / raw)
  To: Carl Jacobsen; +Cc: linuxppc-dev

Carl Jacobsen <cjacobsen@storix.com> writes:
> The panic (on a call to malloc from static linked libcrypto) looks like
> this:

Thanks.

This doesn't make a lot of sense.

> Bad kernel stack pointer 7fffffffeac0 at 700

"at 700" is the regs->nip value, and suggests we're trying to handle a
program check, which is either a trap or BUG or WARN, or illegal
instruction or several other things.

> Oops: Bad kernel stack pointer, sig: 6 [#1]
> SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in: scsi_transport_iscsi af_packet xt_tcpudp ip6t_rpfilter
> ip6t_REJECT ipt_REJECT xt_conntrack ip_set nfnetlink ebtable_nat
> ebtable_broute br_netfilter bridge stp llc ip6table_nat nf_conntrack_ipv6
> nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security
> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
> nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
> x_tables ibmveth(X) vmx_crypto gf128mul crct10dif_vpmsum rtc_generic btrfs
> xor zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom sd_mod
> ibmvscsi(X) scsi_transport_srp crc32c_vpmsum sg dm_multipath dm_mod
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
> Supported: Yes, External
> CPU: 0 PID: 14144 Comm: rand_test_no_pt Tainted: G 4.12.14-197.18-default #1 SLE15-SP1
> task: c00000002fa23b80 task.stack: c000000032824000
> NIP: 0000000000000700 LR: 0000000010004ad0 CTR: 0000000000000000
> REGS: c00000001ec2fd40 TRAP: 0300   Tainted: G (4.12.14-197.18-default)

But then here it says TRAP = 0x300, which is != 0x700.

The trap number is hardcoded in the bad stack handling code, and I don't
see how we can end up with nip == 0x700 but the trap value == 0x300.

> MSR: 8000000000001000 <SF,ME> CR: 44000844  XER: 20000000

And here the MSR says you were in big endian mode, but you said before
your machine was ppc64le.

> CFAR: 00000000000010f0 DAR: ffffffffffffb27a DSISR: 40000000 SOFTE: 0
> GPR00: 0000000020000000 00007fffffffeac0 00000000102af788 fffffffffffffffd
> GPR04: 0000000000000020 0000000000000030 00000000102b0550 0000000000000001
> GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0520 800000010280f033
> GPR12: 0000000000004000 00007fffb7ffa100 0000000000000000 0000000000000000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR24: 0000000000000000 0000000000000000 0000000000000000 00007fffb7fef4b8
> GPR28: 00007fffb7ff0000 0000000000000000 0000000000000000 00007fffffffeac0

The rest of the regs look like user space values, not kernel.

> NIP [0000000000000700] 0x700
> LR [0000000010004ad0] 0x10004ad0
> Call Trace:
> Instruction dump:
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 7db243a6 7db142a6 f92d0080 7d20e2a6
> ---[ end trace cc04515f274cfbf6 ]---


What hardware is this on?

Can you try booting with ppc_tm=off on the kernel command line, and see
if that changes anything?

Can you put your compiled test program up somewhere we can download it
and look at? Or post the disassembly?

cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel panic from malloc() on SUSE 15.1?
  2020-11-05 10:19     ` Michael Ellerman
@ 2020-11-05 11:00       ` Segher Boessenkool
  2020-11-05 19:44       ` Carl Jacobsen
  1 sibling, 0 replies; 9+ messages in thread
From: Segher Boessenkool @ 2020-11-05 11:00 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Carl Jacobsen, linuxppc-dev

On Thu, Nov 05, 2020 at 09:19:22PM +1100, Michael Ellerman wrote:
> Carl Jacobsen <cjacobsen@storix.com> writes:
> This doesn't make a lot of sense.
> 
> > Bad kernel stack pointer 7fffffffeac0 at 700
> 
> "at 700" is the regs->nip value, and suggests we're trying to handle a
> program check, which is either a trap or BUG or WARN, or illegal
> instruction or several other things.

> > REGS: c00000001ec2fd40 TRAP: 0300   Tainted: G (4.12.14-197.18-default)
> 
> But then here it says TRAP = 0x300, which is != 0x700.
> 
> The trap number is hardcoded in the bad stack handling code, and I don't
> see how we can end up with nip == 0x700 but the trap value == 0x300.
> 
> > MSR: 8000000000001000 <SF,ME> CR: 44000844  XER: 20000000
> 
> And here the MSR says you were in big endian mode, but you said before
> your machine was ppc64le.

It looks like you got a DSI (the 300), but for some reason that
interrupt was not taken in LE mode, so the instruction at 300 was read
as a lot of gobbledygook, not a valid insn, and the processor took a
program interrupt (the 700).

(MSR[RI]=0, but there can be other causes for that of course.)


Segher

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel panic from malloc() on SUSE 15.1?
  2020-11-05 10:19     ` Michael Ellerman
  2020-11-05 11:00       ` Segher Boessenkool
@ 2020-11-05 19:44       ` Carl Jacobsen
  2020-11-06 12:25         ` Michael Ellerman
  1 sibling, 1 reply; 9+ messages in thread
From: Carl Jacobsen @ 2020-11-05 19:44 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 14126 bytes --]

On Thu, Nov 5, 2020 at 2:19 AM Michael Ellerman <mpe@ellerman.id.au> wrote:

> Carl Jacobsen <cjacobsen@storix.com> writes:
> > The panic (on a call to malloc from static linked libcrypto) looks like
> > this:
>
> What hardware is this on?
>

Thank you for looking into this.

The system that's panicking identifies like this:
    # uname -a
    Linux sl151pwr8 4.12.14-197.18-default #1 SMP Tue Sep 17 14:26:49 UTC
2019
    (d75059b) ppc64le ppc64le ppc64le GNU/Linux
    #
    # cat /etc/os-release
    NAME="SLES"
    VERSION="15-SP1"
    VERSION_ID="15.1"
    PRETTY_NAME="SUSE Linux Enterprise Server 15 SP1"
    ID="sles"
    ID_LIKE="suse"
    ANSI_COLOR="0;32"
    CPE_NAME="cpe:/o:suse:sles:15:sp1"

The system is an LPAR running under PowerVM vios version 2.2.3.4.
The underlying hardware is machine type-model 8284-22A.


> Can you try booting with ppc_tm=off on the kernel command line, and see
> if that changes anything?
>

Yes. Output is down below. Doesn't appear to change much, but I don't have
the background to interpret the registers.


> Can you put your compiled test program up somewhere we can download it
> and look at? Or post the disassembly?
>

Here's the source file:
    https://www.storix.com/download/support/misc/rand_test.c

Here's the resulting executable:
    https://www.storix.com/download/support/misc/rand_test

Executable is linked to libcrypto from openssl-1.1.1g, configured with:
    ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static

Executable is built (on SUSE 12) with:
    gcc -ggdb3 -o rand_test rand_test.c libcrypto.a


And running the executable (on SUSE 15.1) through gdb goes like this:

    # gdb --args ./rand_test
    GNU gdb (GDB; SUSE Linux Enterprise 15) 8.3.1
    << snip intro text >>
    Reading symbols from ./rand_test...
    (gdb) b main
    Breakpoint 1 at 0x1000288c: file rand_test.c, line 6.
    (gdb) r
    Starting program: /tmp/ossl/rand_test

    Breakpoint 1, main (argc=1, argv=0x7ffffffff798) at rand_test.c:6
    6           int has_enough_data = RAND_status();
    (gdb) s
    RAND_status () at crypto/rand/rand_lib.c:958
    958         const RAND_METHOD *meth = RAND_get_rand_method();
    (gdb)
    RAND_get_rand_method () at crypto/rand/rand_lib.c:844
    844         const RAND_METHOD *tmp_meth = NULL;
    (gdb)
    846         if (!RUN_ONCE(&rand_init, do_rand_init))
    (gdb)
    CRYPTO_THREAD_run_once (once=0x102a7d88 <rand_init>,
init=0x10002f30 <do_rand_init_ossl_>) at crypto/threads_none.c:67
    67          if (*once != 0)
    (gdb)
    70          init();
    (gdb)
    do_rand_init_ossl_ () at crypto/rand/rand_lib.c:306
    306     DEFINE_RUN_ONCE_STATIC(do_rand_init)
    (gdb)
    do_rand_init () at crypto/rand/rand_lib.c:309
    309         rand_engine_lock = CRYPTO_THREAD_lock_new();
    (gdb)
    CRYPTO_THREAD_lock_new () at crypto/threads_none.c:24
    24          if ((lock = OPENSSL_zalloc(sizeof(unsigned int))) == NULL) {
    (gdb)
    CRYPTO_zalloc (num=4, file=0x1023a500 "crypto/threads_none.c", line=24)
at crypto/mem.c:230
    230         void *ret = CRYPTO_malloc(num, file, line);
    (gdb)
    CRYPTO_malloc (num=4, file=0x1023a500 "crypto/threads_none.c", line=24)
at crypto/mem.c:194
    194         void *ret = NULL;
    (gdb)
    197         if (malloc_impl != NULL && malloc_impl != CRYPTO_malloc)
    (gdb)
    200         if (num == 0)
    (gdb)
    204         if (allow_customize) {
    (gdb)
    210             allow_customize = 0;
    (gdb)
    222         ret = malloc(num);
    (gdb)
    Bad kernel stack pointer 7fffffffef20 at 700
    Oops: Bad kernel stack pointer, sig: 6 [#1]
    SMP NR_CPUS=2048
    NUMA
    pSeries
    Modules linked in: scsi_transport_iscsi af_packet xt_tcpudp
ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ip_set nfnetlink
ebtable_nat ebtable_broute br_netfilter bridge stp llc ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
x_tables ibmveth(X) vmx_crypto gf128mul crct10dif_vpmsum rtc_generic btrfs
xor zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom sd_mod
ibmvscsi(X) scsi_transport_srp crc32c_vpmsum sg dm_multipath dm_mod
scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
    Supported: Yes, External
    CPU: 4 PID: 3082 Comm: rand_test Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
    task: c00000002e226100 task.stack: c0000000387c8000
    NIP: 0000000000000700 LR: 0000000010004acc CTR: 0000000000000000
    REGS: c00000001ebffd40 TRAP: 0300   Tainted: G
 (4.12.14-197.18-default)
    MSR: 8000000000001000 <SF,ME>
      CR: 44000844  XER: 20000000
    CFAR: 00000000000010f0 DAR: ffffffffffffb27a DSISR: 40000000 SOFTE: 0
    GPR00: 0000000020000000 00007fffffffef20 00000000102af788
fffffffffffffffd
    GPR04: 0000000000000020 0000000000000030 00000000102b0760
0000000000000001
    GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0730
800000010280f033
    GPR12: 0000000000004000 00007fffb7ffa100 0000000000000000
0000000000000000
    GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
    GPR20: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
    GPR24: 0000000000000000 0000000000000000 0000000000000000
00007fffb7fef4b8
    GPR28: 00007fffb7ff0000 0000000000000000 0000000000000000
00007fffffffef20
    NIP [0000000000000700] 0x700
    LR [0000000010004acc] 0x10004acc
    Call Trace:
    Instruction dump:
    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    00000000 00000000 00000000 00000000 7db243a6 7db142a6 f92d0080 7d20e2a6
    ---[ end trace 167d5d3b2e8a06e9 ]---

    Sending IPI to other CPUs
    IPI complete
    kexec: Starting switchover sequence.
    I'm in purgatory
     -> smp_release_cpus()
    spinning_secondaries = 0
     <- smp_release_cpus()
    Kernel panic - not syncing: Out of memory and no killable processes...

    CPU: 4 PID: 1 Comm: swapper/4 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
    Call Trace:
    [c000000012457210] [c000000008a20140] dump_stack+0xb0/0xf0 (unreliable)
    [c000000012457250] [c000000008a1ccd4] panic+0x144/0x31c
    [c0000000124572e0] [c0000000082efcc0] out_of_memory+0x3f0/0x700
    [c000000012457380] [c0000000082f7ed4]
__alloc_pages_nodemask+0x1004/0x10b0
    [c000000012457570] [c00000000837f4d8] alloc_page_interleave+0x58/0x110
    [c0000000124575b0] [c0000000083800bc] alloc_pages_current+0x16c/0x1d0
    [c000000012457610] [c0000000082e8398] __page_cache_alloc+0xd8/0x150
    [c000000012457650] [c0000000082e8574] pagecache_get_page+0x164/0x440
    [c0000000124576b0] [c0000000082e8884]
grab_cache_page_write_begin+0x34/0x70
    [c0000000124576e0] [c00000000840ede8] simple_write_begin+0x48/0x190
    [c000000012457720] [c0000000082e7c7c] generic_perform_write+0xec/0x270
    [c0000000124577b0] [c0000000082ea2e0]
__generic_file_write_iter+0x250/0x2a0
    [c000000012457810] [c0000000082ea53c]
generic_file_write_iter+0x20c/0x2e0
    [c000000012457850] [c0000000083cc0e0] __vfs_write+0x120/0x1e0
    [c0000000124578e0] [c0000000083cdfc8] vfs_write+0xd8/0x220
    [c000000012457930] [c0000000083cfeec] SyS_write+0x6c/0x110
    [c000000012457980] [c000000008d154c4] xwrite+0x54/0xb8
    [c0000000124579c0] [c000000008d15574] do_copy+0x4c/0x17c
    [c0000000124579f0] [c000000008d15140] write_buffer+0x64/0x90
    [c000000012457a20] [c000000008d151d4] flush_buffer+0x68/0xf4
    [c000000012457a70] [c000000008d62268] unxz+0x210/0x398
    [c000000012457b10] [c000000008d15efc] unpack_to_rootfs+0x1f0/0x360
    [c000000012457bc0] [c000000008d16108] populate_rootfs+0x9c/0x188
    [c000000012457c40] [c00000000800f5d4] do_one_initcall+0x64/0x1d0
    [c000000012457d00] [c000000008d14474] kernel_init_freeable+0x294/0x388
    [c000000012457dc0] [c00000000801026c] kernel_init+0x2c/0x160
    [c000000012457e30] [c00000000800b560] ret_from_kernel_thread+0x5c/0x7c
    ------------[ cut here ]------------


Doing the same thing but with ppc_tm=off...
    # cat /proc/cmdline
    BOOT_IMAGE=/boot/vmlinux-4.12.14-197.18-default
root=UUID=0e795e37-3692-465a-a037-c2935a9fde7a mitigations=auto quiet
crashkernel=197M ppc_tm=off


Results in a panic at the same point, with a few registers changed:

    << snip down to panic at malloc >>
    (gdb)
    Bad kernel stack pointer 7fffffffef20 at 700
    Oops: Bad kernel stack pointer, sig: 6 [#1]
    SMP NR_CPUS=2048
    NUMA
    pSeries
    Modules linked in: scsi_transport_iscsi af_packet xt_tcpudp
ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ip_set nfnetlink
ebtable_nat ebtable_broute br_netfilter bridge stp llc ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
x_tables ibmveth(X) vmx_crypto gf128mul crct10dif_vpmsum rtc_generic btrfs
xor zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom sd_mod
ibmvscsi(X) scsi_transport_srp crc32c_vpmsum sg dm_multipath dm_mod
scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
    Supported: Yes, External
    CPU: 2 PID: 3079 Comm: rand_test Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
    task: c00000002f6bcc00 task.stack: c0000000321fc000
    NIP: 0000000000000700 LR: 0000000010004acc CTR: 0000000000000000
    REGS: c00000001ec17d40 TRAP: 0300   Tainted: G
 (4.12.14-197.18-default)
    MSR: 8000000000001000 <SF,ME>
      CR: 44000844  XER: 20000000
    CFAR: 00000000000010f0 DAR: ffffffffffffb27a DSISR: 40000000 SOFTE: 0
    GPR00: 0000000020000000 00007fffffffef20 00000000102af788
fffffffffffffffd
    GPR04: 0000000000000020 0000000000000030 00000000102b0760
0000000000000001
    GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0730
800000000280f033
    GPR12: 0000000000004000 00007fffb7ffa100 0000000000000000
0000000000000000
    GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
    GPR20: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
    GPR24: 0000000000000000 0000000000000000 0000000000000000
00007fffb7fef4b8
    GPR28: 00007fffb7ff0000 0000000000000000 0000000000000000
00007fffffffef20
    NIP [0000000000000700] 0x700
    LR [0000000010004acc] 0x10004acc
    Call Trace:
    Instruction dump:
    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    00000000 00000000 00000000 00000000 7db243a6 7db142a6 f92d0080 7d20e2a6
    ---[ end trace 436f626dd098548c ]---

    Sending IPI to other CPUs
    IPI complete
    kexec: Starting switchover sequence.
    I'm in purgatory
     -> smp_release_cpus()
    spinning_secondaries = 0
     <- smp_release_cpus()
    Kernel panic - not syncing: Out of memory and no killable processes...

    CPU: 2 PID: 1 Comm: swapper/2 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
    Call Trace:
    [c000000012457210] [c000000008a20140] dump_stack+0xb0/0xf0 (unreliable)
    [c000000012457250] [c000000008a1ccd4] panic+0x144/0x31c
    [c0000000124572e0] [c0000000082efcc0] out_of_memory+0x3f0/0x700
    [c000000012457380] [c0000000082f7ed4]
__alloc_pages_nodemask+0x1004/0x10b0
    [c000000012457570] [c00000000837f4d8] alloc_page_interleave+0x58/0x110
    [c0000000124575b0] [c0000000083800bc] alloc_pages_current+0x16c/0x1d0
    [c000000012457610] [c0000000082e8398] __page_cache_alloc+0xd8/0x150
    [c000000012457650] [c0000000082e8574] pagecache_get_page+0x164/0x440
    [c0000000124576b0] [c0000000082e8884]
grab_cache_page_write_begin+0x34/0x70
    [c0000000124576e0] [c00000000840ede8] simple_write_begin+0x48/0x190
    [c000000012457720] [c0000000082e7c7c] generic_perform_write+0xec/0x270
    [c0000000124577b0] [c0000000082ea2e0]
__generic_file_write_iter+0x250/0x2a0
    [c000000012457810] [c0000000082ea53c]
generic_file_write_iter+0x20c/0x2e0
    [c000000012457850] [c0000000083cc0e0] __vfs_write+0x120/0x1e0
    [c0000000124578e0] [c0000000083cdfc8] vfs_write+0xd8/0x220
    [c000000012457930] [c0000000083cfeec] SyS_write+0x6c/0x110
    [c000000012457980] [c000000008d154c4] xwrite+0x54/0xb8
    [c0000000124579c0] [c000000008d15574] do_copy+0x4c/0x17c
    [c0000000124579f0] [c000000008d15140] write_buffer+0x64/0x90
    [c000000012457a20] [c000000008d151d4] flush_buffer+0x68/0xf4
    [c000000012457a70] [c000000008d62268] unxz+0x210/0x398
    [c000000012457b10] [c000000008d15efc] unpack_to_rootfs+0x1f0/0x360
    [c000000012457bc0] [c000000008d16108] populate_rootfs+0x9c/0x188
    [c000000012457c40] [c00000000800f5d4] do_one_initcall+0x64/0x1d0
    [c000000012457d00] [c000000008d14474] kernel_init_freeable+0x294/0x388
    [c000000012457dc0] [c00000000801026c] kernel_init+0x2c/0x160
    [c000000012457e30] [c00000000800b560] ret_from_kernel_thread+0x5c/0x7c
    ------------[ cut here ]------------


Diffing the panic output looks like this (highlighting register changes?):

    74,75c79,80
    < CPU: 4 PID: 3082 Comm: rand_test Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
    < task: c00000002e226100 task.stack: c0000000387c8000
    ---
    > CPU: 2 PID: 3079 Comm: rand_test Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
    > task: c00000002f6bcc00 task.stack: c0000000321fc000
    77c82
    < REGS: c00000001ebffd40 TRAP: 0300   Tainted: G
 (4.12.14-197.18-default)
    ---
    > REGS: c00000001ec17d40 TRAP: 0300   Tainted: G
 (4.12.14-197.18-default)
    83c88
    < GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0730
800000010280f033
    ---
    > GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0730
800000000280f033
    95c100
    < ---[ end trace 167d5d3b2e8a06e9 ]---
    ---
    > ---[ end trace 436f626dd098548c ]---
    106c111
    < CPU: 4 PID: 1 Comm: swapper/4 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
    ---
    > CPU: 2 PID: 1 Comm: swapper/2 Not tainted 4.12.14-197.18-default #1
SLE15-SP1

-- 
Carl Jacobsen
Storix, Inc.

[-- Attachment #2: Type: text/html, Size: 16954 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel panic from malloc() on SUSE 15.1?
  2020-11-05 19:44       ` Carl Jacobsen
@ 2020-11-06 12:25         ` Michael Ellerman
  2020-11-07  7:44           ` Carl Jacobsen
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2020-11-06 12:25 UTC (permalink / raw)
  To: Carl Jacobsen; +Cc: linuxppc-dev

Carl Jacobsen <cjacobsen@storix.com> writes:
> On Thu, Nov 5, 2020 at 2:19 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
>> Carl Jacobsen <cjacobsen@storix.com> writes:
>> > The panic (on a call to malloc from static linked libcrypto) looks like
>> > this:
>>
>> What hardware is this on?
>>
>
> Thank you for looking into this.
>
> The system that's panicking identifies like this:
>     # uname -a
>     Linux sl151pwr8 4.12.14-197.18-default #1 SMP Tue Sep 17 14:26:49 UTC
> 2019
>     (d75059b) ppc64le ppc64le ppc64le GNU/Linux
>     #
>     # cat /etc/os-release
>     NAME="SLES"
>     VERSION="15-SP1"
>     VERSION_ID="15.1"
>     PRETTY_NAME="SUSE Linux Enterprise Server 15 SP1"
>     ID="sles"
>     ID_LIKE="suse"
>     ANSI_COLOR="0;32"
>     CPE_NAME="cpe:/o:suse:sles:15:sp1"
>
> The system is an LPAR running under PowerVM vios version 2.2.3.4.
> The underlying hardware is machine type-model 8284-22A.

OK thanks. That's a Power8.

>> Can you try booting with ppc_tm=off on the kernel command line, and see
>> if that changes anything?
>
> Yes. Output is down below. Doesn't appear to change much, but I don't have
> the background to interpret the registers.

Yeah looks like that's not the problem.

>> Can you put your compiled test program up somewhere we can download it
>> and look at? Or post the disassembly?
>>
>
> Here's the source file:
>     https://www.storix.com/download/support/misc/rand_test.c
>
> Here's the resulting executable:
>     https://www.storix.com/download/support/misc/rand_test

Thanks.

So something seems to have gone wrong linking this, I see eg:

0000000010004a8c <syscall_random>:
    10004a8c:   2b 10 40 3c     lis     r2,4139
    10004a90:   88 f7 42 38     addi    r2,r2,-2168
    10004a94:   a6 02 08 7c     mflr    r0
    10004a98:   10 00 01 f8     std     r0,16(r1)
    10004a9c:   f8 ff e1 fb     std     r31,-8(r1)
    10004aa0:   81 ff 21 f8     stdu    r1,-128(r1)
    10004aa4:   78 0b 3f 7c     mr      r31,r1
    10004aa8:   60 00 7f f8     std     r3,96(r31)
    10004aac:   68 00 9f f8     std     r4,104(r31)
    10004ab0:   00 00 00 60     nop
    10004ab4:   30 80 22 e9     ld      r9,-32720(r2)
    10004ab8:   00 00 a9 2f     cmpdi   cr7,r9,0
    10004abc:   30 00 9e 41     beq     cr7,10004aec <syscall_random+0x60>
    10004ac0:   60 00 7f e8     ld      r3,96(r31)
    10004ac4:   68 00 9f e8     ld      r4,104(r31)
    10004ac8:   39 b5 ff 4b     bl      10000000 <_init-0x1f00>

Notice that last bl (branch and link) to 0x10000000. But there's no text
at 0x10000000, that's the start of the page which happens to be the ELF
magic.

I've seen something like this before, but I can't remember when/where so
I haven't been able to track down what the problem was.

Anyway hopefully someone on the list will know.

That still doesn't explain the kernel crash though.


> Executable is linked to libcrypto from openssl-1.1.1g, configured with:
>     ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
>
> Executable is built (on SUSE 12) with:
>     gcc -ggdb3 -o rand_test rand_test.c libcrypto.a


> And running the executable (on SUSE 15.1) through gdb goes like this:
>
>     # gdb --args ./rand_test
>     GNU gdb (GDB; SUSE Linux Enterprise 15) 8.3.1
>     << snip intro text >>
>     Reading symbols from ./rand_test...
>     (gdb) b main
>     Breakpoint 1 at 0x1000288c: file rand_test.c, line 6.
>     (gdb) r
>     Starting program: /tmp/ossl/rand_test
>
>     Breakpoint 1, main (argc=1, argv=0x7ffffffff798) at rand_test.c:6
>     6           int has_enough_data = RAND_status();
>     (gdb) s
>     RAND_status () at crypto/rand/rand_lib.c:958
>     958         const RAND_METHOD *meth = RAND_get_rand_method();
>     (gdb)
>     RAND_get_rand_method () at crypto/rand/rand_lib.c:844
>     844         const RAND_METHOD *tmp_meth = NULL;
>     (gdb)
>     846         if (!RUN_ONCE(&rand_init, do_rand_init))
>     (gdb)
>     CRYPTO_THREAD_run_once (once=0x102a7d88 <rand_init>, > init=0x10002f30 <do_rand_init_ossl_>) at crypto/threads_none.c:67
>     67          if (*once != 0)
>     (gdb)
>     70          init();
>     (gdb)
>     do_rand_init_ossl_ () at crypto/rand/rand_lib.c:306
>     306     DEFINE_RUN_ONCE_STATIC(do_rand_init)
>     (gdb)
>     do_rand_init () at crypto/rand/rand_lib.c:309
>     309         rand_engine_lock = CRYPTO_THREAD_lock_new();
>     (gdb)
>     CRYPTO_THREAD_lock_new () at crypto/threads_none.c:24
>     24          if ((lock = OPENSSL_zalloc(sizeof(unsigned int))) == NULL) {
>     (gdb)
>     CRYPTO_zalloc (num=4, file=0x1023a500 "crypto/threads_none.c", line=24) > at crypto/mem.c:230
>     230         void *ret = CRYPTO_malloc(num, file, line);
>     (gdb)
>     CRYPTO_malloc (num=4, file=0x1023a500 "crypto/threads_none.c", line=24) > at crypto/mem.c:194
>     194         void *ret = NULL;
>     (gdb)
>     197         if (malloc_impl != NULL && malloc_impl != CRYPTO_malloc)
>     (gdb)
>     200         if (num == 0)
>     (gdb)
>     204         if (allow_customize) {
>     (gdb)
>     210             allow_customize = 0;
>     (gdb)
>     222         ret = malloc(num);
>     (gdb)
>     Bad kernel stack pointer 7fffffffef20 at 700


On my machine it doesn't crash the kernel, so I can catch it later. For
me it's here:

Program received signal SIGILL, Illegal instruction.
0x0000000010000004 in ?? ()
(gdb) bt
#0  0x0000000010000004 in ?? ()
#1  0x0000000010004acc in syscall_random (buf=0x102b0730, buflen=32)
    at crypto/rand/rand_unix.c:371
#2  0x00000000100053fc in rand_pool_acquire_entropy (pool=0x102b06e0)
    at crypto/rand/rand_unix.c:636
#3  0x0000000010002b58 in rand_drbg_get_entropy (drbg=0x102b02e0, 
    pout=0x7ffffffff3f0, entropy=256, min_len=32, max_len=2147483647, 
    prediction_resistance=0) at crypto/rand/rand_lib.c:198
#4  0x000000001001ed9c in RAND_DRBG_instantiate (drbg=0x102b02e0, 
    pers=0x10248d00 <ossl_pers_string> "OpenSSL NIST SP 800-90A DRBG", 
    perslen=28) at crypto/rand/drbg_lib.c:338
#5  0x0000000010020300 in drbg_setup (parent=0x0) at crypto/rand/drbg_lib.c:895
#6  0x0000000010020414 in do_rand_drbg_init () at crypto/rand/drbg_lib.c:924
#7  0x000000001002034c in do_rand_drbg_init_ossl_ ()
    at crypto/rand/drbg_lib.c:909
#8  0x0000000010005d1c in CRYPTO_THREAD_run_once (
    once=0x102ab4d8 <rand_drbg_init>, 
    init=0x1002032c <do_rand_drbg_init_ossl_>) at crypto/threads_none.c:70
#9  0x00000000100209c4 in RAND_DRBG_get0_master ()
    at crypto/rand/drbg_lib.c:1102
#10 0x0000000010020914 in drbg_status () at crypto/rand/drbg_lib.c:1084
#11 0x0000000010004a58 in RAND_status () at crypto/rand/rand_lib.c:961
#12 0x0000000010002890 in main (argc=1, argv=0x7ffffffffa68) at rand_test.c:6
(gdb) 


ie. in the syscall_random() that I mentioned above.

You should be able to catch it there too if you do:

(gdb) b *0x10000000
(gdb) r

Hopefully it will stop without crashing the kernel, and then a `bt` will
show that you're in the same place as me.

If you can get that to work, when you're stopped there, can you do an
`info registers` and send us the output.

cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel panic from malloc() on SUSE 15.1?
  2020-11-02 20:14 Kernel panic from malloc() on SUSE 15.1? Carl Jacobsen
  2020-11-03  2:26 ` Michael Ellerman
@ 2020-11-06 12:51 ` Michal Suchánek
  1 sibling, 0 replies; 9+ messages in thread
From: Michal Suchánek @ 2020-11-06 12:51 UTC (permalink / raw)
  To: Carl Jacobsen; +Cc: linuxppc-dev

On Mon, Nov 02, 2020 at 12:14:27PM -0800, Carl Jacobsen wrote:
> I've got a SUSE 15.1 install (on ppc64le) that kernel panics on a very
> simple
> test program, built in a slightly unusual way.
> 
> I'm compiling on SUSE 12, using gcc 4.8.3. I'm linking to a static
> copy of libcrypto.a (from openssl-1.1.1g), built without threads.
> I have a 10 line C test program that compiles and runs fine on the
> SUSE 12 system. If I compile the same program on SUSE 15.1 (with
> gcc 7.4.1), it runs fine on SUSE 15.1.
> 
> But, if I run the version that I compiled on SUSE 12, on the SUSE 15.1
> system, the call to RAND_status() gets to a malloc() and then panics.
> (And, of course, if I just compile a call to malloc(), that runs fine
> on both systems.) Here's the test program, it's really just a call to
> RAND_status():
> 
>     #include <stdio.h>
>     #include <openssl/rand.h>
> 
>     int main(int argc, char **argv)
>     {
>         int has_enough_data = RAND_status();
>         printf("The PRNG %s been seeded with enough data\n",
>                has_enough_data ? "HAS" : "has NOT");
>         return 0;
>     }
> 
> openssl is configured/built with:
>     ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
>     make
> 
> and the test program is compiled with:
>     gcc -ggdb3 -o rand_test rand_test.c libcrypto.a
> 
> The kernel on SUSE 12 is: 3.12.28-4-default
> And glibc is: 2.19
> 
> The kernel on SUSE 15.1 is: 4.12.14-197.18-default
> And glibc is: 2.26

SLE 12 SP5 has pretty much the same kernel as SLE 15 SP1 and pretty much
the same compiler as SLE 12 so it might be interesting data point to try
there.

Also I saw you are using very old VIOS (which should not make much of a
difference) but did not see what firmware version the machine has.

There have been cases of mysterious crashes solved by updating the
firmware.

Thanks

Michal

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel panic from malloc() on SUSE 15.1?
  2020-11-06 12:25         ` Michael Ellerman
@ 2020-11-07  7:44           ` Carl Jacobsen
  0 siblings, 0 replies; 9+ messages in thread
From: Carl Jacobsen @ 2020-11-07  7:44 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 7593 bytes --]

On Fri, Nov 6, 2020 at 4:25 AM Michael Ellerman <mpe@ellerman.id.au> wrote:

> So something seems to have gone wrong linking this, I see eg:
>
> 0000000010004a8c <syscall_random>:
>     10004a8c:   2b 10 40 3c     lis     r2,4139
>     10004a90:   88 f7 42 38     addi    r2,r2,-2168
>     10004a94:   a6 02 08 7c     mflr    r0
>     10004a98:   10 00 01 f8     std     r0,16(r1)
>     10004a9c:   f8 ff e1 fb     std     r31,-8(r1)
>     10004aa0:   81 ff 21 f8     stdu    r1,-128(r1)
>     10004aa4:   78 0b 3f 7c     mr      r31,r1
>     10004aa8:   60 00 7f f8     std     r3,96(r31)
>     10004aac:   68 00 9f f8     std     r4,104(r31)
>     10004ab0:   00 00 00 60     nop
>     10004ab4:   30 80 22 e9     ld      r9,-32720(r2)
>     10004ab8:   00 00 a9 2f     cmpdi   cr7,r9,0
>     10004abc:   30 00 9e 41     beq     cr7,10004aec <syscall_random+0x60>
>     10004ac0:   60 00 7f e8     ld      r3,96(r31)
>     10004ac4:   68 00 9f e8     ld      r4,104(r31)
>     10004ac8:   39 b5 ff 4b     bl      10000000 <_init-0x1f00>
>
> Notice that last bl (branch and link) to 0x10000000. But there's no text
> at 0x10000000, that's the start of the page which happens to be the ELF
> magic.
>
> I've seen something like this before, but I can't remember when/where so
> I haven't been able to track down what the problem was.
>
> Anyway hopefully someone on the list will know.
>
> That still doesn't explain the kernel crash though.
>

Interesting. Sounds highly unlikely that the linker would have picked
that address at random, but it makes no sense. And, agreed, jumping
into junk should crash the program, not the kernel.


> On my machine it doesn't crash the kernel, so I can catch it later. For
> me it's here:
> ....

ie. in the syscall_random() that I mentioned above.
>
> You should be able to catch it there too if you do:
>
> (gdb) b *0x10000000
> (gdb) r
>
> Hopefully it will stop without crashing the kernel, and then a `bt` will
> show that you're in the same place as me.
>
> If you can get that to work, when you're stopped there, can you do an
> `info registers` and send us the output.
>

Indeed, setting the breakpoint you suggested works, and the stack looks
almost the same - only differences are a few bits off in main's argv
pointer, rand_drbg_get_entropy's pout pointer, and the final address - you
get 0x0000000010000004, I get 0x0000000010000000. Output, including "info
registers", below. Hoping they provide some useful clues. Thanks again for
looking into this.

# gdb --args /tmp/ossl/rand_test
...
(gdb) b *0x10000000
Breakpoint 1 at 0x10000000
(gdb) r
Starting program: /tmp/ossl/rand_test

Breakpoint 1, 0x0000000010000000 in ?? ()
(gdb) bt
#0  0x0000000010000000 in ?? ()
#1  0x0000000010004acc in syscall_random (buf=0x102b0730, buflen=32) at
crypto/rand/rand_unix.c:371
#2  0x00000000100053fc in rand_pool_acquire_entropy (pool=0x102b06e0) at
crypto/rand/rand_unix.c:636
#3  0x0000000010002b58 in rand_drbg_get_entropy (drbg=0x102b02e0,
pout=0x7fffffffecf0, entropy=256, min_len=32,
    max_len=2147483647, prediction_resistance=0) at
crypto/rand/rand_lib.c:198
#4  0x000000001001ed9c in RAND_DRBG_instantiate (drbg=0x102b02e0,
    pers=0x10248d00 <ossl_pers_string> "OpenSSL NIST SP 800-90A DRBG",
perslen=28) at crypto/rand/drbg_lib.c:338
#5  0x0000000010020300 in drbg_setup (parent=0x0) at
crypto/rand/drbg_lib.c:895
#6  0x0000000010020414 in do_rand_drbg_init () at crypto/rand/drbg_lib.c:924
#7  0x000000001002034c in do_rand_drbg_init_ossl_ () at
crypto/rand/drbg_lib.c:909
#8  0x0000000010005d1c in CRYPTO_THREAD_run_once (once=0x102ab4d8
<rand_drbg_init>,
    init=0x1002032c <do_rand_drbg_init_ossl_>) at crypto/threads_none.c:70
#9  0x00000000100209c4 in RAND_DRBG_get0_master () at
crypto/rand/drbg_lib.c:1102
#10 0x0000000010020914 in drbg_status () at crypto/rand/drbg_lib.c:1084
#11 0x0000000010004a58 in RAND_status () at crypto/rand/rand_lib.c:961
#12 0x0000000010002890 in main (argc=1, argv=0x7ffffffff368) at
rand_test.c:6
(gdb) info registers
r0             0x100053fc          268456956
r1             0x7fffffffeaf0      140737488349936
r2             0x102af788          271251336
r3             0x102b0730          271255344
r4             0x20                32
r5             0x30                48
r6             0x102b0760          271255392
r7             0x1                 1
r8             0x0                 0
r9             0x7fffb7dacc00      140736277957632
r10            0x102b0730          271255344
r11            0x10                16
r12            0x7fffb7e19280      140736278401664
r13            0x7fffb7ffa100      140736280371456
r14            0x0                 0
r15            0x0                 0
r16            0x0                 0
r17            0x0                 0
r18            0x0                 0
r19            0x0                 0
r20            0x0                 0
r21            0x0                 0
r22            0x0                 0
r23            0x0                 0
r24            0x0                 0
r25            0x0                 0
r26            0x0                 0
r27            0x7fffb7fef4b8      140736280327352
r28            0x7fffb7ff0000      140736280330240
r29            0x0                 0
r30            0x0                 0
r31            0x7fffffffeaf0      140737488349936
pc             0x10000000          0x10000000
msr            0x800000010002d033  9223372041149927475
cr             0x44000844          1140852804
lr             0x10004acc          0x10004acc <syscall_random+64>
ctr            0x0                 0
xer            0x20000000          536870912
fpscr          0x0                 0
vscr           0x0                 0
vrsave         0xffffffff          -1
ppr            0xc000000000000     3377699720527872
dscr           0x0                 0
tar            0x0                 0
bescr          <unavailable>
ebbhr          <unavailable>
ebbrr          <unavailable>
mmcr0          0x0                 0
mmcr2          0x0                 0
siar           0x0                 0
sdar           0x0                 0
sier           0x0                 0
tfhar          0x0                 0
texasr         0x0                 0
tfiar          0x0                 0
cr0            <unavailable>
cr1            <unavailable>
cr2            <unavailable>
cr3            <unavailable>
cr4            <unavailable>
cr5            <unavailable>
cr6            <unavailable>
cr7            <unavailable>
cr8            <unavailable>
cr9            <unavailable>
cr10           <unavailable>
cr11           <unavailable>
cr12           <unavailable>
cr13           <unavailable>
cr14           <unavailable>
cr15           <unavailable>
cr16           <unavailable>
cr17           <unavailable>
cr18           <unavailable>
cr19           <unavailable>
cr20           <unavailable>
cr21           <unavailable>
cr22           <unavailable>
cr23           <unavailable>
cr24           <unavailable>
cr25           <unavailable>
cr26           <unavailable>
cr27           <unavailable>
cr28           <unavailable>
cr29           <unavailable>
cr30           <unavailable>
cr31           <unavailable>
ccr            <unavailable>
cxer           <unavailable>
clr            <unavailable>
cctr           <unavailable>
cfpscr         <unavailable>
cvscr          <unavailable>
cvrsave        <unavailable>
cppr           <unavailable>
cdscr          <unavailable>
ctar           <unavailable>
orig_r3        0x10004ac8          268454600
trap           0x700               1792
(gdb)

-- 
Carl Jacobsen
Storix, Inc.

[-- Attachment #2: Type: text/html, Size: 10227 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-11-07 12:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-02 20:14 Kernel panic from malloc() on SUSE 15.1? Carl Jacobsen
2020-11-03  2:26 ` Michael Ellerman
2020-11-03 22:09   ` Carl Jacobsen
2020-11-05 10:19     ` Michael Ellerman
2020-11-05 11:00       ` Segher Boessenkool
2020-11-05 19:44       ` Carl Jacobsen
2020-11-06 12:25         ` Michael Ellerman
2020-11-07  7:44           ` Carl Jacobsen
2020-11-06 12:51 ` Michal Suchánek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.