* Kernel panic from malloc() on SUSE 15.1?
@ 2020-11-02 20:14 Carl Jacobsen
2020-11-03 2:26 ` Michael Ellerman
2020-11-06 12:51 ` Michal Suchánek
0 siblings, 2 replies; 9+ messages in thread
From: Carl Jacobsen @ 2020-11-02 20:14 UTC (permalink / raw)
To: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 1915 bytes --]
I've got a SUSE 15.1 install (on ppc64le) that kernel panics on a very
simple
test program, built in a slightly unusual way.
I'm compiling on SUSE 12, using gcc 4.8.3. I'm linking to a static
copy of libcrypto.a (from openssl-1.1.1g), built without threads.
I have a 10 line C test program that compiles and runs fine on the
SUSE 12 system. If I compile the same program on SUSE 15.1 (with
gcc 7.4.1), it runs fine on SUSE 15.1.
But, if I run the version that I compiled on SUSE 12, on the SUSE 15.1
system, the call to RAND_status() gets to a malloc() and then panics.
(And, of course, if I just compile a call to malloc(), that runs fine
on both systems.) Here's the test program, it's really just a call to
RAND_status():
#include <stdio.h>
#include <openssl/rand.h>
int main(int argc, char **argv)
{
int has_enough_data = RAND_status();
printf("The PRNG %s been seeded with enough data\n",
has_enough_data ? "HAS" : "has NOT");
return 0;
}
openssl is configured/built with:
./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
make
and the test program is compiled with:
gcc -ggdb3 -o rand_test rand_test.c libcrypto.a
The kernel on SUSE 12 is: 3.12.28-4-default
And glibc is: 2.19
The kernel on SUSE 15.1 is: 4.12.14-197.18-default
And glibc is: 2.26
In a previous iteration it was panicking in pthread_once(), so
I compiled openssl without pthreads support, and now it panics
calling malloc().
If I link to the system-supplied libcrypto.so, it works fine, and
running the same tests on x86_64 works fine, it's only ppc64le
that panics, and only running code from the old system on the
new one.
I'm trying to dig further down into this to come up with a standalone
test case, but I'm wondering if anything here stands out as a known
problem, or if someone can point me in the right direction.
Thanks,
Carl Jacobsen
[-- Attachment #2: Type: text/html, Size: 2231 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel panic from malloc() on SUSE 15.1?
2020-11-02 20:14 Kernel panic from malloc() on SUSE 15.1? Carl Jacobsen
@ 2020-11-03 2:26 ` Michael Ellerman
2020-11-03 22:09 ` Carl Jacobsen
2020-11-06 12:51 ` Michal Suchánek
1 sibling, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2020-11-03 2:26 UTC (permalink / raw)
To: Carl Jacobsen, linuxppc-dev
Carl Jacobsen <cjacobsen@storix.com> writes:
> I've got a SUSE 15.1 install (on ppc64le) that kernel panics on a very
> simple
> test program, built in a slightly unusual way.
>
> I'm compiling on SUSE 12, using gcc 4.8.3. I'm linking to a static
> copy of libcrypto.a (from openssl-1.1.1g), built without threads.
> I have a 10 line C test program that compiles and runs fine on the
> SUSE 12 system. If I compile the same program on SUSE 15.1 (with
> gcc 7.4.1), it runs fine on SUSE 15.1.
>
> But, if I run the version that I compiled on SUSE 12, on the SUSE 15.1
> system, the call to RAND_status() gets to a malloc() and then panics.
> (And, of course, if I just compile a call to malloc(), that runs fine
> on both systems.) Here's the test program, it's really just a call to
> RAND_status():
>
> #include <stdio.h>
> #include <openssl/rand.h>
>
> int main(int argc, char **argv)
> {
> int has_enough_data = RAND_status();
> printf("The PRNG %s been seeded with enough data\n",
> has_enough_data ? "HAS" : "has NOT");
> return 0;
> }
>
> openssl is configured/built with:
> ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
> make
>
> and the test program is compiled with:
> gcc -ggdb3 -o rand_test rand_test.c libcrypto.a
>
> The kernel on SUSE 12 is: 3.12.28-4-default
> And glibc is: 2.19
>
> The kernel on SUSE 15.1 is: 4.12.14-197.18-default
> And glibc is: 2.26
>
> In a previous iteration it was panicking in pthread_once(), so
> I compiled openssl without pthreads support, and now it panics
> calling malloc().
What's the panic look like?
cheers
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel panic from malloc() on SUSE 15.1?
2020-11-03 2:26 ` Michael Ellerman
@ 2020-11-03 22:09 ` Carl Jacobsen
2020-11-05 10:19 ` Michael Ellerman
0 siblings, 1 reply; 9+ messages in thread
From: Carl Jacobsen @ 2020-11-03 22:09 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 9539 bytes --]
The panic (on a call to malloc from static linked libcrypto) looks like
this:
Bad kernel stack pointer 7fffffffeac0 at 700
Oops: Bad kernel stack pointer, sig: 6 [#1]
SMP NR_CPUS=2048
NUMA
pSeries
Modules linked in: scsi_transport_iscsi af_packet xt_tcpudp ip6t_rpfilter
ip6t_REJECT ipt_REJECT xt_conntrack ip_set nfnetlink ebtable_nat
ebtable_broute br_netfilter bridge stp llc ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
x_tables ibmveth(X) vmx_crypto gf128mul crct10dif_vpmsum rtc_generic btrfs
xor zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom sd_mod
ibmvscsi(X) scsi_transport_srp crc32c_vpmsum sg dm_multipath dm_mod
scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
Supported: Yes, External
CPU: 0 PID: 14144 Comm: rand_test_no_pt Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
task: c00000002fa23b80 task.stack: c000000032824000
NIP: 0000000000000700 LR: 0000000010004ad0 CTR: 0000000000000000
REGS: c00000001ec2fd40 TRAP: 0300 Tainted: G
(4.12.14-197.18-default)
MSR: 8000000000001000 <SF,ME>
CR: 44000844 XER: 20000000
CFAR: 00000000000010f0 DAR: ffffffffffffb27a DSISR: 40000000 SOFTE: 0
GPR00: 0000000020000000 00007fffffffeac0 00000000102af788 fffffffffffffffd
GPR04: 0000000000000020 0000000000000030 00000000102b0550 0000000000000001
GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0520 800000010280f033
GPR12: 0000000000004000 00007fffb7ffa100 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 00007fffb7fef4b8
GPR28: 00007fffb7ff0000 0000000000000000 0000000000000000 00007fffffffeac0
NIP [0000000000000700] 0x700
LR [0000000010004ad0] 0x10004ad0
Call Trace:
Instruction dump:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 7db243a6 7db142a6 f92d0080 7d20e2a6
---[ end trace cc04515f274cfbf6 ]---
Sending IPI to other CPUs
IPI complete
kexec: Starting switchover sequence.
I'm in purgatory
-> smp_release_cpus()
spinning_secondaries = 0
<- smp_release_cpus()
Kernel panic - not syncing: Out of memory and no killable processes...
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
Call Trace:
[c000000012457210] [c000000008a20140] dump_stack+0xb0/0xf0 (unreliable)
[c000000012457250] [c000000008a1ccd4] panic+0x144/0x31c
[c0000000124572e0] [c0000000082efcc0] out_of_memory+0x3f0/0x700
[c000000012457380] [c0000000082f7ed4] __alloc_pages_nodemask+0x1004/0x10b0
[c000000012457570] [c00000000837f4d8] alloc_page_interleave+0x58/0x110
[c0000000124575b0] [c0000000083800bc] alloc_pages_current+0x16c/0x1d0
[c000000012457610] [c0000000082e8398] __page_cache_alloc+0xd8/0x150
[c000000012457650] [c0000000082e8574] pagecache_get_page+0x164/0x440
[c0000000124576b0] [c0000000082e8884] grab_cache_page_write_begin+0x34/0x70
[c0000000124576e0] [c00000000840ede8] simple_write_begin+0x48/0x190
[c000000012457720] [c0000000082e7c7c] generic_perform_write+0xec/0x270
[c0000000124577b0] [c0000000082ea2e0] __generic_file_write_iter+0x250/0x2a0
[c000000012457810] [c0000000082ea53c] generic_file_write_iter+0x20c/0x2e0
[c000000012457850] [c0000000083cc0e0] __vfs_write+0x120/0x1e0
[c0000000124578e0] [c0000000083cdfc8] vfs_write+0xd8/0x220
[c000000012457930] [c0000000083cfeec] SyS_write+0x6c/0x110
[c000000012457980] [c000000008d154c4] xwrite+0x54/0xb8
[c0000000124579c0] [c000000008d15574] do_copy+0x4c/0x17c
[c0000000124579f0] [c000000008d15140] write_buffer+0x64/0x90
[c000000012457a20] [c000000008d151d4] flush_buffer+0x68/0xf4
[c000000012457a70] [c000000008d62268] unxz+0x210/0x398
[c000000012457b10] [c000000008d15efc] unpack_to_rootfs+0x1f0/0x360
[c000000012457bc0] [c000000008d16108] populate_rootfs+0x9c/0x188
[c000000012457c40] [c00000000800f5d4] do_one_initcall+0x64/0x1d0
[c000000012457d00] [c000000008d14474] kernel_init_freeable+0x294/0x388
[c000000012457dc0] [c00000000801026c] kernel_init+0x2c/0x160
[c000000012457e30] [c00000000800b560] ret_from_kernel_thread+0x5c/0x7c
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at ../drivers/tty/vt/vt.c:3887
do_unblank_screen+0x1d0/0x270
Modules linked in:
Supported: Yes
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
task: c000000012449680 task.stack: c000000012454000
NIP: c0000000086d1ac0 LR: c0000000086d1918 CTR: c0000000085fc390
REGS: c000000012456f60 TRAP: 0700 Not tainted (4.12.14-197.18-default)
MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>
CR: 28242222 XER: 20000008
CFAR: c0000000086d1934 SOFTE: 0
GPR00: c0000000086d1918 c0000000124571e0 c000000009240000 0000000000000000
GPR04: 0000000000000000 c00000001237e00e 00000000000010b9 c000000012457170
GPR08: 000000000a610000 0000000000000000 c0000000090f38f0 c0000000122bc3d7
GPR12: 0000000028242428 c00000000f6c0000 00000000014200c2 00000000014200c2
GPR16: 00000000014200c2 0000000000000001 0000000000000000 0000000000000240
GPR20: 0000000000000001 0000000000000240 0000000000000000 c0000000140e1d10
GPR24: 0000000000000000 0000000000000000 0000000000000115 c000000009282374
GPR28: c000000009403508 c0000000094034d8 0000000000000000 0000000000000000
NIP [c0000000086d1ac0] do_unblank_screen+0x1d0/0x270
LR [c0000000086d1918] do_unblank_screen+0x28/0x270
Call Trace:
[c0000000124571e0] [c000000012457250] 0xc000000012457250 (unreliable)
[c000000012457250] [c000000008a1cd44] panic+0x1b4/0x31c
[c0000000124572e0] [c0000000082efcc0] out_of_memory+0x3f0/0x700
[c000000012457380] [c0000000082f7ed4] __alloc_pages_nodemask+0x1004/0x10b0
[c000000012457570] [c00000000837f4d8] alloc_page_interleave+0x58/0x110
[c0000000124575b0] [c0000000083800bc] alloc_pages_current+0x16c/0x1d0
[c000000012457610] [c0000000082e8398] __page_cache_alloc+0xd8/0x150
[c000000012457650] [c0000000082e8574] pagecache_get_page+0x164/0x440
[c0000000124576b0] [c0000000082e8884] grab_cache_page_write_begin+0x34/0x70
[c0000000124576e0] [c00000000840ede8] simple_write_begin+0x48/0x190
[c000000012457720] [c0000000082e7c7c] generic_perform_write+0xec/0x270
[c0000000124577b0] [c0000000082ea2e0] __generic_file_write_iter+0x250/0x2a0
[c000000012457810] [c0000000082ea53c] generic_file_write_iter+0x20c/0x2e0
[c000000012457850] [c0000000083cc0e0] __vfs_write+0x120/0x1e0
[c0000000124578e0] [c0000000083cdfc8] vfs_write+0xd8/0x220
[c000000012457930] [c0000000083cfeec] SyS_write+0x6c/0x110
[c000000012457980] [c000000008d154c4] xwrite+0x54/0xb8
[c0000000124579c0] [c000000008d15574] do_copy+0x4c/0x17c
[c0000000124579f0] [c000000008d15140] write_buffer+0x64/0x90
[c000000012457a20] [c000000008d151d4] flush_buffer+0x68/0xf4
[c000000012457a70] [c000000008d62268] unxz+0x210/0x398
[c000000012457b10] [c000000008d15efc] unpack_to_rootfs+0x1f0/0x360
[c000000012457bc0] [c000000008d16108] populate_rootfs+0x9c/0x188
[c000000012457c40] [c00000000800f5d4] do_one_initcall+0x64/0x1d0
[c000000012457d00] [c000000008d14474] kernel_init_freeable+0x294/0x388
[c000000012457dc0] [c00000000801026c] kernel_init+0x2c/0x160
[c000000012457e30] [c00000000800b560] ret_from_kernel_thread+0x5c/0x7c
Instruction dump:
3d22001c 39293920 81290000 2f890000 409cff00 ebe10068 38210070 e8010010
ebc1fff0 7c0803a6 4e800020 60000000 <0fe00000> 4bfffe74 60000000 60000000
---[ end trace ad1803c957b45442 ]---
---[ end Kernel panic - not syncing: Out of memory and no killable
processes...
On Mon, Nov 2, 2020 at 6:26 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
> Carl Jacobsen <cjacobsen@storix.com> writes:
> > I've got a SUSE 15.1 install (on ppc64le) that kernel panics on a very
> > simple
> > test program, built in a slightly unusual way.
> >
> > I'm compiling on SUSE 12, using gcc 4.8.3. I'm linking to a static
> > copy of libcrypto.a (from openssl-1.1.1g), built without threads.
> > I have a 10 line C test program that compiles and runs fine on the
> > SUSE 12 system. If I compile the same program on SUSE 15.1 (with
> > gcc 7.4.1), it runs fine on SUSE 15.1.
> >
> > But, if I run the version that I compiled on SUSE 12, on the SUSE 15.1
> > system, the call to RAND_status() gets to a malloc() and then panics.
> > (And, of course, if I just compile a call to malloc(), that runs fine
> > on both systems.) Here's the test program, it's really just a call to
> > RAND_status():
> >
> > #include <stdio.h>
> > #include <openssl/rand.h>
> >
> > int main(int argc, char **argv)
> > {
> > int has_enough_data = RAND_status();
> > printf("The PRNG %s been seeded with enough data\n",
> > has_enough_data ? "HAS" : "has NOT");
> > return 0;
> > }
> >
> > openssl is configured/built with:
> > ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
> > make
> >
> > and the test program is compiled with:
> > gcc -ggdb3 -o rand_test rand_test.c libcrypto.a
> >
> > The kernel on SUSE 12 is: 3.12.28-4-default
> > And glibc is: 2.19
> >
> > The kernel on SUSE 15.1 is: 4.12.14-197.18-default
> > And glibc is: 2.26
> >
> > In a previous iteration it was panicking in pthread_once(), so
> > I compiled openssl without pthreads support, and now it panics
> > calling malloc().
>
> What's the panic look like?
>
> cheers
>
--
Carl Jacobsen
Storix, Inc.
[-- Attachment #2: Type: text/html, Size: 10895 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel panic from malloc() on SUSE 15.1?
2020-11-03 22:09 ` Carl Jacobsen
@ 2020-11-05 10:19 ` Michael Ellerman
2020-11-05 11:00 ` Segher Boessenkool
2020-11-05 19:44 ` Carl Jacobsen
0 siblings, 2 replies; 9+ messages in thread
From: Michael Ellerman @ 2020-11-05 10:19 UTC (permalink / raw)
To: Carl Jacobsen; +Cc: linuxppc-dev
Carl Jacobsen <cjacobsen@storix.com> writes:
> The panic (on a call to malloc from static linked libcrypto) looks like
> this:
Thanks.
This doesn't make a lot of sense.
> Bad kernel stack pointer 7fffffffeac0 at 700
"at 700" is the regs->nip value, and suggests we're trying to handle a
program check, which is either a trap or BUG or WARN, or illegal
instruction or several other things.
> Oops: Bad kernel stack pointer, sig: 6 [#1]
> SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in: scsi_transport_iscsi af_packet xt_tcpudp ip6t_rpfilter
> ip6t_REJECT ipt_REJECT xt_conntrack ip_set nfnetlink ebtable_nat
> ebtable_broute br_netfilter bridge stp llc ip6table_nat nf_conntrack_ipv6
> nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security
> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
> nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
> x_tables ibmveth(X) vmx_crypto gf128mul crct10dif_vpmsum rtc_generic btrfs
> xor zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom sd_mod
> ibmvscsi(X) scsi_transport_srp crc32c_vpmsum sg dm_multipath dm_mod
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
> Supported: Yes, External
> CPU: 0 PID: 14144 Comm: rand_test_no_pt Tainted: G 4.12.14-197.18-default #1 SLE15-SP1
> task: c00000002fa23b80 task.stack: c000000032824000
> NIP: 0000000000000700 LR: 0000000010004ad0 CTR: 0000000000000000
> REGS: c00000001ec2fd40 TRAP: 0300 Tainted: G (4.12.14-197.18-default)
But then here it says TRAP = 0x300, which is != 0x700.
The trap number is hardcoded in the bad stack handling code, and I don't
see how we can end up with nip == 0x700 but the trap value == 0x300.
> MSR: 8000000000001000 <SF,ME> CR: 44000844 XER: 20000000
And here the MSR says you were in big endian mode, but you said before
your machine was ppc64le.
> CFAR: 00000000000010f0 DAR: ffffffffffffb27a DSISR: 40000000 SOFTE: 0
> GPR00: 0000000020000000 00007fffffffeac0 00000000102af788 fffffffffffffffd
> GPR04: 0000000000000020 0000000000000030 00000000102b0550 0000000000000001
> GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0520 800000010280f033
> GPR12: 0000000000004000 00007fffb7ffa100 0000000000000000 0000000000000000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR24: 0000000000000000 0000000000000000 0000000000000000 00007fffb7fef4b8
> GPR28: 00007fffb7ff0000 0000000000000000 0000000000000000 00007fffffffeac0
The rest of the regs look like user space values, not kernel.
> NIP [0000000000000700] 0x700
> LR [0000000010004ad0] 0x10004ad0
> Call Trace:
> Instruction dump:
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 7db243a6 7db142a6 f92d0080 7d20e2a6
> ---[ end trace cc04515f274cfbf6 ]---
What hardware is this on?
Can you try booting with ppc_tm=off on the kernel command line, and see
if that changes anything?
Can you put your compiled test program up somewhere we can download it
and look at? Or post the disassembly?
cheers
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel panic from malloc() on SUSE 15.1?
2020-11-05 10:19 ` Michael Ellerman
@ 2020-11-05 11:00 ` Segher Boessenkool
2020-11-05 19:44 ` Carl Jacobsen
1 sibling, 0 replies; 9+ messages in thread
From: Segher Boessenkool @ 2020-11-05 11:00 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Carl Jacobsen, linuxppc-dev
On Thu, Nov 05, 2020 at 09:19:22PM +1100, Michael Ellerman wrote:
> Carl Jacobsen <cjacobsen@storix.com> writes:
> This doesn't make a lot of sense.
>
> > Bad kernel stack pointer 7fffffffeac0 at 700
>
> "at 700" is the regs->nip value, and suggests we're trying to handle a
> program check, which is either a trap or BUG or WARN, or illegal
> instruction or several other things.
> > REGS: c00000001ec2fd40 TRAP: 0300 Tainted: G (4.12.14-197.18-default)
>
> But then here it says TRAP = 0x300, which is != 0x700.
>
> The trap number is hardcoded in the bad stack handling code, and I don't
> see how we can end up with nip == 0x700 but the trap value == 0x300.
>
> > MSR: 8000000000001000 <SF,ME> CR: 44000844 XER: 20000000
>
> And here the MSR says you were in big endian mode, but you said before
> your machine was ppc64le.
It looks like you got a DSI (the 300), but for some reason that
interrupt was not taken in LE mode, so the instruction at 300 was read
as a lot of gobbledygook, not a valid insn, and the processor took a
program interrupt (the 700).
(MSR[RI]=0, but there can be other causes for that of course.)
Segher
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel panic from malloc() on SUSE 15.1?
2020-11-05 10:19 ` Michael Ellerman
2020-11-05 11:00 ` Segher Boessenkool
@ 2020-11-05 19:44 ` Carl Jacobsen
2020-11-06 12:25 ` Michael Ellerman
1 sibling, 1 reply; 9+ messages in thread
From: Carl Jacobsen @ 2020-11-05 19:44 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 14126 bytes --]
On Thu, Nov 5, 2020 at 2:19 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> Carl Jacobsen <cjacobsen@storix.com> writes:
> > The panic (on a call to malloc from static linked libcrypto) looks like
> > this:
>
> What hardware is this on?
>
Thank you for looking into this.
The system that's panicking identifies like this:
# uname -a
Linux sl151pwr8 4.12.14-197.18-default #1 SMP Tue Sep 17 14:26:49 UTC
2019
(d75059b) ppc64le ppc64le ppc64le GNU/Linux
#
# cat /etc/os-release
NAME="SLES"
VERSION="15-SP1"
VERSION_ID="15.1"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP1"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp1"
The system is an LPAR running under PowerVM vios version 2.2.3.4.
The underlying hardware is machine type-model 8284-22A.
> Can you try booting with ppc_tm=off on the kernel command line, and see
> if that changes anything?
>
Yes. Output is down below. Doesn't appear to change much, but I don't have
the background to interpret the registers.
> Can you put your compiled test program up somewhere we can download it
> and look at? Or post the disassembly?
>
Here's the source file:
https://www.storix.com/download/support/misc/rand_test.c
Here's the resulting executable:
https://www.storix.com/download/support/misc/rand_test
Executable is linked to libcrypto from openssl-1.1.1g, configured with:
./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
Executable is built (on SUSE 12) with:
gcc -ggdb3 -o rand_test rand_test.c libcrypto.a
And running the executable (on SUSE 15.1) through gdb goes like this:
# gdb --args ./rand_test
GNU gdb (GDB; SUSE Linux Enterprise 15) 8.3.1
<< snip intro text >>
Reading symbols from ./rand_test...
(gdb) b main
Breakpoint 1 at 0x1000288c: file rand_test.c, line 6.
(gdb) r
Starting program: /tmp/ossl/rand_test
Breakpoint 1, main (argc=1, argv=0x7ffffffff798) at rand_test.c:6
6 int has_enough_data = RAND_status();
(gdb) s
RAND_status () at crypto/rand/rand_lib.c:958
958 const RAND_METHOD *meth = RAND_get_rand_method();
(gdb)
RAND_get_rand_method () at crypto/rand/rand_lib.c:844
844 const RAND_METHOD *tmp_meth = NULL;
(gdb)
846 if (!RUN_ONCE(&rand_init, do_rand_init))
(gdb)
CRYPTO_THREAD_run_once (once=0x102a7d88 <rand_init>,
init=0x10002f30 <do_rand_init_ossl_>) at crypto/threads_none.c:67
67 if (*once != 0)
(gdb)
70 init();
(gdb)
do_rand_init_ossl_ () at crypto/rand/rand_lib.c:306
306 DEFINE_RUN_ONCE_STATIC(do_rand_init)
(gdb)
do_rand_init () at crypto/rand/rand_lib.c:309
309 rand_engine_lock = CRYPTO_THREAD_lock_new();
(gdb)
CRYPTO_THREAD_lock_new () at crypto/threads_none.c:24
24 if ((lock = OPENSSL_zalloc(sizeof(unsigned int))) == NULL) {
(gdb)
CRYPTO_zalloc (num=4, file=0x1023a500 "crypto/threads_none.c", line=24)
at crypto/mem.c:230
230 void *ret = CRYPTO_malloc(num, file, line);
(gdb)
CRYPTO_malloc (num=4, file=0x1023a500 "crypto/threads_none.c", line=24)
at crypto/mem.c:194
194 void *ret = NULL;
(gdb)
197 if (malloc_impl != NULL && malloc_impl != CRYPTO_malloc)
(gdb)
200 if (num == 0)
(gdb)
204 if (allow_customize) {
(gdb)
210 allow_customize = 0;
(gdb)
222 ret = malloc(num);
(gdb)
Bad kernel stack pointer 7fffffffef20 at 700
Oops: Bad kernel stack pointer, sig: 6 [#1]
SMP NR_CPUS=2048
NUMA
pSeries
Modules linked in: scsi_transport_iscsi af_packet xt_tcpudp
ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ip_set nfnetlink
ebtable_nat ebtable_broute br_netfilter bridge stp llc ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
x_tables ibmveth(X) vmx_crypto gf128mul crct10dif_vpmsum rtc_generic btrfs
xor zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom sd_mod
ibmvscsi(X) scsi_transport_srp crc32c_vpmsum sg dm_multipath dm_mod
scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
Supported: Yes, External
CPU: 4 PID: 3082 Comm: rand_test Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
task: c00000002e226100 task.stack: c0000000387c8000
NIP: 0000000000000700 LR: 0000000010004acc CTR: 0000000000000000
REGS: c00000001ebffd40 TRAP: 0300 Tainted: G
(4.12.14-197.18-default)
MSR: 8000000000001000 <SF,ME>
CR: 44000844 XER: 20000000
CFAR: 00000000000010f0 DAR: ffffffffffffb27a DSISR: 40000000 SOFTE: 0
GPR00: 0000000020000000 00007fffffffef20 00000000102af788
fffffffffffffffd
GPR04: 0000000000000020 0000000000000030 00000000102b0760
0000000000000001
GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0730
800000010280f033
GPR12: 0000000000004000 00007fffb7ffa100 0000000000000000
0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000
00007fffb7fef4b8
GPR28: 00007fffb7ff0000 0000000000000000 0000000000000000
00007fffffffef20
NIP [0000000000000700] 0x700
LR [0000000010004acc] 0x10004acc
Call Trace:
Instruction dump:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 7db243a6 7db142a6 f92d0080 7d20e2a6
---[ end trace 167d5d3b2e8a06e9 ]---
Sending IPI to other CPUs
IPI complete
kexec: Starting switchover sequence.
I'm in purgatory
-> smp_release_cpus()
spinning_secondaries = 0
<- smp_release_cpus()
Kernel panic - not syncing: Out of memory and no killable processes...
CPU: 4 PID: 1 Comm: swapper/4 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
Call Trace:
[c000000012457210] [c000000008a20140] dump_stack+0xb0/0xf0 (unreliable)
[c000000012457250] [c000000008a1ccd4] panic+0x144/0x31c
[c0000000124572e0] [c0000000082efcc0] out_of_memory+0x3f0/0x700
[c000000012457380] [c0000000082f7ed4]
__alloc_pages_nodemask+0x1004/0x10b0
[c000000012457570] [c00000000837f4d8] alloc_page_interleave+0x58/0x110
[c0000000124575b0] [c0000000083800bc] alloc_pages_current+0x16c/0x1d0
[c000000012457610] [c0000000082e8398] __page_cache_alloc+0xd8/0x150
[c000000012457650] [c0000000082e8574] pagecache_get_page+0x164/0x440
[c0000000124576b0] [c0000000082e8884]
grab_cache_page_write_begin+0x34/0x70
[c0000000124576e0] [c00000000840ede8] simple_write_begin+0x48/0x190
[c000000012457720] [c0000000082e7c7c] generic_perform_write+0xec/0x270
[c0000000124577b0] [c0000000082ea2e0]
__generic_file_write_iter+0x250/0x2a0
[c000000012457810] [c0000000082ea53c]
generic_file_write_iter+0x20c/0x2e0
[c000000012457850] [c0000000083cc0e0] __vfs_write+0x120/0x1e0
[c0000000124578e0] [c0000000083cdfc8] vfs_write+0xd8/0x220
[c000000012457930] [c0000000083cfeec] SyS_write+0x6c/0x110
[c000000012457980] [c000000008d154c4] xwrite+0x54/0xb8
[c0000000124579c0] [c000000008d15574] do_copy+0x4c/0x17c
[c0000000124579f0] [c000000008d15140] write_buffer+0x64/0x90
[c000000012457a20] [c000000008d151d4] flush_buffer+0x68/0xf4
[c000000012457a70] [c000000008d62268] unxz+0x210/0x398
[c000000012457b10] [c000000008d15efc] unpack_to_rootfs+0x1f0/0x360
[c000000012457bc0] [c000000008d16108] populate_rootfs+0x9c/0x188
[c000000012457c40] [c00000000800f5d4] do_one_initcall+0x64/0x1d0
[c000000012457d00] [c000000008d14474] kernel_init_freeable+0x294/0x388
[c000000012457dc0] [c00000000801026c] kernel_init+0x2c/0x160
[c000000012457e30] [c00000000800b560] ret_from_kernel_thread+0x5c/0x7c
------------[ cut here ]------------
Doing the same thing but with ppc_tm=off...
# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinux-4.12.14-197.18-default
root=UUID=0e795e37-3692-465a-a037-c2935a9fde7a mitigations=auto quiet
crashkernel=197M ppc_tm=off
Results in a panic at the same point, with a few registers changed:
<< snip down to panic at malloc >>
(gdb)
Bad kernel stack pointer 7fffffffef20 at 700
Oops: Bad kernel stack pointer, sig: 6 [#1]
SMP NR_CPUS=2048
NUMA
pSeries
Modules linked in: scsi_transport_iscsi af_packet xt_tcpudp
ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ip_set nfnetlink
ebtable_nat ebtable_broute br_netfilter bridge stp llc ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
x_tables ibmveth(X) vmx_crypto gf128mul crct10dif_vpmsum rtc_generic btrfs
xor zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom sd_mod
ibmvscsi(X) scsi_transport_srp crc32c_vpmsum sg dm_multipath dm_mod
scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
Supported: Yes, External
CPU: 2 PID: 3079 Comm: rand_test Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
task: c00000002f6bcc00 task.stack: c0000000321fc000
NIP: 0000000000000700 LR: 0000000010004acc CTR: 0000000000000000
REGS: c00000001ec17d40 TRAP: 0300 Tainted: G
(4.12.14-197.18-default)
MSR: 8000000000001000 <SF,ME>
CR: 44000844 XER: 20000000
CFAR: 00000000000010f0 DAR: ffffffffffffb27a DSISR: 40000000 SOFTE: 0
GPR00: 0000000020000000 00007fffffffef20 00000000102af788
fffffffffffffffd
GPR04: 0000000000000020 0000000000000030 00000000102b0760
0000000000000001
GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0730
800000000280f033
GPR12: 0000000000004000 00007fffb7ffa100 0000000000000000
0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000
00007fffb7fef4b8
GPR28: 00007fffb7ff0000 0000000000000000 0000000000000000
00007fffffffef20
NIP [0000000000000700] 0x700
LR [0000000010004acc] 0x10004acc
Call Trace:
Instruction dump:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 7db243a6 7db142a6 f92d0080 7d20e2a6
---[ end trace 436f626dd098548c ]---
Sending IPI to other CPUs
IPI complete
kexec: Starting switchover sequence.
I'm in purgatory
-> smp_release_cpus()
spinning_secondaries = 0
<- smp_release_cpus()
Kernel panic - not syncing: Out of memory and no killable processes...
CPU: 2 PID: 1 Comm: swapper/2 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
Call Trace:
[c000000012457210] [c000000008a20140] dump_stack+0xb0/0xf0 (unreliable)
[c000000012457250] [c000000008a1ccd4] panic+0x144/0x31c
[c0000000124572e0] [c0000000082efcc0] out_of_memory+0x3f0/0x700
[c000000012457380] [c0000000082f7ed4]
__alloc_pages_nodemask+0x1004/0x10b0
[c000000012457570] [c00000000837f4d8] alloc_page_interleave+0x58/0x110
[c0000000124575b0] [c0000000083800bc] alloc_pages_current+0x16c/0x1d0
[c000000012457610] [c0000000082e8398] __page_cache_alloc+0xd8/0x150
[c000000012457650] [c0000000082e8574] pagecache_get_page+0x164/0x440
[c0000000124576b0] [c0000000082e8884]
grab_cache_page_write_begin+0x34/0x70
[c0000000124576e0] [c00000000840ede8] simple_write_begin+0x48/0x190
[c000000012457720] [c0000000082e7c7c] generic_perform_write+0xec/0x270
[c0000000124577b0] [c0000000082ea2e0]
__generic_file_write_iter+0x250/0x2a0
[c000000012457810] [c0000000082ea53c]
generic_file_write_iter+0x20c/0x2e0
[c000000012457850] [c0000000083cc0e0] __vfs_write+0x120/0x1e0
[c0000000124578e0] [c0000000083cdfc8] vfs_write+0xd8/0x220
[c000000012457930] [c0000000083cfeec] SyS_write+0x6c/0x110
[c000000012457980] [c000000008d154c4] xwrite+0x54/0xb8
[c0000000124579c0] [c000000008d15574] do_copy+0x4c/0x17c
[c0000000124579f0] [c000000008d15140] write_buffer+0x64/0x90
[c000000012457a20] [c000000008d151d4] flush_buffer+0x68/0xf4
[c000000012457a70] [c000000008d62268] unxz+0x210/0x398
[c000000012457b10] [c000000008d15efc] unpack_to_rootfs+0x1f0/0x360
[c000000012457bc0] [c000000008d16108] populate_rootfs+0x9c/0x188
[c000000012457c40] [c00000000800f5d4] do_one_initcall+0x64/0x1d0
[c000000012457d00] [c000000008d14474] kernel_init_freeable+0x294/0x388
[c000000012457dc0] [c00000000801026c] kernel_init+0x2c/0x160
[c000000012457e30] [c00000000800b560] ret_from_kernel_thread+0x5c/0x7c
------------[ cut here ]------------
Diffing the panic output looks like this (highlighting register changes?):
74,75c79,80
< CPU: 4 PID: 3082 Comm: rand_test Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
< task: c00000002e226100 task.stack: c0000000387c8000
---
> CPU: 2 PID: 3079 Comm: rand_test Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
> task: c00000002f6bcc00 task.stack: c0000000321fc000
77c82
< REGS: c00000001ebffd40 TRAP: 0300 Tainted: G
(4.12.14-197.18-default)
---
> REGS: c00000001ec17d40 TRAP: 0300 Tainted: G
(4.12.14-197.18-default)
83c88
< GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0730
800000010280f033
---
> GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0730
800000000280f033
95c100
< ---[ end trace 167d5d3b2e8a06e9 ]---
---
> ---[ end trace 436f626dd098548c ]---
106c111
< CPU: 4 PID: 1 Comm: swapper/4 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
---
> CPU: 2 PID: 1 Comm: swapper/2 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
--
Carl Jacobsen
Storix, Inc.
[-- Attachment #2: Type: text/html, Size: 16954 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel panic from malloc() on SUSE 15.1?
2020-11-05 19:44 ` Carl Jacobsen
@ 2020-11-06 12:25 ` Michael Ellerman
2020-11-07 7:44 ` Carl Jacobsen
0 siblings, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2020-11-06 12:25 UTC (permalink / raw)
To: Carl Jacobsen; +Cc: linuxppc-dev
Carl Jacobsen <cjacobsen@storix.com> writes:
> On Thu, Nov 5, 2020 at 2:19 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
>> Carl Jacobsen <cjacobsen@storix.com> writes:
>> > The panic (on a call to malloc from static linked libcrypto) looks like
>> > this:
>>
>> What hardware is this on?
>>
>
> Thank you for looking into this.
>
> The system that's panicking identifies like this:
> # uname -a
> Linux sl151pwr8 4.12.14-197.18-default #1 SMP Tue Sep 17 14:26:49 UTC
> 2019
> (d75059b) ppc64le ppc64le ppc64le GNU/Linux
> #
> # cat /etc/os-release
> NAME="SLES"
> VERSION="15-SP1"
> VERSION_ID="15.1"
> PRETTY_NAME="SUSE Linux Enterprise Server 15 SP1"
> ID="sles"
> ID_LIKE="suse"
> ANSI_COLOR="0;32"
> CPE_NAME="cpe:/o:suse:sles:15:sp1"
>
> The system is an LPAR running under PowerVM vios version 2.2.3.4.
> The underlying hardware is machine type-model 8284-22A.
OK thanks. That's a Power8.
>> Can you try booting with ppc_tm=off on the kernel command line, and see
>> if that changes anything?
>
> Yes. Output is down below. Doesn't appear to change much, but I don't have
> the background to interpret the registers.
Yeah looks like that's not the problem.
>> Can you put your compiled test program up somewhere we can download it
>> and look at? Or post the disassembly?
>>
>
> Here's the source file:
> https://www.storix.com/download/support/misc/rand_test.c
>
> Here's the resulting executable:
> https://www.storix.com/download/support/misc/rand_test
Thanks.
So something seems to have gone wrong linking this, I see eg:
0000000010004a8c <syscall_random>:
10004a8c: 2b 10 40 3c lis r2,4139
10004a90: 88 f7 42 38 addi r2,r2,-2168
10004a94: a6 02 08 7c mflr r0
10004a98: 10 00 01 f8 std r0,16(r1)
10004a9c: f8 ff e1 fb std r31,-8(r1)
10004aa0: 81 ff 21 f8 stdu r1,-128(r1)
10004aa4: 78 0b 3f 7c mr r31,r1
10004aa8: 60 00 7f f8 std r3,96(r31)
10004aac: 68 00 9f f8 std r4,104(r31)
10004ab0: 00 00 00 60 nop
10004ab4: 30 80 22 e9 ld r9,-32720(r2)
10004ab8: 00 00 a9 2f cmpdi cr7,r9,0
10004abc: 30 00 9e 41 beq cr7,10004aec <syscall_random+0x60>
10004ac0: 60 00 7f e8 ld r3,96(r31)
10004ac4: 68 00 9f e8 ld r4,104(r31)
10004ac8: 39 b5 ff 4b bl 10000000 <_init-0x1f00>
Notice that last bl (branch and link) to 0x10000000. But there's no text
at 0x10000000, that's the start of the page which happens to be the ELF
magic.
I've seen something like this before, but I can't remember when/where so
I haven't been able to track down what the problem was.
Anyway hopefully someone on the list will know.
That still doesn't explain the kernel crash though.
> Executable is linked to libcrypto from openssl-1.1.1g, configured with:
> ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
>
> Executable is built (on SUSE 12) with:
> gcc -ggdb3 -o rand_test rand_test.c libcrypto.a
> And running the executable (on SUSE 15.1) through gdb goes like this:
>
> # gdb --args ./rand_test
> GNU gdb (GDB; SUSE Linux Enterprise 15) 8.3.1
> << snip intro text >>
> Reading symbols from ./rand_test...
> (gdb) b main
> Breakpoint 1 at 0x1000288c: file rand_test.c, line 6.
> (gdb) r
> Starting program: /tmp/ossl/rand_test
>
> Breakpoint 1, main (argc=1, argv=0x7ffffffff798) at rand_test.c:6
> 6 int has_enough_data = RAND_status();
> (gdb) s
> RAND_status () at crypto/rand/rand_lib.c:958
> 958 const RAND_METHOD *meth = RAND_get_rand_method();
> (gdb)
> RAND_get_rand_method () at crypto/rand/rand_lib.c:844
> 844 const RAND_METHOD *tmp_meth = NULL;
> (gdb)
> 846 if (!RUN_ONCE(&rand_init, do_rand_init))
> (gdb)
> CRYPTO_THREAD_run_once (once=0x102a7d88 <rand_init>, > init=0x10002f30 <do_rand_init_ossl_>) at crypto/threads_none.c:67
> 67 if (*once != 0)
> (gdb)
> 70 init();
> (gdb)
> do_rand_init_ossl_ () at crypto/rand/rand_lib.c:306
> 306 DEFINE_RUN_ONCE_STATIC(do_rand_init)
> (gdb)
> do_rand_init () at crypto/rand/rand_lib.c:309
> 309 rand_engine_lock = CRYPTO_THREAD_lock_new();
> (gdb)
> CRYPTO_THREAD_lock_new () at crypto/threads_none.c:24
> 24 if ((lock = OPENSSL_zalloc(sizeof(unsigned int))) == NULL) {
> (gdb)
> CRYPTO_zalloc (num=4, file=0x1023a500 "crypto/threads_none.c", line=24) > at crypto/mem.c:230
> 230 void *ret = CRYPTO_malloc(num, file, line);
> (gdb)
> CRYPTO_malloc (num=4, file=0x1023a500 "crypto/threads_none.c", line=24) > at crypto/mem.c:194
> 194 void *ret = NULL;
> (gdb)
> 197 if (malloc_impl != NULL && malloc_impl != CRYPTO_malloc)
> (gdb)
> 200 if (num == 0)
> (gdb)
> 204 if (allow_customize) {
> (gdb)
> 210 allow_customize = 0;
> (gdb)
> 222 ret = malloc(num);
> (gdb)
> Bad kernel stack pointer 7fffffffef20 at 700
On my machine it doesn't crash the kernel, so I can catch it later. For
me it's here:
Program received signal SIGILL, Illegal instruction.
0x0000000010000004 in ?? ()
(gdb) bt
#0 0x0000000010000004 in ?? ()
#1 0x0000000010004acc in syscall_random (buf=0x102b0730, buflen=32)
at crypto/rand/rand_unix.c:371
#2 0x00000000100053fc in rand_pool_acquire_entropy (pool=0x102b06e0)
at crypto/rand/rand_unix.c:636
#3 0x0000000010002b58 in rand_drbg_get_entropy (drbg=0x102b02e0,
pout=0x7ffffffff3f0, entropy=256, min_len=32, max_len=2147483647,
prediction_resistance=0) at crypto/rand/rand_lib.c:198
#4 0x000000001001ed9c in RAND_DRBG_instantiate (drbg=0x102b02e0,
pers=0x10248d00 <ossl_pers_string> "OpenSSL NIST SP 800-90A DRBG",
perslen=28) at crypto/rand/drbg_lib.c:338
#5 0x0000000010020300 in drbg_setup (parent=0x0) at crypto/rand/drbg_lib.c:895
#6 0x0000000010020414 in do_rand_drbg_init () at crypto/rand/drbg_lib.c:924
#7 0x000000001002034c in do_rand_drbg_init_ossl_ ()
at crypto/rand/drbg_lib.c:909
#8 0x0000000010005d1c in CRYPTO_THREAD_run_once (
once=0x102ab4d8 <rand_drbg_init>,
init=0x1002032c <do_rand_drbg_init_ossl_>) at crypto/threads_none.c:70
#9 0x00000000100209c4 in RAND_DRBG_get0_master ()
at crypto/rand/drbg_lib.c:1102
#10 0x0000000010020914 in drbg_status () at crypto/rand/drbg_lib.c:1084
#11 0x0000000010004a58 in RAND_status () at crypto/rand/rand_lib.c:961
#12 0x0000000010002890 in main (argc=1, argv=0x7ffffffffa68) at rand_test.c:6
(gdb)
ie. in the syscall_random() that I mentioned above.
You should be able to catch it there too if you do:
(gdb) b *0x10000000
(gdb) r
Hopefully it will stop without crashing the kernel, and then a `bt` will
show that you're in the same place as me.
If you can get that to work, when you're stopped there, can you do an
`info registers` and send us the output.
cheers
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel panic from malloc() on SUSE 15.1?
2020-11-02 20:14 Kernel panic from malloc() on SUSE 15.1? Carl Jacobsen
2020-11-03 2:26 ` Michael Ellerman
@ 2020-11-06 12:51 ` Michal Suchánek
1 sibling, 0 replies; 9+ messages in thread
From: Michal Suchánek @ 2020-11-06 12:51 UTC (permalink / raw)
To: Carl Jacobsen; +Cc: linuxppc-dev
On Mon, Nov 02, 2020 at 12:14:27PM -0800, Carl Jacobsen wrote:
> I've got a SUSE 15.1 install (on ppc64le) that kernel panics on a very
> simple
> test program, built in a slightly unusual way.
>
> I'm compiling on SUSE 12, using gcc 4.8.3. I'm linking to a static
> copy of libcrypto.a (from openssl-1.1.1g), built without threads.
> I have a 10 line C test program that compiles and runs fine on the
> SUSE 12 system. If I compile the same program on SUSE 15.1 (with
> gcc 7.4.1), it runs fine on SUSE 15.1.
>
> But, if I run the version that I compiled on SUSE 12, on the SUSE 15.1
> system, the call to RAND_status() gets to a malloc() and then panics.
> (And, of course, if I just compile a call to malloc(), that runs fine
> on both systems.) Here's the test program, it's really just a call to
> RAND_status():
>
> #include <stdio.h>
> #include <openssl/rand.h>
>
> int main(int argc, char **argv)
> {
> int has_enough_data = RAND_status();
> printf("The PRNG %s been seeded with enough data\n",
> has_enough_data ? "HAS" : "has NOT");
> return 0;
> }
>
> openssl is configured/built with:
> ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
> make
>
> and the test program is compiled with:
> gcc -ggdb3 -o rand_test rand_test.c libcrypto.a
>
> The kernel on SUSE 12 is: 3.12.28-4-default
> And glibc is: 2.19
>
> The kernel on SUSE 15.1 is: 4.12.14-197.18-default
> And glibc is: 2.26
SLE 12 SP5 has pretty much the same kernel as SLE 15 SP1 and pretty much
the same compiler as SLE 12 so it might be interesting data point to try
there.
Also I saw you are using very old VIOS (which should not make much of a
difference) but did not see what firmware version the machine has.
There have been cases of mysterious crashes solved by updating the
firmware.
Thanks
Michal
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel panic from malloc() on SUSE 15.1?
2020-11-06 12:25 ` Michael Ellerman
@ 2020-11-07 7:44 ` Carl Jacobsen
0 siblings, 0 replies; 9+ messages in thread
From: Carl Jacobsen @ 2020-11-07 7:44 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 7593 bytes --]
On Fri, Nov 6, 2020 at 4:25 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> So something seems to have gone wrong linking this, I see eg:
>
> 0000000010004a8c <syscall_random>:
> 10004a8c: 2b 10 40 3c lis r2,4139
> 10004a90: 88 f7 42 38 addi r2,r2,-2168
> 10004a94: a6 02 08 7c mflr r0
> 10004a98: 10 00 01 f8 std r0,16(r1)
> 10004a9c: f8 ff e1 fb std r31,-8(r1)
> 10004aa0: 81 ff 21 f8 stdu r1,-128(r1)
> 10004aa4: 78 0b 3f 7c mr r31,r1
> 10004aa8: 60 00 7f f8 std r3,96(r31)
> 10004aac: 68 00 9f f8 std r4,104(r31)
> 10004ab0: 00 00 00 60 nop
> 10004ab4: 30 80 22 e9 ld r9,-32720(r2)
> 10004ab8: 00 00 a9 2f cmpdi cr7,r9,0
> 10004abc: 30 00 9e 41 beq cr7,10004aec <syscall_random+0x60>
> 10004ac0: 60 00 7f e8 ld r3,96(r31)
> 10004ac4: 68 00 9f e8 ld r4,104(r31)
> 10004ac8: 39 b5 ff 4b bl 10000000 <_init-0x1f00>
>
> Notice that last bl (branch and link) to 0x10000000. But there's no text
> at 0x10000000, that's the start of the page which happens to be the ELF
> magic.
>
> I've seen something like this before, but I can't remember when/where so
> I haven't been able to track down what the problem was.
>
> Anyway hopefully someone on the list will know.
>
> That still doesn't explain the kernel crash though.
>
Interesting. Sounds highly unlikely that the linker would have picked
that address at random, but it makes no sense. And, agreed, jumping
into junk should crash the program, not the kernel.
> On my machine it doesn't crash the kernel, so I can catch it later. For
> me it's here:
> ....
ie. in the syscall_random() that I mentioned above.
>
> You should be able to catch it there too if you do:
>
> (gdb) b *0x10000000
> (gdb) r
>
> Hopefully it will stop without crashing the kernel, and then a `bt` will
> show that you're in the same place as me.
>
> If you can get that to work, when you're stopped there, can you do an
> `info registers` and send us the output.
>
Indeed, setting the breakpoint you suggested works, and the stack looks
almost the same - only differences are a few bits off in main's argv
pointer, rand_drbg_get_entropy's pout pointer, and the final address - you
get 0x0000000010000004, I get 0x0000000010000000. Output, including "info
registers", below. Hoping they provide some useful clues. Thanks again for
looking into this.
# gdb --args /tmp/ossl/rand_test
...
(gdb) b *0x10000000
Breakpoint 1 at 0x10000000
(gdb) r
Starting program: /tmp/ossl/rand_test
Breakpoint 1, 0x0000000010000000 in ?? ()
(gdb) bt
#0 0x0000000010000000 in ?? ()
#1 0x0000000010004acc in syscall_random (buf=0x102b0730, buflen=32) at
crypto/rand/rand_unix.c:371
#2 0x00000000100053fc in rand_pool_acquire_entropy (pool=0x102b06e0) at
crypto/rand/rand_unix.c:636
#3 0x0000000010002b58 in rand_drbg_get_entropy (drbg=0x102b02e0,
pout=0x7fffffffecf0, entropy=256, min_len=32,
max_len=2147483647, prediction_resistance=0) at
crypto/rand/rand_lib.c:198
#4 0x000000001001ed9c in RAND_DRBG_instantiate (drbg=0x102b02e0,
pers=0x10248d00 <ossl_pers_string> "OpenSSL NIST SP 800-90A DRBG",
perslen=28) at crypto/rand/drbg_lib.c:338
#5 0x0000000010020300 in drbg_setup (parent=0x0) at
crypto/rand/drbg_lib.c:895
#6 0x0000000010020414 in do_rand_drbg_init () at crypto/rand/drbg_lib.c:924
#7 0x000000001002034c in do_rand_drbg_init_ossl_ () at
crypto/rand/drbg_lib.c:909
#8 0x0000000010005d1c in CRYPTO_THREAD_run_once (once=0x102ab4d8
<rand_drbg_init>,
init=0x1002032c <do_rand_drbg_init_ossl_>) at crypto/threads_none.c:70
#9 0x00000000100209c4 in RAND_DRBG_get0_master () at
crypto/rand/drbg_lib.c:1102
#10 0x0000000010020914 in drbg_status () at crypto/rand/drbg_lib.c:1084
#11 0x0000000010004a58 in RAND_status () at crypto/rand/rand_lib.c:961
#12 0x0000000010002890 in main (argc=1, argv=0x7ffffffff368) at
rand_test.c:6
(gdb) info registers
r0 0x100053fc 268456956
r1 0x7fffffffeaf0 140737488349936
r2 0x102af788 271251336
r3 0x102b0730 271255344
r4 0x20 32
r5 0x30 48
r6 0x102b0760 271255392
r7 0x1 1
r8 0x0 0
r9 0x7fffb7dacc00 140736277957632
r10 0x102b0730 271255344
r11 0x10 16
r12 0x7fffb7e19280 140736278401664
r13 0x7fffb7ffa100 140736280371456
r14 0x0 0
r15 0x0 0
r16 0x0 0
r17 0x0 0
r18 0x0 0
r19 0x0 0
r20 0x0 0
r21 0x0 0
r22 0x0 0
r23 0x0 0
r24 0x0 0
r25 0x0 0
r26 0x0 0
r27 0x7fffb7fef4b8 140736280327352
r28 0x7fffb7ff0000 140736280330240
r29 0x0 0
r30 0x0 0
r31 0x7fffffffeaf0 140737488349936
pc 0x10000000 0x10000000
msr 0x800000010002d033 9223372041149927475
cr 0x44000844 1140852804
lr 0x10004acc 0x10004acc <syscall_random+64>
ctr 0x0 0
xer 0x20000000 536870912
fpscr 0x0 0
vscr 0x0 0
vrsave 0xffffffff -1
ppr 0xc000000000000 3377699720527872
dscr 0x0 0
tar 0x0 0
bescr <unavailable>
ebbhr <unavailable>
ebbrr <unavailable>
mmcr0 0x0 0
mmcr2 0x0 0
siar 0x0 0
sdar 0x0 0
sier 0x0 0
tfhar 0x0 0
texasr 0x0 0
tfiar 0x0 0
cr0 <unavailable>
cr1 <unavailable>
cr2 <unavailable>
cr3 <unavailable>
cr4 <unavailable>
cr5 <unavailable>
cr6 <unavailable>
cr7 <unavailable>
cr8 <unavailable>
cr9 <unavailable>
cr10 <unavailable>
cr11 <unavailable>
cr12 <unavailable>
cr13 <unavailable>
cr14 <unavailable>
cr15 <unavailable>
cr16 <unavailable>
cr17 <unavailable>
cr18 <unavailable>
cr19 <unavailable>
cr20 <unavailable>
cr21 <unavailable>
cr22 <unavailable>
cr23 <unavailable>
cr24 <unavailable>
cr25 <unavailable>
cr26 <unavailable>
cr27 <unavailable>
cr28 <unavailable>
cr29 <unavailable>
cr30 <unavailable>
cr31 <unavailable>
ccr <unavailable>
cxer <unavailable>
clr <unavailable>
cctr <unavailable>
cfpscr <unavailable>
cvscr <unavailable>
cvrsave <unavailable>
cppr <unavailable>
cdscr <unavailable>
ctar <unavailable>
orig_r3 0x10004ac8 268454600
trap 0x700 1792
(gdb)
--
Carl Jacobsen
Storix, Inc.
[-- Attachment #2: Type: text/html, Size: 10227 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-11-07 12:05 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-02 20:14 Kernel panic from malloc() on SUSE 15.1? Carl Jacobsen
2020-11-03 2:26 ` Michael Ellerman
2020-11-03 22:09 ` Carl Jacobsen
2020-11-05 10:19 ` Michael Ellerman
2020-11-05 11:00 ` Segher Boessenkool
2020-11-05 19:44 ` Carl Jacobsen
2020-11-06 12:25 ` Michael Ellerman
2020-11-07 7:44 ` Carl Jacobsen
2020-11-06 12:51 ` Michal Suchánek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).