* strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels @ 2017-07-27 19:45 Mikael Pettersson 2017-07-28 5:10 ` David Miller 0 siblings, 1 reply; 21+ messages in thread From: Mikael Pettersson @ 2017-07-27 19:45 UTC (permalink / raw) To: sparclinux; +Cc: linux-kernel, David S. Miller Attempting to build strace-4.18 as sparcv9 code and run its test suite on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails reliably in three test cases (sched.gen, sched_xetattr.gen, and poll) because two test binaries (sched_xetattr and poll) OOPS the kernel and get killed. Sample dmesg from 4.13-rc2: [42912.270398] Unable to handle kernel NULL pointer dereference [42912.327717] tsk->{mm,active_mm}->context = 000000000000136a [42912.383789] tsk->{mm,active_mm}->pgd = fff0000227db4000 [42912.435247] \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ [42912.559982] sched_xetattr(21866): Oops [#1] [42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 4.13.0-rc2 #1 [42912.672138] task: fff0000229a5c380 task.stack: fff0000227dec000 [42912.732876] TSTATE: 0000004411001603 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Not tainted [42912.845079] TPC: <__bzero+0x20/0xc0> [42912.874870] g0: 0000000000000000 g1: 0000000000000000 g2: 0000003000000000 g3: 00000000008ca100 [42912.972120] g4: fff0000229a5c380 g5: fff000023ef44000 g6: fff0000227dec000 g7: 0000000000000030 [42913.069446] o0: 0000000000000030 o1: fff0000227defe70 o2: 0000000000000000 o3: 0000000000000030 [42913.166765] o4: fff0000227defe70 o5: 0000000000000000 sp: fff0000227def5c1 ret_pc: 0000000000474fa4 [42913.268664] RPC: <SyS_sched_setattr+0xb0/0x150> [42913.311046] l0: 00000000f7b6caa8 l1: 00000000cccccccd l2: 00000000ffc2e7d4 l3: 00000000f7b6c888 [42913.408293] l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: 00000000f7ba2000 [42913.505627] i0: 0000000000000000 i1: 00000000f79f1ffc i2: 0000000000000000 i3: 0000000000000000 [42913.602940] i4: fff0000227defe70 i5: fff0000227defe70 i6: fff0000227def6a1 i7: 00000000004061b4 [42913.700268] I7: <linux_sparc_syscall32+0x34/0x60> [42913.744966] Call Trace: [42913.759938] [00000000004061b4] linux_sparc_syscall32+0x34/0x60 [42913.820656] Disabling lock debugging due to kernel taint [42913.873374] Caller[00000000004061b4]: linux_sparc_syscall32+0x34/0x60 [42913.940953] Caller[0000000000010ed0]: 0x10ed0 [42913.981081] Instruction DUMP: [42913.981085] c56a2000 [42914.002817] 808a2003 [42914.016643] 02480006 [42914.030363] <d42a2000> [42914.044094] 90022001 [42914.057816] 808a2003 [42914.071539] 1247fffd [42914.085269] 92226001 [42914.098992] 808a2007 [42914.471525] Unable to handle kernel NULL pointer dereference [42914.528830] tsk->{mm,active_mm}->context = 00000000000017cd [42914.584862] tsk->{mm,active_mm}->pgd = fff0000227b78000 [42914.636319] \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ [42914.761013] sched_xetattr(22483): Oops [#2] [42914.798837] CPU: 0 PID: 22483 Comm: sched_xetattr Tainted: G D 4.13.0-rc2 #1 [42914.889222] task: fff000123c73bc00 task.stack: fff0001238998000 [42914.949915] TSTATE: 0000004411001603 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Tainted: G D [42915.078076] TPC: <__bzero+0x20/0xc0> [42915.107875] g0: 0000000000000000 g1: 0000000000000000 g2: 0000003000000000 g3: 00000000008ca100 [42915.205205] g4: fff000123c73bc00 g5: fff000023ef44000 g6: fff0001238998000 g7: 0000000000000030 [42915.302532] o0: 0000000000000030 o1: fff000123899be70 o2: 0000000000000000 o3: 0000000000000030 [42915.399851] o4: fff000123899be70 o5: 0000000000000000 sp: fff000123899b5c1 ret_pc: 0000000000474fa4 [42915.501731] RPC: <SyS_sched_setattr+0xb0/0x150> [42915.544033] l0: 00000000f784caa8 l1: 00000000cccccccd l2: 00000000ff91c7d4 l3: 00000000f784c888 [42915.641289] l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: 00000000f7882000 [42915.738582] i0: 0000000000000000 i1: 00000000f76d1ffc i2: 0000000000000000 i3: 0000000000000000 [42915.835827] i4: fff000123899be70 i5: fff000123899be70 i6: fff000123899b6a1 i7: 00000000004061b4 [42915.933160] I7: <linux_sparc_syscall32+0x34/0x60> [42915.977822] Call Trace: [42915.992698] [00000000004061b4] linux_sparc_syscall32+0x34/0x60 [42916.053335] Caller[00000000004061b4]: linux_sparc_syscall32+0x34/0x60 [42916.120934] Caller[0000000000010ed0]: 0x10ed0 [42916.161052] Instruction DUMP: [42916.161056] c56a2000 [42916.182878] 808a2003 [42916.196607] 02480006 [42916.210330] <d42a2000> [42916.224052] 90022001 [42916.237781] 808a2003 [42916.251502] 1247fffd [42916.265224] 92226001 [42916.278955] 808a2007 [42918.071476] ------------[ cut here ]------------ [42918.115146] WARNING: CPU: 0 PID: 23177 at arch/sparc/kernel/sys_sparc32.c:150 compat_SyS_sparc_sigaction+0x34/0x4c [42918.234167] Modules linked in: af_packet ipv6 hid_generic tg3 hwmon i2c_ali1535 ohci_pci ptp ohci_hcd evdev i2c_core pps_core flash sr_mod cdrom pata_ali libata [42918.405845] CPU: 0 PID: 23177 Comm: sigaction Tainted: G D 4.13.0-rc2 #1 [42918.491645] Call Trace: [42918.506518] [0000000000455b18] __warn+0xb4/0xc4 [42918.549976] [00000000004449e4] compat_SyS_sparc_sigaction+0x34/0x4c [42918.616319] [00000000004061b4] linux_sparc_syscall32+0x34/0x60 [42918.677014] ---[ end trace 4800f70b0fef934e ]--- [42947.617187] Unable to handle kernel NULL pointer dereference [42947.674440] tsk->{mm,active_mm}->context = 00000000000018d3 [42947.730560] tsk->{mm,active_mm}->pgd = fff0000202a04000 [42947.782020] \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ [42947.906726] poll(31644): Oops [#3] [42947.934244] CPU: 0 PID: 31644 Comm: poll Tainted: G D W 4.13.0-rc2 #1 [42948.014399] task: fff000023c68cb00 task.stack: fff0000227adc000 [42948.075064] TSTATE: 0000004411001601 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Tainted: G D W [42948.203275] TPC: <__bzero+0x20/0xc0> [42948.233069] g0: fff000123c5a8828 g1: 0000000000000000 g2: 0000000000000000 g3: 00000000008ca100 [42948.330322] g4: fff000023c68cb00 g5: fff000023ef44000 g6: fff0000227adc000 g7: 0000000000000008 [42948.427651] o0: 000000000000000c o1: fff0000227adfa80 o2: 0000000000000000 o3: 000000000000000c [42948.524959] o4: fff0000227adfa7c o5: 00000000000000fb sp: fff0000227adf181 ret_pc: 0000000000516ee0 [42948.626876] RPC: <do_sys_poll+0x80/0x3c0> [42948.662408] l0: 0000000000000002 l1: 00000000014000c0 l2: 00000000000003fe l3: fff0000227adfa7c [42948.759738] l4: 0000000000000000 l5: 0000000000000000 l6: 000000000000006d l7: ffffffffffffffea [42948.857064] i0: 00000000f7dbdff8 i1: 0000000000000002 i2: fff0000227adfe90 i3: fff0000227adfa70 [42948.954389] i4: 000ffffdd8520590 i5: fff0000227adfa70 i6: fff0000227adf5e1 i7: 00000000005177f8 [42949.051703] I7: <SyS_poll+0x74/0xd0> [42949.081507] Call Trace: [42949.096407] [00000000005177f8] SyS_poll+0x74/0xd0 [42949.142242] [00000000004061b4] linux_sparc_syscall32+0x34/0x60 [42949.202876] Caller[00000000005177f8]: SyS_poll+0x74/0xd0 [42949.255596] Caller[00000000004061b4]: linux_sparc_syscall32+0x34/0x60 [42949.323177] Caller[0000000000010a20]: 0x10a20 [42949.363284] Instruction DUMP: [42949.363288] c56a2000 [42949.385037] 808a2003 [42949.398841] 02480006 [42949.412564] <d42a2000> [42949.426287] 90022001 [42949.440034] 808a2003 [42949.453739] 1247fffd [42949.467465] 92226001 [42949.481188] 808a2007 [42965.393520] pc[534]: segfault at 1085c ip 000000000001085c (rpc 0000000000010854) sp 00000000ffba8da8 error 30001 in pc[20000+2000] This occurs reliably with the 4.13-rc2, 4.13-rc1, and 4.12.0 kernels. With 4.11.0 and older kernels the binaries get some EFAULTs but they survive that, and there are also no OOPSes. /Mikael ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-07-27 19:45 strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels Mikael Pettersson @ 2017-07-28 5:10 ` David Miller 2017-07-28 8:45 ` Mikael Pettersson 0 siblings, 1 reply; 21+ messages in thread From: David Miller @ 2017-07-28 5:10 UTC (permalink / raw) To: mikpelinux; +Cc: sparclinux, linux-kernel From: Mikael Pettersson <mikpelinux@gmail.com> Date: Thu, 27 Jul 2017 21:45:25 +0200 > Attempting to build strace-4.18 as sparcv9 code and run its test suite > on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails > reliably in three test cases (sched.gen, sched_xetattr.gen, and poll) > because two test binaries (sched_xetattr and poll) OOPS the kernel and > get killed. Sample dmesg from 4.13-rc2: > > [42912.270398] Unable to handle kernel NULL pointer dereference > [42912.327717] tsk->{mm,active_mm}->context = 000000000000136a > [42912.383789] tsk->{mm,active_mm}->pgd = fff0000227db4000 > [42912.435247] \|/ ____ \|/ > "@'/ .. \`@" > /_| \__/ |_\ > \__U_/ > [42912.559982] sched_xetattr(21866): Oops [#1] > [42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 4.13.0-rc2 #1 > [42912.672138] task: fff0000229a5c380 task.stack: fff0000227dec000 > [42912.732876] TSTATE: 0000004411001603 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Not tainted > [42912.845079] TPC: <__bzero+0x20/0xc0> > [42912.874870] g0: 0000000000000000 g1: 0000000000000000 g2: 0000003000000000 g3: 00000000008ca100 > [42912.972120] g4: fff0000229a5c380 g5: fff000023ef44000 g6: fff0000227dec000 g7: 0000000000000030 > [42913.069446] o0: 0000000000000030 o1: fff0000227defe70 o2: 0000000000000000 o3: 0000000000000030 > [42913.166765] o4: fff0000227defe70 o5: 0000000000000000 sp: fff0000227def5c1 ret_pc: 0000000000474fa4 > [42913.268664] RPC: <SyS_sched_setattr+0xb0/0x150> This looks really strange. It is a memset() call with the buffer pointer and length arguments reversed. What exact command did you give to configure and build strace-4.18 so that I can try to reproduce this? Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-07-28 5:10 ` David Miller @ 2017-07-28 8:45 ` Mikael Pettersson 2017-07-28 18:27 ` David Miller 2017-07-29 10:52 ` Anatoly Pugachev 0 siblings, 2 replies; 21+ messages in thread From: Mikael Pettersson @ 2017-07-28 8:45 UTC (permalink / raw) To: David Miller; +Cc: mikpelinux, sparclinux, linux-kernel David Miller writes: > From: Mikael Pettersson <mikpelinux@gmail.com> > Date: Thu, 27 Jul 2017 21:45:25 +0200 > > > Attempting to build strace-4.18 as sparcv9 code and run its test suite > > on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails > > reliably in three test cases (sched.gen, sched_xetattr.gen, and poll) > > because two test binaries (sched_xetattr and poll) OOPS the kernel and > > get killed. Sample dmesg from 4.13-rc2: > > > > [42912.270398] Unable to handle kernel NULL pointer dereference > > [42912.327717] tsk->{mm,active_mm}->context = 000000000000136a > > [42912.383789] tsk->{mm,active_mm}->pgd = fff0000227db4000 > > [42912.435247] \|/ ____ \|/ > > "@'/ .. \`@" > > /_| \__/ |_\ > > \__U_/ > > [42912.559982] sched_xetattr(21866): Oops [#1] > > [42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 4.13.0-rc2 #1 > > [42912.672138] task: fff0000229a5c380 task.stack: fff0000227dec000 > > [42912.732876] TSTATE: 0000004411001603 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Not tainted > > [42912.845079] TPC: <__bzero+0x20/0xc0> > > [42912.874870] g0: 0000000000000000 g1: 0000000000000000 g2: 0000003000000000 g3: 00000000008ca100 > > [42912.972120] g4: fff0000229a5c380 g5: fff000023ef44000 g6: fff0000227dec000 g7: 0000000000000030 > > [42913.069446] o0: 0000000000000030 o1: fff0000227defe70 o2: 0000000000000000 o3: 0000000000000030 > > [42913.166765] o4: fff0000227defe70 o5: 0000000000000000 sp: fff0000227def5c1 ret_pc: 0000000000474fa4 > > [42913.268664] RPC: <SyS_sched_setattr+0xb0/0x150> > > This looks really strange. It is a memset() call with the buffer pointer > and length arguments reversed. > > What exact command did you give to configure and build strace-4.18 so that > I can try to reproduce this? It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but according to the build log the following should do it: export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -mcpu=ultrasparc' ./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info make -j2 make -j2 -k check VERBOSE=1 /Mikael ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-07-28 8:45 ` Mikael Pettersson @ 2017-07-28 18:27 ` David Miller 2017-07-28 18:37 ` David Miller 2017-07-29 10:52 ` Anatoly Pugachev 1 sibling, 1 reply; 21+ messages in thread From: David Miller @ 2017-07-28 18:27 UTC (permalink / raw) To: mikpelinux; +Cc: sparclinux, linux-kernel From: Mikael Pettersson <mikpelinux@gmail.com> Date: Fri, 28 Jul 2017 10:45:15 +0200 > David Miller writes: > > From: Mikael Pettersson <mikpelinux@gmail.com> > > Date: Thu, 27 Jul 2017 21:45:25 +0200 > > > > > Attempting to build strace-4.18 as sparcv9 code and run its test suite > > > on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails > > > reliably in three test cases (sched.gen, sched_xetattr.gen, and poll) > > > because two test binaries (sched_xetattr and poll) OOPS the kernel and > > > get killed. Sample dmesg from 4.13-rc2: > > > > > > [42912.270398] Unable to handle kernel NULL pointer dereference > > > [42912.327717] tsk->{mm,active_mm}->context = 000000000000136a > > > [42912.383789] tsk->{mm,active_mm}->pgd = fff0000227db4000 > > > [42912.435247] \|/ ____ \|/ > > > "@'/ .. \`@" > > > /_| \__/ |_\ > > > \__U_/ > > > [42912.559982] sched_xetattr(21866): Oops [#1] > > > [42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 4.13.0-rc2 #1 > > > [42912.672138] task: fff0000229a5c380 task.stack: fff0000227dec000 > > > [42912.732876] TSTATE: 0000004411001603 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Not tainted > > > [42912.845079] TPC: <__bzero+0x20/0xc0> > > > [42912.874870] g0: 0000000000000000 g1: 0000000000000000 g2: 0000003000000000 g3: 00000000008ca100 > > > [42912.972120] g4: fff0000229a5c380 g5: fff000023ef44000 g6: fff0000227dec000 g7: 0000000000000030 > > > [42913.069446] o0: 0000000000000030 o1: fff0000227defe70 o2: 0000000000000000 o3: 0000000000000030 > > > [42913.166765] o4: fff0000227defe70 o5: 0000000000000000 sp: fff0000227def5c1 ret_pc: 0000000000474fa4 > > > [42913.268664] RPC: <SyS_sched_setattr+0xb0/0x150> > > > > This looks really strange. It is a memset() call with the buffer pointer > > and length arguments reversed. > > > > What exact command did you give to configure and build strace-4.18 so that > > I can try to reproduce this? > > It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but according to the > build log the following should do it: > > export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -mcpu=ultrasparc' > ./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u > sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local > statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info > make -j2 > make -j2 -k check VERBOSE=1 I guess your gcc is emitting 64-bit code by default? Because simply using that configure line doesn't cause any problems for me and I get 32-bit binaries from the build. . ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-07-28 18:27 ` David Miller @ 2017-07-28 18:37 ` David Miller 0 siblings, 0 replies; 21+ messages in thread From: David Miller @ 2017-07-28 18:37 UTC (permalink / raw) To: mikpelinux; +Cc: sparclinux, linux-kernel From: David Miller <davem@davemloft.net> Date: Fri, 28 Jul 2017 11:27:41 -0700 (PDT) > From: Mikael Pettersson <mikpelinux@gmail.com> > Date: Fri, 28 Jul 2017 10:45:15 +0200 > >> David Miller writes: >> > From: Mikael Pettersson <mikpelinux@gmail.com> >> > Date: Thu, 27 Jul 2017 21:45:25 +0200 >> > >> > > Attempting to build strace-4.18 as sparcv9 code and run its test suite >> > > on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails >> > > reliably in three test cases (sched.gen, sched_xetattr.gen, and poll) >> > > because two test binaries (sched_xetattr and poll) OOPS the kernel and >> > > get killed. Sample dmesg from 4.13-rc2: >> > > >> > > [42912.270398] Unable to handle kernel NULL pointer dereference >> > > [42912.327717] tsk->{mm,active_mm}->context = 000000000000136a >> > > [42912.383789] tsk->{mm,active_mm}->pgd = fff0000227db4000 >> > > [42912.435247] \|/ ____ \|/ >> > > "@'/ .. \`@" >> > > /_| \__/ |_\ >> > > \__U_/ >> > > [42912.559982] sched_xetattr(21866): Oops [#1] >> > > [42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 4.13.0-rc2 #1 >> > > [42912.672138] task: fff0000229a5c380 task.stack: fff0000227dec000 >> > > [42912.732876] TSTATE: 0000004411001603 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Not tainted >> > > [42912.845079] TPC: <__bzero+0x20/0xc0> >> > > [42912.874870] g0: 0000000000000000 g1: 0000000000000000 g2: 0000003000000000 g3: 00000000008ca100 >> > > [42912.972120] g4: fff0000229a5c380 g5: fff000023ef44000 g6: fff0000227dec000 g7: 0000000000000030 >> > > [42913.069446] o0: 0000000000000030 o1: fff0000227defe70 o2: 0000000000000000 o3: 0000000000000030 >> > > [42913.166765] o4: fff0000227defe70 o5: 0000000000000000 sp: fff0000227def5c1 ret_pc: 0000000000474fa4 >> > > [42913.268664] RPC: <SyS_sched_setattr+0xb0/0x150> >> > >> > This looks really strange. It is a memset() call with the buffer pointer >> > and length arguments reversed. >> > >> > What exact command did you give to configure and build strace-4.18 so that >> > I can try to reproduce this? >> >> It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but according to the >> build log the following should do it: >> >> export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -mcpu=ultrasparc' >> ./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u >> sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local >> statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info >> make -j2 >> make -j2 -k check VERBOSE=1 > > I guess your gcc is emitting 64-bit code by default? > > Because simply using that configure line doesn't cause any problems for me and > I get 32-bit binaries from the build. > . I've just also done a forced 64-bit build with "CC="gcc -m64 ./configure --build=sparc64-unknown-linux-gnu ..." and it built just fine and the testsuite ran without incident. I cannot reporduce your crashes at all. Please provide me with the binaries you have which trigger the OOPS and tell me exactly how to run them. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-07-28 8:45 ` Mikael Pettersson 2017-07-28 18:27 ` David Miller @ 2017-07-29 10:52 ` Anatoly Pugachev 2017-07-29 12:02 ` Mikael Pettersson 1 sibling, 1 reply; 21+ messages in thread From: Anatoly Pugachev @ 2017-07-29 10:52 UTC (permalink / raw) To: Mikael Pettersson; +Cc: David Miller, Sparc kernel list, Linux Kernel list On Fri, Jul 28, 2017 at 11:45 AM, Mikael Pettersson <mikpelinux@gmail.com> wrote: > It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but according to the > build log the following should do it: > > export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -mcpu=ultrasparc' > ./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u > sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local > statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info > make -j2 > make -j2 -k check VERBOSE=1 cant' reproduce it here on debian sparc64 LDOM: used git version of strace ( https://github.com/strace/strace ) strace$ make -j24 check VERBOSE=1 ... ============================================================================ Testsuite summary for strace 4.18.0.134.805d ============================================================================ # TOTAL: 443 # PASS: 389 # SKIP: 40 # XFAIL: 0 # FAIL: 14 # XPASS: 0 # ERROR: 0 while in kernel logs (journalctl -k -f): Jul 29 12:49:22 ttip kernel: mmap: remap_file_page (77341) uses deprecated remap_file_pages() syscall. See Documentation/vm/remap_file_pages.txt. Jul 29 12:49:22 ttip kernel: capability: warning: `caps' uses deprecated v2 capabilities in a way that may be insecure Jul 29 12:49:22 ttip kernel: capability: warning: `caps' uses 32-bit capabilities (legacy support in use) Jul 29 12:49:22 ttip kernel: ------------[ cut here ]------------ Jul 29 12:49:22 ttip kernel: WARNING: CPU: 18 PID: 78388 at arch/sparc/kernel/sys_sparc32.c:150 compat_SyS_sparc_sigaction+0x3c/0x60 Jul 29 12:49:22 ttip kernel: Modules linked in: tcp_diag inet_diag xfrm_user xfrm_algo nfnetlink netlink_diag xt_tcpudp xt_multiport xt_conntrack tun iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xfs camellia_sparc64 des_sparc64 des_generic aes_sparc64 n2_rng md5_sparc64 rng_core flash sha512_sparc64 sha256_sparc64 sha1_sparc64 nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat nf_conntrack libcrc32c crc32c_generic ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_sparc64 Jul 29 12:49:22 ttip kernel: CPU: 18 PID: 78388 Comm: sigaction Not tainted 4.13.0-rc2-00220-g0a07b238e5f4 #376 Jul 29 12:49:22 ttip kernel: Call Trace: Jul 29 12:49:22 ttip kernel: [000000000046c074] __warn+0xb4/0xe0 Jul 29 12:49:22 ttip kernel: [000000000046c120] warn_slowpath_null+0x20/0x40 Jul 29 12:49:22 ttip kernel: [000000000044b7bc] compat_SyS_sparc_sigaction+0x3c/0x60 Jul 29 12:49:22 ttip kernel: [00000000004061d4] linux_sparc_syscall32+0x34/0x60 Jul 29 12:49:22 ttip kernel: ---[ end trace 1ad5184278304e6d ]--- Jul 29 12:49:25 ttip kernel: pc[83378]: segfault at 70000974 ip 0000000070000974 (rpc 000000007000096c) sp 00000000ffdd9438 error 30001 in pc[70010000+2000] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-07-29 10:52 ` Anatoly Pugachev @ 2017-07-29 12:02 ` Mikael Pettersson 2017-07-31 17:14 ` Mikael Pettersson 0 siblings, 1 reply; 21+ messages in thread From: Mikael Pettersson @ 2017-07-29 12:02 UTC (permalink / raw) To: Anatoly Pugachev Cc: Mikael Pettersson, David Miller, Sparc kernel list, Linux Kernel list Anatoly Pugachev writes: > On Fri, Jul 28, 2017 at 11:45 AM, Mikael Pettersson > <mikpelinux@gmail.com> wrote: > > It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but according to the > > build log the following should do it: > > > > export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -mcpu=ultrasparc' > > ./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u > > sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local > > statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info > > make -j2 > > make -j2 -k check VERBOSE=1 > > cant' reproduce it here on debian sparc64 LDOM: DaveM was also unable to reproduce it. I'll be investigating a possible kernel miscompile next. > > used git version of strace ( https://github.com/strace/strace ) > > strace$ make -j24 check VERBOSE=1 > ... > ============================================================================ > Testsuite summary for strace 4.18.0.134.805d > ============================================================================ > # TOTAL: 443 > # PASS: 389 > # SKIP: 40 > # XFAIL: 0 > # FAIL: 14 > # XPASS: 0 > # ERROR: 0 > > > while in kernel logs (journalctl -k -f): > > Jul 29 12:49:22 ttip kernel: mmap: remap_file_page (77341) uses > deprecated remap_file_pages() syscall. See > Documentation/vm/remap_file_pages.txt. > Jul 29 12:49:22 ttip kernel: capability: warning: `caps' uses > deprecated v2 capabilities in a way that may be insecure > Jul 29 12:49:22 ttip kernel: capability: warning: `caps' uses 32-bit > capabilities (legacy support in use) > Jul 29 12:49:22 ttip kernel: ------------[ cut here ]------------ > Jul 29 12:49:22 ttip kernel: WARNING: CPU: 18 PID: 78388 at > arch/sparc/kernel/sys_sparc32.c:150 > compat_SyS_sparc_sigaction+0x3c/0x60 > Jul 29 12:49:22 ttip kernel: Modules linked in: tcp_diag inet_diag > xfrm_user xfrm_algo nfnetlink netlink_diag xt_tcpudp xt_multiport > xt_conntrack tun iptable_filter iptable_nat nf_conntrack_ipv4 > nf_defrag_ipv4 nf_nat_ipv4 xfs camellia_sparc64 des_sparc64 > des_generic aes_sparc64 n2_rng md5_sparc64 rng_core flash > sha512_sparc64 sha256_sparc64 sha1_sparc64 nf_nat_pptp > nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat > nf_conntrack libcrc32c crc32c_generic ip_tables x_tables autofs4 ext4 > crc16 mbcache jbd2 crc32c_sparc64 > Jul 29 12:49:22 ttip kernel: CPU: 18 PID: 78388 Comm: sigaction Not > tainted 4.13.0-rc2-00220-g0a07b238e5f4 #376 > Jul 29 12:49:22 ttip kernel: Call Trace: > Jul 29 12:49:22 ttip kernel: [000000000046c074] __warn+0xb4/0xe0 > Jul 29 12:49:22 ttip kernel: [000000000046c120] warn_slowpath_null+0x20/0x40 > Jul 29 12:49:22 ttip kernel: [000000000044b7bc] > compat_SyS_sparc_sigaction+0x3c/0x60 > Jul 29 12:49:22 ttip kernel: [00000000004061d4] linux_sparc_syscall32+0x34/0x60 > Jul 29 12:49:22 ttip kernel: ---[ end trace 1ad5184278304e6d ]--- > Jul 29 12:49:25 ttip kernel: pc[83378]: segfault at 70000974 ip > 0000000070000974 (rpc 000000007000096c) sp 00000000ffdd9438 error > 30001 in pc[70010000+2000] -- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-07-29 12:02 ` Mikael Pettersson @ 2017-07-31 17:14 ` Mikael Pettersson 2017-07-31 21:48 ` Anatoly Pugachev 0 siblings, 1 reply; 21+ messages in thread From: Mikael Pettersson @ 2017-07-31 17:14 UTC (permalink / raw) To: Mikael Pettersson Cc: Anatoly Pugachev, David Miller, Sparc kernel list, Linux Kernel list Mikael Pettersson writes: > Anatoly Pugachev writes: > > On Fri, Jul 28, 2017 at 11:45 AM, Mikael Pettersson > > <mikpelinux@gmail.com> wrote: > > > It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but according to the > > > build log the following should do it: > > > > > > export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -mcpu=ultrasparc' > > > ./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u > > > sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local > > > statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info > > > make -j2 > > > make -j2 -k check VERBOSE=1 > > > > cant' reproduce it here on debian sparc64 LDOM: > > DaveM was also unable to reproduce it. > > I'll be investigating a possible kernel miscompile next. I don't think it's a miscompile. First I recompiled 4.13-rc2 with each of gcc-7, gcc-6, and gcc-5, each bootstrapped and regtested from the head of the respective FSF GCC branch: no change, kernel 4.11 works while kernels >= 4.12 OOPS. So a miscompile seems unlikely. Then I ran a git bisect between v4.11 (good) and v4.12 (bad), booting each kernel and trying the problematic strace test binaries. That identified the following as the first bad commit: commit 31af2f36d50e3b9b2fb7f17aa430c11c91f946c4 Author: Al Viro <viro@zeniv.linux.org.uk> Date: Tue Mar 21 17:04:45 2017 -0400 sparc: switch to RAW_COPY_USER ... and drop zeroing in sparc32. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> That touches the CPU model specific assembly code in arch/sparc/lib/ for copy_{to,from}_user and changes how it's wired into the rest of the kernel. There's different code for different UltraSPARC and Niagara generations, so if there is a bug in e.g. the USIII code, you won't see it on Niagara. Unfortunately I don't see anything obviously wrong in Al's patch... /Mikael ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-07-31 17:14 ` Mikael Pettersson @ 2017-07-31 21:48 ` Anatoly Pugachev 2017-07-31 21:51 ` David Miller 0 siblings, 1 reply; 21+ messages in thread From: Anatoly Pugachev @ 2017-07-31 21:48 UTC (permalink / raw) To: Mikael Pettersson; +Cc: David Miller, Sparc kernel list, Linux Kernel list On Mon, Jul 31, 2017 at 8:14 PM, Mikael Pettersson <mikpelinux@gmail.com> wrote: > Mikael Pettersson writes: > > Anatoly Pugachev writes: > > > On Fri, Jul 28, 2017 at 11:45 AM, Mikael Pettersson > > > <mikpelinux@gmail.com> wrote: > > > > It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but according to the > > > > build log the following should do it: > > > > > > > > export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -mcpu=ultrasparc' > > > > ./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u > > > > sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local > > > > statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info > > > > make -j2 > > > > make -j2 -k check VERBOSE=1 > > > > > > cant' reproduce it here on debian sparc64 LDOM: > > > > DaveM was also unable to reproduce it. > > > > I'll be investigating a possible kernel miscompile next. > > I don't think it's a miscompile. > > First I recompiled 4.13-rc2 with each of gcc-7, gcc-6, and gcc-5, each > bootstrapped and regtested from the head of the respective FSF GCC branch: > no change, kernel 4.11 works while kernels >= 4.12 OOPS. So a miscompile > seems unlikely. > > Then I ran a git bisect between v4.11 (good) and v4.12 (bad), booting > each kernel and trying the problematic strace test binaries. That > identified the following as the first bad commit: > > commit 31af2f36d50e3b9b2fb7f17aa430c11c91f946c4 > Author: Al Viro <viro@zeniv.linux.org.uk> > Date: Tue Mar 21 17:04:45 2017 -0400 > > sparc: switch to RAW_COPY_USER > > ... and drop zeroing in sparc32. > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > > That touches the CPU model specific assembly code in arch/sparc/lib/ for > copy_{to,from}_user and changes how it's wired into the rest of the kernel. > There's different code for different UltraSPARC and Niagara generations, > so if there is a bug in e.g. the USIII code, you won't see it on Niagara. Just to let you know, just reproduced this OOPS on my v215 debian sparc64: Aug 01 00:34:56 v215 kernel: capability: warning: `caps' uses deprecated v2 capabilities in a way that may be insecure Aug 01 00:34:56 v215 kernel: capability: warning: `caps' uses 32-bit capabilities (legacy support in use) Aug 01 00:35:00 v215 kernel: Netfilter messages via NETLINK v0.30. Aug 01 00:35:00 v215 kernel: Initializing XFRM netlink socket Aug 01 00:35:09 v215 kernel: mmap: remap_file_page (1155) uses deprecated remap_file_pages() syscall. See Documentation/vm/remap_file_pages.txt. Aug 01 00:35:10 v215 kernel: Unable to handle kernel NULL pointer dereference Aug 01 00:35:10 v215 kernel: tsk->{mm,active_mm}->context = 0000000000000de6 Aug 01 00:35:10 v215 kernel: tsk->{mm,active_mm}->pgd = fff000123d478000 Aug 01 00:35:10 v215 kernel: \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ Aug 01 00:35:11 v215 kernel: sched_xetattr(1527): Oops [#1] Aug 01 00:35:11 v215 kernel: CPU: 1 PID: 1527 Comm: sched_xetattr Not tainted 4.12.0 #365 Aug 01 00:35:11 v215 kernel: task: fff0001231d41340 task.stack: fff000123dfc4000 Aug 01 00:35:11 v215 kernel: TSTATE: 0000004411001604 TPC: 0000000000a121fc TNPC: 0000000000a12210 Y: 00000000 Not tainted Aug 01 00:35:11 v215 kernel: TPC: <__bzero+0x20/0xc0> Aug 01 00:35:11 v215 kernel: g0: fff000123dfc7d20 g1: 0000000000000000 g2: 0000003000000000 g3: 0000000000000000 Aug 01 00:35:11 v215 kernel: g4: fff0001231d41340 g5: fff000123ed08000 g6: fff000123dfc4000 g7: 0000000000000030 Aug 01 00:35:11 v215 kernel: o0: 0000000000000030 o1: fff000123dfc7e70 o2: 0000000000000000 o3: 0000000000000030 Aug 01 00:35:11 v215 kernel: o4: fff000123dfc7e70 o5: 000000000000000a sp: fff000123dfc75c1 ret_pc: 000000000049b294 Aug 01 00:35:11 v215 kernel: RPC: <SyS_sched_setattr+0x174/0x1a0> Aug 01 00:35:11 v215 kernel: l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000000000 l3: 0000000000000000 Aug 01 00:35:11 v215 kernel: l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: 00000000f7d58000 Aug 01 00:35:12 v215 kernel: i0: 0000000000000000 i1: 00000000f7bc5ffc i2: 0000000000000000 i3: fff000123dfc7e70 Aug 01 00:35:12 v215 kernel: i4: 0000000000000000 i5: fff000123dfc7e70 i6: fff000123dfc76a1 i7: 00000000004061b4 Aug 01 00:35:12 v215 kernel: I7: <linux_sparc_syscall32+0x34/0x60> Aug 01 00:35:12 v215 kernel: Call Trace: Aug 01 00:35:12 v215 kernel: [00000000004061b4] linux_sparc_syscall32+0x34/0x60 Aug 01 00:35:12 v215 kernel: Disabling lock debugging due to kernel taint Aug 01 00:35:12 v215 kernel: Caller[00000000004061b4]: linux_sparc_syscall32+0x34/0x60 Aug 01 00:35:12 v215 kernel: Caller[000000007000117c]: 0x7000117c Aug 01 00:35:12 v215 kernel: Instruction DUMP: Aug 01 00:35:12 v215 kernel: c56a2000 Aug 01 00:35:12 v215 kernel: 808a2003 Aug 01 00:35:12 v215 kernel: 02480006 Aug 01 00:35:12 v215 kernel: <d42a2000> Aug 01 00:35:12 v215 kernel: 90022001 Aug 01 00:35:12 v215 kernel: 808a2003 Aug 01 00:35:12 v215 kernel: 1247fffd Aug 01 00:35:12 v215 kernel: 92226001 Aug 01 00:35:12 v215 kernel: 808a2007 Aug 01 00:35:12 v215 kernel: Aug 01 00:35:13 v215 kernel: Unable to handle kernel NULL pointer dereference Aug 01 00:35:13 v215 kernel: tsk->{mm,active_mm}->context = 00000000000012cb Aug 01 00:35:14 v215 kernel: tsk->{mm,active_mm}->pgd = fff0001230a12000 Aug 01 00:35:14 v215 kernel: \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ Aug 01 00:35:14 v215 kernel: sched_xetattr(2216): Oops [#2] Aug 01 00:35:14 v215 kernel: CPU: 0 PID: 2216 Comm: sched_xetattr Tainted: G D 4.12.0 #365 Aug 01 00:35:14 v215 kernel: task: fff0001231d41340 task.stack: fff0001232754000 Aug 01 00:35:14 v215 kernel: TSTATE: 0000004411001601 TPC: 0000000000a121fc TNPC: 0000000000a12210 Y: 00000000 Tainted: G D Aug 01 00:35:14 v215 kernel: TPC: <__bzero+0x20/0xc0> Aug 01 00:35:14 v215 kernel: g0: fff0001232757d20 g1: 0000000000000000 g2: 0000003000000000 g3: 0000000000000000 Aug 01 00:35:14 v215 kernel: g4: fff0001231d41340 g5: fff000123eb08000 g6: fff0001232754000 g7: 0000000000000030 Aug 01 00:35:14 v215 kernel: o0: 0000000000000030 o1: fff0001232757e70 o2: 0000000000000000 o3: 0000000000000030 Aug 01 00:35:14 v215 kernel: o4: fff0001232757e70 o5: 000000000000000a sp: fff00012327575c1 ret_pc: 000000000049b294 Aug 01 00:35:14 v215 kernel: RPC: <SyS_sched_setattr+0x174/0x1a0> Aug 01 00:35:14 v215 kernel: l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000000000 l3: 0000000000000000 Aug 01 00:35:14 v215 kernel: l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: 00000000f7cdc000 Aug 01 00:35:14 v215 kernel: i0: 0000000000000000 i1: 00000000f7b49ffc i2: 0000000000000000 i3: fff0001232757e70 Aug 01 00:35:15 v215 kernel: i4: 0000000000000000 i5: fff0001232757e70 i6: fff00012327576a1 i7: 00000000004061b4 Aug 01 00:35:15 v215 kernel: I7: <linux_sparc_syscall32+0x34/0x60> Aug 01 00:35:15 v215 kernel: Call Trace: Aug 01 00:35:15 v215 kernel: [00000000004061b4] linux_sparc_syscall32+0x34/0x60 Aug 01 00:35:15 v215 kernel: Caller[00000000004061b4]: linux_sparc_syscall32+0x34/0x60 Aug 01 00:35:15 v215 kernel: Caller[000000007000117c]: 0x7000117c Aug 01 00:35:15 v215 kernel: Instruction DUMP: Aug 01 00:35:15 v215 kernel: c56a2000 Aug 01 00:35:15 v215 kernel: 808a2003 Aug 01 00:35:15 v215 kernel: 02480006 Aug 01 00:35:15 v215 kernel: <d42a2000> Aug 01 00:35:15 v215 kernel: 90022001 Aug 01 00:35:15 v215 kernel: 808a2003 Aug 01 00:35:15 v215 kernel: 1247fffd Aug 01 00:35:15 v215 kernel: 92226001 Aug 01 00:35:15 v215 kernel: 808a2007 Aug 01 00:35:15 v215 kernel: Aug 01 00:35:16 v215 kernel: ------------[ cut here ]------------ Aug 01 00:35:16 v215 kernel: WARNING: CPU: 0 PID: 2900 at arch/sparc/kernel/sys_sparc32.c:150 compat_SyS_sparc_sigaction+0x54/0x80 Aug 01 00:35:16 v215 kernel: Modules linked in: xfrm_user xfrm_algo tcp_diag inet_diag af_packet_diag netlink_diag unix_diag nfnetlink ohci_pci ata_generic tg3 ohci_hcd ehci_pci ptp ehci_hcd pps_core libphy usbcore pata_ali libata sg flash jitterentropy_rng ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto raid10 raid456 libcrc32c crc32c_generic async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod dm_mod dax sd_mod mptsas scsi_transport_sas mptscsih scsi_mod mptbase Aug 01 00:35:17 v215 kernel: CPU: 0 PID: 2900 Comm: sigaction Tainted: G D 4.12.0 #365 Aug 01 00:35:17 v215 kernel: Call Trace: Aug 01 00:35:17 v215 kernel: [000000000046b900] __warn+0xc0/0xe0 Aug 01 00:35:17 v215 kernel: [000000000046b9e0] warn_slowpath_null+0x20/0x40 Aug 01 00:35:17 v215 kernel: [000000000044b614] compat_SyS_sparc_sigaction+0x54/0x80 Aug 01 00:35:17 v215 kernel: [00000000004061b4] linux_sparc_syscall32+0x34/0x60 Aug 01 00:35:17 v215 kernel: ---[ end trace 0413ef9096de5564 ]--- Aug 01 00:35:41 v215 kernel: Unable to handle kernel NULL pointer dereference Aug 01 00:35:41 v215 kernel: tsk->{mm,active_mm}->context = 00000000000015f9 Aug 01 00:35:41 v215 kernel: tsk->{mm,active_mm}->pgd = fff0001230ab6000 Aug 01 00:35:41 v215 kernel: \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ Aug 01 00:35:42 v215 kernel: poll(11551): Oops [#3] Aug 01 00:35:42 v215 kernel: CPU: 1 PID: 11551 Comm: poll Tainted: G D W 4.12.0 #365 Aug 01 00:35:42 v215 kernel: task: fff000123c9113a0 task.stack: fff0001232e7c000 Aug 01 00:35:42 v215 kernel: TSTATE: 0000004411001603 TPC: 0000000000a121fc TNPC: 0000000000a12210 Y: 00000000 Tainted: G D W Aug 01 00:35:42 v215 kernel: TPC: <__bzero+0x20/0xc0> Aug 01 00:35:42 v215 kernel: g0: fff000123cfce548 g1: 0000000000000000 g2: 0000000000000000 g3: 0000000000000000 Aug 01 00:35:42 v215 kernel: g4: fff000123c9113a0 g5: fff000123ed08000 g6: fff0001232e7c000 g7: 0000000000000008 Aug 01 00:35:42 v215 kernel: o0: 000000000000000c o1: fff0001232e7fa80 o2: 0000000000000000 o3: 000000000000000c Aug 01 00:35:42 v215 kernel: o4: fff0001232e7fa7c o5: 00000000000000fb sp: fff0001232e7f1a1 ret_pc: 0000000000630ad4 Aug 01 00:35:42 v215 kernel: RPC: <do_sys_poll+0xd4/0x460> Aug 01 00:35:42 v215 kernel: l0: 0000000000000002 l1: 00000000014000c0 l2: 00000000000003fe l3: 000fffedcd180590 Aug 01 00:35:43 v215 kernel: l4: fff0001232e7fa7c l5: 00000000f78346f4 l6: 0000000000000002 l7: 00000000f7968000 Aug 01 00:35:43 v215 kernel: i0: 00000000f77e1ff8 i1: 0000000000000002 i2: fff0001232e7fe90 i3: fff0001232e7fa70 Aug 01 00:35:43 v215 kernel: i4: 0000000000000002 i5: 00000000f77e1ff8 i6: fff0001232e7f5e1 i7: 00000000006315d8 Aug 01 00:35:43 v215 kernel: I7: <SyS_poll+0x78/0x100> Aug 01 00:35:43 v215 kernel: Call Trace: Aug 01 00:35:43 v215 kernel: [00000000006315d8] SyS_poll+0x78/0x100 Aug 01 00:35:43 v215 kernel: [00000000004061b4] linux_sparc_syscall32+0x34/0x60 Aug 01 00:35:43 v215 kernel: Caller[00000000006315d8]: SyS_poll+0x78/0x100 Aug 01 00:35:43 v215 kernel: Caller[00000000004061b4]: linux_sparc_syscall32+0x34/0x60 Aug 01 00:35:43 v215 kernel: Caller[0000000070000ba8]: 0x70000ba8 Aug 01 00:35:43 v215 kernel: Instruction DUMP: Aug 01 00:35:43 v215 kernel: c56a2000 Aug 01 00:35:43 v215 kernel: 808a2003 Aug 01 00:35:44 v215 kernel: 02480006 Aug 01 00:35:44 v215 kernel: <d42a2000> Aug 01 00:35:44 v215 kernel: 90022001 Aug 01 00:35:44 v215 kernel: 808a2003 Aug 01 00:35:44 v215 kernel: 1247fffd Aug 01 00:35:44 v215 kernel: 92226001 Aug 01 00:35:44 v215 kernel: 808a2007 Aug 01 00:35:44 v215 kernel: Aug 01 00:35:55 v215 kernel: pc[12811]: segfault at 70000974 ip 0000000070000974 (rpc 000000007000096c) sp 00000000ffa69488 error 30001 in pc[70010000+2000] ... ============================================================================ Testsuite summary for strace 4.18.0.134.805d ============================================================================ # TOTAL: 443 # PASS: 387 # SKIP: 39 # XFAIL: 0 # FAIL: 17 # XPASS: 0 # ERROR: 0 ============================================================================ See tests/test-suite.log ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-07-31 21:48 ` Anatoly Pugachev @ 2017-07-31 21:51 ` David Miller 2017-07-31 22:01 ` Anatoly Pugachev 0 siblings, 1 reply; 21+ messages in thread From: David Miller @ 2017-07-31 21:51 UTC (permalink / raw) To: matorola; +Cc: mikpelinux, sparclinux, linux-kernel From: Anatoly Pugachev <matorola@gmail.com> Date: Tue, 1 Aug 2017 00:48:07 +0300 > Aug 01 00:35:11 v215 kernel: sched_xetattr(1527): Oops [#1] > Aug 01 00:35:11 v215 kernel: CPU: 1 PID: 1527 Comm: sched_xetattr Not > tainted 4.12.0 #365 > Aug 01 00:35:11 v215 kernel: task: fff0001231d41340 task.stack: fff000123dfc4000 > Aug 01 00:35:11 v215 kernel: TSTATE: 0000004411001604 TPC: > 0000000000a121fc TNPC: 0000000000a12210 Y: 00000000 Not tainted > Aug 01 00:35:11 v215 kernel: TPC: <__bzero+0x20/0xc0> > Aug 01 00:35:11 v215 kernel: g0: fff000123dfc7d20 g1: 0000000000000000 > g2: 0000003000000000 g3: 0000000000000000 > Aug 01 00:35:11 v215 kernel: g4: fff0001231d41340 g5: fff000123ed08000 > g6: fff000123dfc4000 g7: 0000000000000030 > Aug 01 00:35:11 v215 kernel: o0: 0000000000000030 o1: fff000123dfc7e70 > o2: 0000000000000000 o3: 0000000000000030 > Aug 01 00:35:11 v215 kernel: o4: fff000123dfc7e70 o5: 000000000000000a > sp: fff000123dfc75c1 ret_pc: 000000000049b294 > Aug 01 00:35:11 v215 kernel: RPC: <SyS_sched_setattr+0x174/0x1a0> Please run gdb on this kernel image and tell it: (gdb) x/20i 0x49b294 - 16 Thanks. I think perhaps one of Al Viro's changes in the bisected commit causes a branch to either have an overflowed offset field, or get mispatched to the wrong destination. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-07-31 21:51 ` David Miller @ 2017-07-31 22:01 ` Anatoly Pugachev 2017-07-31 22:06 ` David Miller 0 siblings, 1 reply; 21+ messages in thread From: Anatoly Pugachev @ 2017-07-31 22:01 UTC (permalink / raw) To: David Miller; +Cc: Mikael Pettersson, Sparc kernel list, Linux Kernel list On Tue, Aug 1, 2017 at 12:51 AM, David Miller <davem@davemloft.net> wrote: > From: Anatoly Pugachev <matorola@gmail.com> > Date: Tue, 1 Aug 2017 00:48:07 +0300 > >> Aug 01 00:35:11 v215 kernel: sched_xetattr(1527): Oops [#1] >> Aug 01 00:35:11 v215 kernel: CPU: 1 PID: 1527 Comm: sched_xetattr Not >> tainted 4.12.0 #365 >> Aug 01 00:35:11 v215 kernel: task: fff0001231d41340 task.stack: fff000123dfc4000 >> Aug 01 00:35:11 v215 kernel: TSTATE: 0000004411001604 TPC: >> 0000000000a121fc TNPC: 0000000000a12210 Y: 00000000 Not tainted >> Aug 01 00:35:11 v215 kernel: TPC: <__bzero+0x20/0xc0> >> Aug 01 00:35:11 v215 kernel: g0: fff000123dfc7d20 g1: 0000000000000000 >> g2: 0000003000000000 g3: 0000000000000000 >> Aug 01 00:35:11 v215 kernel: g4: fff0001231d41340 g5: fff000123ed08000 >> g6: fff000123dfc4000 g7: 0000000000000030 >> Aug 01 00:35:11 v215 kernel: o0: 0000000000000030 o1: fff000123dfc7e70 >> o2: 0000000000000000 o3: 0000000000000030 >> Aug 01 00:35:11 v215 kernel: o4: fff000123dfc7e70 o5: 000000000000000a >> sp: fff000123dfc75c1 ret_pc: 000000000049b294 >> Aug 01 00:35:11 v215 kernel: RPC: <SyS_sched_setattr+0x174/0x1a0> > > Please run gdb on this kernel image and tell it: > > (gdb) x/20i 0x49b294 - 16 > > Thanks. > > I think perhaps one of Al Viro's changes in the bisected commit causes > a branch to either have an overflowed offset field, or get mispatched > to the wrong destination. David, I don't know how to run on a running kernel , but as I understood: root@v215:strace# gzip -dc /boot/vmlinuz-4.12.0 > vmlinux root@v215:strace# gdb -q vmlinux Reading symbols from vmlinux...(no debugging symbols found)...done. (gdb) x/20i 0x49b294 - 16 0x49b284 <_start+619140>: mov -22, %o0 0x49b288 <_start+619144>: sub %i5, %o0, %o0 0x49b28c <_start+619148>: mov %i3, %o2 0x49b290 <_start+619152>: clr %o1 0x49b294 <_start+619156>: call 0xa121b8 <_start+6349240> 0x49b298 <_start+619160>: add %o0, 0x30, %o0 0x49b29c <_start+619164>: cmp %i3, 0 0x49b2a0 <_start+619168>: be %icc, 0x49b20c <_start+619020> 0x49b2a4 <_start+619172>: mov -14, %i0 0x49b2a8 <_start+619176>: rett %i7 + 8 0x49b2ac <_start+619180>: nop 0x49b2b0 <_start+619184>: b,a %xcc, 0x49b2c0 <_start+619200> 0x49b2b4 <_start+619188>: nop 0x49b2b8 <_start+619192>: nop 0x49b2bc <_start+619196>: nop 0x49b2c0 <_start+619200>: save %sp, -176, %sp 0x49b2c4 <_start+619204>: call 0xa136c0 <_start+6354624> 0x49b2c8 <_start+619208>: nop 0x49b2cc <_start+619212>: cmp %i0, 0 0x49b2d0 <_start+619216>: bl,pn %icc, 0x49b318 <_start+619288> 0x49b2d4 <_start+619220>: mov -22, %o0 (gdb) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-07-31 22:01 ` Anatoly Pugachev @ 2017-07-31 22:06 ` David Miller 2017-08-01 7:29 ` Mikael Pettersson 0 siblings, 1 reply; 21+ messages in thread From: David Miller @ 2017-07-31 22:06 UTC (permalink / raw) To: matorola; +Cc: mikpelinux, sparclinux, linux-kernel From: Anatoly Pugachev <matorola@gmail.com> Date: Tue, 1 Aug 2017 01:01:47 +0300 > I don't know how to run on a running kernel , but as I understood: > > root@v215:strace# gzip -dc /boot/vmlinuz-4.12.0 > vmlinux > root@v215:strace# gdb -q vmlinux > Reading symbols from vmlinux...(no debugging symbols found)...done. > (gdb) x/20i 0x49b294 - 16 Unfortunately you need to do this on the build kernel image before it has been stripped of all of it's symbols. Mikael, you built your kernels right? Go into one of your OOPS's and extract the "RPC: " hex value, and run the gdb command: bash$ cd src/linux bash$ gdb ./vmlinux (gdb) x/10i 0x${RPC_HEX_VALUE} - 16 Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-07-31 22:06 ` David Miller @ 2017-08-01 7:29 ` Mikael Pettersson 2017-08-01 20:58 ` Sam Ravnborg 0 siblings, 1 reply; 21+ messages in thread From: Mikael Pettersson @ 2017-08-01 7:29 UTC (permalink / raw) To: David Miller; +Cc: matorola, mikpelinux, sparclinux, linux-kernel David Miller writes: > From: Anatoly Pugachev <matorola@gmail.com> > Date: Tue, 1 Aug 2017 01:01:47 +0300 > > > I don't know how to run on a running kernel , but as I understood: > > > > root@v215:strace# gzip -dc /boot/vmlinuz-4.12.0 > vmlinux > > root@v215:strace# gdb -q vmlinux > > Reading symbols from vmlinux...(no debugging symbols found)...done. > > (gdb) x/20i 0x49b294 - 16 > > Unfortunately you need to do this on the build kernel image before it > has been stripped of all of it's symbols. > > Mikael, you built your kernels right? > > Go into one of your OOPS's and extract the "RPC: " hex value, and run > the gdb command: > > bash$ cd src/linux > bash$ gdb ./vmlinux > (gdb) x/10i 0x${RPC_HEX_VALUE} - 16 > > Thanks. Ok, with 4.13-rc3 I got [ 240.085153] Unable to handle kernel NULL pointer dereference [ 240.142397] tsk->{mm,active_mm}->context = 000000000000044a [ 240.198531] tsk->{mm,active_mm}->pgd = fff000023c784000 [ 240.250112] \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ [ 240.374879] poll(724): Oops [#1] [ 240.400132] CPU: 0 PID: 724 Comm: poll Not tainted 4.13.0-rc3 #1 [ 240.462002] task: fff000123cc71e00 task.stack: fff000123c894000 [ 240.522717] TSTATE: 0000004411001605 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Not tainted [ 240.634921] TPC: <__bzero+0x20/0xc0> [ 240.664747] g0: fff000123c897081 g1: 0000000000000000 g2: 0000000000000000 g3: 00000000008ca100 [ 240.762068] g4: fff000123cc71e00 g5: fff000023ef44000 g6: fff000123c894000 g7: 0000000000000008 [ 240.859389] o0: 000000000000000c o1: fff000123c897a80 o2: 0000000000000000 o3: 000000000000000c [ 240.956718] o4: fff000123c897a7c o5: 00000000000000fb sp: fff000123c897181 ret_pc: 0000000000516ee0 [ 241.058627] RPC: <do_sys_poll+0x80/0x3c0> [ 241.094166] l0: 0000000000000002 l1: 00000000014000c0 l2: 00000000000003fe l3: fff000123c897a7c [ 241.191506] l4: 0000000000000000 l5: 0000000000000000 l6: 000000000000006d l7: ffffffffffffffea [ 241.288822] i0: 00000000f7d93ff8 i1: 0000000000000002 i2: fff000123c897e90 i3: fff000123c897a70 [ 241.386141] i4: 000fffedc3768590 i5: fff000123c897a70 i6: fff000123c8975e1 i7: 00000000005177f8 [ 241.483468] I7: <SyS_poll+0x74/0xd0> [ 241.513292] Call Trace: [ 241.528265] [00000000005177f8] SyS_poll+0x74/0xd0 [ 241.574140] [00000000004061b4] linux_sparc_syscall32+0x34/0x60 [ 241.634847] Disabling lock debugging due to kernel taint [ 241.687555] Caller[00000000005177f8]: SyS_poll+0x74/0xd0 [ 241.740276] Caller[00000000004061b4]: linux_sparc_syscall32+0x34/0x60 [ 241.807855] Caller[0000000000010a20]: 0x10a20 [ 241.847983] Instruction DUMP: [ 241.847987] c56a2000 [ 241.869824] 808a2003 [ 241.883651] 02480006 [ 241.897475] <d42a2000> [ 241.911207] 90022001 [ 241.925032] 808a2003 [ 241.938755] 1247fffd [ 241.952484] 92226001 [ 241.966310] 808a2007 so the RPC should be do_sys_poll+0x80 right? Then gdb on the original vmlinux said: (gdb) x/10i do_sys_poll+0x80-16 0x516ed0 <do_sys_poll+112>: brz %o0, 0x5170fc <do_sys_poll+668> 0x516ed4 <do_sys_poll+116>: mov %o0, %o2 0x516ed8 <do_sys_poll+120>: sub %i4, %o0, %i4 0x516edc <do_sys_poll+124>: clr %o1 0x516ee0 <do_sys_poll+128>: call 0x7570b8 <memset> 0x516ee4 <do_sys_poll+132>: add %l3, %i4, %o0 0x516ee8 <do_sys_poll+136>: b %xcc, 0x5170b0 <do_sys_poll+592> 0x516eec <do_sys_poll+140>: mov -14, %l7 0x516ef0 <do_sys_poll+144>: mov %l2, %o0 0x516ef4 <do_sys_poll+148>: movleu %xcc, %l0, %o0 (gdb) /Mikael ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-08-01 7:29 ` Mikael Pettersson @ 2017-08-01 20:58 ` Sam Ravnborg 2017-08-02 21:36 ` Sam Ravnborg 0 siblings, 1 reply; 21+ messages in thread From: Sam Ravnborg @ 2017-08-01 20:58 UTC (permalink / raw) To: Mikael Pettersson; +Cc: David Miller, matorola, sparclinux, linux-kernel Hi Mikael. I think this translates to the following code from linux/uaccess.h first part is the inlined _copy_from_user() > > (gdb) x/10i do_sys_poll+0x80-16 > 0x516ed0 <do_sys_poll+112>: brz %o0, 0x5170fc <do_sys_poll+668> if (unlikely(res)) > 0x516ed4 <do_sys_poll+116>: mov %o0, %o2 > 0x516ed8 <do_sys_poll+120>: sub %i4, %o0, %i4 > 0x516edc <do_sys_poll+124>: clr %o1 > 0x516ee0 <do_sys_poll+128>: call 0x7570b8 <memset> > 0x516ee4 <do_sys_poll+132>: add %l3, %i4, %o0 memset(to + (n - res), 0, res); and this part is from the inlined copy_from_user() > 0x516ee8 <do_sys_poll+136>: b %xcc, 0x5170b0 <do_sys_poll+592> jump to end of function > 0x516eec <do_sys_poll+140>: mov -14, %l7 > 0x516ef0 <do_sys_poll+144>: mov %l2, %o0 > 0x516ef4 <do_sys_poll+148>: movleu %xcc, %l0, %o0 } else if (!__builtin_constant_p(n)) copy_user_overflow(sz, n); Where we in the generic implementation now uses the return value of raw_copy_from_user() which we did not do before said patch. So I suspect that what we see here is that: 1) with the patch from Al we start to use the return value of raw_copy_from_user 2) The return value is wrong in the sparc implmentation so boom 3) We only trigger this on old HW because the return value is correct in some, but not all of the implemantions of raw_copy_from_user. Davem fixed this is a series of patches that requires some sparc assembler knowledge to dechifer. The return value was fixed in ee841d0aff649164080e445e84885015958d8ff4 for the Ultra III as used by SUN Blade 2500. And if I am right then this fix fails with the paramters used in our case with strace. Mikael - could you try to edit U3patch.S like this: Change the following lines: cheetah_patch_copyops: ULTRA3_DO_PATCH(memcpy, U3memcpy) ULTRA3_DO_PATCH(___copy_from_user, U3copy_from_user) ULTRA3_DO_PATCH(___copy_to_user, U3copy_to_user) retl To: cheetah_patch_copyops: ULTRA3_DO_PATCH(memcpy, GENmemcpy) ULTRA3_DO_PATCH(raw_copy_from_user, GENcopy_from_user) ULTRA3_DO_PATCH(raw_copy_to_user, GENcopy_to_user) retl In other words, so we use the generic versions which I assume is OK on Ultra III, but slower. Sam ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-08-01 20:58 ` Sam Ravnborg @ 2017-08-02 21:36 ` Sam Ravnborg 2017-08-02 23:27 ` David Miller 2017-08-03 20:02 ` Mikael Pettersson 0 siblings, 2 replies; 21+ messages in thread From: Sam Ravnborg @ 2017-08-02 21:36 UTC (permalink / raw) To: Mikael Pettersson; +Cc: David Miller, matorola, sparclinux, linux-kernel On Tue, Aug 01, 2017 at 10:58:29PM +0200, Sam Ravnborg wrote: > Hi Mikael. > > I think this translates to the following code > from linux/uaccess.h > > first part is the inlined _copy_from_user() > > > > > (gdb) x/10i do_sys_poll+0x80-16 > > 0x516ed0 <do_sys_poll+112>: brz %o0, 0x5170fc <do_sys_poll+668> > if (unlikely(res)) > > > 0x516ed4 <do_sys_poll+116>: mov %o0, %o2 > > 0x516ed8 <do_sys_poll+120>: sub %i4, %o0, %i4 > > 0x516edc <do_sys_poll+124>: clr %o1 > > 0x516ee0 <do_sys_poll+128>: call 0x7570b8 <memset> > > 0x516ee4 <do_sys_poll+132>: add %l3, %i4, %o0 > memset(to + (n - res), 0, res); And memset calls down to bzero, where %o0=buf, %o1=len %o0 = 0xc %o1 = 0xfff000123c897a80 %o2 = 0x0 %o3 = 0xc So from this we know that: res = 0xfff000123c897a80 to + (n - 0xfff000123c897a80)) = 0xc The value "fff000123c897a80" really looks like a constructed address from somewhere in the strace code, and where this constructed address is used to provoke some unusual behaviour. The "fff0" part may be a sparc thing. So far the analysis seems to match the intial conclusion that we in this special case try to zero out the remaining memory based on the return value of raw_copy_from_user. And therefore we use the return value (res) which triggers the oops. So rather than manipulating with the assembler code as suggested in the previous mail this simpler patch could be tested: diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index acdd6f915a8d..13d299ff1f21 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -115,7 +115,7 @@ _copy_from_user(void *to, const void __user *from, unsigned long n) res = raw_copy_from_user(to, from, n); } if (unlikely(res)) - memset(to + (n - res), 0, res); + void: /*memset(to + (n - res), 0, res);*/ return res; } #else It would be good to know if this makes the opps go away. And maybe you could try to print the parameters supplied to _copy_from_user in case memset would be called, so we have an idea what error path is taken. I have tried to dechiper U3memcpy.S - but that is non-trivial. So it would be good with a bit more data to verify the theory. Sam ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-08-02 21:36 ` Sam Ravnborg @ 2017-08-02 23:27 ` David Miller 2017-08-03 20:02 ` Mikael Pettersson 1 sibling, 0 replies; 21+ messages in thread From: David Miller @ 2017-08-02 23:27 UTC (permalink / raw) To: sam; +Cc: mikpelinux, matorola, sparclinux, linux-kernel From: Sam Ravnborg <sam@ravnborg.org> Date: Wed, 2 Aug 2017 23:36:47 +0200 > And memset calls down to bzero, where %o0=buf, %o1=len > > %o0 = 0xc > %o1 = 0xfff000123c897a80 > %o2 = 0x0 > %o3 = 0xc > > So from this we know that: > res = 0xfff000123c897a80 > to + (n - 0xfff000123c897a80)) = 0xc > > The value "fff000123c897a80" really looks like a constructed address > from somewhere in the strace code, and where this constructed address > is used to provoke some unusual behaviour. > The "fff0" part may be a sparc thing. > > So far the analysis seems to match the intial conclusion that > we in this special case try to zero out the remaining memory > based on the return value of raw_copy_from_user. > And therefore we use the return value (res) which triggers the oops. Yes, the return value is bogus. > So rather than manipulating with the assembler code as suggested > in the previous mail this simpler patch could be tested: ... > - memset(to + (n - res), 0, res); > + void: /*memset(to + (n - res), 0, res);*/ Need a semicolon rather than a colon there :-) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-08-02 21:36 ` Sam Ravnborg 2017-08-02 23:27 ` David Miller @ 2017-08-03 20:02 ` Mikael Pettersson 2017-08-03 21:57 ` David Miller 1 sibling, 1 reply; 21+ messages in thread From: Mikael Pettersson @ 2017-08-03 20:02 UTC (permalink / raw) To: Sam Ravnborg Cc: Mikael Pettersson, David Miller, matorola, sparclinux, linux-kernel Sam Ravnborg writes: > On Tue, Aug 01, 2017 at 10:58:29PM +0200, Sam Ravnborg wrote: > > Hi Mikael. > > > > I think this translates to the following code > > from linux/uaccess.h > > > > first part is the inlined _copy_from_user() > > > > > > > > (gdb) x/10i do_sys_poll+0x80-16 > > > 0x516ed0 <do_sys_poll+112>: brz %o0, 0x5170fc <do_sys_poll+668> > > if (unlikely(res)) > > > > > 0x516ed4 <do_sys_poll+116>: mov %o0, %o2 > > > 0x516ed8 <do_sys_poll+120>: sub %i4, %o0, %i4 > > > 0x516edc <do_sys_poll+124>: clr %o1 > > > 0x516ee0 <do_sys_poll+128>: call 0x7570b8 <memset> > > > 0x516ee4 <do_sys_poll+132>: add %l3, %i4, %o0 > > memset(to + (n - res), 0, res); > > And memset calls down to bzero, where %o0=buf, %o1=len > > %o0 = 0xc > %o1 = 0xfff000123c897a80 > %o2 = 0x0 > %o3 = 0xc > > So from this we know that: > res = 0xfff000123c897a80 > to + (n - 0xfff000123c897a80)) = 0xc > > The value "fff000123c897a80" really looks like a constructed address > from somewhere in the strace code, and where this constructed address > is used to provoke some unusual behaviour. > The "fff0" part may be a sparc thing. > > So far the analysis seems to match the intial conclusion that > we in this special case try to zero out the remaining memory > based on the return value of raw_copy_from_user. > And therefore we use the return value (res) which triggers the oops. > > So rather than manipulating with the assembler code as suggested > in the previous mail this simpler patch could be tested: > > diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h > index acdd6f915a8d..13d299ff1f21 100644 > --- a/include/linux/uaccess.h > +++ b/include/linux/uaccess.h > @@ -115,7 +115,7 @@ _copy_from_user(void *to, const void __user *from, unsigned long n) > res = raw_copy_from_user(to, from, n); > } > if (unlikely(res)) > - memset(to + (n - res), 0, res); > + void: /*memset(to + (n - res), 0, res);*/ > return res; > } > #else > > > It would be good to know if this makes the opps go away. > > And maybe you could try to print the parameters > supplied to _copy_from_user in case memset would be called, > so we have an idea what error path is taken. > > I have tried to dechiper U3memcpy.S - but that is non-trivial. > So it would be good with a bit more data to verify the theory. I applied the following: --- linux-4.13-rc3/include/linux/uaccess.h.~1~ 2017-08-01 08:49:48.397819726 +0200 +++ linux-4.13-rc3/include/linux/uaccess.h 2017-08-03 21:33:11.009634421 +0200 @@ -4,6 +4,8 @@ #include <linux/sched.h> #include <linux/thread_info.h> #include <linux/kasan-checks.h> +#include <linux/ratelimit.h> +#include <linux/printk.h> #define VERIFY_READ 0 #define VERIFY_WRITE 1 @@ -115,7 +117,9 @@ _copy_from_user(void *to, const void __u res = raw_copy_from_user(to, from, n); } if (unlikely(res)) - memset(to + (n - res), 0, res); + { + printk_ratelimited("_copy_from_user(%p, %p, %lu) res %lu\n", to, from, n, res); + } return res; } #else With that in place the kernel booted fine. When I then ran the `poll' strace test binary, the OOPS was replaced by: [ 140.589913] _copy_from_user(fff000123c8dfa7c, (null), 240) res 240 [ 140.753162] _copy_from_user(fff000123c8dfa7c, 00000000f7e4a000, 8) res 8 [ 140.824155] _copy_from_user(fff000123c8dfa7c, 00000000f7e49ff8, 16) res 18442240552407530112 That last `res' doesn't look good. /Mikael ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-08-03 20:02 ` Mikael Pettersson @ 2017-08-03 21:57 ` David Miller 2017-08-04 5:44 ` Sam Ravnborg 2017-08-04 8:02 ` Mikael Pettersson 0 siblings, 2 replies; 21+ messages in thread From: David Miller @ 2017-08-03 21:57 UTC (permalink / raw) To: mikpelinux; +Cc: sam, matorola, sparclinux, linux-kernel From: Mikael Pettersson <mikpelinux@gmail.com> Date: Thu, 3 Aug 2017 22:02:57 +0200 > With that in place the kernel booted fine. > When I then ran the `poll' strace test binary, the OOPS was replaced by: > > [ 140.589913] _copy_from_user(fff000123c8dfa7c, (null), 240) res 240 > [ 140.753162] _copy_from_user(fff000123c8dfa7c, 00000000f7e4a000, 8) res 8 > [ 140.824155] _copy_from_user(fff000123c8dfa7c, 00000000f7e49ff8, 16) res 18442240552407530112 > > That last `res' doesn't look good. Please test this patch: diff --git a/arch/sparc/lib/U3memcpy.S b/arch/sparc/lib/U3memcpy.S index 54f9870..5a8cb37 100644 --- a/arch/sparc/lib/U3memcpy.S +++ b/arch/sparc/lib/U3memcpy.S @@ -145,13 +145,13 @@ ENDPROC(U3_retl_o2_plus_GS_plus_0x08) ENTRY(U3_retl_o2_and_7_plus_GS) and %o2, 7, %o2 retl - add %o2, GLOBAL_SPARE, %o2 + add %o2, GLOBAL_SPARE, %o0 ENDPROC(U3_retl_o2_and_7_plus_GS) ENTRY(U3_retl_o2_and_7_plus_GS_plus_8) add GLOBAL_SPARE, 8, GLOBAL_SPARE and %o2, 7, %o2 retl - add %o2, GLOBAL_SPARE, %o2 + add %o2, GLOBAL_SPARE, %o0 ENDPROC(U3_retl_o2_and_7_plus_GS_plus_8) #endif ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-08-03 21:57 ` David Miller @ 2017-08-04 5:44 ` Sam Ravnborg 2017-08-04 8:02 ` Mikael Pettersson 1 sibling, 0 replies; 21+ messages in thread From: Sam Ravnborg @ 2017-08-04 5:44 UTC (permalink / raw) To: David Miller; +Cc: mikpelinux, matorola, sparclinux, linux-kernel Hi Davem. On Thu, Aug 03, 2017 at 02:57:48PM -0700, David Miller wrote: > From: Mikael Pettersson <mikpelinux@gmail.com> > Date: Thu, 3 Aug 2017 22:02:57 +0200 > > > With that in place the kernel booted fine. > > When I then ran the `poll' strace test binary, the OOPS was replaced by: > > > > [ 140.589913] _copy_from_user(fff000123c8dfa7c, (null), 240) res 240 > > [ 140.753162] _copy_from_user(fff000123c8dfa7c, 00000000f7e4a000, 8) res 8 > > [ 140.824155] _copy_from_user(fff000123c8dfa7c, 00000000f7e49ff8, 16) res 18442240552407530112 > > > > That last `res' doesn't look good. > > Please test this patch: > > diff --git a/arch/sparc/lib/U3memcpy.S b/arch/sparc/lib/U3memcpy.S > index 54f9870..5a8cb37 100644 > --- a/arch/sparc/lib/U3memcpy.S > +++ b/arch/sparc/lib/U3memcpy.S > @@ -145,13 +145,13 @@ ENDPROC(U3_retl_o2_plus_GS_plus_0x08) > ENTRY(U3_retl_o2_and_7_plus_GS) > and %o2, 7, %o2 > retl > - add %o2, GLOBAL_SPARE, %o2 > + add %o2, GLOBAL_SPARE, %o0 > ENDPROC(U3_retl_o2_and_7_plus_GS) > ENTRY(U3_retl_o2_and_7_plus_GS_plus_8) > add GLOBAL_SPARE, 8, GLOBAL_SPARE > and %o2, 7, %o2 > retl > - add %o2, GLOBAL_SPARE, %o2 > + add %o2, GLOBAL_SPARE, %o0 > ENDPROC(U3_retl_o2_and_7_plus_GS_plus_8) > #endif > Patch looks obviously correct, and I am a bit irritated that I did not see this myself. Reviewed-by: Sam Ravnborg <sam@ravnborg.org> I will send another patch that fixes/adds a few comments to the same file. Sam ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-08-03 21:57 ` David Miller 2017-08-04 5:44 ` Sam Ravnborg @ 2017-08-04 8:02 ` Mikael Pettersson 2017-08-04 16:48 ` David Miller 1 sibling, 1 reply; 21+ messages in thread From: Mikael Pettersson @ 2017-08-04 8:02 UTC (permalink / raw) To: David Miller; +Cc: mikpelinux, sam, matorola, sparclinux, linux-kernel David Miller writes: > From: Mikael Pettersson <mikpelinux@gmail.com> > Date: Thu, 3 Aug 2017 22:02:57 +0200 > > > With that in place the kernel booted fine. > > When I then ran the `poll' strace test binary, the OOPS was replaced by: > > > > [ 140.589913] _copy_from_user(fff000123c8dfa7c, (null), 240) res 240 > > [ 140.753162] _copy_from_user(fff000123c8dfa7c, 00000000f7e4a000, 8) res 8 > > [ 140.824155] _copy_from_user(fff000123c8dfa7c, 00000000f7e49ff8, 16) res 18442240552407530112 > > > > That last `res' doesn't look good. > > Please test this patch: > > diff --git a/arch/sparc/lib/U3memcpy.S b/arch/sparc/lib/U3memcpy.S > index 54f9870..5a8cb37 100644 > --- a/arch/sparc/lib/U3memcpy.S > +++ b/arch/sparc/lib/U3memcpy.S > @@ -145,13 +145,13 @@ ENDPROC(U3_retl_o2_plus_GS_plus_0x08) > ENTRY(U3_retl_o2_and_7_plus_GS) > and %o2, 7, %o2 > retl > - add %o2, GLOBAL_SPARE, %o2 > + add %o2, GLOBAL_SPARE, %o0 > ENDPROC(U3_retl_o2_and_7_plus_GS) > ENTRY(U3_retl_o2_and_7_plus_GS_plus_8) > add GLOBAL_SPARE, 8, GLOBAL_SPARE > and %o2, 7, %o2 > retl > - add %o2, GLOBAL_SPARE, %o2 > + add %o2, GLOBAL_SPARE, %o0 > ENDPROC(U3_retl_o2_and_7_plus_GS_plus_8) > #endif > Backing out my debugging patch and adding this one instead gave me a working kernel that doesn't OOPS. Thanks. Tested-by: Mikael Pettersson <mikpelinux@gmail.com> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels 2017-08-04 8:02 ` Mikael Pettersson @ 2017-08-04 16:48 ` David Miller 0 siblings, 0 replies; 21+ messages in thread From: David Miller @ 2017-08-04 16:48 UTC (permalink / raw) To: mikpelinux; +Cc: sam, matorola, sparclinux, linux-kernel From: Mikael Pettersson <mikpelinux@gmail.com> Date: Fri, 4 Aug 2017 10:02:25 +0200 > David Miller writes: > > From: Mikael Pettersson <mikpelinux@gmail.com> > > Date: Thu, 3 Aug 2017 22:02:57 +0200 > > > > > With that in place the kernel booted fine. > > > When I then ran the `poll' strace test binary, the OOPS was replaced by: > > > > > > [ 140.589913] _copy_from_user(fff000123c8dfa7c, (null), 240) res 240 > > > [ 140.753162] _copy_from_user(fff000123c8dfa7c, 00000000f7e4a000, 8) res 8 > > > [ 140.824155] _copy_from_user(fff000123c8dfa7c, 00000000f7e49ff8, 16) res 18442240552407530112 > > > > > > That last `res' doesn't look good. > > > > Please test this patch: > > > > diff --git a/arch/sparc/lib/U3memcpy.S b/arch/sparc/lib/U3memcpy.S > > index 54f9870..5a8cb37 100644 > > --- a/arch/sparc/lib/U3memcpy.S > > +++ b/arch/sparc/lib/U3memcpy.S > > @@ -145,13 +145,13 @@ ENDPROC(U3_retl_o2_plus_GS_plus_0x08) > > ENTRY(U3_retl_o2_and_7_plus_GS) > > and %o2, 7, %o2 > > retl > > - add %o2, GLOBAL_SPARE, %o2 > > + add %o2, GLOBAL_SPARE, %o0 > > ENDPROC(U3_retl_o2_and_7_plus_GS) > > ENTRY(U3_retl_o2_and_7_plus_GS_plus_8) > > add GLOBAL_SPARE, 8, GLOBAL_SPARE > > and %o2, 7, %o2 > > retl > > - add %o2, GLOBAL_SPARE, %o2 > > + add %o2, GLOBAL_SPARE, %o0 > > ENDPROC(U3_retl_o2_and_7_plus_GS_plus_8) > > #endif > > > > Backing out my debugging patch and adding this one instead > gave me a working kernel that doesn't OOPS. Thanks. > > Tested-by: Mikael Pettersson <mikpelinux@gmail.com> Great, thanks for testing. This is the final patch I committed: ==================== >From 0ede1c401332173ab0693121dc6cde04a4dbf131 Mon Sep 17 00:00:00 2001 From: "David S. Miller" <davem@davemloft.net> Date: Fri, 4 Aug 2017 09:47:52 -0700 Subject: [PATCH] sparc64: Fix exception handling in UltraSPARC-III memcpy. Mikael Pettersson reported that some test programs in the strace-4.18 testsuite cause an OOPS. After some debugging it turns out that garbage values are returned when an exception occurs, causing the fixup memset() to be run with bogus arguments. The problem is that two of the exception handler stubs write the successfully copied length into the wrong register. Fixes: ee841d0aff64 ("sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.") Reported-by: Mikael Pettersson <mikpelinux@gmail.com> Tested-by: Mikael Pettersson <mikpelinux@gmail.com> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net> --- arch/sparc/lib/U3memcpy.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/sparc/lib/U3memcpy.S b/arch/sparc/lib/U3memcpy.S index 54f98706b03b..5a8cb37f0a3b 100644 --- a/arch/sparc/lib/U3memcpy.S +++ b/arch/sparc/lib/U3memcpy.S @@ -145,13 +145,13 @@ ENDPROC(U3_retl_o2_plus_GS_plus_0x08) ENTRY(U3_retl_o2_and_7_plus_GS) and %o2, 7, %o2 retl - add %o2, GLOBAL_SPARE, %o2 + add %o2, GLOBAL_SPARE, %o0 ENDPROC(U3_retl_o2_and_7_plus_GS) ENTRY(U3_retl_o2_and_7_plus_GS_plus_8) add GLOBAL_SPARE, 8, GLOBAL_SPARE and %o2, 7, %o2 retl - add %o2, GLOBAL_SPARE, %o2 + add %o2, GLOBAL_SPARE, %o0 ENDPROC(U3_retl_o2_and_7_plus_GS_plus_8) #endif -- 2.13.3 ^ permalink raw reply related [flat|nested] 21+ messages in thread
end of thread, other threads:[~2017-08-04 16:48 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-07-27 19:45 strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels Mikael Pettersson 2017-07-28 5:10 ` David Miller 2017-07-28 8:45 ` Mikael Pettersson 2017-07-28 18:27 ` David Miller 2017-07-28 18:37 ` David Miller 2017-07-29 10:52 ` Anatoly Pugachev 2017-07-29 12:02 ` Mikael Pettersson 2017-07-31 17:14 ` Mikael Pettersson 2017-07-31 21:48 ` Anatoly Pugachev 2017-07-31 21:51 ` David Miller 2017-07-31 22:01 ` Anatoly Pugachev 2017-07-31 22:06 ` David Miller 2017-08-01 7:29 ` Mikael Pettersson 2017-08-01 20:58 ` Sam Ravnborg 2017-08-02 21:36 ` Sam Ravnborg 2017-08-02 23:27 ` David Miller 2017-08-03 20:02 ` Mikael Pettersson 2017-08-03 21:57 ` David Miller 2017-08-04 5:44 ` Sam Ravnborg 2017-08-04 8:02 ` Mikael Pettersson 2017-08-04 16:48 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).