linux-mips.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* gcc-6.3.x miscompiling code for IP27?
@ 2017-01-22 23:28 Joshua Kinard
  2017-01-23  1:03 ` Joshua Kinard
  0 siblings, 1 reply; 9+ messages in thread
From: Joshua Kinard @ 2017-01-22 23:28 UTC (permalink / raw)
  To: Linux/MIPS

I think I've run into a really odd gcc-6.3.x miscompile bug here on IP27.
But I'm not sure.  I've reproduced the issue on 4.9.5, 4.8.17, and now
4.7.10 (which I KNOW should boot).  If I recompile the same 4.7.10 kernel
with gcc-5.4.0, though, it boots as expected.  The fault appears to be in
the assembly for _raw_spin_lock_irq.

This is the Oops message that I get:

[    0.918286] Checking for the daddi bug... no.
[    0.973207] CPU 0 Unable to handle kernel paging request at virtual address 0000000000000000, epc == a8000000006b4eb4, ra == a8000000006b4eac
[    1.124206] Oops[#1]:
[    1.151460] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.7.10-mipsgit-20160806 #1
[    1.242577] task: a8000000fe694a80 ti: a8000000fe6b8000 task.ti: a8000000fe6b8000
[    1.332616] $ 0   : 0000000000000000 ffffffff94005ce0 0000000000010000 0000000000000001
[    1.428969] $ 4   : 0000000000000000 a8000000fe694a80 ffffffff94005ce0 ffffffffffff00fe
[    1.525323] $ 8   : 0000000000000000 000000003a00da40 0000000000000000 00000000002f4a24
[    1.621677] $12   : a8000000fe6bbfe0 0000000000005c00 a8000000006c5cc8 0000000000000001
[    1.718032] $16   : 0000000000000000 a80000000201c300 0000000000000000 a80000000201c320
[    1.814388] $20   : 0000000000000000 a800000000841690 0000000000000000 a8000000fe010100
[    1.910742] $24   : 00000000fa83b2da a8000000000895c0
[    2.007098] $28   : a8000000fe6b8000 a8000000fe6bbda0 0000000000000000 a8000000006b4eac
[    2.103455] Hi    : 0000000000000000
[    2.146395] Lo    : 00103a8f6265ed12
[    2.189417] epc   : a8000000006b4eb4 _raw_spin_lock_irq+0x24/0x58
[    2.262660] ra    : a8000000006b4eac _raw_spin_lock_irq+0x1c/0x58
[    2.335962] Status: 94005ce2 KX SX UX KERNEL EXL
[    2.392521] Cause : 00008008 (ExcCode 02)
[    2.440698] BadVA : 0000000000000000
[    2.483640] PrId  : 00000f14 (R14000)
[    2.527636] Process kworker/0:0 (pid: 4, threadinfo=a8000000fe6b8000, task=a8000000fe694a80, tls=0000000000000000)
[    2.652258] Stack : 0000000000000000 a8000000000676b0 a800000000870000 a800000000870000
          0000000000000000 a8000000007de7f0 a8000001fc669e00 a8000000007e0000
          a8000000008e0000 a8000000fe047300 0000000000000000 a800000000841690
          0000000000000000 a8000000fe010100 0000000000000000 a80000000006ff00
          0000000000000000 0000000000000000 a8000000fe047300 0000000000000000
          0000000000000000 a8000000fe6bbe48 a8000000fe6bbe48 0000000000000000
          0000000000000000 a8000000fe6bbe68 a8000000fe6bbe68 a8000000fe696880
          a80000000006fdf0 a8000001fc669e00 a8000000fe696880 a8000000fe695480
          0000000000000000 a800000000024c08 0000000000000000 0000000000000000
          0000000000000000 0000000000000000 0000000000000000 0000000000000000
          ...
[    3.435669] Call Trace:
[    3.465003] [<a8000000006b4eb4>] _raw_spin_lock_irq+0x24/0x58
[    3.534172] [<a8000000000676b0>] worker_thread+0x2e8/0x760
[    3.600120] [<a80000000006ff00>] kthread+0x110/0x128
[    3.659827] [<a800000000024c08>] ret_from_kernel_thread+0x14/0x1c
[    3.733110]
[    3.750915]
Code: 00808025  bfb40000  3c020001 <c2030000> 00622021  e2040000  1080fffc  00032402  3063ffff
[    3.870390] ---[ end trace e2bb2e115aef4a1e ]---
[    3.925864] Kernel panic - not syncing: Fatal exception
[    3.988706] Reboot started from CPU 0


Here's the disassembled code for a 4.7.10 kernel built with gcc-6.3.0 for
IP27:
    a8000000006b4e90 <_raw_spin_lock_irq>:
    a8000000006b4e90:	bfb40000 	cache	0x14,0(sp)
    a8000000006b4e94:	ffa0bff0 	sd	zero,-16400(sp)
    a8000000006b4e98:	67bdfff0 	daddiu	sp,sp,-16
    a8000000006b4e9c:	ffb00000 	sd	s0,0(sp)
    a8000000006b4ea0:	ffbf0008 	sd	ra,8(sp)
    a8000000006b4ea4:	0c0eca58 	jal	a8000000003b2960 <arch_local_irq_disable>
    a8000000006b4ea8:	00808025 	move	s0,a0
    a8000000006b4eac:	bfb40000 	cache	0x14,0(sp)
    a8000000006b4eb0:	3c020001 	lui	v0,0x1
    a8000000006b4eb4:	c2030000 	ll	v1,0(s0)
    a8000000006b4eb8:	00622021 	addu	a0,v1,v0
    a8000000006b4ebc:	e2040000 	sc	a0,0(s0)
    a8000000006b4ec0:	1080fffc 	beqz	a0,a8000000006b4eb4 <_raw_spin_lock_irq+0x24>
    a8000000006b4ec4:	00032402 	srl	a0,v1,0x10
    a8000000006b4ec8:	3063ffff 	andi	v1,v1,0xffff
    a8000000006b4ecc:	14640161 	bne	v1,a0,a8000000006b5454 <_raw_write_unlock_irqrestore+0x7c>
    a8000000006b4ed0:	00831823 	subu	v1,a0,v1
    a8000000006b4ed4:	dfbf0008 	ld	ra,8(sp)
    a8000000006b4ed8:	dfb00000 	ld	s0,0(sp)
    a8000000006b4edc:	03e00008 	jr	ra
    a8000000006b4ee0:	67bd0010 	daddiu	sp,sp,16
    a8000000006b4ee4:	00000000 	nop


And here's the same kernel tree rebuilt with gcc-5.4.0 for IP27:
    a8000000006a7198 <_raw_spin_lock_irq>:
    a8000000006a7198:	67bdfff0 	daddiu	sp,sp,-16
    a8000000006a719c:	ffb00000 	sd	s0,0(sp)
    a8000000006a71a0:	ffbf0008 	sd	ra,8(sp)
    a8000000006a71a4:	0c0eac00 	jal	a8000000003ab000 <arch_local_irq_disable>
    a8000000006a71a8:	00808025 	move	s0,a0
    a8000000006a71ac:	bfb40000 	cache	0x14,0(sp)
    a8000000006a71b0:	3c020001 	lui	v0,0x1
    a8000000006a71b4:	c2030000 	ll	v1,0(s0)
    a8000000006a71b8:	00622021 	addu	a0,v1,v0
    a8000000006a71bc:	e2040000 	sc	a0,0(s0)
    a8000000006a71c0:	1080fffc 	beqz	a0,a8000000006a71b4 <_raw_spin_lock_irq+0x1c>
    a8000000006a71c4:	00032402 	srl	a0,v1,0x10
    a8000000006a71c8:	3063ffff 	andi	v1,v1,0xffff
    a8000000006a71cc:	14640156 	bne	v1,a0,a8000000006a7728 <_raw_write_unlock_irqrestore+0x78>
    a8000000006a71d0:	00831823 	subu	v1,a0,v1
    a8000000006a71d4:	dfbf0008 	ld	ra,8(sp)
    a8000000006a71d8:	dfb00000 	ld	s0,0(sp)
    a8000000006a71dc:	03e00008 	jr	ra
    a8000000006a71e0:	67bd0010 	daddiu	sp,sp,16
    a8000000006a71e4:	00000000 	nop


The failing instruction is the "ll v1,0(s0)" bit, so it looks like register
s0 is getting clobbered.  The only visible differences between these two
assembly fragments is the addition of these two assembler instructions at
the very top:
    cache  0x14,0(sp)
    sd     zero,-16400(sp)

Looking at the git source for arch/mips/include/asm/spinlock.h, nothing has
changed for this region of code recently, so this looks very much like an
issue in gcc-6.3 itself.  I cannot reproduce this problem on my SGI Octane.
It's been running a 4.8-series kernel for over a month now compiling new
Gentoo stages.  So I can't rule out a bug in the new IP27 code I'm using,
but I am kinda doubting that.

Does anyone have any ideas I can look into?  I'll start digging through
gcc-6.3's source to see if I can figure out why these two extra
instructions are being generated for hard-coded assembly.  I suspect that's
the fault here.  Although why it only seems to affect IP27 is unknown to me
at the moment.

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc-6.3.x miscompiling code for IP27?
  2017-01-22 23:28 gcc-6.3.x miscompiling code for IP27? Joshua Kinard
@ 2017-01-23  1:03 ` Joshua Kinard
  2017-01-23  2:24   ` Joshua Kinard
  0 siblings, 1 reply; 9+ messages in thread
From: Joshua Kinard @ 2017-01-23  1:03 UTC (permalink / raw)
  To: linux-mips

On 01/22/2017 18:28, Joshua Kinard wrote:
> I think I've run into a really odd gcc-6.3.x miscompile bug here on IP27.
> But I'm not sure.  I've reproduced the issue on 4.9.5, 4.8.17, and now
> 4.7.10 (which I KNOW should boot).  If I recompile the same 4.7.10 kernel
> with gcc-5.4.0, though, it boots as expected.  The fault appears to be in
> the assembly for _raw_spin_lock_irq.
> 

Figured it out.  Not 100% sure WHY, but gcc-6.3.x is causing kbuild to parse
the arch/mips/sgi-ip32/Platform file for some reason on both IP27 and IP30
builds, and is thusly appending -mr10k-cache-barrier=load-store to the kernel
CFLAGS.  It did this on my Octane's kernel as well, but the Octane seems to be
unaffected by the extraneous cache barriers.  I sent a fix in for this a long
time ago, but it never got accepted.  So I'll try again...

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc-6.3.x miscompiling code for IP27?
  2017-01-23  1:03 ` Joshua Kinard
@ 2017-01-23  2:24   ` Joshua Kinard
  2017-01-23 18:00     ` Joshua Kinard
  0 siblings, 1 reply; 9+ messages in thread
From: Joshua Kinard @ 2017-01-23  2:24 UTC (permalink / raw)
  To: linux-mips

On 01/22/2017 20:03, Joshua Kinard wrote:
> On 01/22/2017 18:28, Joshua Kinard wrote:
>> I think I've run into a really odd gcc-6.3.x miscompile bug here on IP27.
>> But I'm not sure.  I've reproduced the issue on 4.9.5, 4.8.17, and now
>> 4.7.10 (which I KNOW should boot).  If I recompile the same 4.7.10 kernel
>> with gcc-5.4.0, though, it boots as expected.  The fault appears to be in
>> the assembly for _raw_spin_lock_irq.
>>
> 
> Figured it out.  Not 100% sure WHY, but gcc-6.3.x is causing kbuild to parse
> the arch/mips/sgi-ip32/Platform file for some reason on both IP27 and IP30
> builds, and is thusly appending -mr10k-cache-barrier=load-store to the kernel
> CFLAGS.  It did this on my Octane's kernel as well, but the Octane seems to be
> unaffected by the extraneous cache barriers.  I sent a fix in for this a long
> time ago, but it never got accepted.  So I'll try again...
> 

Nope.  I was wrong.  Still happens even after fixing the erroneous
mr10k-cache-barrier thing.  I'll send a patch in for that later now, but
looking at other sections of disassembly, I am see a pattern of this "sd
zero,..." instruction being placed at the beginning of most functions, before
most "daddiu" instructions.  I even test-compiled a vanilla kernel as well, and
the same issue is happening there when looking at disassembly (test boot also
Oopses):

Examples:

a80000000001c400 <run_init_process>:
a80000000001c400:       ffa0bff0        sd      zero,-16400(sp)
a80000000001c404:       67bdfff0        daddiu  sp,sp,-16

a80000000001d740 <per_cpu_init>:
a80000000001d740:       ffa0bfc0        sd      zero,-16448(sp)
a80000000001d744:       2405ffc9        li      a1,-55
a80000000001d748:       67bdffc0        daddiu  sp,sp,-64

a80000000001cea0 <ip27_be_handler>:
a80000000001cea0:       ffa0bfe0        sd      zero,-16416(sp)
a80000000001cea4:       67bdffe0        daddiu  sp,sp,-32

a8000000000256c0 <__compute_return_epc>:
a8000000000256c0:       ffa0bff0        sd      zero,-16400(sp)
a8000000000256c4:       67bdfff0        daddiu  sp,sp,-16

a80000000001c5b0 <name_to_dev_t>:
a80000000001c5b0:       ffa0bf90        sd      zero,-16496(sp)
a80000000001c5b4:       3c05a800        lui     a1,0xa800
a80000000001c5b8:       3c020074        lui     v0,0x74
a80000000001c5bc:       64a50000        daddiu  a1,a1,0
a80000000001c5c0:       64424840        daddiu  v0,v0,18496
a80000000001c5c4:       0005283c        dsll32  a1,a1,0x0
a80000000001c5c8:       67bdff90        daddiu  sp,sp,-112


I am not sure what to call this.  This is definitely not happening with a
gcc-5.4.x-built kernel, so it's a code-generation issue of some kind:

a80000000001c400 <run_init_process>:
a80000000001c400:	67bdfff0 	daddiu	sp,sp,-16
a80000000001c404:	3c02007b 	lui	v0,0x7b

a80000000001cec0 <ip27_be_handler>:
a80000000001cec0:	67bdffe0 	daddiu	sp,sp,-32
a80000000001cec4:	ffb00000 	sd	s0,0(sp)

a80000000001c5a0 <name_to_dev_t>:
a80000000001c5a0:	3c05a800 	lui	a1,0xa800
a80000000001c5a4:	3c020075 	lui	v0,0x75
a80000000001c5a8:	64a50000 	daddiu	a1,a1,0
a80000000001c5ac:	64423f40 	daddiu	v0,v0,16192
a80000000001c5b0:	0005283c 	dsll32	a1,a1,0x0
a80000000001c5b4:	67bdff90 	daddiu	sp,sp,-112


Oddly enough, Octane is definitely not bothered by this extraneous
store-doubleword instruction.  Only IP27 appears to be, which may explain why
it's gone unnoticed thus far.  Maybe NUMA-related?

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc-6.3.x miscompiling code for IP27?
  2017-01-23  2:24   ` Joshua Kinard
@ 2017-01-23 18:00     ` Joshua Kinard
  2017-01-24 15:45       ` James Hogan
  0 siblings, 1 reply; 9+ messages in thread
From: Joshua Kinard @ 2017-01-23 18:00 UTC (permalink / raw)
  To: linux-mips

On 01/22/2017 21:24, Joshua Kinard wrote:
> On 01/22/2017 20:03, Joshua Kinard wrote:
>> On 01/22/2017 18:28, Joshua Kinard wrote:
>>> I think I've run into a really odd gcc-6.3.x miscompile bug here on IP27.
>>> But I'm not sure.  I've reproduced the issue on 4.9.5, 4.8.17, and now
>>> 4.7.10 (which I KNOW should boot).  If I recompile the same 4.7.10 kernel
>>> with gcc-5.4.0, though, it boots as expected.  The fault appears to be in
>>> the assembly for _raw_spin_lock_irq.

[snip]

> I am not sure what to call this.  This is definitely not happening with a
> gcc-5.4.x-built kernel, so it's a code-generation issue of some kind:

With some help from the gcc mailing list, the "sd" instructions emitted at the
beginning of every function is from the -fstack-check flag.  It enables
stack-probing to ensure there is sufficient space on the stack, else it'll
trigger a SEGV.  My guess is for IP27, at this early point in boot, the memory
system isn't up and running yet, due to IP27's somewhat-unique nature.  Thus,
this stack-probe will fail and trigger the NULL deref in _raw_spin_lock_irq().

The workaround I am going to go with is to add -fno-stack-check to IP27's
arch/mips/sgi-ip27/Platform file to disable stack-checking.  As far as I can
tell, this solves the issue on IP27 by not emitted these instructions anymore
(verified w/ objdump).  Not sure where this flag is getting set from.  Could be
set in my distro's toolchain somewhere.  But since at least IP30 is not
affected, I am going to assume it's specific to IP27 for now.

I'll send a patch in later this week...

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc-6.3.x miscompiling code for IP27?
  2017-01-23 18:00     ` Joshua Kinard
@ 2017-01-24 15:45       ` James Hogan
  2017-01-24 15:45         ` James Hogan
  2017-01-25 19:15         ` Joshua Kinard
  0 siblings, 2 replies; 9+ messages in thread
From: James Hogan @ 2017-01-24 15:45 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: linux-mips

[-- Attachment #1: Type: text/plain, Size: 2524 bytes --]

Hi Joshua,

On Mon, Jan 23, 2017 at 01:00:52PM -0500, Joshua Kinard wrote:
> On 01/22/2017 21:24, Joshua Kinard wrote:
> > On 01/22/2017 20:03, Joshua Kinard wrote:
> >> On 01/22/2017 18:28, Joshua Kinard wrote:
> >>> I think I've run into a really odd gcc-6.3.x miscompile bug here on IP27.
> >>> But I'm not sure.  I've reproduced the issue on 4.9.5, 4.8.17, and now
> >>> 4.7.10 (which I KNOW should boot).  If I recompile the same 4.7.10 kernel
> >>> with gcc-5.4.0, though, it boots as expected.  The fault appears to be in
> >>> the assembly for _raw_spin_lock_irq.
> 
> [snip]
> 
> > I am not sure what to call this.  This is definitely not happening with a
> > gcc-5.4.x-built kernel, so it's a code-generation issue of some kind:
> 
> With some help from the gcc mailing list, the "sd" instructions emitted at the
> beginning of every function is from the -fstack-check flag.  It enables
> stack-probing to ensure there is sufficient space on the stack, else it'll
> trigger a SEGV.  My guess is for IP27, at this early point in boot, the memory
> system isn't up and running yet, due to IP27's somewhat-unique nature.  Thus,
> this stack-probe will fail and trigger the NULL deref in _raw_spin_lock_irq().
> 
> The workaround I am going to go with is to add -fno-stack-check to IP27's
> arch/mips/sgi-ip27/Platform file to disable stack-checking.  As far as I can
> tell, this solves the issue on IP27 by not emitted these instructions anymore
> (verified w/ objdump).  Not sure where this flag is getting set from.  Could be
> set in my distro's toolchain somewhere.  But since at least IP30 is not
> affected, I am going to assume it's specific to IP27 for now.

Interesting. It definitely looks like an option we should not be using
in the kernel, regardless of platform, since kernel stacks are
relatively small, put in unmapped virtual memory and don't grow when
overflowed. At least until we get mapped stacks (which has its own set
of hurdles) its more likely to just corrupt other kernel memory and
cause seemingly random crashes like this one.

Cheers
James

> 
> I'll send a patch in later this week...
> 
> -- 
> Joshua Kinard
> Gentoo/MIPS
> kumba@gentoo.org
> 6144R/F5C6C943 2015-04-27
> 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943
> 
> "The past tempts us, the present confuses us, the future frightens us.  And our
> lives slip away, moment by moment, lost in that vast, terrible in-between."
> 
> --Emperor Turhan, Centauri Republic
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc-6.3.x miscompiling code for IP27?
  2017-01-24 15:45       ` James Hogan
@ 2017-01-24 15:45         ` James Hogan
  2017-01-25 19:15         ` Joshua Kinard
  1 sibling, 0 replies; 9+ messages in thread
From: James Hogan @ 2017-01-24 15:45 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: linux-mips

[-- Attachment #1: Type: text/plain, Size: 2524 bytes --]

Hi Joshua,

On Mon, Jan 23, 2017 at 01:00:52PM -0500, Joshua Kinard wrote:
> On 01/22/2017 21:24, Joshua Kinard wrote:
> > On 01/22/2017 20:03, Joshua Kinard wrote:
> >> On 01/22/2017 18:28, Joshua Kinard wrote:
> >>> I think I've run into a really odd gcc-6.3.x miscompile bug here on IP27.
> >>> But I'm not sure.  I've reproduced the issue on 4.9.5, 4.8.17, and now
> >>> 4.7.10 (which I KNOW should boot).  If I recompile the same 4.7.10 kernel
> >>> with gcc-5.4.0, though, it boots as expected.  The fault appears to be in
> >>> the assembly for _raw_spin_lock_irq.
> 
> [snip]
> 
> > I am not sure what to call this.  This is definitely not happening with a
> > gcc-5.4.x-built kernel, so it's a code-generation issue of some kind:
> 
> With some help from the gcc mailing list, the "sd" instructions emitted at the
> beginning of every function is from the -fstack-check flag.  It enables
> stack-probing to ensure there is sufficient space on the stack, else it'll
> trigger a SEGV.  My guess is for IP27, at this early point in boot, the memory
> system isn't up and running yet, due to IP27's somewhat-unique nature.  Thus,
> this stack-probe will fail and trigger the NULL deref in _raw_spin_lock_irq().
> 
> The workaround I am going to go with is to add -fno-stack-check to IP27's
> arch/mips/sgi-ip27/Platform file to disable stack-checking.  As far as I can
> tell, this solves the issue on IP27 by not emitted these instructions anymore
> (verified w/ objdump).  Not sure where this flag is getting set from.  Could be
> set in my distro's toolchain somewhere.  But since at least IP30 is not
> affected, I am going to assume it's specific to IP27 for now.

Interesting. It definitely looks like an option we should not be using
in the kernel, regardless of platform, since kernel stacks are
relatively small, put in unmapped virtual memory and don't grow when
overflowed. At least until we get mapped stacks (which has its own set
of hurdles) its more likely to just corrupt other kernel memory and
cause seemingly random crashes like this one.

Cheers
James

> 
> I'll send a patch in later this week...
> 
> -- 
> Joshua Kinard
> Gentoo/MIPS
> kumba@gentoo.org
> 6144R/F5C6C943 2015-04-27
> 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943
> 
> "The past tempts us, the present confuses us, the future frightens us.  And our
> lives slip away, moment by moment, lost in that vast, terrible in-between."
> 
> --Emperor Turhan, Centauri Republic
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc-6.3.x miscompiling code for IP27?
  2017-01-24 15:45       ` James Hogan
  2017-01-24 15:45         ` James Hogan
@ 2017-01-25 19:15         ` Joshua Kinard
  2017-01-26 16:17           ` Maciej W. Rozycki
  1 sibling, 1 reply; 9+ messages in thread
From: Joshua Kinard @ 2017-01-25 19:15 UTC (permalink / raw)
  To: James Hogan; +Cc: linux-mips

On 01/24/2017 10:45, James Hogan wrote:
> Hi Joshua,
> 
> On Mon, Jan 23, 2017 at 01:00:52PM -0500, Joshua Kinard wrote:
>> On 01/22/2017 21:24, Joshua Kinard wrote:
>>> On 01/22/2017 20:03, Joshua Kinard wrote:
>>>> On 01/22/2017 18:28, Joshua Kinard wrote:
>>>>> I think I've run into a really odd gcc-6.3.x miscompile bug here on IP27.
>>>>> But I'm not sure.  I've reproduced the issue on 4.9.5, 4.8.17, and now
>>>>> 4.7.10 (which I KNOW should boot).  If I recompile the same 4.7.10 kernel
>>>>> with gcc-5.4.0, though, it boots as expected.  The fault appears to be in
>>>>> the assembly for _raw_spin_lock_irq.
>>
>> [snip]
>>
>>> I am not sure what to call this.  This is definitely not happening with a
>>> gcc-5.4.x-built kernel, so it's a code-generation issue of some kind:
>>
>> With some help from the gcc mailing list, the "sd" instructions emitted at the
>> beginning of every function is from the -fstack-check flag.  It enables
>> stack-probing to ensure there is sufficient space on the stack, else it'll
>> trigger a SEGV.  My guess is for IP27, at this early point in boot, the memory
>> system isn't up and running yet, due to IP27's somewhat-unique nature.  Thus,
>> this stack-probe will fail and trigger the NULL deref in _raw_spin_lock_irq().
>>
>> The workaround I am going to go with is to add -fno-stack-check to IP27's
>> arch/mips/sgi-ip27/Platform file to disable stack-checking.  As far as I can
>> tell, this solves the issue on IP27 by not emitted these instructions anymore
>> (verified w/ objdump).  Not sure where this flag is getting set from.  Could be
>> set in my distro's toolchain somewhere.  But since at least IP30 is not
>> affected, I am going to assume it's specific to IP27 for now.
> 
> Interesting. It definitely looks like an option we should not be using
> in the kernel, regardless of platform, since kernel stacks are
> relatively small, put in unmapped virtual memory and don't grow when
> overflowed. At least until we get mapped stacks (which has its own set
> of hurdles) its more likely to just corrupt other kernel memory and
> cause seemingly random crashes like this one.
> 
> Cheers
> James

Instead of making -fno-stack-check IP27-only, I can do a patch for the main
arch/mips/Makefile instead to turn it off globally.  It looks like this option
has been available in gcc as far back as at least 3.0.4, so would any kind of
compatibility/version check for gcc be needed?  I'm not sure what the oldest
gcc supported by the MIPS code currently is.

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc-6.3.x miscompiling code for IP27?
  2017-01-25 19:15         ` Joshua Kinard
@ 2017-01-26 16:17           ` Maciej W. Rozycki
  2017-01-26 16:17             ` Maciej W. Rozycki
  0 siblings, 1 reply; 9+ messages in thread
From: Maciej W. Rozycki @ 2017-01-26 16:17 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: James Hogan, linux-mips

On Wed, 25 Jan 2017, Joshua Kinard wrote:

> Instead of making -fno-stack-check IP27-only, I can do a patch for the main
> arch/mips/Makefile instead to turn it off globally.  It looks like this option
> has been available in gcc as far back as at least 3.0.4, so would any kind of
> compatibility/version check for gcc be needed?  I'm not sure what the oldest
> gcc supported by the MIPS code currently is.

 Wrapping a compiler option into `$(call cc-option,...)' is always safe to 
do if unsure.  In this case however Documentation/Changes states 3.2 as 
the minimum GCC version so it looks to me like no such check is required.

  Maciej

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc-6.3.x miscompiling code for IP27?
  2017-01-26 16:17           ` Maciej W. Rozycki
@ 2017-01-26 16:17             ` Maciej W. Rozycki
  0 siblings, 0 replies; 9+ messages in thread
From: Maciej W. Rozycki @ 2017-01-26 16:17 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: James Hogan, linux-mips

On Wed, 25 Jan 2017, Joshua Kinard wrote:

> Instead of making -fno-stack-check IP27-only, I can do a patch for the main
> arch/mips/Makefile instead to turn it off globally.  It looks like this option
> has been available in gcc as far back as at least 3.0.4, so would any kind of
> compatibility/version check for gcc be needed?  I'm not sure what the oldest
> gcc supported by the MIPS code currently is.

 Wrapping a compiler option into `$(call cc-option,...)' is always safe to 
do if unsure.  In this case however Documentation/Changes states 3.2 as 
the minimum GCC version so it looks to me like no such check is required.

  Maciej

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-01-26 16:17 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-22 23:28 gcc-6.3.x miscompiling code for IP27? Joshua Kinard
2017-01-23  1:03 ` Joshua Kinard
2017-01-23  2:24   ` Joshua Kinard
2017-01-23 18:00     ` Joshua Kinard
2017-01-24 15:45       ` James Hogan
2017-01-24 15:45         ` James Hogan
2017-01-25 19:15         ` Joshua Kinard
2017-01-26 16:17           ` Maciej W. Rozycki
2017-01-26 16:17             ` Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).