linux-mips.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel
@ 2017-03-15 20:11 Joshua Kinard
  2017-03-16  3:50 ` Joshua Kinard
  0 siblings, 1 reply; 12+ messages in thread
From: Joshua Kinard @ 2017-03-15 20:11 UTC (permalink / raw)
  To: Linux/MIPS

I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a
kernel that can't boot on several SGI platforms.  It turns out that using
arcload (Stan's bootloader originally written for IP30), I can get some
debugging out on why.  I am still puzzled, but maybe this information can be
interpreted by someone else into something meaningful?

All addresses printed out of arcload are physical address.

ARCS Memory Map as printed by some debugging I added to the arcload binary:

0x00000000 - 0x00001000 ExceptionBlock
0x00001000 - 0x00002000 SystemParameterBlock
0x00002000 - 0x00004000 FirmwarePermanent
0x20004000 - 0x20f00000 FreeMemory***
0x20f00000 - 0x21000000 FirmwareTemporary
0x21000000 - 0x5fff0000 FreeMemory
0x5fff0000 - 0x5ffff000 LoadedProgram
0x5ffff000 - 0x60000000 FreeMemory
0x60000000 - 0xa0000000 FirmwarePermanent

The ***'ed FreeMemory segment is where the kernel is supposed to load.  Here's
the debugging for a kernel WITHOUT CONFIG_DEBUG_LOCK_ALLOC enabled (4102norm):

ELF Start: 0x20004000
Elf End  : 0x20a6fdd0
Size     : 0x00a6bdd0 (~10MB?)

# ls -l 4102norm
-rwxr-xr-x 1 root root 28M Mar 15 15:12 4102norm*


And the debugging kernel compiled with CONFIG_DEBUG_LOCK_ALLOC=y (no other
config changes from above):

ELF Start: 0x20004000
Elf End  : 0x2148bf80
Size     : 0x01487f80 (~20MB?)

# ls -l 4102dbg
-rwxr-xr-x 1 root root 29M Mar 15 15:21 4102dbg*


I am only using the traditional "vmlinux" make target, so there shouldn't be
any compression involved here.  Yet, it looks like, according to ARCS anyways,
that CONFIG_DEBUG_LOCK_ALLOC is adding an additional 10MB of "something", yet
the vmlinux file only grows by roughly 1MB.

If I examine both kernels with readelf and dump the program headers, I can see
these two sizes reflected under "MemSiz":

# mips64-unknown-linux-gnu-readelf -l 4102norm

Elf file type is EXEC (Executable file)
Entry point 0xa800000020700450
There are 2 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000004000 0xa800000020004000 0xa800000020004000
                 0x00000000009a5030 0x0000000000a6bdd0  RWE    10000
  NOTE           0x0000000000714bb0 0xa800000020714bb0 0xa800000020714bb0
                 0x0000000000000024 0x0000000000000024  R      4

# mips64-unknown-linux-gnu-readelf -l 4102dbg

Elf file type is EXEC (Executable file)
Entry point 0xa800000020749c80
There are 2 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000004000 0xa800000020004000 0xa800000020004000
                 0x0000000000a05850 0x0000000001487f80  RWE    10000
  NOTE           0x000000000075e330 0xa80000002075e330 0xa80000002075e330
                 0x0000000000000024 0x0000000000000024  R      4


So I'm not quite certain why ARCS or arcload dislike kernels with
CONFIG_DEBUG_LOCK_ALLOC=y.  This issue is known about on at least IP27 and IP30
platforms for the past few years, and it's been quite a hindrance in doing any
debugging of locks.

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel
  2017-03-15 20:11 ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel Joshua Kinard
@ 2017-03-16  3:50 ` Joshua Kinard
  2017-03-16 14:09   ` Ralf Baechle
  0 siblings, 1 reply; 12+ messages in thread
From: Joshua Kinard @ 2017-03-16  3:50 UTC (permalink / raw)
  To: linux-mips

On 03/15/2017 16:11, Joshua Kinard wrote:
> I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a
> kernel that can't boot on several SGI platforms.  It turns out that using
> arcload (Stan's bootloader originally written for IP30), I can get some
> debugging out on why.  I am still puzzled, but maybe this information can be
> interpreted by someone else into something meaningful?
> 
> All addresses printed out of arcload are physical address.
> 
> ARCS Memory Map as printed by some debugging I added to the arcload binary:
> 
> 0x00000000 - 0x00001000 ExceptionBlock
> 0x00001000 - 0x00002000 SystemParameterBlock
> 0x00002000 - 0x00004000 FirmwarePermanent
> 0x20004000 - 0x20f00000 FreeMemory***
> 0x20f00000 - 0x21000000 FirmwareTemporary
> 0x21000000 - 0x5fff0000 FreeMemory
> 0x5fff0000 - 0x5ffff000 LoadedProgram
> 0x5ffff000 - 0x60000000 FreeMemory
> 0x60000000 - 0xa0000000 FirmwarePermanent

So it turns out I can get away, on Octane at least, by changing the load
address from 0x20004000 to an arbitrary value in the other FreeMemory segment
from 0x21000000 - 0x5fff0000.  Specifically, using 0x21004000 appears to work
without any ill effects.

The 0x20004000 value is the address used by IRIX to load (with symon, it
becomes 0x200800000 instead).  I'll have to try this on the IP27 later on as
well.  On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking
issues yet.  Probably need to hammer the disks with bonnie++ or such.  At least
I can get back to the BRIDGE/PCI mess now...

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel
  2017-03-16  3:50 ` Joshua Kinard
@ 2017-03-16 14:09   ` Ralf Baechle
  2017-03-16 17:50     ` Joshua Kinard
  0 siblings, 1 reply; 12+ messages in thread
From: Ralf Baechle @ 2017-03-16 14:09 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: linux-mips

On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote:

> On 03/15/2017 16:11, Joshua Kinard wrote:
> > I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a
> > kernel that can't boot on several SGI platforms.  It turns out that using
> > arcload (Stan's bootloader originally written for IP30), I can get some
> > debugging out on why.  I am still puzzled, but maybe this information can be
> > interpreted by someone else into something meaningful?
> > 
> > All addresses printed out of arcload are physical address.
> > 
> > ARCS Memory Map as printed by some debugging I added to the arcload binary:
> > 
> > 0x00000000 - 0x00001000 ExceptionBlock
> > 0x00001000 - 0x00002000 SystemParameterBlock
> > 0x00002000 - 0x00004000 FirmwarePermanent
> > 0x20004000 - 0x20f00000 FreeMemory***
> > 0x20f00000 - 0x21000000 FirmwareTemporary
> > 0x21000000 - 0x5fff0000 FreeMemory
> > 0x5fff0000 - 0x5ffff000 LoadedProgram
> > 0x5ffff000 - 0x60000000 FreeMemory
> > 0x60000000 - 0xa0000000 FirmwarePermanent
> 
> So it turns out I can get away, on Octane at least, by changing the load
> address from 0x20004000 to an arbitrary value in the other FreeMemory segment
> from 0x21000000 - 0x5fff0000.  Specifically, using 0x21004000 appears to work
> without any ill effects.
> 
> The 0x20004000 value is the address used by IRIX to load (with symon, it
> becomes 0x200800000 instead).  I'll have to try this on the IP27 later on as
> well.  On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking
> issues yet.  Probably need to hammer the disks with bonnie++ or such.  At least
> I can get back to the BRIDGE/PCI mess now...

I'm wondering where the ARC stack is on kernel entry if maybe the
ARC stack has corrupted the kernel?  If possible, can you get your
kernel or a test program to compute a checksum over itself to see
if it has been corrupted?

Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies.
Trust is futile.  Even if ARC(S) claims something is free I'd rather
not rely on it.

  Ralf

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel
  2017-03-16 14:09   ` Ralf Baechle
@ 2017-03-16 17:50     ` Joshua Kinard
  2017-03-16 19:06       ` Ralf Baechle
  0 siblings, 1 reply; 12+ messages in thread
From: Joshua Kinard @ 2017-03-16 17:50 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

On 03/16/2017 10:09, Ralf Baechle wrote:
> On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote:
> 
>> On 03/15/2017 16:11, Joshua Kinard wrote:
>>> I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a
>>> kernel that can't boot on several SGI platforms.  It turns out that using
>>> arcload (Stan's bootloader originally written for IP30), I can get some
>>> debugging out on why.  I am still puzzled, but maybe this information can be
>>> interpreted by someone else into something meaningful?
>>>
>>> All addresses printed out of arcload are physical address.
>>>
>>> ARCS Memory Map as printed by some debugging I added to the arcload binary:
>>>
>>> 0x00000000 - 0x00001000 ExceptionBlock
>>> 0x00001000 - 0x00002000 SystemParameterBlock
>>> 0x00002000 - 0x00004000 FirmwarePermanent
>>> 0x20004000 - 0x20f00000 FreeMemory***
>>> 0x20f00000 - 0x21000000 FirmwareTemporary
>>> 0x21000000 - 0x5fff0000 FreeMemory
>>> 0x5fff0000 - 0x5ffff000 LoadedProgram
>>> 0x5ffff000 - 0x60000000 FreeMemory
>>> 0x60000000 - 0xa0000000 FirmwarePermanent
>>
>> So it turns out I can get away, on Octane at least, by changing the load
>> address from 0x20004000 to an arbitrary value in the other FreeMemory segment
>> from 0x21000000 - 0x5fff0000.  Specifically, using 0x21004000 appears to work
>> without any ill effects.
>>
>> The 0x20004000 value is the address used by IRIX to load (with symon, it
>> becomes 0x200800000 instead).  I'll have to try this on the IP27 later on as
>> well.  On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking
>> issues yet.  Probably need to hammer the disks with bonnie++ or such.  At least
>> I can get back to the BRIDGE/PCI mess now...
> 
> I'm wondering where the ARC stack is on kernel entry if maybe the
> ARC stack has corrupted the kernel?  If possible, can you get your
> kernel or a test program to compute a checksum over itself to see
> if it has been corrupted?

As far as I can tell, it really does seem that it is a sizing issue.  I don't
have the time to dive into what CONFIG_DEBUG_LOCK_ALLOC is exactly doing, but I
found one hit on LKML (lost the URL) that indicates it fluffs up a particular
struct that is very common and so introduces a fair bit of bloat, and it seems
possible that the 0x20004000-0x20f00000 really is too small.  I wouldn't rule
out the possibility that SGI designed ARCS on the Octane to allow only IRIX to
load at this particular address and Linux has just gotten lucky thus far.

As for whether loading at the next FreeMemory segment in 0x21000000-0x5fff0000
smashes any ARCS segments, that I am not sure about.  A kernel booting in that
segment does boot, and seems to behave no differently than a kernel booting in
the other segment, including exhibiting the same bugs.  Like IP27, Octane
doesn't have a need for ARCS after the kernel boots, as resetting the system
can be done by flipping a bit in HEART, and power down is handled by the RTC
driver (this feature broke, though, and I haven't chased down why yet).  So if
we're clobbering ARCS using this load address...well, it can't be all that bad
</famous-last-words>

I'll see what IP27 does, assuming it even has a large enough FreeMemory segment
to work with.


> Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies.
> Trust is futile.  Even if ARC(S) claims something is free I'd rather
> not rely on it.

Apparently, and only on Octane, ARCS detects and maps out only the first 1GB of
RAM.  All remaining RAM installed in the system is marked as FirmwarePermanent
and mapped into 0x60000000 on up.


-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel
  2017-03-16 17:50     ` Joshua Kinard
@ 2017-03-16 19:06       ` Ralf Baechle
  2017-03-16 20:02         ` Joshua Kinard
  0 siblings, 1 reply; 12+ messages in thread
From: Ralf Baechle @ 2017-03-16 19:06 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: linux-mips

On Thu, Mar 16, 2017 at 01:50:42PM -0400, Joshua Kinard wrote:

> On 03/16/2017 10:09, Ralf Baechle wrote:
> > On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote:
> > 
> >> On 03/15/2017 16:11, Joshua Kinard wrote:
> >>> I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a
> >>> kernel that can't boot on several SGI platforms.  It turns out that using
> >>> arcload (Stan's bootloader originally written for IP30), I can get some
> >>> debugging out on why.  I am still puzzled, but maybe this information can be
> >>> interpreted by someone else into something meaningful?
> >>>
> >>> All addresses printed out of arcload are physical address.
> >>>
> >>> ARCS Memory Map as printed by some debugging I added to the arcload binary:
> >>>
> >>> 0x00000000 - 0x00001000 ExceptionBlock
> >>> 0x00001000 - 0x00002000 SystemParameterBlock
> >>> 0x00002000 - 0x00004000 FirmwarePermanent
> >>> 0x20004000 - 0x20f00000 FreeMemory***
> >>> 0x20f00000 - 0x21000000 FirmwareTemporary
> >>> 0x21000000 - 0x5fff0000 FreeMemory
> >>> 0x5fff0000 - 0x5ffff000 LoadedProgram
> >>> 0x5ffff000 - 0x60000000 FreeMemory
> >>> 0x60000000 - 0xa0000000 FirmwarePermanent
> >>
> >> So it turns out I can get away, on Octane at least, by changing the load
> >> address from 0x20004000 to an arbitrary value in the other FreeMemory segment
> >> from 0x21000000 - 0x5fff0000.  Specifically, using 0x21004000 appears to work
> >> without any ill effects.
> >>
> >> The 0x20004000 value is the address used by IRIX to load (with symon, it
> >> becomes 0x200800000 instead).  I'll have to try this on the IP27 later on as
> >> well.  On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking
> >> issues yet.  Probably need to hammer the disks with bonnie++ or such.  At least
> >> I can get back to the BRIDGE/PCI mess now...
> > 
> > I'm wondering where the ARC stack is on kernel entry if maybe the
> > ARC stack has corrupted the kernel?  If possible, can you get your
> > kernel or a test program to compute a checksum over itself to see
> > if it has been corrupted?
> 
> As far as I can tell, it really does seem that it is a sizing issue.  I don't
> have the time to dive into what CONFIG_DEBUG_LOCK_ALLOC is exactly doing, but I
> found one hit on LKML (lost the URL) that indicates it fluffs up a particular
> struct that is very common and so introduces a fair bit of bloat, and it seems
> possible that the 0x20004000-0x20f00000 really is too small.  I wouldn't rule
> out the possibility that SGI designed ARCS on the Octane to allow only IRIX to
> load at this particular address and Linux has just gotten lucky thus far.
> 
> As for whether loading at the next FreeMemory segment in 0x21000000-0x5fff0000
> smashes any ARCS segments, that I am not sure about.  A kernel booting in that
> segment does boot, and seems to behave no differently than a kernel booting in
> the other segment, including exhibiting the same bugs.  Like IP27, Octane
> doesn't have a need for ARCS after the kernel boots, as resetting the system
> can be done by flipping a bit in HEART, and power down is handled by the RTC
> driver (this feature broke, though, and I haven't chased down why yet).  So if
> we're clobbering ARCS using this load address...well, it can't be all that bad
> </famous-last-words>
> 
> I'll see what IP27 does, assuming it even has a large enough FreeMemory segment
> to work with.
> 
> 
> > Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies.
> > Trust is futile.  Even if ARC(S) claims something is free I'd rather
> > not rely on it.
> 
> Apparently, and only on Octane, ARCS detects and maps out only the first 1GB of
> RAM.  All remaining RAM installed in the system is marked as FirmwarePermanent
> and mapped into 0x60000000 on up.

I think on IP27 it was only the first 32MB that are somehow used by
ARCS.  Everything else is entirely ignored and the OS is supposed to
use klconfig to query the hardware configuration.  That said, klconfig
is an infinitely better than ARCS, it actually works and is easy to
use.  What it does not provide is information on how firmware or
other loaded programs are using memory - it's really just a hardware
inventory.

  Ralf

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel
  2017-03-16 19:06       ` Ralf Baechle
@ 2017-03-16 20:02         ` Joshua Kinard
  2017-03-16 20:50           ` Ralf Baechle
  0 siblings, 1 reply; 12+ messages in thread
From: Joshua Kinard @ 2017-03-16 20:02 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

On 03/16/2017 15:06, Ralf Baechle wrote:
> On Thu, Mar 16, 2017 at 01:50:42PM -0400, Joshua Kinard wrote:
> 
>> On 03/16/2017 10:09, Ralf Baechle wrote:
>>> On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote:
>>>
>>>> On 03/15/2017 16:11, Joshua Kinard wrote:
>>>>> I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a
>>>>> kernel that can't boot on several SGI platforms.  It turns out that using
>>>>> arcload (Stan's bootloader originally written for IP30), I can get some
>>>>> debugging out on why.  I am still puzzled, but maybe this information can be
>>>>> interpreted by someone else into something meaningful?
>>>>>
>>>>> All addresses printed out of arcload are physical address.
>>>>>
>>>>> ARCS Memory Map as printed by some debugging I added to the arcload binary:
>>>>>
>>>>> 0x00000000 - 0x00001000 ExceptionBlock
>>>>> 0x00001000 - 0x00002000 SystemParameterBlock
>>>>> 0x00002000 - 0x00004000 FirmwarePermanent
>>>>> 0x20004000 - 0x20f00000 FreeMemory***
>>>>> 0x20f00000 - 0x21000000 FirmwareTemporary
>>>>> 0x21000000 - 0x5fff0000 FreeMemory
>>>>> 0x5fff0000 - 0x5ffff000 LoadedProgram
>>>>> 0x5ffff000 - 0x60000000 FreeMemory
>>>>> 0x60000000 - 0xa0000000 FirmwarePermanent
>>>>
>>>> So it turns out I can get away, on Octane at least, by changing the load
>>>> address from 0x20004000 to an arbitrary value in the other FreeMemory segment
>>>> from 0x21000000 - 0x5fff0000.  Specifically, using 0x21004000 appears to work
>>>> without any ill effects.
>>>>
>>>> The 0x20004000 value is the address used by IRIX to load (with symon, it
>>>> becomes 0x200800000 instead).  I'll have to try this on the IP27 later on as
>>>> well.  On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking
>>>> issues yet.  Probably need to hammer the disks with bonnie++ or such.  At least
>>>> I can get back to the BRIDGE/PCI mess now...
>>>
>>> I'm wondering where the ARC stack is on kernel entry if maybe the
>>> ARC stack has corrupted the kernel?  If possible, can you get your
>>> kernel or a test program to compute a checksum over itself to see
>>> if it has been corrupted?
>>
>> As far as I can tell, it really does seem that it is a sizing issue.  I don't
>> have the time to dive into what CONFIG_DEBUG_LOCK_ALLOC is exactly doing, but I
>> found one hit on LKML (lost the URL) that indicates it fluffs up a particular
>> struct that is very common and so introduces a fair bit of bloat, and it seems
>> possible that the 0x20004000-0x20f00000 really is too small.  I wouldn't rule
>> out the possibility that SGI designed ARCS on the Octane to allow only IRIX to
>> load at this particular address and Linux has just gotten lucky thus far.
>>
>> As for whether loading at the next FreeMemory segment in 0x21000000-0x5fff0000
>> smashes any ARCS segments, that I am not sure about.  A kernel booting in that
>> segment does boot, and seems to behave no differently than a kernel booting in
>> the other segment, including exhibiting the same bugs.  Like IP27, Octane
>> doesn't have a need for ARCS after the kernel boots, as resetting the system
>> can be done by flipping a bit in HEART, and power down is handled by the RTC
>> driver (this feature broke, though, and I haven't chased down why yet).  So if
>> we're clobbering ARCS using this load address...well, it can't be all that bad
>> </famous-last-words>
>>
>> I'll see what IP27 does, assuming it even has a large enough FreeMemory segment
>> to work with.
>>
>>
>>> Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies.
>>> Trust is futile.  Even if ARC(S) claims something is free I'd rather
>>> not rely on it.
>>
>> Apparently, and only on Octane, ARCS detects and maps out only the first 1GB of
>> RAM.  All remaining RAM installed in the system is marked as FirmwarePermanent
>> and mapped into 0x60000000 on up.
> 
> I think on IP27 it was only the first 32MB that are somehow used by
> ARCS.  Everything else is entirely ignored and the OS is supposed to
> use klconfig to query the hardware configuration.  That said, klconfig
> is an infinitely better than ARCS, it actually works and is easy to
> use.  What it does not provide is information on how firmware or
> other loaded programs are using memory - it's really just a hardware
> inventory.

IIRC, the first 32MB is reserved for use as directory memory on systems with
less than 32 CPUs.  For 32 or more CPUs, I believe you have to start populating
the special directory memory DIMM slots.

This does remind me, though, when I installed a router board I found for cheap,
the kernel, regardless of configuration, wouldn't load at the address defined
in IP27's Platform file, as ARCS said it was too large.  If I can find a larger
ARCS segment to load into, I'll have to test that again as well...

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel
  2017-03-16 20:02         ` Joshua Kinard
@ 2017-03-16 20:50           ` Ralf Baechle
  2017-03-17  3:01             ` Joshua Kinard
  0 siblings, 1 reply; 12+ messages in thread
From: Ralf Baechle @ 2017-03-16 20:50 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: linux-mips

On Thu, Mar 16, 2017 at 04:02:48PM -0400, Joshua Kinard wrote:

> On 03/16/2017 15:06, Ralf Baechle wrote:
> > On Thu, Mar 16, 2017 at 01:50:42PM -0400, Joshua Kinard wrote:
> > 
> >> On 03/16/2017 10:09, Ralf Baechle wrote:
> >>> On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote:
> >>>
> >>>> On 03/15/2017 16:11, Joshua Kinard wrote:
> >>>>> I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a
> >>>>> kernel that can't boot on several SGI platforms.  It turns out that using
> >>>>> arcload (Stan's bootloader originally written for IP30), I can get some
> >>>>> debugging out on why.  I am still puzzled, but maybe this information can be
> >>>>> interpreted by someone else into something meaningful?
> >>>>>
> >>>>> All addresses printed out of arcload are physical address.
> >>>>>
> >>>>> ARCS Memory Map as printed by some debugging I added to the arcload binary:
> >>>>>
> >>>>> 0x00000000 - 0x00001000 ExceptionBlock
> >>>>> 0x00001000 - 0x00002000 SystemParameterBlock
> >>>>> 0x00002000 - 0x00004000 FirmwarePermanent
> >>>>> 0x20004000 - 0x20f00000 FreeMemory***
> >>>>> 0x20f00000 - 0x21000000 FirmwareTemporary
> >>>>> 0x21000000 - 0x5fff0000 FreeMemory
> >>>>> 0x5fff0000 - 0x5ffff000 LoadedProgram
> >>>>> 0x5ffff000 - 0x60000000 FreeMemory
> >>>>> 0x60000000 - 0xa0000000 FirmwarePermanent
> >>>>
> >>>> So it turns out I can get away, on Octane at least, by changing the load
> >>>> address from 0x20004000 to an arbitrary value in the other FreeMemory segment
> >>>> from 0x21000000 - 0x5fff0000.  Specifically, using 0x21004000 appears to work
> >>>> without any ill effects.
> >>>>
> >>>> The 0x20004000 value is the address used by IRIX to load (with symon, it
> >>>> becomes 0x200800000 instead).  I'll have to try this on the IP27 later on as
> >>>> well.  On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking
> >>>> issues yet.  Probably need to hammer the disks with bonnie++ or such.  At least
> >>>> I can get back to the BRIDGE/PCI mess now...
> >>>
> >>> I'm wondering where the ARC stack is on kernel entry if maybe the
> >>> ARC stack has corrupted the kernel?  If possible, can you get your
> >>> kernel or a test program to compute a checksum over itself to see
> >>> if it has been corrupted?
> >>
> >> As far as I can tell, it really does seem that it is a sizing issue.  I don't
> >> have the time to dive into what CONFIG_DEBUG_LOCK_ALLOC is exactly doing, but I
> >> found one hit on LKML (lost the URL) that indicates it fluffs up a particular
> >> struct that is very common and so introduces a fair bit of bloat, and it seems
> >> possible that the 0x20004000-0x20f00000 really is too small.  I wouldn't rule
> >> out the possibility that SGI designed ARCS on the Octane to allow only IRIX to
> >> load at this particular address and Linux has just gotten lucky thus far.
> >>
> >> As for whether loading at the next FreeMemory segment in 0x21000000-0x5fff0000
> >> smashes any ARCS segments, that I am not sure about.  A kernel booting in that
> >> segment does boot, and seems to behave no differently than a kernel booting in
> >> the other segment, including exhibiting the same bugs.  Like IP27, Octane
> >> doesn't have a need for ARCS after the kernel boots, as resetting the system
> >> can be done by flipping a bit in HEART, and power down is handled by the RTC
> >> driver (this feature broke, though, and I haven't chased down why yet).  So if
> >> we're clobbering ARCS using this load address...well, it can't be all that bad
> >> </famous-last-words>
> >>
> >> I'll see what IP27 does, assuming it even has a large enough FreeMemory segment
> >> to work with.
> >>
> >>
> >>> Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies.
> >>> Trust is futile.  Even if ARC(S) claims something is free I'd rather
> >>> not rely on it.
> >>
> >> Apparently, and only on Octane, ARCS detects and maps out only the first 1GB of
> >> RAM.  All remaining RAM installed in the system is marked as FirmwarePermanent
> >> and mapped into 0x60000000 on up.
> > 
> > I think on IP27 it was only the first 32MB that are somehow used by
> > ARCS.  Everything else is entirely ignored and the OS is supposed to
> > use klconfig to query the hardware configuration.  That said, klconfig
> > is an infinitely better than ARCS, it actually works and is easy to
> > use.  What it does not provide is information on how firmware or
> > other loaded programs are using memory - it's really just a hardware
> > inventory.
> 
> IIRC, the first 32MB is reserved for use as directory memory on systems with
> less than 32 CPUs.  For 32 or more CPUs, I believe you have to start populating
> the special directory memory DIMM slots.

Completly wrong.  IP27's special memory modules contain the directory for
each 128 byte S-cache line.  This is similar to how other memory controllers
include an ECC for each line of memory.  The directory size and format of
standard memory modules is sufficient for up to 16 nodes.  Note the limit
is about nodes, not processors.

For larger systems IP27 node boards need to be populated with with
``premium directory DIMMs'' which extend the number of directory bits to
cover systems up to 64 nodes (which would be 128 CPUs).  For the few
systems that exceed even that size (we're talking about > 9 full size
racks!) each of the 64 directory bits represents a node in a particular
128 part of the system or coarse mode where each bit represents eight
nodes, thus allowing for 8 * 64 = 512 nodes = 1024 CPUs.

> This does remind me, though, when I installed a router board I found for cheap,
> the kernel, regardless of configuration, wouldn't load at the address defined
> in IP27's Platform file, as ARCS said it was too large.  If I can find a larger
> ARCS segment to load into, I'll have to test that again as well...

The load address used for the IP27 vmlinux was mindlessly copied from
either sash or vmunix itself.  I'd not call that a scientific method
and I never had access to ARC(S) source.

Your system is large enough to require a router board?  I hope you
got cheap power :)

  Ralf

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel
  2017-03-16 20:50           ` Ralf Baechle
@ 2017-03-17  3:01             ` Joshua Kinard
  2017-03-18 23:42               ` Joshua Kinard
  0 siblings, 1 reply; 12+ messages in thread
From: Joshua Kinard @ 2017-03-17  3:01 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

On 03/16/2017 16:50, Ralf Baechle wrote:
> On Thu, Mar 16, 2017 at 04:02:48PM -0400, Joshua Kinard wrote:
> 
>> On 03/16/2017 15:06, Ralf Baechle wrote:
>>> On Thu, Mar 16, 2017 at 01:50:42PM -0400, Joshua Kinard wrote:
>>>
>>>> On 03/16/2017 10:09, Ralf Baechle wrote:
>>>>> On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote:
>>>>>
>>>>>> On 03/15/2017 16:11, Joshua Kinard wrote:
>>>>>>> I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a
>>>>>>> kernel that can't boot on several SGI platforms.  It turns out that using
>>>>>>> arcload (Stan's bootloader originally written for IP30), I can get some
>>>>>>> debugging out on why.  I am still puzzled, but maybe this information can be
>>>>>>> interpreted by someone else into something meaningful?
>>>>>>>
>>>>>>> All addresses printed out of arcload are physical address.
>>>>>>>
>>>>>>> ARCS Memory Map as printed by some debugging I added to the arcload binary:
>>>>>>>
>>>>>>> 0x00000000 - 0x00001000 ExceptionBlock
>>>>>>> 0x00001000 - 0x00002000 SystemParameterBlock
>>>>>>> 0x00002000 - 0x00004000 FirmwarePermanent
>>>>>>> 0x20004000 - 0x20f00000 FreeMemory***
>>>>>>> 0x20f00000 - 0x21000000 FirmwareTemporary
>>>>>>> 0x21000000 - 0x5fff0000 FreeMemory
>>>>>>> 0x5fff0000 - 0x5ffff000 LoadedProgram
>>>>>>> 0x5ffff000 - 0x60000000 FreeMemory
>>>>>>> 0x60000000 - 0xa0000000 FirmwarePermanent
>>>>>>
>>>>>> So it turns out I can get away, on Octane at least, by changing the load
>>>>>> address from 0x20004000 to an arbitrary value in the other FreeMemory segment
>>>>>> from 0x21000000 - 0x5fff0000.  Specifically, using 0x21004000 appears to work
>>>>>> without any ill effects.
>>>>>>
>>>>>> The 0x20004000 value is the address used by IRIX to load (with symon, it
>>>>>> becomes 0x200800000 instead).  I'll have to try this on the IP27 later on as
>>>>>> well.  On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking
>>>>>> issues yet.  Probably need to hammer the disks with bonnie++ or such.  At least
>>>>>> I can get back to the BRIDGE/PCI mess now...
>>>>>
>>>>> I'm wondering where the ARC stack is on kernel entry if maybe the
>>>>> ARC stack has corrupted the kernel?  If possible, can you get your
>>>>> kernel or a test program to compute a checksum over itself to see
>>>>> if it has been corrupted?
>>>>
>>>> As far as I can tell, it really does seem that it is a sizing issue.  I don't
>>>> have the time to dive into what CONFIG_DEBUG_LOCK_ALLOC is exactly doing, but I
>>>> found one hit on LKML (lost the URL) that indicates it fluffs up a particular
>>>> struct that is very common and so introduces a fair bit of bloat, and it seems
>>>> possible that the 0x20004000-0x20f00000 really is too small.  I wouldn't rule
>>>> out the possibility that SGI designed ARCS on the Octane to allow only IRIX to
>>>> load at this particular address and Linux has just gotten lucky thus far.
>>>>
>>>> As for whether loading at the next FreeMemory segment in 0x21000000-0x5fff0000
>>>> smashes any ARCS segments, that I am not sure about.  A kernel booting in that
>>>> segment does boot, and seems to behave no differently than a kernel booting in
>>>> the other segment, including exhibiting the same bugs.  Like IP27, Octane
>>>> doesn't have a need for ARCS after the kernel boots, as resetting the system
>>>> can be done by flipping a bit in HEART, and power down is handled by the RTC
>>>> driver (this feature broke, though, and I haven't chased down why yet).  So if
>>>> we're clobbering ARCS using this load address...well, it can't be all that bad
>>>> </famous-last-words>
>>>>
>>>> I'll see what IP27 does, assuming it even has a large enough FreeMemory segment
>>>> to work with.
>>>>
>>>>
>>>>> Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies.
>>>>> Trust is futile.  Even if ARC(S) claims something is free I'd rather
>>>>> not rely on it.
>>>>
>>>> Apparently, and only on Octane, ARCS detects and maps out only the first 1GB of
>>>> RAM.  All remaining RAM installed in the system is marked as FirmwarePermanent
>>>> and mapped into 0x60000000 on up.
>>>
>>> I think on IP27 it was only the first 32MB that are somehow used by
>>> ARCS.  Everything else is entirely ignored and the OS is supposed to
>>> use klconfig to query the hardware configuration.  That said, klconfig
>>> is an infinitely better than ARCS, it actually works and is easy to
>>> use.  What it does not provide is information on how firmware or
>>> other loaded programs are using memory - it's really just a hardware
>>> inventory.
>>
>> IIRC, the first 32MB is reserved for use as directory memory on systems with
>> less than 32 CPUs.  For 32 or more CPUs, I believe you have to start populating
>> the special directory memory DIMM slots.
> 
> Completly wrong.  IP27's special memory modules contain the directory for
> each 128 byte S-cache line.  This is similar to how other memory controllers
> include an ECC for each line of memory.  The directory size and format of
> standard memory modules is sufficient for up to 16 nodes.  Note the limit
> is about nodes, not processors.
> 
> For larger systems IP27 node boards need to be populated with with
> ``premium directory DIMMs'' which extend the number of directory bits to
> cover systems up to 64 nodes (which would be 128 CPUs).  For the few
> systems that exceed even that size (we're talking about > 9 full size
> racks!) each of the 64 directory bits represents a node in a particular
> 128 part of the system or coarse mode where each bit represents eight
> nodes, thus allowing for 8 * 64 = 512 nodes = 1024 CPUs.

Ah, good to know!  I've seen in the PROM startup messages where ARCS reserves
the first 32MB for something.  I thought it was for the directory memory stuff.
 Perhaps that's where klconfig's data is stored?  Something to dig into later
one day maybe.


>> This does remind me, though, when I installed a router board I found for cheap,
>> the kernel, regardless of configuration, wouldn't load at the address defined
>> in IP27's Platform file, as ARCS said it was too large.  If I can find a larger
>> ARCS segment to load into, I'll have to test that again as well...
> 
> The load address used for the IP27 vmlinux was mindlessly copied from
> either sash or vmunix itself.  I'd not call that a scientific method
> and I never had access to ARC(S) source.

I'll see if I can find a better load address that can be used then.  I have
headers from IRIX 6.5.31 still around somewhere, and can probably get a memory
map out of ARCS with the same arcload hack I used on IP30.


> Your system is large enough to require a router board?  I hope you
> got cheap power :)

Only two nodes right now.  The router board was ~$20 on eBay, and I thought
it'd be cool to have more blinky lights.  Someday, I might get my hands on a
full rack, though...


-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel
  2017-03-17  3:01             ` Joshua Kinard
@ 2017-03-18 23:42               ` Joshua Kinard
  2017-03-19  7:23                 ` Joshua Kinard
  0 siblings, 1 reply; 12+ messages in thread
From: Joshua Kinard @ 2017-03-18 23:42 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

On 03/16/2017 23:01, Joshua Kinard wrote:
> On 03/16/2017 16:50, Ralf Baechle wrote:
>> On Thu, Mar 16, 2017 at 04:02:48PM -0400, Joshua Kinard wrote:
>>
>>> On 03/16/2017 15:06, Ralf Baechle wrote:
>>>> On Thu, Mar 16, 2017 at 01:50:42PM -0400, Joshua Kinard wrote:
>>>>
>>>>> On 03/16/2017 10:09, Ralf Baechle wrote:
>>>>>> On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote:
>>>>>>
>>>>>>> On 03/15/2017 16:11, Joshua Kinard wrote:
>>>>>>>> I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a
>>>>>>>> kernel that can't boot on several SGI platforms.  It turns out that using
>>>>>>>> arcload (Stan's bootloader originally written for IP30), I can get some
>>>>>>>> debugging out on why.  I am still puzzled, but maybe this information can be
>>>>>>>> interpreted by someone else into something meaningful?
>>>>>>>>
>>>>>>>> All addresses printed out of arcload are physical address.
>>>>>>>>
>>>>>>>> ARCS Memory Map as printed by some debugging I added to the arcload binary:
>>>>>>>>
>>>>>>>> 0x00000000 - 0x00001000 ExceptionBlock
>>>>>>>> 0x00001000 - 0x00002000 SystemParameterBlock
>>>>>>>> 0x00002000 - 0x00004000 FirmwarePermanent
>>>>>>>> 0x20004000 - 0x20f00000 FreeMemory***
>>>>>>>> 0x20f00000 - 0x21000000 FirmwareTemporary
>>>>>>>> 0x21000000 - 0x5fff0000 FreeMemory
>>>>>>>> 0x5fff0000 - 0x5ffff000 LoadedProgram
>>>>>>>> 0x5ffff000 - 0x60000000 FreeMemory
>>>>>>>> 0x60000000 - 0xa0000000 FirmwarePermanent
>>>>>>>
>>>>>>> So it turns out I can get away, on Octane at least, by changing the load
>>>>>>> address from 0x20004000 to an arbitrary value in the other FreeMemory segment
>>>>>>> from 0x21000000 - 0x5fff0000.  Specifically, using 0x21004000 appears to work
>>>>>>> without any ill effects.
>>>>>>>
>>>>>>> The 0x20004000 value is the address used by IRIX to load (with symon, it
>>>>>>> becomes 0x200800000 instead).  I'll have to try this on the IP27 later on as
>>>>>>> well.  On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking
>>>>>>> issues yet.  Probably need to hammer the disks with bonnie++ or such.  At least
>>>>>>> I can get back to the BRIDGE/PCI mess now...
>>>>>>
>>>>>> I'm wondering where the ARC stack is on kernel entry if maybe the
>>>>>> ARC stack has corrupted the kernel?  If possible, can you get your
>>>>>> kernel or a test program to compute a checksum over itself to see
>>>>>> if it has been corrupted?
>>>>>
>>>>> As far as I can tell, it really does seem that it is a sizing issue.  I don't
>>>>> have the time to dive into what CONFIG_DEBUG_LOCK_ALLOC is exactly doing, but I
>>>>> found one hit on LKML (lost the URL) that indicates it fluffs up a particular
>>>>> struct that is very common and so introduces a fair bit of bloat, and it seems
>>>>> possible that the 0x20004000-0x20f00000 really is too small.  I wouldn't rule
>>>>> out the possibility that SGI designed ARCS on the Octane to allow only IRIX to
>>>>> load at this particular address and Linux has just gotten lucky thus far.
>>>>>
>>>>> As for whether loading at the next FreeMemory segment in 0x21000000-0x5fff0000
>>>>> smashes any ARCS segments, that I am not sure about.  A kernel booting in that
>>>>> segment does boot, and seems to behave no differently than a kernel booting in
>>>>> the other segment, including exhibiting the same bugs.  Like IP27, Octane
>>>>> doesn't have a need for ARCS after the kernel boots, as resetting the system
>>>>> can be done by flipping a bit in HEART, and power down is handled by the RTC
>>>>> driver (this feature broke, though, and I haven't chased down why yet).  So if
>>>>> we're clobbering ARCS using this load address...well, it can't be all that bad
>>>>> </famous-last-words>
>>>>>
>>>>> I'll see what IP27 does, assuming it even has a large enough FreeMemory segment
>>>>> to work with.
>>>>>
>>>>>
>>>>>> Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies.
>>>>>> Trust is futile.  Even if ARC(S) claims something is free I'd rather
>>>>>> not rely on it.
>>>>>
>>>>> Apparently, and only on Octane, ARCS detects and maps out only the first 1GB of
>>>>> RAM.  All remaining RAM installed in the system is marked as FirmwarePermanent
>>>>> and mapped into 0x60000000 on up.
>>>>
>>>> I think on IP27 it was only the first 32MB that are somehow used by
>>>> ARCS.  Everything else is entirely ignored and the OS is supposed to
>>>> use klconfig to query the hardware configuration.  That said, klconfig
>>>> is an infinitely better than ARCS, it actually works and is easy to
>>>> use.  What it does not provide is information on how firmware or
>>>> other loaded programs are using memory - it's really just a hardware
>>>> inventory.
>>>
>>> IIRC, the first 32MB is reserved for use as directory memory on systems with
>>> less than 32 CPUs.  For 32 or more CPUs, I believe you have to start populating
>>> the special directory memory DIMM slots.
>>
>> Completly wrong.  IP27's special memory modules contain the directory for
>> each 128 byte S-cache line.  This is similar to how other memory controllers
>> include an ECC for each line of memory.  The directory size and format of
>> standard memory modules is sufficient for up to 16 nodes.  Note the limit
>> is about nodes, not processors.
>>
>> For larger systems IP27 node boards need to be populated with with
>> ``premium directory DIMMs'' which extend the number of directory bits to
>> cover systems up to 64 nodes (which would be 128 CPUs).  For the few
>> systems that exceed even that size (we're talking about > 9 full size
>> racks!) each of the 64 directory bits represents a node in a particular
>> 128 part of the system or coarse mode where each bit represents eight
>> nodes, thus allowing for 8 * 64 = 512 nodes = 1024 CPUs.
> 
> Ah, good to know!  I've seen in the PROM startup messages where ARCS reserves
> the first 32MB for something.  I thought it was for the directory memory stuff.
>  Perhaps that's where klconfig's data is stored?  Something to dig into later
> one day maybe.
> 
> 
>>> This does remind me, though, when I installed a router board I found for cheap,
>>> the kernel, regardless of configuration, wouldn't load at the address defined
>>> in IP27's Platform file, as ARCS said it was too large.  If I can find a larger
>>> ARCS segment to load into, I'll have to test that again as well...
>>
>> The load address used for the IP27 vmlinux was mindlessly copied from
>> either sash or vmunix itself.  I'd not call that a scientific method
>> and I never had access to ARC(S) source.
> 
> I'll see if I can find a better load address that can be used then.  I have
> headers from IRIX 6.5.31 still around somewhere, and can probably get a memory
> map out of ARCS with the same arcload hack I used on IP30.

Futzing around with the load address on IP27 doesn't work the same as on
Octane.  IP27 has a much smaller window of FreeMemory available versus the
Octane, based on this dump I got out of arcload:

ARCS Memory Map
0x0 - 0x1000 (ExceptionBlock)
0x1000 - 0x2000 (SystemParameterBlock)
0x19000 - 0x12f0000 (FreeMemory)
0x12f0000 - 0x12ff000 (LoadedProgram)
0x12ff000 - 0x1300000 (FreeMemory)
0x1300000 - 0x1400000 (FirmwareTemporary)
0x1400000 - 0x1500000 (FreeMemory)
0x1500000 - 0x1800000 (FirmwareTemporary)
0x1800000 - 0x1a00000 (FirmwareTemporary)
0x1a00000 - 0x1b00000 (FirmwarePermanent)
0x1c00000 - 0x1e00000 (FreeMemory)
0x1c01000 - 0x1f66000 (FirmwareTemporary)
0x1f80000 - 0x1fa0000 (FirmwareTemporary)

Going by that, I was finally able to strip a kernel down small enough to
contain both CONFIG_DEBUG_LOCK_ALLOC and the absolute bare minimum
functionality to boot to login on IP27, and I have about ~3.5KB to spare.  The
only thing I've seen thus far after several reboots is a single spinlock lockup
in generic code, but that was on a kernel using my patches, and I couldn't
reproduce it a second time.  So I'm switching to as pure of a mainline kernel
as I can to see if I can trip things up there.

Also trying to get kgdb to work, but something isn't right with it.  Seems like
the kgdboc= boot parameter isn't being parsed/honored, so I have to force it
manually by writing to /sys/module/kgdboc/parameters/kgdboc before the SysRq-g
option becomes available.  I am hoping there's nothing special I need to do to
IOC3 to get a debugger attached and working, but we'll see.  The kdb frontend
appears to be out of the question, as it adds ~6-7KB of extra code.

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel
  2017-03-18 23:42               ` Joshua Kinard
@ 2017-03-19  7:23                 ` Joshua Kinard
  2017-03-19  8:55                   ` Ralf Baechle
  0 siblings, 1 reply; 12+ messages in thread
From: Joshua Kinard @ 2017-03-19  7:23 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

On 03/18/2017 19:42, Joshua Kinard wrote:
> 
> Futzing around with the load address on IP27 doesn't work the same as on
> Octane.  IP27 has a much smaller window of FreeMemory available versus the
> Octane, based on this dump I got out of arcload:
> 
> ARCS Memory Map
> 0x0 - 0x1000 (ExceptionBlock)
> 0x1000 - 0x2000 (SystemParameterBlock)
> 0x19000 - 0x12f0000 (FreeMemory)
> 0x12f0000 - 0x12ff000 (LoadedProgram)
> 0x12ff000 - 0x1300000 (FreeMemory)
> 0x1300000 - 0x1400000 (FirmwareTemporary)
> 0x1400000 - 0x1500000 (FreeMemory)
> 0x1500000 - 0x1800000 (FirmwareTemporary)
> 0x1800000 - 0x1a00000 (FirmwareTemporary)
> 0x1a00000 - 0x1b00000 (FirmwarePermanent)
> 0x1c00000 - 0x1e00000 (FreeMemory)
> 0x1c01000 - 0x1f66000 (FirmwareTemporary)
> 0x1f80000 - 0x1fa0000 (FirmwareTemporary)
> 
> Going by that, I was finally able to strip a kernel down small enough to
> contain both CONFIG_DEBUG_LOCK_ALLOC and the absolute bare minimum
> functionality to boot to login on IP27, and I have about ~3.5KB to spare.  The
> only thing I've seen thus far after several reboots is a single spinlock lockup
> in generic code, but that was on a kernel using my patches, and I couldn't
> reproduce it a second time.  So I'm switching to as pure of a mainline kernel
> as I can to see if I can trip things up there.
> 
> Also trying to get kgdb to work, but something isn't right with it.  Seems like
> the kgdboc= boot parameter isn't being parsed/honored, so I have to force it
> manually by writing to /sys/module/kgdboc/parameters/kgdboc before the SysRq-g
> option becomes available.  I am hoping there's nothing special I need to do to
> IOC3 to get a debugger attached and working, but we'll see.  The kdb frontend
> appears to be out of the question, as it adds ~6-7KB of extra code.

It looks like kgdb won't work with the IOC3 metadriver, but the existing IOC3
code in ioc3-eth.c that handles serial will work.  I was able to get gdb on my
Octane to connect to it, though one has to use ~4800 baud to make it reliable
(could be the 30ft cat5 cable I'm using that dislikes 9600 baud).

Looks like that whatever this deadlock issue is locks the kernel pretty hard,
as even after stopping with SysRq-g and then continuing it via gdb, when the
deadlock happens, I cannot break into the debugger at all.  Even triggering an
NMI via the MSC dumps nothing out of the kernel before the PROM resets.

The closest I've gotten to extracting info on the state of the machine is to
set the MSC debug switches to 0x1018 and then issue an immediate reset to have
it drop into POD dirty-exclusive as soon as possible.  Then running "why"
sometimes nets me a valid kernel address in EPC that tells me where the POD CPU
was last at.  Downside, I have four CPUs and MSC POD locks up if I try
switching to any of the other CPUs.  So I can't get a register dump off of the
other three.

Other interesting note, sometimes when this deadlock happens, a soft reset
doesn't work.  It seems like one of the HUBs is locked up, because the PROM is
unable to communicate with it:

2A 000: Done initializing klconfig.
2A 000: Discovering NUMAlink connectivity .........             DONE
2A 000: Found 2 objects (2 hubs, 0 routers) in 511413 usec
1B 000: Testing/Initializing memory ...............             DONE
2A 000: Waiting for peers to complete discovery....             Reading link 0
(addr 0x92000000
2A 000: 00000004) failed
1B 000: CPU B switching to UALIAS
1B 000: CPU B now running out of UALIAS
2A 000: Reading link 0 (addr 0x9200000000000004) failed
1B 000: Skipping secondary cache diags
1B 000: CPU B switching stack into UALIAS and invalidating D-cache
1B 000: CPU B switching into node 0 cached RAM
1B 000: CPU B running cached
2A 000: Reading link 0 (addr 0x9200000000000004) failed
2A 000: Reading link 0 (addr 0x9200000000000004) failed


Then it gets a general exception and drops to POD Dex:
1B 000: Local Slave : Waiting for my NASID ...
1B 000: CPU B switching to UALIAS
1B 000: CPU B running in UALIAS
1B 000: CPU B Flushing and invalidating caches
1B 000: CPU B switching to node 0 cached RAM
1B 000: CPU B running cached
1A 000:
1A 000: *** General Exception on node 0
1A 000: *** EPC: 0xc00000001fc473dc (0xc00000001fc473dc)
1A 000: *** Press ENTER to continue.
1A 000: POD MSC Dex>

If this is a hardware lock up, that might explain why kgdb isn't useful at that
point.  POD lets me dump the CRBs and PI error spool, but I'm not sure how
useful that information is w/o SGI's internal documents.

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel
  2017-03-19  7:23                 ` Joshua Kinard
@ 2017-03-19  8:55                   ` Ralf Baechle
  2017-03-21 21:52                     ` Joshua Kinard
  0 siblings, 1 reply; 12+ messages in thread
From: Ralf Baechle @ 2017-03-19  8:55 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: linux-mips

On Sun, Mar 19, 2017 at 03:23:39AM -0400, Joshua Kinard wrote:

> The closest I've gotten to extracting info on the state of the machine is to
> set the MSC debug switches to 0x1018 and then issue an immediate reset to have
> it drop into POD dirty-exclusive as soon as possible.  Then running "why"
> sometimes nets me a valid kernel address in EPC that tells me where the POD CPU
> was last at.  Downside, I have four CPUs and MSC POD locks up if I try
> switching to any of the other CPUs.  So I can't get a register dump off of the
> other three.

Have you tried to send an NMI fro the MSC?  The PoD debugger is actually
a fairly handy tool in such cases.

> 2A 000: Done initializing klconfig.
> 2A 000: Discovering NUMAlink connectivity .........             DONE
> 2A 000: Found 2 objects (2 hubs, 0 routers) in 511413 usec
> 1B 000: Testing/Initializing memory ...............             DONE
> 2A 000: Waiting for peers to complete discovery....             Reading link 0
> (addr 0x92000000
> 2A 000: 00000004) failed
> 1B 000: CPU B switching to UALIAS
> 1B 000: CPU B now running out of UALIAS
> 2A 000: Reading link 0 (addr 0x9200000000000004) failed
> 1B 000: Skipping secondary cache diags
> 1B 000: CPU B switching stack into UALIAS and invalidating D-cache
> 1B 000: CPU B switching into node 0 cached RAM
> 1B 000: CPU B running cached
> 2A 000: Reading link 0 (addr 0x9200000000000004) failed
> 2A 000: Reading link 0 (addr 0x9200000000000004) failed

I thought that kind of messages was indicating a hardware issue.

> Then it gets a general exception and drops to POD Dex:
> 1B 000: Local Slave : Waiting for my NASID ...
> 1B 000: CPU B switching to UALIAS
> 1B 000: CPU B running in UALIAS
> 1B 000: CPU B Flushing and invalidating caches
> 1B 000: CPU B switching to node 0 cached RAM
> 1B 000: CPU B running cached
> 1A 000:
> 1A 000: *** General Exception on node 0
> 1A 000: *** EPC: 0xc00000001fc473dc (0xc00000001fc473dc)
> 1A 000: *** Press ENTER to continue.
> 1A 000: POD MSC Dex>
> 
> If this is a hardware lock up, that might explain why kgdb isn't useful at that
> point.  POD lets me dump the CRBs and PI error spool, but I'm not sure how
> useful that information is w/o SGI's internal documents.

I still haven't forgotten everything (I hope) so maybe you could post that
information anyway just to use the small chance there ight be something
useful in there?

  Ralf

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel
  2017-03-19  8:55                   ` Ralf Baechle
@ 2017-03-21 21:52                     ` Joshua Kinard
  0 siblings, 0 replies; 12+ messages in thread
From: Joshua Kinard @ 2017-03-21 21:52 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

On 03/19/2017 04:55, Ralf Baechle wrote:
> On Sun, Mar 19, 2017 at 03:23:39AM -0400, Joshua Kinard wrote:
> 
>> The closest I've gotten to extracting info on the state of the machine is to
>> set the MSC debug switches to 0x1018 and then issue an immediate reset to have
>> it drop into POD dirty-exclusive as soon as possible.  Then running "why"
>> sometimes nets me a valid kernel address in EPC that tells me where the POD CPU
>> was last at.  Downside, I have four CPUs and MSC POD locks up if I try
>> switching to any of the other CPUs.  So I can't get a register dump off of the
>> other three.
> 
> Have you tried to send an NMI fro the MSC?  The PoD debugger is actually
> a fairly handy tool in such cases.

Oddly enough, sending NMI's from POD lets me switch CPUs to execute "why"
off of.  The POD "cpu" command doesn't appear to work right.  It'll just
lock up that console instead.  This allows me to grab the address in EPC
from all four CPUs after a lock up and that's how I've been doing my
debugging all weekend.


>> 2A 000: Done initializing klconfig.
>> 2A 000: Discovering NUMAlink connectivity .........             DONE
>> 2A 000: Found 2 objects (2 hubs, 0 routers) in 511413 usec
>> 1B 000: Testing/Initializing memory ...............             DONE
>> 2A 000: Waiting for peers to complete discovery....             Reading link 0
>> (addr 0x92000000
>> 2A 000: 00000004) failed
>> 1B 000: CPU B switching to UALIAS
>> 1B 000: CPU B now running out of UALIAS
>> 2A 000: Reading link 0 (addr 0x9200000000000004) failed
>> 1B 000: Skipping secondary cache diags
>> 1B 000: CPU B switching stack into UALIAS and invalidating D-cache
>> 1B 000: CPU B switching into node 0 cached RAM
>> 1B 000: CPU B running cached
>> 2A 000: Reading link 0 (addr 0x9200000000000004) failed
>> 2A 000: Reading link 0 (addr 0x9200000000000004) failed
> 
> I thought that kind of messages was indicating a hardware issue.

As far as I can tell, it doesn't appear to be.  A cold boot usually
resolves this issue.  It happens randomly, and not at all on Sunday.


>> Then it gets a general exception and drops to POD Dex:
>> 1B 000: Local Slave : Waiting for my NASID ...
>> 1B 000: CPU B switching to UALIAS
>> 1B 000: CPU B running in UALIAS
>> 1B 000: CPU B Flushing and invalidating caches
>> 1B 000: CPU B switching to node 0 cached RAM
>> 1B 000: CPU B running cached
>> 1A 000:
>> 1A 000: *** General Exception on node 0
>> 1A 000: *** EPC: 0xc00000001fc473dc (0xc00000001fc473dc)
>> 1A 000: *** Press ENTER to continue.
>> 1A 000: POD MSC Dex>
>>
>> If this is a hardware lock up, that might explain why kgdb isn't useful at that
>> point.  POD lets me dump the CRBs and PI error spool, but I'm not sure how
>> useful that information is w/o SGI's internal documents.
> 
> I still haven't forgotten everything (I hope) so maybe you could post that
> information anyway just to use the small chance there ight be something
> useful in there?

I hope you haven't forgotten everything.  How many people left actually
still care about about these platforms? :)

That said, CRBs didn't contain anything useful.  All initialized to zero,
except CRB D, which only held 0x00000000000000ff in all 15 of its
registers.

So that said, here's what I spent all of Sunday and a little of Monday
night doing:

1. Boot into userland, allow md to finish any resyncs, delete the dummy
file "FOO" being created, sync

2. Re-run this dd command:
dd if=/dev/urandom of=/usr/space/bonnie++/FOO bs=1M count=24000 status=progress

If that command actually completes, it takes about ~35-40mins.  In a couple
of instances, it actually finished, but in most instances, the system would
lock up anywhere from 5 seconds after launching to 25mins later.  I even
made a video capture of my two serial windows and ssh console when the
system locks up that I can put onto Youtube if interested.

Once the system locked up, I:

1. Switched to the MSC.
2. Set debug bits 0x1018 to force POD Dex mode.
3. Reset the system.
4. Run "why" on the first CPU to come up and record the value of EPC.
5. Issue an NMI to the other CPU on that node to force it into POD
6. Run "why" and record the value of EPC.
7. Switch to IOC3 console.
8. Issue an NMI to the other node to force it into POD on MSC.
9. Run "why" and record EPC.
10. Repeat for last CPU.
11. Clear debug bits, reset system, repeat all over again.

I don't know why issuing the NMI's works to switch the active CPU in POD
around.  POD's own "cpu" command just locks the console up when you try to
switch CPUs.  Not sure if it's a bug in the POD software I have in my PROM
image (which is 6.156, the latest SGI made AFAIK).

I've also determined that EPC is the only register that's reliable to hold
data left over from the lockup.  The PROM scrambles the remaining
registers.  I verified this by periodically recompiling my kernel with
different options or a different compiler version to make sure the function
addresses were different, and EPC would still point at the addresses of
very specific functions when the system locked up.

Using the EPC values, I looked those up in GDB for each cycle and kept a
listing of where all four CPUs were last at when the machine stopped.  
Going by that, I've compiled a non-exhaustive list of the most common
functions, one or more of which might be where the underlying problem is
at:

mem_serial_in (./arch/mips/include/asm/io.h:428)
ioc3_mdio_read (drivers/net/ethernet/sgi/ioc3-eth.c:478)
arch_local_irq_save (arch/mips/lib/mips-atomic.c:66)
arch_local_irq_restore (arch/mips/lib/mips-atomic.c:109)
do_idle (kernel/sched/idle.c:154)
hub_rt_read (arch/mips/sgi-ip27/ip27-timer.c:145)
spin_dump (kernel/locking/spinlock_debug.c:58)

The ones that stand out the most to me are arch_local_irq_restore and
spin_dump.  This is what GDB said about spin_dump:

(gdb) l *0xa800000000096ba0
0xa800000000096ba0 is in spin_dump (kernel/locking/spinlock_debug.c:56).
51
52      static void spin_dump(raw_spinlock_t *lock, const char *msg)
53      {
54              struct task_struct *owner = NULL;
55
56              if (lock->owner && lock->owner != SPINLOCK_OWNER_INIT)
57                      owner = lock->owner;
58              printk(KERN_EMERG "BUG: spinlock %s on CPU#%d, %s/%d\n",
59                      msg, raw_smp_processor_id(),
60                      current->comm, task_pid_nr(current));

Which tells me the system was about to report a spinlock lockup somewhere
before it halted.  I tried setting a kgdb breakpoint on line 56, but the
machine is halting before it can actually execute the breakpoint, as in
one of my runs, one of the CPUs was literally in the middle of executing
"nmi_handler" from genex.S, and I only saw this when I had an active
breakpoint set.

So I think my suspicion of this being a spinlock lockup is on the right
path, and it's grave enough that it halts the entire machine.  Problem is,
we're locking up so fast that existing kernel infrastructure can't report
//what// spinlock is the one deadlocking.  I can't rely on leftover data
in the CPU's own registers, and I don't know if it's possible to dump
kernel memory from POD mode in a way that can be decoded into something
useful to trace down what happened.

---

The other standout function is arch_local_irq_restore.  This is what that
looks like in GDB:

0xa800000000325148 is in arch_local_irq_restore (arch/mips/lib/mips-atomic.c:109).
104             "       .set    pop                                             \n"
105             : [flags] "=r" (__tmp1)
106             : "0" (flags)
107             : "memory");
108
109             preempt_enable();
110     }
111     EXPORT_SYMBOL(arch_local_irq_restore);
112
113     #endif /* !CONFIG_CPU_MIPSR2 && !CONFIG_CPU_MIPSR6 */

Or the asm dump:
a800000000325130 <arch_local_irq_restore>:
a800000000325130:       40016000        mfc0    at,$12
a800000000325134:       30840001        andi    a0,a0,0x1
a800000000325138:       3421001f        ori     at,at,0x1f
a80000000032513c:       3821001f        xori    at,at,0x1f
a800000000325140:       00812025        or      a0,a0,at
a800000000325144:       40846000        mtc0    a0,$12
a800000000325148:       03e00008        jr      ra
a80000000032514c:       00000000        nop

Each time the machine has locked up and EPC points at
arch_local_irq_restore, it's always been on the address of that "ja"
instruction, regardless of which CPU I got that EPC value off of.  But I
don't know if that instruction is responsible for the lock up (I doubt it).

I am also noticing that sometimes when a lockup happens, one or more of the
other CPUs is in the middle of a read[bwlq]/write[bwlq] operation.  In a
test I just ran before writing this e-mail, two of the CPUs were in the
middle of read/write tasks (this is from a kernel with my patchset on top,
so this function is from the IOC3 metadriver, but the same problem has been
seen in mem_serial_in from a vanilla lmo git tree):

CPU 2B:
0xa8000000003a6138 is in ioc3_serial_in (./arch/mips/include/asm/io.h:428).
423                                                                             \
424     __BUILD_MEMORY_PFX(__raw_, bwlq, type)                                  \
425     __BUILD_MEMORY_PFX(, bwlq, type)                                        \
426     __BUILD_MEMORY_PFX(__mem_, bwlq, type)                                  \
427
428     BUILDIO_MEM(b, u8)
429     BUILDIO_MEM(w, u16)
430     BUILDIO_MEM(l, u32)
431     BUILDIO_MEM(q, u64)
432

a8000000003a6120 <ioc3_serial_in>:
a8000000003a6120:       908200b1        lbu     v0,177(a0)
a8000000003a6124:       dc830020        ld      v1,32(a0)
a8000000003a6128:       00452804        sllv    a1,a1,v0
a8000000003a612c:       38a50003        xori    a1,a1,0x3
a8000000003a6130:       0065282d        daddu   a1,v1,a1
a8000000003a6134:       38a50003        xori    a1,a1,0x3
a8000000003a6138:       90a20000        lbu     v0,0(a1)
a8000000003a613c:       03e00008        jr      ra
a8000000003a6140:       304200ff        andi    v0,v0,0xff
a8000000003a6144:       00000000        nop


CPU 2A:
0xa8000000003e6410 is in qla1280_queuecommand (./arch/mips/include/asm/io.h:429).
424     __BUILD_MEMORY_PFX(__raw_, bwlq, type)                                  \
425     __BUILD_MEMORY_PFX(, bwlq, type)                                        \
426     __BUILD_MEMORY_PFX(__mem_, bwlq, type)                                  \
427
428     BUILDIO_MEM(b, u8)
429     BUILDIO_MEM(w, u16)
430     BUILDIO_MEM(l, u32)
431     BUILDIO_MEM(q, u64)
432
433     #define __BUILD_IOPORT_PFX(bus, bwlq, type)                             \

a8000000003e6360 <qla1280_queuecommand>:
a8000000003e6360:       67bdff90        daddiu  sp,sp,-112
a8000000003e6364:       ffbf0068        sd      ra,104(sp)
...
a8000000003e6404:       97c41728        lhu     a0,5928(s8)
a8000000003e6408:       64630078        daddiu  v1,v1,120
a8000000003e640c:       38630002        xori    v1,v1,0x2
a8000000003e6410:       94650000        lhu     a1,0(v1)
a8000000003e6414:       30a5ffff        andi    a1,a1,0xffff
a8000000003e6418:       0085182a        slt     v1,a0,a1
a8000000003e641c:       14600002        bnez    v1,a8000000003e6428 <qla1280_queuecommand+0xc8>

---

I've also tried dumping the HUB error state after reset by setting the debug bits to 0x1000, and that says this:

Erecting partition fences ................                        DONE
nasid 0 Reading peer hub nasid: 0x9200000021600000
xbow_update: updating slave nasid 1 on link 10
Update config for routers connected to hubs
Update config for hubs and hubless routers
CPU A flushing cache
 Hardware Error State at System Reset
 +  Errors on node Nasid 0x0 (0)
 +    IP27 in /hw/module/1/slot/n2
 +      HUB signalled following errors.
 +        HUB error interrupt register: 0x200000
 +          21: CPU A received uncorrectable error during uncached load
 End Hardware Error State (at System Reset)

But I'm not 100% sure I can trust that, since I believe I've seen it after
a clean reboot from userland as well.  Both vanilla lmo git and my patched
tree throw a panic if the kernel gets a HUB error interrupt.  I have
noticed that sometimes, when the system panics, it can halt the machine
before any panic info can be written out to the console.  But given
multiple runs, there's still a chance one of the panics will output
something.  Given the number of times I've crashed this machine lately, by
now, I should have gotten *something* out of it.  So I am discounting it
being a HUB error interrupt locking things up.  Just to be sure, I replaced
the panic() call with a basic printk(), and still didn't get any output on
the console.

I'll also have to enable the heavy diagnostics mode and let the machine
check itself, and maybe run bist from POD to be really sure it's not a
hardware issue.  I've got spare node boards in a closest that I can swap
the CPU PIMMS on if I suspect something's wrong with one of the HUBs.

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens
us.  And our lives slip away, moment by moment, lost in that vast,
terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-03-21 21:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-15 20:11 ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel Joshua Kinard
2017-03-16  3:50 ` Joshua Kinard
2017-03-16 14:09   ` Ralf Baechle
2017-03-16 17:50     ` Joshua Kinard
2017-03-16 19:06       ` Ralf Baechle
2017-03-16 20:02         ` Joshua Kinard
2017-03-16 20:50           ` Ralf Baechle
2017-03-17  3:01             ` Joshua Kinard
2017-03-18 23:42               ` Joshua Kinard
2017-03-19  7:23                 ` Joshua Kinard
2017-03-19  8:55                   ` Ralf Baechle
2017-03-21 21:52                     ` Joshua Kinard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).