* ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel @ 2017-03-15 20:11 Joshua Kinard 2017-03-16 3:50 ` Joshua Kinard 0 siblings, 1 reply; 12+ messages in thread From: Joshua Kinard @ 2017-03-15 20:11 UTC (permalink / raw) To: Linux/MIPS I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a kernel that can't boot on several SGI platforms. It turns out that using arcload (Stan's bootloader originally written for IP30), I can get some debugging out on why. I am still puzzled, but maybe this information can be interpreted by someone else into something meaningful? All addresses printed out of arcload are physical address. ARCS Memory Map as printed by some debugging I added to the arcload binary: 0x00000000 - 0x00001000 ExceptionBlock 0x00001000 - 0x00002000 SystemParameterBlock 0x00002000 - 0x00004000 FirmwarePermanent 0x20004000 - 0x20f00000 FreeMemory*** 0x20f00000 - 0x21000000 FirmwareTemporary 0x21000000 - 0x5fff0000 FreeMemory 0x5fff0000 - 0x5ffff000 LoadedProgram 0x5ffff000 - 0x60000000 FreeMemory 0x60000000 - 0xa0000000 FirmwarePermanent The ***'ed FreeMemory segment is where the kernel is supposed to load. Here's the debugging for a kernel WITHOUT CONFIG_DEBUG_LOCK_ALLOC enabled (4102norm): ELF Start: 0x20004000 Elf End : 0x20a6fdd0 Size : 0x00a6bdd0 (~10MB?) # ls -l 4102norm -rwxr-xr-x 1 root root 28M Mar 15 15:12 4102norm* And the debugging kernel compiled with CONFIG_DEBUG_LOCK_ALLOC=y (no other config changes from above): ELF Start: 0x20004000 Elf End : 0x2148bf80 Size : 0x01487f80 (~20MB?) # ls -l 4102dbg -rwxr-xr-x 1 root root 29M Mar 15 15:21 4102dbg* I am only using the traditional "vmlinux" make target, so there shouldn't be any compression involved here. Yet, it looks like, according to ARCS anyways, that CONFIG_DEBUG_LOCK_ALLOC is adding an additional 10MB of "something", yet the vmlinux file only grows by roughly 1MB. If I examine both kernels with readelf and dump the program headers, I can see these two sizes reflected under "MemSiz": # mips64-unknown-linux-gnu-readelf -l 4102norm Elf file type is EXEC (Executable file) Entry point 0xa800000020700450 There are 2 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000000004000 0xa800000020004000 0xa800000020004000 0x00000000009a5030 0x0000000000a6bdd0 RWE 10000 NOTE 0x0000000000714bb0 0xa800000020714bb0 0xa800000020714bb0 0x0000000000000024 0x0000000000000024 R 4 # mips64-unknown-linux-gnu-readelf -l 4102dbg Elf file type is EXEC (Executable file) Entry point 0xa800000020749c80 There are 2 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000000004000 0xa800000020004000 0xa800000020004000 0x0000000000a05850 0x0000000001487f80 RWE 10000 NOTE 0x000000000075e330 0xa80000002075e330 0xa80000002075e330 0x0000000000000024 0x0000000000000024 R 4 So I'm not quite certain why ARCS or arcload dislike kernels with CONFIG_DEBUG_LOCK_ALLOC=y. This issue is known about on at least IP27 and IP30 platforms for the past few years, and it's been quite a hindrance in doing any debugging of locks. -- Joshua Kinard Gentoo/MIPS kumba@gentoo.org 6144R/F5C6C943 2015-04-27 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel 2017-03-15 20:11 ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel Joshua Kinard @ 2017-03-16 3:50 ` Joshua Kinard 2017-03-16 14:09 ` Ralf Baechle 0 siblings, 1 reply; 12+ messages in thread From: Joshua Kinard @ 2017-03-16 3:50 UTC (permalink / raw) To: linux-mips On 03/15/2017 16:11, Joshua Kinard wrote: > I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a > kernel that can't boot on several SGI platforms. It turns out that using > arcload (Stan's bootloader originally written for IP30), I can get some > debugging out on why. I am still puzzled, but maybe this information can be > interpreted by someone else into something meaningful? > > All addresses printed out of arcload are physical address. > > ARCS Memory Map as printed by some debugging I added to the arcload binary: > > 0x00000000 - 0x00001000 ExceptionBlock > 0x00001000 - 0x00002000 SystemParameterBlock > 0x00002000 - 0x00004000 FirmwarePermanent > 0x20004000 - 0x20f00000 FreeMemory*** > 0x20f00000 - 0x21000000 FirmwareTemporary > 0x21000000 - 0x5fff0000 FreeMemory > 0x5fff0000 - 0x5ffff000 LoadedProgram > 0x5ffff000 - 0x60000000 FreeMemory > 0x60000000 - 0xa0000000 FirmwarePermanent So it turns out I can get away, on Octane at least, by changing the load address from 0x20004000 to an arbitrary value in the other FreeMemory segment from 0x21000000 - 0x5fff0000. Specifically, using 0x21004000 appears to work without any ill effects. The 0x20004000 value is the address used by IRIX to load (with symon, it becomes 0x200800000 instead). I'll have to try this on the IP27 later on as well. On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking issues yet. Probably need to hammer the disks with bonnie++ or such. At least I can get back to the BRIDGE/PCI mess now... -- Joshua Kinard Gentoo/MIPS kumba@gentoo.org 6144R/F5C6C943 2015-04-27 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel 2017-03-16 3:50 ` Joshua Kinard @ 2017-03-16 14:09 ` Ralf Baechle 2017-03-16 17:50 ` Joshua Kinard 0 siblings, 1 reply; 12+ messages in thread From: Ralf Baechle @ 2017-03-16 14:09 UTC (permalink / raw) To: Joshua Kinard; +Cc: linux-mips On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote: > On 03/15/2017 16:11, Joshua Kinard wrote: > > I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a > > kernel that can't boot on several SGI platforms. It turns out that using > > arcload (Stan's bootloader originally written for IP30), I can get some > > debugging out on why. I am still puzzled, but maybe this information can be > > interpreted by someone else into something meaningful? > > > > All addresses printed out of arcload are physical address. > > > > ARCS Memory Map as printed by some debugging I added to the arcload binary: > > > > 0x00000000 - 0x00001000 ExceptionBlock > > 0x00001000 - 0x00002000 SystemParameterBlock > > 0x00002000 - 0x00004000 FirmwarePermanent > > 0x20004000 - 0x20f00000 FreeMemory*** > > 0x20f00000 - 0x21000000 FirmwareTemporary > > 0x21000000 - 0x5fff0000 FreeMemory > > 0x5fff0000 - 0x5ffff000 LoadedProgram > > 0x5ffff000 - 0x60000000 FreeMemory > > 0x60000000 - 0xa0000000 FirmwarePermanent > > So it turns out I can get away, on Octane at least, by changing the load > address from 0x20004000 to an arbitrary value in the other FreeMemory segment > from 0x21000000 - 0x5fff0000. Specifically, using 0x21004000 appears to work > without any ill effects. > > The 0x20004000 value is the address used by IRIX to load (with symon, it > becomes 0x200800000 instead). I'll have to try this on the IP27 later on as > well. On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking > issues yet. Probably need to hammer the disks with bonnie++ or such. At least > I can get back to the BRIDGE/PCI mess now... I'm wondering where the ARC stack is on kernel entry if maybe the ARC stack has corrupted the kernel? If possible, can you get your kernel or a test program to compute a checksum over itself to see if it has been corrupted? Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies. Trust is futile. Even if ARC(S) claims something is free I'd rather not rely on it. Ralf ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel 2017-03-16 14:09 ` Ralf Baechle @ 2017-03-16 17:50 ` Joshua Kinard 2017-03-16 19:06 ` Ralf Baechle 0 siblings, 1 reply; 12+ messages in thread From: Joshua Kinard @ 2017-03-16 17:50 UTC (permalink / raw) To: Ralf Baechle; +Cc: linux-mips On 03/16/2017 10:09, Ralf Baechle wrote: > On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote: > >> On 03/15/2017 16:11, Joshua Kinard wrote: >>> I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a >>> kernel that can't boot on several SGI platforms. It turns out that using >>> arcload (Stan's bootloader originally written for IP30), I can get some >>> debugging out on why. I am still puzzled, but maybe this information can be >>> interpreted by someone else into something meaningful? >>> >>> All addresses printed out of arcload are physical address. >>> >>> ARCS Memory Map as printed by some debugging I added to the arcload binary: >>> >>> 0x00000000 - 0x00001000 ExceptionBlock >>> 0x00001000 - 0x00002000 SystemParameterBlock >>> 0x00002000 - 0x00004000 FirmwarePermanent >>> 0x20004000 - 0x20f00000 FreeMemory*** >>> 0x20f00000 - 0x21000000 FirmwareTemporary >>> 0x21000000 - 0x5fff0000 FreeMemory >>> 0x5fff0000 - 0x5ffff000 LoadedProgram >>> 0x5ffff000 - 0x60000000 FreeMemory >>> 0x60000000 - 0xa0000000 FirmwarePermanent >> >> So it turns out I can get away, on Octane at least, by changing the load >> address from 0x20004000 to an arbitrary value in the other FreeMemory segment >> from 0x21000000 - 0x5fff0000. Specifically, using 0x21004000 appears to work >> without any ill effects. >> >> The 0x20004000 value is the address used by IRIX to load (with symon, it >> becomes 0x200800000 instead). I'll have to try this on the IP27 later on as >> well. On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking >> issues yet. Probably need to hammer the disks with bonnie++ or such. At least >> I can get back to the BRIDGE/PCI mess now... > > I'm wondering where the ARC stack is on kernel entry if maybe the > ARC stack has corrupted the kernel? If possible, can you get your > kernel or a test program to compute a checksum over itself to see > if it has been corrupted? As far as I can tell, it really does seem that it is a sizing issue. I don't have the time to dive into what CONFIG_DEBUG_LOCK_ALLOC is exactly doing, but I found one hit on LKML (lost the URL) that indicates it fluffs up a particular struct that is very common and so introduces a fair bit of bloat, and it seems possible that the 0x20004000-0x20f00000 really is too small. I wouldn't rule out the possibility that SGI designed ARCS on the Octane to allow only IRIX to load at this particular address and Linux has just gotten lucky thus far. As for whether loading at the next FreeMemory segment in 0x21000000-0x5fff0000 smashes any ARCS segments, that I am not sure about. A kernel booting in that segment does boot, and seems to behave no differently than a kernel booting in the other segment, including exhibiting the same bugs. Like IP27, Octane doesn't have a need for ARCS after the kernel boots, as resetting the system can be done by flipping a bit in HEART, and power down is handled by the RTC driver (this feature broke, though, and I haven't chased down why yet). So if we're clobbering ARCS using this load address...well, it can't be all that bad </famous-last-words> I'll see what IP27 does, assuming it even has a large enough FreeMemory segment to work with. > Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies. > Trust is futile. Even if ARC(S) claims something is free I'd rather > not rely on it. Apparently, and only on Octane, ARCS detects and maps out only the first 1GB of RAM. All remaining RAM installed in the system is marked as FirmwarePermanent and mapped into 0x60000000 on up. -- Joshua Kinard Gentoo/MIPS kumba@gentoo.org 6144R/F5C6C943 2015-04-27 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel 2017-03-16 17:50 ` Joshua Kinard @ 2017-03-16 19:06 ` Ralf Baechle 2017-03-16 20:02 ` Joshua Kinard 0 siblings, 1 reply; 12+ messages in thread From: Ralf Baechle @ 2017-03-16 19:06 UTC (permalink / raw) To: Joshua Kinard; +Cc: linux-mips On Thu, Mar 16, 2017 at 01:50:42PM -0400, Joshua Kinard wrote: > On 03/16/2017 10:09, Ralf Baechle wrote: > > On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote: > > > >> On 03/15/2017 16:11, Joshua Kinard wrote: > >>> I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a > >>> kernel that can't boot on several SGI platforms. It turns out that using > >>> arcload (Stan's bootloader originally written for IP30), I can get some > >>> debugging out on why. I am still puzzled, but maybe this information can be > >>> interpreted by someone else into something meaningful? > >>> > >>> All addresses printed out of arcload are physical address. > >>> > >>> ARCS Memory Map as printed by some debugging I added to the arcload binary: > >>> > >>> 0x00000000 - 0x00001000 ExceptionBlock > >>> 0x00001000 - 0x00002000 SystemParameterBlock > >>> 0x00002000 - 0x00004000 FirmwarePermanent > >>> 0x20004000 - 0x20f00000 FreeMemory*** > >>> 0x20f00000 - 0x21000000 FirmwareTemporary > >>> 0x21000000 - 0x5fff0000 FreeMemory > >>> 0x5fff0000 - 0x5ffff000 LoadedProgram > >>> 0x5ffff000 - 0x60000000 FreeMemory > >>> 0x60000000 - 0xa0000000 FirmwarePermanent > >> > >> So it turns out I can get away, on Octane at least, by changing the load > >> address from 0x20004000 to an arbitrary value in the other FreeMemory segment > >> from 0x21000000 - 0x5fff0000. Specifically, using 0x21004000 appears to work > >> without any ill effects. > >> > >> The 0x20004000 value is the address used by IRIX to load (with symon, it > >> becomes 0x200800000 instead). I'll have to try this on the IP27 later on as > >> well. On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking > >> issues yet. Probably need to hammer the disks with bonnie++ or such. At least > >> I can get back to the BRIDGE/PCI mess now... > > > > I'm wondering where the ARC stack is on kernel entry if maybe the > > ARC stack has corrupted the kernel? If possible, can you get your > > kernel or a test program to compute a checksum over itself to see > > if it has been corrupted? > > As far as I can tell, it really does seem that it is a sizing issue. I don't > have the time to dive into what CONFIG_DEBUG_LOCK_ALLOC is exactly doing, but I > found one hit on LKML (lost the URL) that indicates it fluffs up a particular > struct that is very common and so introduces a fair bit of bloat, and it seems > possible that the 0x20004000-0x20f00000 really is too small. I wouldn't rule > out the possibility that SGI designed ARCS on the Octane to allow only IRIX to > load at this particular address and Linux has just gotten lucky thus far. > > As for whether loading at the next FreeMemory segment in 0x21000000-0x5fff0000 > smashes any ARCS segments, that I am not sure about. A kernel booting in that > segment does boot, and seems to behave no differently than a kernel booting in > the other segment, including exhibiting the same bugs. Like IP27, Octane > doesn't have a need for ARCS after the kernel boots, as resetting the system > can be done by flipping a bit in HEART, and power down is handled by the RTC > driver (this feature broke, though, and I haven't chased down why yet). So if > we're clobbering ARCS using this load address...well, it can't be all that bad > </famous-last-words> > > I'll see what IP27 does, assuming it even has a large enough FreeMemory segment > to work with. > > > > Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies. > > Trust is futile. Even if ARC(S) claims something is free I'd rather > > not rely on it. > > Apparently, and only on Octane, ARCS detects and maps out only the first 1GB of > RAM. All remaining RAM installed in the system is marked as FirmwarePermanent > and mapped into 0x60000000 on up. I think on IP27 it was only the first 32MB that are somehow used by ARCS. Everything else is entirely ignored and the OS is supposed to use klconfig to query the hardware configuration. That said, klconfig is an infinitely better than ARCS, it actually works and is easy to use. What it does not provide is information on how firmware or other loaded programs are using memory - it's really just a hardware inventory. Ralf ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel 2017-03-16 19:06 ` Ralf Baechle @ 2017-03-16 20:02 ` Joshua Kinard 2017-03-16 20:50 ` Ralf Baechle 0 siblings, 1 reply; 12+ messages in thread From: Joshua Kinard @ 2017-03-16 20:02 UTC (permalink / raw) To: Ralf Baechle; +Cc: linux-mips On 03/16/2017 15:06, Ralf Baechle wrote: > On Thu, Mar 16, 2017 at 01:50:42PM -0400, Joshua Kinard wrote: > >> On 03/16/2017 10:09, Ralf Baechle wrote: >>> On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote: >>> >>>> On 03/15/2017 16:11, Joshua Kinard wrote: >>>>> I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a >>>>> kernel that can't boot on several SGI platforms. It turns out that using >>>>> arcload (Stan's bootloader originally written for IP30), I can get some >>>>> debugging out on why. I am still puzzled, but maybe this information can be >>>>> interpreted by someone else into something meaningful? >>>>> >>>>> All addresses printed out of arcload are physical address. >>>>> >>>>> ARCS Memory Map as printed by some debugging I added to the arcload binary: >>>>> >>>>> 0x00000000 - 0x00001000 ExceptionBlock >>>>> 0x00001000 - 0x00002000 SystemParameterBlock >>>>> 0x00002000 - 0x00004000 FirmwarePermanent >>>>> 0x20004000 - 0x20f00000 FreeMemory*** >>>>> 0x20f00000 - 0x21000000 FirmwareTemporary >>>>> 0x21000000 - 0x5fff0000 FreeMemory >>>>> 0x5fff0000 - 0x5ffff000 LoadedProgram >>>>> 0x5ffff000 - 0x60000000 FreeMemory >>>>> 0x60000000 - 0xa0000000 FirmwarePermanent >>>> >>>> So it turns out I can get away, on Octane at least, by changing the load >>>> address from 0x20004000 to an arbitrary value in the other FreeMemory segment >>>> from 0x21000000 - 0x5fff0000. Specifically, using 0x21004000 appears to work >>>> without any ill effects. >>>> >>>> The 0x20004000 value is the address used by IRIX to load (with symon, it >>>> becomes 0x200800000 instead). I'll have to try this on the IP27 later on as >>>> well. On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking >>>> issues yet. Probably need to hammer the disks with bonnie++ or such. At least >>>> I can get back to the BRIDGE/PCI mess now... >>> >>> I'm wondering where the ARC stack is on kernel entry if maybe the >>> ARC stack has corrupted the kernel? If possible, can you get your >>> kernel or a test program to compute a checksum over itself to see >>> if it has been corrupted? >> >> As far as I can tell, it really does seem that it is a sizing issue. I don't >> have the time to dive into what CONFIG_DEBUG_LOCK_ALLOC is exactly doing, but I >> found one hit on LKML (lost the URL) that indicates it fluffs up a particular >> struct that is very common and so introduces a fair bit of bloat, and it seems >> possible that the 0x20004000-0x20f00000 really is too small. I wouldn't rule >> out the possibility that SGI designed ARCS on the Octane to allow only IRIX to >> load at this particular address and Linux has just gotten lucky thus far. >> >> As for whether loading at the next FreeMemory segment in 0x21000000-0x5fff0000 >> smashes any ARCS segments, that I am not sure about. A kernel booting in that >> segment does boot, and seems to behave no differently than a kernel booting in >> the other segment, including exhibiting the same bugs. Like IP27, Octane >> doesn't have a need for ARCS after the kernel boots, as resetting the system >> can be done by flipping a bit in HEART, and power down is handled by the RTC >> driver (this feature broke, though, and I haven't chased down why yet). So if >> we're clobbering ARCS using this load address...well, it can't be all that bad >> </famous-last-words> >> >> I'll see what IP27 does, assuming it even has a large enough FreeMemory segment >> to work with. >> >> >>> Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies. >>> Trust is futile. Even if ARC(S) claims something is free I'd rather >>> not rely on it. >> >> Apparently, and only on Octane, ARCS detects and maps out only the first 1GB of >> RAM. All remaining RAM installed in the system is marked as FirmwarePermanent >> and mapped into 0x60000000 on up. > > I think on IP27 it was only the first 32MB that are somehow used by > ARCS. Everything else is entirely ignored and the OS is supposed to > use klconfig to query the hardware configuration. That said, klconfig > is an infinitely better than ARCS, it actually works and is easy to > use. What it does not provide is information on how firmware or > other loaded programs are using memory - it's really just a hardware > inventory. IIRC, the first 32MB is reserved for use as directory memory on systems with less than 32 CPUs. For 32 or more CPUs, I believe you have to start populating the special directory memory DIMM slots. This does remind me, though, when I installed a router board I found for cheap, the kernel, regardless of configuration, wouldn't load at the address defined in IP27's Platform file, as ARCS said it was too large. If I can find a larger ARCS segment to load into, I'll have to test that again as well... -- Joshua Kinard Gentoo/MIPS kumba@gentoo.org 6144R/F5C6C943 2015-04-27 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel 2017-03-16 20:02 ` Joshua Kinard @ 2017-03-16 20:50 ` Ralf Baechle 2017-03-17 3:01 ` Joshua Kinard 0 siblings, 1 reply; 12+ messages in thread From: Ralf Baechle @ 2017-03-16 20:50 UTC (permalink / raw) To: Joshua Kinard; +Cc: linux-mips On Thu, Mar 16, 2017 at 04:02:48PM -0400, Joshua Kinard wrote: > On 03/16/2017 15:06, Ralf Baechle wrote: > > On Thu, Mar 16, 2017 at 01:50:42PM -0400, Joshua Kinard wrote: > > > >> On 03/16/2017 10:09, Ralf Baechle wrote: > >>> On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote: > >>> > >>>> On 03/15/2017 16:11, Joshua Kinard wrote: > >>>>> I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a > >>>>> kernel that can't boot on several SGI platforms. It turns out that using > >>>>> arcload (Stan's bootloader originally written for IP30), I can get some > >>>>> debugging out on why. I am still puzzled, but maybe this information can be > >>>>> interpreted by someone else into something meaningful? > >>>>> > >>>>> All addresses printed out of arcload are physical address. > >>>>> > >>>>> ARCS Memory Map as printed by some debugging I added to the arcload binary: > >>>>> > >>>>> 0x00000000 - 0x00001000 ExceptionBlock > >>>>> 0x00001000 - 0x00002000 SystemParameterBlock > >>>>> 0x00002000 - 0x00004000 FirmwarePermanent > >>>>> 0x20004000 - 0x20f00000 FreeMemory*** > >>>>> 0x20f00000 - 0x21000000 FirmwareTemporary > >>>>> 0x21000000 - 0x5fff0000 FreeMemory > >>>>> 0x5fff0000 - 0x5ffff000 LoadedProgram > >>>>> 0x5ffff000 - 0x60000000 FreeMemory > >>>>> 0x60000000 - 0xa0000000 FirmwarePermanent > >>>> > >>>> So it turns out I can get away, on Octane at least, by changing the load > >>>> address from 0x20004000 to an arbitrary value in the other FreeMemory segment > >>>> from 0x21000000 - 0x5fff0000. Specifically, using 0x21004000 appears to work > >>>> without any ill effects. > >>>> > >>>> The 0x20004000 value is the address used by IRIX to load (with symon, it > >>>> becomes 0x200800000 instead). I'll have to try this on the IP27 later on as > >>>> well. On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking > >>>> issues yet. Probably need to hammer the disks with bonnie++ or such. At least > >>>> I can get back to the BRIDGE/PCI mess now... > >>> > >>> I'm wondering where the ARC stack is on kernel entry if maybe the > >>> ARC stack has corrupted the kernel? If possible, can you get your > >>> kernel or a test program to compute a checksum over itself to see > >>> if it has been corrupted? > >> > >> As far as I can tell, it really does seem that it is a sizing issue. I don't > >> have the time to dive into what CONFIG_DEBUG_LOCK_ALLOC is exactly doing, but I > >> found one hit on LKML (lost the URL) that indicates it fluffs up a particular > >> struct that is very common and so introduces a fair bit of bloat, and it seems > >> possible that the 0x20004000-0x20f00000 really is too small. I wouldn't rule > >> out the possibility that SGI designed ARCS on the Octane to allow only IRIX to > >> load at this particular address and Linux has just gotten lucky thus far. > >> > >> As for whether loading at the next FreeMemory segment in 0x21000000-0x5fff0000 > >> smashes any ARCS segments, that I am not sure about. A kernel booting in that > >> segment does boot, and seems to behave no differently than a kernel booting in > >> the other segment, including exhibiting the same bugs. Like IP27, Octane > >> doesn't have a need for ARCS after the kernel boots, as resetting the system > >> can be done by flipping a bit in HEART, and power down is handled by the RTC > >> driver (this feature broke, though, and I haven't chased down why yet). So if > >> we're clobbering ARCS using this load address...well, it can't be all that bad > >> </famous-last-words> > >> > >> I'll see what IP27 does, assuming it even has a large enough FreeMemory segment > >> to work with. > >> > >> > >>> Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies. > >>> Trust is futile. Even if ARC(S) claims something is free I'd rather > >>> not rely on it. > >> > >> Apparently, and only on Octane, ARCS detects and maps out only the first 1GB of > >> RAM. All remaining RAM installed in the system is marked as FirmwarePermanent > >> and mapped into 0x60000000 on up. > > > > I think on IP27 it was only the first 32MB that are somehow used by > > ARCS. Everything else is entirely ignored and the OS is supposed to > > use klconfig to query the hardware configuration. That said, klconfig > > is an infinitely better than ARCS, it actually works and is easy to > > use. What it does not provide is information on how firmware or > > other loaded programs are using memory - it's really just a hardware > > inventory. > > IIRC, the first 32MB is reserved for use as directory memory on systems with > less than 32 CPUs. For 32 or more CPUs, I believe you have to start populating > the special directory memory DIMM slots. Completly wrong. IP27's special memory modules contain the directory for each 128 byte S-cache line. This is similar to how other memory controllers include an ECC for each line of memory. The directory size and format of standard memory modules is sufficient for up to 16 nodes. Note the limit is about nodes, not processors. For larger systems IP27 node boards need to be populated with with ``premium directory DIMMs'' which extend the number of directory bits to cover systems up to 64 nodes (which would be 128 CPUs). For the few systems that exceed even that size (we're talking about > 9 full size racks!) each of the 64 directory bits represents a node in a particular 128 part of the system or coarse mode where each bit represents eight nodes, thus allowing for 8 * 64 = 512 nodes = 1024 CPUs. > This does remind me, though, when I installed a router board I found for cheap, > the kernel, regardless of configuration, wouldn't load at the address defined > in IP27's Platform file, as ARCS said it was too large. If I can find a larger > ARCS segment to load into, I'll have to test that again as well... The load address used for the IP27 vmlinux was mindlessly copied from either sash or vmunix itself. I'd not call that a scientific method and I never had access to ARC(S) source. Your system is large enough to require a router board? I hope you got cheap power :) Ralf ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel 2017-03-16 20:50 ` Ralf Baechle @ 2017-03-17 3:01 ` Joshua Kinard 2017-03-18 23:42 ` Joshua Kinard 0 siblings, 1 reply; 12+ messages in thread From: Joshua Kinard @ 2017-03-17 3:01 UTC (permalink / raw) To: Ralf Baechle; +Cc: linux-mips On 03/16/2017 16:50, Ralf Baechle wrote: > On Thu, Mar 16, 2017 at 04:02:48PM -0400, Joshua Kinard wrote: > >> On 03/16/2017 15:06, Ralf Baechle wrote: >>> On Thu, Mar 16, 2017 at 01:50:42PM -0400, Joshua Kinard wrote: >>> >>>> On 03/16/2017 10:09, Ralf Baechle wrote: >>>>> On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote: >>>>> >>>>>> On 03/15/2017 16:11, Joshua Kinard wrote: >>>>>>> I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a >>>>>>> kernel that can't boot on several SGI platforms. It turns out that using >>>>>>> arcload (Stan's bootloader originally written for IP30), I can get some >>>>>>> debugging out on why. I am still puzzled, but maybe this information can be >>>>>>> interpreted by someone else into something meaningful? >>>>>>> >>>>>>> All addresses printed out of arcload are physical address. >>>>>>> >>>>>>> ARCS Memory Map as printed by some debugging I added to the arcload binary: >>>>>>> >>>>>>> 0x00000000 - 0x00001000 ExceptionBlock >>>>>>> 0x00001000 - 0x00002000 SystemParameterBlock >>>>>>> 0x00002000 - 0x00004000 FirmwarePermanent >>>>>>> 0x20004000 - 0x20f00000 FreeMemory*** >>>>>>> 0x20f00000 - 0x21000000 FirmwareTemporary >>>>>>> 0x21000000 - 0x5fff0000 FreeMemory >>>>>>> 0x5fff0000 - 0x5ffff000 LoadedProgram >>>>>>> 0x5ffff000 - 0x60000000 FreeMemory >>>>>>> 0x60000000 - 0xa0000000 FirmwarePermanent >>>>>> >>>>>> So it turns out I can get away, on Octane at least, by changing the load >>>>>> address from 0x20004000 to an arbitrary value in the other FreeMemory segment >>>>>> from 0x21000000 - 0x5fff0000. Specifically, using 0x21004000 appears to work >>>>>> without any ill effects. >>>>>> >>>>>> The 0x20004000 value is the address used by IRIX to load (with symon, it >>>>>> becomes 0x200800000 instead). I'll have to try this on the IP27 later on as >>>>>> well. On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking >>>>>> issues yet. Probably need to hammer the disks with bonnie++ or such. At least >>>>>> I can get back to the BRIDGE/PCI mess now... >>>>> >>>>> I'm wondering where the ARC stack is on kernel entry if maybe the >>>>> ARC stack has corrupted the kernel? If possible, can you get your >>>>> kernel or a test program to compute a checksum over itself to see >>>>> if it has been corrupted? >>>> >>>> As far as I can tell, it really does seem that it is a sizing issue. I don't >>>> have the time to dive into what CONFIG_DEBUG_LOCK_ALLOC is exactly doing, but I >>>> found one hit on LKML (lost the URL) that indicates it fluffs up a particular >>>> struct that is very common and so introduces a fair bit of bloat, and it seems >>>> possible that the 0x20004000-0x20f00000 really is too small. I wouldn't rule >>>> out the possibility that SGI designed ARCS on the Octane to allow only IRIX to >>>> load at this particular address and Linux has just gotten lucky thus far. >>>> >>>> As for whether loading at the next FreeMemory segment in 0x21000000-0x5fff0000 >>>> smashes any ARCS segments, that I am not sure about. A kernel booting in that >>>> segment does boot, and seems to behave no differently than a kernel booting in >>>> the other segment, including exhibiting the same bugs. Like IP27, Octane >>>> doesn't have a need for ARCS after the kernel boots, as resetting the system >>>> can be done by flipping a bit in HEART, and power down is handled by the RTC >>>> driver (this feature broke, though, and I haven't chased down why yet). So if >>>> we're clobbering ARCS using this load address...well, it can't be all that bad >>>> </famous-last-words> >>>> >>>> I'll see what IP27 does, assuming it even has a large enough FreeMemory segment >>>> to work with. >>>> >>>> >>>>> Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies. >>>>> Trust is futile. Even if ARC(S) claims something is free I'd rather >>>>> not rely on it. >>>> >>>> Apparently, and only on Octane, ARCS detects and maps out only the first 1GB of >>>> RAM. All remaining RAM installed in the system is marked as FirmwarePermanent >>>> and mapped into 0x60000000 on up. >>> >>> I think on IP27 it was only the first 32MB that are somehow used by >>> ARCS. Everything else is entirely ignored and the OS is supposed to >>> use klconfig to query the hardware configuration. That said, klconfig >>> is an infinitely better than ARCS, it actually works and is easy to >>> use. What it does not provide is information on how firmware or >>> other loaded programs are using memory - it's really just a hardware >>> inventory. >> >> IIRC, the first 32MB is reserved for use as directory memory on systems with >> less than 32 CPUs. For 32 or more CPUs, I believe you have to start populating >> the special directory memory DIMM slots. > > Completly wrong. IP27's special memory modules contain the directory for > each 128 byte S-cache line. This is similar to how other memory controllers > include an ECC for each line of memory. The directory size and format of > standard memory modules is sufficient for up to 16 nodes. Note the limit > is about nodes, not processors. > > For larger systems IP27 node boards need to be populated with with > ``premium directory DIMMs'' which extend the number of directory bits to > cover systems up to 64 nodes (which would be 128 CPUs). For the few > systems that exceed even that size (we're talking about > 9 full size > racks!) each of the 64 directory bits represents a node in a particular > 128 part of the system or coarse mode where each bit represents eight > nodes, thus allowing for 8 * 64 = 512 nodes = 1024 CPUs. Ah, good to know! I've seen in the PROM startup messages where ARCS reserves the first 32MB for something. I thought it was for the directory memory stuff. Perhaps that's where klconfig's data is stored? Something to dig into later one day maybe. >> This does remind me, though, when I installed a router board I found for cheap, >> the kernel, regardless of configuration, wouldn't load at the address defined >> in IP27's Platform file, as ARCS said it was too large. If I can find a larger >> ARCS segment to load into, I'll have to test that again as well... > > The load address used for the IP27 vmlinux was mindlessly copied from > either sash or vmunix itself. I'd not call that a scientific method > and I never had access to ARC(S) source. I'll see if I can find a better load address that can be used then. I have headers from IRIX 6.5.31 still around somewhere, and can probably get a memory map out of ARCS with the same arcload hack I used on IP30. > Your system is large enough to require a router board? I hope you > got cheap power :) Only two nodes right now. The router board was ~$20 on eBay, and I thought it'd be cool to have more blinky lights. Someday, I might get my hands on a full rack, though... -- Joshua Kinard Gentoo/MIPS kumba@gentoo.org 6144R/F5C6C943 2015-04-27 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel 2017-03-17 3:01 ` Joshua Kinard @ 2017-03-18 23:42 ` Joshua Kinard 2017-03-19 7:23 ` Joshua Kinard 0 siblings, 1 reply; 12+ messages in thread From: Joshua Kinard @ 2017-03-18 23:42 UTC (permalink / raw) To: Ralf Baechle; +Cc: linux-mips On 03/16/2017 23:01, Joshua Kinard wrote: > On 03/16/2017 16:50, Ralf Baechle wrote: >> On Thu, Mar 16, 2017 at 04:02:48PM -0400, Joshua Kinard wrote: >> >>> On 03/16/2017 15:06, Ralf Baechle wrote: >>>> On Thu, Mar 16, 2017 at 01:50:42PM -0400, Joshua Kinard wrote: >>>> >>>>> On 03/16/2017 10:09, Ralf Baechle wrote: >>>>>> On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote: >>>>>> >>>>>>> On 03/15/2017 16:11, Joshua Kinard wrote: >>>>>>>> I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a >>>>>>>> kernel that can't boot on several SGI platforms. It turns out that using >>>>>>>> arcload (Stan's bootloader originally written for IP30), I can get some >>>>>>>> debugging out on why. I am still puzzled, but maybe this information can be >>>>>>>> interpreted by someone else into something meaningful? >>>>>>>> >>>>>>>> All addresses printed out of arcload are physical address. >>>>>>>> >>>>>>>> ARCS Memory Map as printed by some debugging I added to the arcload binary: >>>>>>>> >>>>>>>> 0x00000000 - 0x00001000 ExceptionBlock >>>>>>>> 0x00001000 - 0x00002000 SystemParameterBlock >>>>>>>> 0x00002000 - 0x00004000 FirmwarePermanent >>>>>>>> 0x20004000 - 0x20f00000 FreeMemory*** >>>>>>>> 0x20f00000 - 0x21000000 FirmwareTemporary >>>>>>>> 0x21000000 - 0x5fff0000 FreeMemory >>>>>>>> 0x5fff0000 - 0x5ffff000 LoadedProgram >>>>>>>> 0x5ffff000 - 0x60000000 FreeMemory >>>>>>>> 0x60000000 - 0xa0000000 FirmwarePermanent >>>>>>> >>>>>>> So it turns out I can get away, on Octane at least, by changing the load >>>>>>> address from 0x20004000 to an arbitrary value in the other FreeMemory segment >>>>>>> from 0x21000000 - 0x5fff0000. Specifically, using 0x21004000 appears to work >>>>>>> without any ill effects. >>>>>>> >>>>>>> The 0x20004000 value is the address used by IRIX to load (with symon, it >>>>>>> becomes 0x200800000 instead). I'll have to try this on the IP27 later on as >>>>>>> well. On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking >>>>>>> issues yet. Probably need to hammer the disks with bonnie++ or such. At least >>>>>>> I can get back to the BRIDGE/PCI mess now... >>>>>> >>>>>> I'm wondering where the ARC stack is on kernel entry if maybe the >>>>>> ARC stack has corrupted the kernel? If possible, can you get your >>>>>> kernel or a test program to compute a checksum over itself to see >>>>>> if it has been corrupted? >>>>> >>>>> As far as I can tell, it really does seem that it is a sizing issue. I don't >>>>> have the time to dive into what CONFIG_DEBUG_LOCK_ALLOC is exactly doing, but I >>>>> found one hit on LKML (lost the URL) that indicates it fluffs up a particular >>>>> struct that is very common and so introduces a fair bit of bloat, and it seems >>>>> possible that the 0x20004000-0x20f00000 really is too small. I wouldn't rule >>>>> out the possibility that SGI designed ARCS on the Octane to allow only IRIX to >>>>> load at this particular address and Linux has just gotten lucky thus far. >>>>> >>>>> As for whether loading at the next FreeMemory segment in 0x21000000-0x5fff0000 >>>>> smashes any ARCS segments, that I am not sure about. A kernel booting in that >>>>> segment does boot, and seems to behave no differently than a kernel booting in >>>>> the other segment, including exhibiting the same bugs. Like IP27, Octane >>>>> doesn't have a need for ARCS after the kernel boots, as resetting the system >>>>> can be done by flipping a bit in HEART, and power down is handled by the RTC >>>>> driver (this feature broke, though, and I haven't chased down why yet). So if >>>>> we're clobbering ARCS using this load address...well, it can't be all that bad >>>>> </famous-last-words> >>>>> >>>>> I'll see what IP27 does, assuming it even has a large enough FreeMemory segment >>>>> to work with. >>>>> >>>>> >>>>>> Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies. >>>>>> Trust is futile. Even if ARC(S) claims something is free I'd rather >>>>>> not rely on it. >>>>> >>>>> Apparently, and only on Octane, ARCS detects and maps out only the first 1GB of >>>>> RAM. All remaining RAM installed in the system is marked as FirmwarePermanent >>>>> and mapped into 0x60000000 on up. >>>> >>>> I think on IP27 it was only the first 32MB that are somehow used by >>>> ARCS. Everything else is entirely ignored and the OS is supposed to >>>> use klconfig to query the hardware configuration. That said, klconfig >>>> is an infinitely better than ARCS, it actually works and is easy to >>>> use. What it does not provide is information on how firmware or >>>> other loaded programs are using memory - it's really just a hardware >>>> inventory. >>> >>> IIRC, the first 32MB is reserved for use as directory memory on systems with >>> less than 32 CPUs. For 32 or more CPUs, I believe you have to start populating >>> the special directory memory DIMM slots. >> >> Completly wrong. IP27's special memory modules contain the directory for >> each 128 byte S-cache line. This is similar to how other memory controllers >> include an ECC for each line of memory. The directory size and format of >> standard memory modules is sufficient for up to 16 nodes. Note the limit >> is about nodes, not processors. >> >> For larger systems IP27 node boards need to be populated with with >> ``premium directory DIMMs'' which extend the number of directory bits to >> cover systems up to 64 nodes (which would be 128 CPUs). For the few >> systems that exceed even that size (we're talking about > 9 full size >> racks!) each of the 64 directory bits represents a node in a particular >> 128 part of the system or coarse mode where each bit represents eight >> nodes, thus allowing for 8 * 64 = 512 nodes = 1024 CPUs. > > Ah, good to know! I've seen in the PROM startup messages where ARCS reserves > the first 32MB for something. I thought it was for the directory memory stuff. > Perhaps that's where klconfig's data is stored? Something to dig into later > one day maybe. > > >>> This does remind me, though, when I installed a router board I found for cheap, >>> the kernel, regardless of configuration, wouldn't load at the address defined >>> in IP27's Platform file, as ARCS said it was too large. If I can find a larger >>> ARCS segment to load into, I'll have to test that again as well... >> >> The load address used for the IP27 vmlinux was mindlessly copied from >> either sash or vmunix itself. I'd not call that a scientific method >> and I never had access to ARC(S) source. > > I'll see if I can find a better load address that can be used then. I have > headers from IRIX 6.5.31 still around somewhere, and can probably get a memory > map out of ARCS with the same arcload hack I used on IP30. Futzing around with the load address on IP27 doesn't work the same as on Octane. IP27 has a much smaller window of FreeMemory available versus the Octane, based on this dump I got out of arcload: ARCS Memory Map 0x0 - 0x1000 (ExceptionBlock) 0x1000 - 0x2000 (SystemParameterBlock) 0x19000 - 0x12f0000 (FreeMemory) 0x12f0000 - 0x12ff000 (LoadedProgram) 0x12ff000 - 0x1300000 (FreeMemory) 0x1300000 - 0x1400000 (FirmwareTemporary) 0x1400000 - 0x1500000 (FreeMemory) 0x1500000 - 0x1800000 (FirmwareTemporary) 0x1800000 - 0x1a00000 (FirmwareTemporary) 0x1a00000 - 0x1b00000 (FirmwarePermanent) 0x1c00000 - 0x1e00000 (FreeMemory) 0x1c01000 - 0x1f66000 (FirmwareTemporary) 0x1f80000 - 0x1fa0000 (FirmwareTemporary) Going by that, I was finally able to strip a kernel down small enough to contain both CONFIG_DEBUG_LOCK_ALLOC and the absolute bare minimum functionality to boot to login on IP27, and I have about ~3.5KB to spare. The only thing I've seen thus far after several reboots is a single spinlock lockup in generic code, but that was on a kernel using my patches, and I couldn't reproduce it a second time. So I'm switching to as pure of a mainline kernel as I can to see if I can trip things up there. Also trying to get kgdb to work, but something isn't right with it. Seems like the kgdboc= boot parameter isn't being parsed/honored, so I have to force it manually by writing to /sys/module/kgdboc/parameters/kgdboc before the SysRq-g option becomes available. I am hoping there's nothing special I need to do to IOC3 to get a debugger attached and working, but we'll see. The kdb frontend appears to be out of the question, as it adds ~6-7KB of extra code. -- Joshua Kinard Gentoo/MIPS kumba@gentoo.org 6144R/F5C6C943 2015-04-27 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel 2017-03-18 23:42 ` Joshua Kinard @ 2017-03-19 7:23 ` Joshua Kinard 2017-03-19 8:55 ` Ralf Baechle 0 siblings, 1 reply; 12+ messages in thread From: Joshua Kinard @ 2017-03-19 7:23 UTC (permalink / raw) To: Ralf Baechle; +Cc: linux-mips On 03/18/2017 19:42, Joshua Kinard wrote: > > Futzing around with the load address on IP27 doesn't work the same as on > Octane. IP27 has a much smaller window of FreeMemory available versus the > Octane, based on this dump I got out of arcload: > > ARCS Memory Map > 0x0 - 0x1000 (ExceptionBlock) > 0x1000 - 0x2000 (SystemParameterBlock) > 0x19000 - 0x12f0000 (FreeMemory) > 0x12f0000 - 0x12ff000 (LoadedProgram) > 0x12ff000 - 0x1300000 (FreeMemory) > 0x1300000 - 0x1400000 (FirmwareTemporary) > 0x1400000 - 0x1500000 (FreeMemory) > 0x1500000 - 0x1800000 (FirmwareTemporary) > 0x1800000 - 0x1a00000 (FirmwareTemporary) > 0x1a00000 - 0x1b00000 (FirmwarePermanent) > 0x1c00000 - 0x1e00000 (FreeMemory) > 0x1c01000 - 0x1f66000 (FirmwareTemporary) > 0x1f80000 - 0x1fa0000 (FirmwareTemporary) > > Going by that, I was finally able to strip a kernel down small enough to > contain both CONFIG_DEBUG_LOCK_ALLOC and the absolute bare minimum > functionality to boot to login on IP27, and I have about ~3.5KB to spare. The > only thing I've seen thus far after several reboots is a single spinlock lockup > in generic code, but that was on a kernel using my patches, and I couldn't > reproduce it a second time. So I'm switching to as pure of a mainline kernel > as I can to see if I can trip things up there. > > Also trying to get kgdb to work, but something isn't right with it. Seems like > the kgdboc= boot parameter isn't being parsed/honored, so I have to force it > manually by writing to /sys/module/kgdboc/parameters/kgdboc before the SysRq-g > option becomes available. I am hoping there's nothing special I need to do to > IOC3 to get a debugger attached and working, but we'll see. The kdb frontend > appears to be out of the question, as it adds ~6-7KB of extra code. It looks like kgdb won't work with the IOC3 metadriver, but the existing IOC3 code in ioc3-eth.c that handles serial will work. I was able to get gdb on my Octane to connect to it, though one has to use ~4800 baud to make it reliable (could be the 30ft cat5 cable I'm using that dislikes 9600 baud). Looks like that whatever this deadlock issue is locks the kernel pretty hard, as even after stopping with SysRq-g and then continuing it via gdb, when the deadlock happens, I cannot break into the debugger at all. Even triggering an NMI via the MSC dumps nothing out of the kernel before the PROM resets. The closest I've gotten to extracting info on the state of the machine is to set the MSC debug switches to 0x1018 and then issue an immediate reset to have it drop into POD dirty-exclusive as soon as possible. Then running "why" sometimes nets me a valid kernel address in EPC that tells me where the POD CPU was last at. Downside, I have four CPUs and MSC POD locks up if I try switching to any of the other CPUs. So I can't get a register dump off of the other three. Other interesting note, sometimes when this deadlock happens, a soft reset doesn't work. It seems like one of the HUBs is locked up, because the PROM is unable to communicate with it: 2A 000: Done initializing klconfig. 2A 000: Discovering NUMAlink connectivity ......... DONE 2A 000: Found 2 objects (2 hubs, 0 routers) in 511413 usec 1B 000: Testing/Initializing memory ............... DONE 2A 000: Waiting for peers to complete discovery.... Reading link 0 (addr 0x92000000 2A 000: 00000004) failed 1B 000: CPU B switching to UALIAS 1B 000: CPU B now running out of UALIAS 2A 000: Reading link 0 (addr 0x9200000000000004) failed 1B 000: Skipping secondary cache diags 1B 000: CPU B switching stack into UALIAS and invalidating D-cache 1B 000: CPU B switching into node 0 cached RAM 1B 000: CPU B running cached 2A 000: Reading link 0 (addr 0x9200000000000004) failed 2A 000: Reading link 0 (addr 0x9200000000000004) failed Then it gets a general exception and drops to POD Dex: 1B 000: Local Slave : Waiting for my NASID ... 1B 000: CPU B switching to UALIAS 1B 000: CPU B running in UALIAS 1B 000: CPU B Flushing and invalidating caches 1B 000: CPU B switching to node 0 cached RAM 1B 000: CPU B running cached 1A 000: 1A 000: *** General Exception on node 0 1A 000: *** EPC: 0xc00000001fc473dc (0xc00000001fc473dc) 1A 000: *** Press ENTER to continue. 1A 000: POD MSC Dex> If this is a hardware lock up, that might explain why kgdb isn't useful at that point. POD lets me dump the CRBs and PI error spool, but I'm not sure how useful that information is w/o SGI's internal documents. -- Joshua Kinard Gentoo/MIPS kumba@gentoo.org 6144R/F5C6C943 2015-04-27 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel 2017-03-19 7:23 ` Joshua Kinard @ 2017-03-19 8:55 ` Ralf Baechle 2017-03-21 21:52 ` Joshua Kinard 0 siblings, 1 reply; 12+ messages in thread From: Ralf Baechle @ 2017-03-19 8:55 UTC (permalink / raw) To: Joshua Kinard; +Cc: linux-mips On Sun, Mar 19, 2017 at 03:23:39AM -0400, Joshua Kinard wrote: > The closest I've gotten to extracting info on the state of the machine is to > set the MSC debug switches to 0x1018 and then issue an immediate reset to have > it drop into POD dirty-exclusive as soon as possible. Then running "why" > sometimes nets me a valid kernel address in EPC that tells me where the POD CPU > was last at. Downside, I have four CPUs and MSC POD locks up if I try > switching to any of the other CPUs. So I can't get a register dump off of the > other three. Have you tried to send an NMI fro the MSC? The PoD debugger is actually a fairly handy tool in such cases. > 2A 000: Done initializing klconfig. > 2A 000: Discovering NUMAlink connectivity ......... DONE > 2A 000: Found 2 objects (2 hubs, 0 routers) in 511413 usec > 1B 000: Testing/Initializing memory ............... DONE > 2A 000: Waiting for peers to complete discovery.... Reading link 0 > (addr 0x92000000 > 2A 000: 00000004) failed > 1B 000: CPU B switching to UALIAS > 1B 000: CPU B now running out of UALIAS > 2A 000: Reading link 0 (addr 0x9200000000000004) failed > 1B 000: Skipping secondary cache diags > 1B 000: CPU B switching stack into UALIAS and invalidating D-cache > 1B 000: CPU B switching into node 0 cached RAM > 1B 000: CPU B running cached > 2A 000: Reading link 0 (addr 0x9200000000000004) failed > 2A 000: Reading link 0 (addr 0x9200000000000004) failed I thought that kind of messages was indicating a hardware issue. > Then it gets a general exception and drops to POD Dex: > 1B 000: Local Slave : Waiting for my NASID ... > 1B 000: CPU B switching to UALIAS > 1B 000: CPU B running in UALIAS > 1B 000: CPU B Flushing and invalidating caches > 1B 000: CPU B switching to node 0 cached RAM > 1B 000: CPU B running cached > 1A 000: > 1A 000: *** General Exception on node 0 > 1A 000: *** EPC: 0xc00000001fc473dc (0xc00000001fc473dc) > 1A 000: *** Press ENTER to continue. > 1A 000: POD MSC Dex> > > If this is a hardware lock up, that might explain why kgdb isn't useful at that > point. POD lets me dump the CRBs and PI error spool, but I'm not sure how > useful that information is w/o SGI's internal documents. I still haven't forgotten everything (I hope) so maybe you could post that information anyway just to use the small chance there ight be something useful in there? Ralf ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel 2017-03-19 8:55 ` Ralf Baechle @ 2017-03-21 21:52 ` Joshua Kinard 0 siblings, 0 replies; 12+ messages in thread From: Joshua Kinard @ 2017-03-21 21:52 UTC (permalink / raw) To: Ralf Baechle; +Cc: linux-mips On 03/19/2017 04:55, Ralf Baechle wrote: > On Sun, Mar 19, 2017 at 03:23:39AM -0400, Joshua Kinard wrote: > >> The closest I've gotten to extracting info on the state of the machine is to >> set the MSC debug switches to 0x1018 and then issue an immediate reset to have >> it drop into POD dirty-exclusive as soon as possible. Then running "why" >> sometimes nets me a valid kernel address in EPC that tells me where the POD CPU >> was last at. Downside, I have four CPUs and MSC POD locks up if I try >> switching to any of the other CPUs. So I can't get a register dump off of the >> other three. > > Have you tried to send an NMI fro the MSC? The PoD debugger is actually > a fairly handy tool in such cases. Oddly enough, sending NMI's from POD lets me switch CPUs to execute "why" off of. The POD "cpu" command doesn't appear to work right. It'll just lock up that console instead. This allows me to grab the address in EPC from all four CPUs after a lock up and that's how I've been doing my debugging all weekend. >> 2A 000: Done initializing klconfig. >> 2A 000: Discovering NUMAlink connectivity ......... DONE >> 2A 000: Found 2 objects (2 hubs, 0 routers) in 511413 usec >> 1B 000: Testing/Initializing memory ............... DONE >> 2A 000: Waiting for peers to complete discovery.... Reading link 0 >> (addr 0x92000000 >> 2A 000: 00000004) failed >> 1B 000: CPU B switching to UALIAS >> 1B 000: CPU B now running out of UALIAS >> 2A 000: Reading link 0 (addr 0x9200000000000004) failed >> 1B 000: Skipping secondary cache diags >> 1B 000: CPU B switching stack into UALIAS and invalidating D-cache >> 1B 000: CPU B switching into node 0 cached RAM >> 1B 000: CPU B running cached >> 2A 000: Reading link 0 (addr 0x9200000000000004) failed >> 2A 000: Reading link 0 (addr 0x9200000000000004) failed > > I thought that kind of messages was indicating a hardware issue. As far as I can tell, it doesn't appear to be. A cold boot usually resolves this issue. It happens randomly, and not at all on Sunday. >> Then it gets a general exception and drops to POD Dex: >> 1B 000: Local Slave : Waiting for my NASID ... >> 1B 000: CPU B switching to UALIAS >> 1B 000: CPU B running in UALIAS >> 1B 000: CPU B Flushing and invalidating caches >> 1B 000: CPU B switching to node 0 cached RAM >> 1B 000: CPU B running cached >> 1A 000: >> 1A 000: *** General Exception on node 0 >> 1A 000: *** EPC: 0xc00000001fc473dc (0xc00000001fc473dc) >> 1A 000: *** Press ENTER to continue. >> 1A 000: POD MSC Dex> >> >> If this is a hardware lock up, that might explain why kgdb isn't useful at that >> point. POD lets me dump the CRBs and PI error spool, but I'm not sure how >> useful that information is w/o SGI's internal documents. > > I still haven't forgotten everything (I hope) so maybe you could post that > information anyway just to use the small chance there ight be something > useful in there? I hope you haven't forgotten everything. How many people left actually still care about about these platforms? :) That said, CRBs didn't contain anything useful. All initialized to zero, except CRB D, which only held 0x00000000000000ff in all 15 of its registers. So that said, here's what I spent all of Sunday and a little of Monday night doing: 1. Boot into userland, allow md to finish any resyncs, delete the dummy file "FOO" being created, sync 2. Re-run this dd command: dd if=/dev/urandom of=/usr/space/bonnie++/FOO bs=1M count=24000 status=progress If that command actually completes, it takes about ~35-40mins. In a couple of instances, it actually finished, but in most instances, the system would lock up anywhere from 5 seconds after launching to 25mins later. I even made a video capture of my two serial windows and ssh console when the system locks up that I can put onto Youtube if interested. Once the system locked up, I: 1. Switched to the MSC. 2. Set debug bits 0x1018 to force POD Dex mode. 3. Reset the system. 4. Run "why" on the first CPU to come up and record the value of EPC. 5. Issue an NMI to the other CPU on that node to force it into POD 6. Run "why" and record the value of EPC. 7. Switch to IOC3 console. 8. Issue an NMI to the other node to force it into POD on MSC. 9. Run "why" and record EPC. 10. Repeat for last CPU. 11. Clear debug bits, reset system, repeat all over again. I don't know why issuing the NMI's works to switch the active CPU in POD around. POD's own "cpu" command just locks the console up when you try to switch CPUs. Not sure if it's a bug in the POD software I have in my PROM image (which is 6.156, the latest SGI made AFAIK). I've also determined that EPC is the only register that's reliable to hold data left over from the lockup. The PROM scrambles the remaining registers. I verified this by periodically recompiling my kernel with different options or a different compiler version to make sure the function addresses were different, and EPC would still point at the addresses of very specific functions when the system locked up. Using the EPC values, I looked those up in GDB for each cycle and kept a listing of where all four CPUs were last at when the machine stopped. Going by that, I've compiled a non-exhaustive list of the most common functions, one or more of which might be where the underlying problem is at: mem_serial_in (./arch/mips/include/asm/io.h:428) ioc3_mdio_read (drivers/net/ethernet/sgi/ioc3-eth.c:478) arch_local_irq_save (arch/mips/lib/mips-atomic.c:66) arch_local_irq_restore (arch/mips/lib/mips-atomic.c:109) do_idle (kernel/sched/idle.c:154) hub_rt_read (arch/mips/sgi-ip27/ip27-timer.c:145) spin_dump (kernel/locking/spinlock_debug.c:58) The ones that stand out the most to me are arch_local_irq_restore and spin_dump. This is what GDB said about spin_dump: (gdb) l *0xa800000000096ba0 0xa800000000096ba0 is in spin_dump (kernel/locking/spinlock_debug.c:56). 51 52 static void spin_dump(raw_spinlock_t *lock, const char *msg) 53 { 54 struct task_struct *owner = NULL; 55 56 if (lock->owner && lock->owner != SPINLOCK_OWNER_INIT) 57 owner = lock->owner; 58 printk(KERN_EMERG "BUG: spinlock %s on CPU#%d, %s/%d\n", 59 msg, raw_smp_processor_id(), 60 current->comm, task_pid_nr(current)); Which tells me the system was about to report a spinlock lockup somewhere before it halted. I tried setting a kgdb breakpoint on line 56, but the machine is halting before it can actually execute the breakpoint, as in one of my runs, one of the CPUs was literally in the middle of executing "nmi_handler" from genex.S, and I only saw this when I had an active breakpoint set. So I think my suspicion of this being a spinlock lockup is on the right path, and it's grave enough that it halts the entire machine. Problem is, we're locking up so fast that existing kernel infrastructure can't report //what// spinlock is the one deadlocking. I can't rely on leftover data in the CPU's own registers, and I don't know if it's possible to dump kernel memory from POD mode in a way that can be decoded into something useful to trace down what happened. --- The other standout function is arch_local_irq_restore. This is what that looks like in GDB: 0xa800000000325148 is in arch_local_irq_restore (arch/mips/lib/mips-atomic.c:109). 104 " .set pop \n" 105 : [flags] "=r" (__tmp1) 106 : "0" (flags) 107 : "memory"); 108 109 preempt_enable(); 110 } 111 EXPORT_SYMBOL(arch_local_irq_restore); 112 113 #endif /* !CONFIG_CPU_MIPSR2 && !CONFIG_CPU_MIPSR6 */ Or the asm dump: a800000000325130 <arch_local_irq_restore>: a800000000325130: 40016000 mfc0 at,$12 a800000000325134: 30840001 andi a0,a0,0x1 a800000000325138: 3421001f ori at,at,0x1f a80000000032513c: 3821001f xori at,at,0x1f a800000000325140: 00812025 or a0,a0,at a800000000325144: 40846000 mtc0 a0,$12 a800000000325148: 03e00008 jr ra a80000000032514c: 00000000 nop Each time the machine has locked up and EPC points at arch_local_irq_restore, it's always been on the address of that "ja" instruction, regardless of which CPU I got that EPC value off of. But I don't know if that instruction is responsible for the lock up (I doubt it). I am also noticing that sometimes when a lockup happens, one or more of the other CPUs is in the middle of a read[bwlq]/write[bwlq] operation. In a test I just ran before writing this e-mail, two of the CPUs were in the middle of read/write tasks (this is from a kernel with my patchset on top, so this function is from the IOC3 metadriver, but the same problem has been seen in mem_serial_in from a vanilla lmo git tree): CPU 2B: 0xa8000000003a6138 is in ioc3_serial_in (./arch/mips/include/asm/io.h:428). 423 \ 424 __BUILD_MEMORY_PFX(__raw_, bwlq, type) \ 425 __BUILD_MEMORY_PFX(, bwlq, type) \ 426 __BUILD_MEMORY_PFX(__mem_, bwlq, type) \ 427 428 BUILDIO_MEM(b, u8) 429 BUILDIO_MEM(w, u16) 430 BUILDIO_MEM(l, u32) 431 BUILDIO_MEM(q, u64) 432 a8000000003a6120 <ioc3_serial_in>: a8000000003a6120: 908200b1 lbu v0,177(a0) a8000000003a6124: dc830020 ld v1,32(a0) a8000000003a6128: 00452804 sllv a1,a1,v0 a8000000003a612c: 38a50003 xori a1,a1,0x3 a8000000003a6130: 0065282d daddu a1,v1,a1 a8000000003a6134: 38a50003 xori a1,a1,0x3 a8000000003a6138: 90a20000 lbu v0,0(a1) a8000000003a613c: 03e00008 jr ra a8000000003a6140: 304200ff andi v0,v0,0xff a8000000003a6144: 00000000 nop CPU 2A: 0xa8000000003e6410 is in qla1280_queuecommand (./arch/mips/include/asm/io.h:429). 424 __BUILD_MEMORY_PFX(__raw_, bwlq, type) \ 425 __BUILD_MEMORY_PFX(, bwlq, type) \ 426 __BUILD_MEMORY_PFX(__mem_, bwlq, type) \ 427 428 BUILDIO_MEM(b, u8) 429 BUILDIO_MEM(w, u16) 430 BUILDIO_MEM(l, u32) 431 BUILDIO_MEM(q, u64) 432 433 #define __BUILD_IOPORT_PFX(bus, bwlq, type) \ a8000000003e6360 <qla1280_queuecommand>: a8000000003e6360: 67bdff90 daddiu sp,sp,-112 a8000000003e6364: ffbf0068 sd ra,104(sp) ... a8000000003e6404: 97c41728 lhu a0,5928(s8) a8000000003e6408: 64630078 daddiu v1,v1,120 a8000000003e640c: 38630002 xori v1,v1,0x2 a8000000003e6410: 94650000 lhu a1,0(v1) a8000000003e6414: 30a5ffff andi a1,a1,0xffff a8000000003e6418: 0085182a slt v1,a0,a1 a8000000003e641c: 14600002 bnez v1,a8000000003e6428 <qla1280_queuecommand+0xc8> --- I've also tried dumping the HUB error state after reset by setting the debug bits to 0x1000, and that says this: Erecting partition fences ................ DONE nasid 0 Reading peer hub nasid: 0x9200000021600000 xbow_update: updating slave nasid 1 on link 10 Update config for routers connected to hubs Update config for hubs and hubless routers CPU A flushing cache Hardware Error State at System Reset + Errors on node Nasid 0x0 (0) + IP27 in /hw/module/1/slot/n2 + HUB signalled following errors. + HUB error interrupt register: 0x200000 + 21: CPU A received uncorrectable error during uncached load End Hardware Error State (at System Reset) But I'm not 100% sure I can trust that, since I believe I've seen it after a clean reboot from userland as well. Both vanilla lmo git and my patched tree throw a panic if the kernel gets a HUB error interrupt. I have noticed that sometimes, when the system panics, it can halt the machine before any panic info can be written out to the console. But given multiple runs, there's still a chance one of the panics will output something. Given the number of times I've crashed this machine lately, by now, I should have gotten *something* out of it. So I am discounting it being a HUB error interrupt locking things up. Just to be sure, I replaced the panic() call with a basic printk(), and still didn't get any output on the console. I'll also have to enable the heavy diagnostics mode and let the machine check itself, and maybe run bist from POD to be really sure it's not a hardware issue. I've got spare node boards in a closest that I can swap the CPU PIMMS on if I suspect something's wrong with one of the HUBs. -- Joshua Kinard Gentoo/MIPS kumba@gentoo.org 6144R/F5C6C943 2015-04-27 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-03-21 21:53 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-15 20:11 ARCS can't load CONFIG_DEBUG_LOCK_ALLOC kernel Joshua Kinard 2017-03-16 3:50 ` Joshua Kinard 2017-03-16 14:09 ` Ralf Baechle 2017-03-16 17:50 ` Joshua Kinard 2017-03-16 19:06 ` Ralf Baechle 2017-03-16 20:02 ` Joshua Kinard 2017-03-16 20:50 ` Ralf Baechle 2017-03-17 3:01 ` Joshua Kinard 2017-03-18 23:42 ` Joshua Kinard 2017-03-19 7:23 ` Joshua Kinard 2017-03-19 8:55 ` Ralf Baechle 2017-03-21 21:52 ` Joshua Kinard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).