All of lore.kernel.org
 help / color / mirror / Atom feed
* [parisc-linux] 64-bit kernel crashes on my c3600
@ 2004-10-19 17:54 Matthew Wilcox
  2004-10-20 15:24 ` Carlos O'Donell
  2004-10-31  6:29 ` Randolph Chung
  0 siblings, 2 replies; 3+ messages in thread
From: Matthew Wilcox @ 2004-10-19 17:54 UTC (permalink / raw)
  To: parisc-linux


One of the problems with this crash is that enabling EARLY_CONSOLE
doesn't help.  The exact same configuration boots fine in 32-bit mode.
I'm building from the same tree (with O=) so there's no question of patch
skew.  Turning on DISCONTIGMEM does not help.  The HPMC points inside
the code generated by the save_general macro just past skip_save_ior
inside the intr_save function in entry.S

I'm not even sure how to start debugging.  My initial thought is that r29
seems awfully high to be a good memory address.

Here's the HPMC if it's useful.  BTW, the "system responder address" is
MEM_CONTROL_0 inside the memory controller block of Astro's config space.


Service Menu: Enter command > pim hpmc

PROCESSOR PIM INFORMATION

-----------------  Processor 0 HPMC Information ------------------

Timestamp =
  Tue Oct  19 15:58:28 GMT 2004    (20:04:10:19:15:58:28)

HPMC Chassis Codes = 2cbf0  2500b  2cbf4  2cbfc

General Registers 0 - 31
00-03   0000000000000000  0000000000000080  000000000010012c  fffffff0f0000018
04-07   00000000004cd000  00000000004cf220  00000000fffffff0  00000000f0002f68
08-11   0000000000000006  00000001ffffff80  000000000804000e  000000001062c564
12-15   0000000000000000  00000000ffffffff  0000000000000000  00000000f0400004
16-19   0000000000000000  00000000f000017c  00000000f0000174  0000000000000000
20-23   0000000000000000  00000000fee003f8  00000000fee003fd  0000000000000000
24-27   0000000000000000  0000000000000000  0000000000000006  0000000010612ac0
28-31   0000000000000000  000000020ffffc40  000000020fffff80  0300000000802204

Control Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000000  0000000000000000  0000000000000000  000000000000001f
12-15   0000000000000000  0000000000000000  0000000000106000  0000000000000000
16-19   0000000a99eb6986  0000000000000000  0000000010107678  0000000043ffff80
20-23   0000000000000000  0000000000000000  000000ff08007f00  8000000000000000
24-27   00000000004cd000  00000000004cd000  000000007fffffff  000000007fdfffff
28-31   000000007fffffff  000000007fffffff  00000000105c8000  00000000105cc000
Space Registers 0 - 7

00-03   00000000          00000000          00000000          00000000
04-07   00000000          00000000          00000000          00000000

IIA Space                    = 0x0000000000000000
IIA Offset                   = 0x000000001010767c
Check Type                   = 0x20000000
CPU State                    = 0x9e000004
Cache Check                  = 0x00000000
TLB Check                    = 0x00000000
Bus Check                    = 0x003010bb
Assists Check                = 0x00000000
Assist State                 = 0x00000000
Path Info                    = 0x00031800
System Responder Address     = 0xfffffffffed10200
System Requestor Address     = 0xfffffffffffa0000

Floating-Point Registers 0 - 31
00-03   0000001f00000000  0000000000000000  0000000000000000  0000000000000000
04-07   00001e84000f41fa  0000007810179ac8  00000000000e4de0  104270101052b810
08-11   12ae1e4000000002  eff1700000000002  0000000030433480  000f41fa10425000
12-15   1052bcb400000002  eff1700000000002  0000000000000001  12b1414000000000
16-19   f00008c41052b810  104270103b9aca00  104251601052bc80  30433480000f41fa
20-23   104250001052bcb4  1052bc801016533c  08a00000052d8e00  00000000431bde83
24-27   20e6da0000000000  0000008000000000  eff6a9d400000000  12ad5c40effc18dc
28-31   eff8cbc0ffffffff  ffffffff10176990  ffffffff7fffffff  fffffb7dffffffff

'9000/785 B,C,J Workstation Unarchitected (per-CPU)', rev 1, 140 bytes:

Check Summary                = 0xcb81041000000000
Available Memory             = 0x0000000200000000
CPU Diagnose Register 2      = 0x0300000000802204
CPU Status Register 0        = 0x2420c20000000000
CPU Status Register 1        = 0x8080000000000000
SADD LOG                     = 0x0000000000000000
Read Short LOG               = 0xc13ff0f0f000a1b8
ERROR_STATUS                 = 0x0000000000000010
MEM_ADDR                     = 0x000001ff3fffffff
MEM_SYND                     = 0x0000000000000000
MEM_ADDR_CORR                = 0x000001ff3fffffff
MEM_SYND_CORR                = 0x0000000000000000
RUN_DATA_HIGH                = 0xc1bff0fffed08040
RUN_DATA_LOW                 = 0xc1bff0fffed08040
RUN_CTRL                     = 0x0000021c00001418
RUN_ADDR                     = 0xc1bff0fffed08040
System Responder Path        = 0x00ffffffffffffff


HPMC PIM Analysis Information:

Timestamp =
  Tue Oct  19 15:58:28 GMT 2004    (20:04:10:19:15:58:28)


'9000/785 B,C,J Workstation HPMC PIM Analysis (per-CPU)', rev 0, 1304 bytes:

A Data Miss Timeout occurred while CPU 0 was
requesting information.


Memory/IO Controller Error Analysis Information:

The Memory/IO Controller only observed the Broadcast Error.  It did not log
any additional information about the HPMC.

Memory Error Log Information:

Timestamp =
  Tue Oct  19 15:58:28 GMT 2004    (20:04:10:19:15:58:28)


'9000/785 B,C,J Workstation Memory Error Log', rev 0, 64 bytes:

   No memory errors logged


I/O Module Error Log Information:

Timestamp =
  Tue Oct  19 15:58:28 GMT 2004    (20:04:10:19:15:58:28)


'9000/785 B,C,J Workstation IO Error Log', rev 0, 228 bytes:

 Rope     Word1        Word2            Word3
------ ------------ ------------
   0    0x00000000   0x0e0cc009   0x00000000fed30048
   1    0x00000000   0x1e0cc009   0x00000000fed32048
   2    ----------   0x2e0cc009   ------------------
   3    ----------   0x3e0cc009   ------------------
   4    0x00000000   0x4e0cc009   0x00000000fed38048
   5    ----------   0x5e0cc009   ------------------
   6    0x00000000   0x6e0cc009   0x00000000fed3c048
   7    ----------   0x7e0cc009   ------------------

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [parisc-linux] 64-bit kernel crashes on my c3600
  2004-10-19 17:54 [parisc-linux] 64-bit kernel crashes on my c3600 Matthew Wilcox
@ 2004-10-20 15:24 ` Carlos O'Donell
  2004-10-31  6:29 ` Randolph Chung
  1 sibling, 0 replies; 3+ messages in thread
From: Carlos O'Donell @ 2004-10-20 15:24 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: parisc-linux

On Tue, Oct 19, 2004 at 06:54:40PM +0100, Matthew Wilcox wrote:
> One of the problems with this crash is that enabling EARLY_CONSOLE
> doesn't help.  The exact same configuration boots fine in 32-bit mode.
> I'm building from the same tree (with O=) so there's no question of patch
> skew.  Turning on DISCONTIGMEM does not help.  The HPMC points inside
> the code generated by the save_general macro just past skip_save_ior
> inside the intr_save function in entry.S

This is just before calling handle_interruption, so it looks like you
took an interrupt before something was setup properly?

These sorts of problems are very messy to debug if they are
non-deterministic. Just stick an infinite loop in a portion of code you
expect might be before the HPMC, run, TOC, check, and move the loop.
That's my normal procedure when I had to debug similar stuff to prove
some lws code.
 
> I'm not even sure how to start debugging.  My initial thought is that r29
> seems awfully high to be a good memory address.

Why do you think that?

I'm interested in r2 which is a userspace address. Did this box make it
to userspace?

c.

_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [parisc-linux] 64-bit kernel crashes on my c3600
  2004-10-19 17:54 [parisc-linux] 64-bit kernel crashes on my c3600 Matthew Wilcox
  2004-10-20 15:24 ` Carlos O'Donell
@ 2004-10-31  6:29 ` Randolph Chung
  1 sibling, 0 replies; 3+ messages in thread
From: Randolph Chung @ 2004-10-31  6:29 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: parisc-linux

> One of the problems with this crash is that enabling EARLY_CONSOLE
> doesn't help.  The exact same configuration boots fine in 32-bit mode.
> I'm building from the same tree (with O=) so there's no question of patch
> skew.  Turning on DISCONTIGMEM does not help.  The HPMC points inside
> the code generated by the save_general macro just past skip_save_ior
> inside the intr_save function in entry.S

i've found out some more info about this problem, but still no clue why
it's happening....

at the end of head.S, when we branch to virtual space, the first virtual
insn access (to start_kernel) causes a itlb miss fault (as expected).
For some reason, the itlb handler is not able to find the page for
start_kernel in the page table, so it attempts to call the fault handler
(handle_interruption, via intr_save). However, in intr_save, as soon as
we switch to virtual space (virt_map, right before the save_general
macro call), we immediately cause another itlb miss fault, which fails,
and calls intr_save again. Each time intr_save is called, we create a
new stack frame. Eventually, the stack pointer points past valid phys
addr space, and the machine HPMCs.

The question is, why does the itlb miss handler fail to find the mapping
for start_kernel? On my kernel, start_kernel is at 0x1056xxxx, which is
well within the 16MB initially mapped in head.S. I went through the code
in head.S several times and it seems to be correct. I also don't quite
understand how this part of the code, which is all in assembly, can
behave differently between gcc-3.3 and gcc-3.4. I tried to move the
initial-VM initialization code in head.S much closer to the rfi (with
the hypothesis that some intervening code had trashed the page table)
but that doesn't seem to help. I also had a theory that perhaps the
different gcc versions were expanding the #define's differently for
offsets.h, but that doesn't seem to be the case either... so i'm out of
ideas :( 

if it helps, what i see is that in L2_ptep, 
    ldw,s \index(\pmd),\pmd
is returning with \pmd == 0

weird....

randolph
-- 
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-10-31  6:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-19 17:54 [parisc-linux] 64-bit kernel crashes on my c3600 Matthew Wilcox
2004-10-20 15:24 ` Carlos O'Donell
2004-10-31  6:29 ` Randolph Chung

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.