linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Maple PPC970 kexec crash-dump problems
@ 2009-01-23 20:59 Benjamin Walsh
  2009-01-24  7:52 ` Milton Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Walsh @ 2009-01-23 20:59 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1166 bytes --]

Hi all,

I am trying to use kexec with a crash dump kernel on a Maple board (Motorola
ATCA6101 to be precise). This board is running a two-CPU PPC970FX. I am
running a 2.6.27-10 kernel and have tried both older kexec-tools and the
newest ones. I have tried SMP and non-SMP kernels.

Using kexec -l to fast boot works correctly. However, loading a crash dump
kernel and triggering a crash via echo c > /proc/sysrq-trigger simply hangs
the board. I have traced the sequence down to after the call to
kexec_copy_flush(), when the CPU returns to real-address mode (bl
real_mode). At this point I have no further debugging information.

Two things could help me:

- Getting the fix if this is a known issue and a fix exists. I have looked
at recent patches and nothing lept to mind, mostly relocatable kernel
support.
- Obtaining the address of the serial port @3f8 in real mode. The init
sequence with udbg ON says that the physical address of the port is
0xf40003f8; however, setting it up in poll mode and trying to stuff
characters in the tx buffer doesn't produce anything.

Has anyone recently tried to use the serial port in real mode ?

Thanks for any help.

Ben

[-- Attachment #2: Type: text/html, Size: 1233 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Maple PPC970 kexec crash-dump problems
  2009-01-23 20:59 Maple PPC970 kexec crash-dump problems Benjamin Walsh
@ 2009-01-24  7:52 ` Milton Miller
  2009-02-04 18:48   ` Benjamin Walsh
  0 siblings, 1 reply; 4+ messages in thread
From: Milton Miller @ 2009-01-24  7:52 UTC (permalink / raw)
  To: Benjamin Walsh; +Cc: linuxppc-dev list

On Sat Jan 24 at 07:59:47 EST in 2009, Benjamin Walsh wrote:
> I am trying to use kexec with a crash dump kernel on a Maple board 
> (Motorola
> ATCA6101 to be precise). This board is running a two-CPU PPC970FX. I am
> running a 2.6.27-10 kernel and have tried both older kexec-tools and 
> the
> newest ones. I have tried SMP and non-SMP kernels.

Once you start the second cpu it is likly executing instructions 
somewhere.

Priory to 2.6.27 you had to compile a fixxed offset kerenl to run 
kdump.  With 2.6.27 that option was removed and replaced with teh 
relocatable kerenl.  However, becasue of the way linux interacts with 
open firmware, the kernel will still move itself to 0 unless a specific 
flag is set.   The location of the flag was changed twice during the 
merge process, and the patches for kexec-tools were not made until 
early this year.

> Using kexec -l to fast boot works correctly. However, loading a crash 
> dump
> kernel and triggering a crash via echo c > /proc/sysrq-trigger simply 
> hangs
> the board. I have traced the sequence down to after the call to
> kexec_copy_flush(), when the CPU returns to real-address mode (bl
> real_mode). At this point I have no further debugging information.


> Two things could help me:
>
> - Getting the fix if this is a known issue and a fix exists. I have 
> looked
> at recent patches and nothing lept to mind, mostly relocatable kernel
> support.

That is a major change.

That said, I don't know if anyone has tested kexec panic beyond pseries 
for 64 bit powerpc.

I know Paul originally prototyped the relocatable patch on a powermac, 
but I dont' know what if any smp testing he performed.   And you said 
you are actualy on maple not a powermac, so the startup issues are 
different.

> - Obtaining the address of the serial port @3f8 in real mode. The init
> sequence with udbg ON says that the physical address of the port is
> 0xf40003f8; however, setting it up in poll mode and trying to stuff
> characters in the tx buffer doesn't produce anything.

Ah yes.  In real mode you can only talk to cacheable memory without 
implementation specific assistance.  However, if you look in the kernel 
for the maple early udbg support, you will find the code you need to 
talk to that serial port in real mode.

>
> Has anyone recently tried to use the serial port in real mode ?
>
> Thanks for any help.
>
> Ben

Hope this gets you started.  I wrote a lot of the kernel code, but I 
had the advantage of external jtag access to the processor to see where 
it when ended up when it went astray.

milton

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Maple PPC970 kexec crash-dump problems
  2009-01-24  7:52 ` Milton Miller
@ 2009-02-04 18:48   ` Benjamin Walsh
  2009-02-06 16:53     ` Milton Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Walsh @ 2009-02-04 18:48 UTC (permalink / raw)
  To: Milton Miller; +Cc: linuxppc-dev list

[-- Attachment #1: Type: text/plain, Size: 3773 bytes --]

Hi Milton,

I've tracked it down to the device tree passed to the second kernel being
screwed-up when patched by kexec-tools. Namely, it was creating
linux,usable-memory entries that were wrong, and the MMU initialization hung
when it failed allocating for the page tables. I hacked the tool, and got
passed that point in the init sequence, but the very first IO mapped access
fails, so the MMU doesn't seem to be set up correctly.

Anyway, up to my question: is the crash dump (kdump) kernel supposed to use
the memory reserved for it by the first kernel for its working memory ? e.g.
On that board, I have 0->2GB and 4->6GB for a total of 4GB of RAM. Let's say
I reserve 128M@32M, that's 0x2000000->0xa000000. Is the second kernel
supposed to use

(0x2000000+<kernel size>) -> 0xa000000

for its memory pool and leave everything else:

0->0x2000000, 0xa000000 -> 80000000, 0x100000000 -> 0x180000000

as memory that is from the first kernel, used to debug it ?

Basically, I am trying to figure out if I patched the tool correctly.

Thanks,
Ben

On Sat, Jan 24, 2009 at 2:52 AM, Milton Miller <miltonm@bga.com> wrote:

> On Sat Jan 24 at 07:59:47 EST in 2009, Benjamin Walsh wrote:
>
>> I am trying to use kexec with a crash dump kernel on a Maple board
>> (Motorola
>> ATCA6101 to be precise). This board is running a two-CPU PPC970FX. I am
>> running a 2.6.27-10 kernel and have tried both older kexec-tools and the
>> newest ones. I have tried SMP and non-SMP kernels.
>>
>
> Once you start the second cpu it is likly executing instructions somewhere.
>
> Priory to 2.6.27 you had to compile a fixxed offset kerenl to run kdump.
>  With 2.6.27 that option was removed and replaced with teh relocatable
> kerenl.  However, becasue of the way linux interacts with open firmware, the
> kernel will still move itself to 0 unless a specific flag is set.   The
> location of the flag was changed twice during the merge process, and the
> patches for kexec-tools were not made until early this year.
>
>  Using kexec -l to fast boot works correctly. However, loading a crash dump
>> kernel and triggering a crash via echo c > /proc/sysrq-trigger simply
>> hangs
>> the board. I have traced the sequence down to after the call to
>> kexec_copy_flush(), when the CPU returns to real-address mode (bl
>> real_mode). At this point I have no further debugging information.
>>
>
>
>  Two things could help me:
>>
>> - Getting the fix if this is a known issue and a fix exists. I have looked
>> at recent patches and nothing lept to mind, mostly relocatable kernel
>> support.
>>
>
> That is a major change.
>
> That said, I don't know if anyone has tested kexec panic beyond pseries for
> 64 bit powerpc.
>
> I know Paul originally prototyped the relocatable patch on a powermac, but
> I dont' know what if any smp testing he performed.   And you said you are
> actualy on maple not a powermac, so the startup issues are different.
>
>  - Obtaining the address of the serial port @3f8 in real mode. The init
>> sequence with udbg ON says that the physical address of the port is
>> 0xf40003f8; however, setting it up in poll mode and trying to stuff
>> characters in the tx buffer doesn't produce anything.
>>
>
> Ah yes.  In real mode you can only talk to cacheable memory without
> implementation specific assistance.  However, if you look in the kernel for
> the maple early udbg support, you will find the code you need to talk to
> that serial port in real mode.
>
>
>> Has anyone recently tried to use the serial port in real mode ?
>>
>> Thanks for any help.
>>
>> Ben
>>
>
> Hope this gets you started.  I wrote a lot of the kernel code, but I had
> the advantage of external jtag access to the processor to see where it when
> ended up when it went astray.
>
> milton
>
>

[-- Attachment #2: Type: text/html, Size: 5132 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Maple PPC970 kexec crash-dump problems
  2009-02-04 18:48   ` Benjamin Walsh
@ 2009-02-06 16:53     ` Milton Miller
  0 siblings, 0 replies; 4+ messages in thread
From: Milton Miller @ 2009-02-06 16:53 UTC (permalink / raw)
  To: Benjamin Walsh; +Cc: linuxppc-dev list


On Feb 4, 2009, at 12:48 PM, Benjamin Walsh wrote:

> Hi Milton,
>
> I've tracked it down to the device tree passed to the second kernel=20
> being screwed-up when patched by kexec-tools. Namely, it was creating=20=

> linux,usable-memory entries that were wrong, and the MMU=20
> initialization hung when it failed allocating for the page tables. I=20=

> hacked the tool, and got passed that point in the init sequence, but=20=

> the very first IO mapped access fails, so the MMU doesn't seem to be=20=

> set up correctly.

I would need more details exactly what you think is wrong.

How does it fail?

If the first IO mapped access fails, then I would ask if you are using=20=

IOMMU.  It is quite possible that the dart iommu code needs to be=20
modified to use the existing mapping table instead of allocating a new=20=

table, otherwise any existing mappings being used by inflight dma would=20=

fail and the that might cause mmio loads to wait for uncompletable dma=20=

writes.   Just a theory with the lack of information you gave me.


>
> Anyway, up to my question: is the crash dump (kdump) kernel supposed=20=

> to use the memory reserved for it by the first kernel for its working=20=

> memory ? e.g. On that board, I have 0->2GB and 4->6GB for a total of=20=

> 4GB of RAM. Let's say I reserve 128M@32M, that's 0x2000000->0xa000000.=20=

> Is the second kernel supposed to use
>
> (0x2000000+<kernel size>) -> 0xa000000
>
> for its memory pool and leave everything else:
>
> 0->0x2000000, 0xa000000 -> 80000000, 0x100000000 -> 0x180000000
>
> as memory that is from the first kernel, used to debug it ?


Yes, but that is not quite how the device tree is formed.

The second kernel will also use the interrupt vector area at address 0.=20=

  Therefore that is saved as the backup region in purgatory to the=20
address allocated in the kdump region.  =10The device tree is then=20
created with linux,usable-memory regions extending the kdump region=20
back to 0, and a reserve entry marking the area as reserved.

In addition, the device tree gets the memory backing tce tables for=20
pseries smp mode.  It may need the page with the dart table, marked=20
reserved, so that the table gets added to the linear map -- except it=20
should be mapped cache inhibited so that may not work either.

>
> Basically, I am trying to figure out if I patched the tool correctly.
>
> Thanks,
> Ben
>
> On Sat, Jan 24, 2009 at 2:52 AM, Milton Miller <miltonm@bga.com> =
wrote:
>> On Sat Jan 24 at 07:59:47 EST in 2009, Benjamin Walsh wrote:
>>> I am trying to use kexec with a crash dump kernel on a Maple board=20=

>>> (Motorola
>>>  ATCA6101 to be precise). This board is running a two-CPU PPC970FX.=20=

>>> I am
>>>  running a 2.6.27-10 kernel and have tried both older kexec-tools=20
>>> and the
>>>  newest ones. I have tried SMP and non-SMP kernels.
>>
>>  Once you start the second cpu it is likly executing instructions=20
>> somewhere.
>>
>>  Priory to 2.6.27 you had to compile a fixxed offset kerenl to run=20
>> kdump. =A0With 2.6.27 that option was removed and replaced with teh=20=

>> relocatable kerenl. =A0However, becasue of the way linux interacts =
with=20
>> open firmware, the kernel will still move itself to 0 unless a=20
>> specific flag is set. =A0 The location of the flag was changed twice=20=

>> during the merge process, and the patches for kexec-tools were not=20
>> made until early this year.
>>
>>
>>> Using kexec -l to fast boot works correctly. However, loading a=20
>>> crash dump
>>>  kernel and triggering a crash via echo c > /proc/sysrq-trigger=20
>>> simply hangs
>>>  the board. I have traced the sequence down to after the call to
>>>  kexec_copy_flush(), when the CPU returns to real-address mode (bl
>>>  real_mode). At this point I have no further debugging information.
>>
>>
>>> Two things could help me:
>>>
>>>  - Getting the fix if this is a known issue and a fix exists. I have=20=

>>> looked
>>>  at recent patches and nothing lept to mind, mostly relocatable=20
>>> kernel
>>>  support.
>>
>>  That is a major change.
>>
>>  That said, I don't know if anyone has tested kexec panic beyond=20
>> pseries for 64 bit powerpc.
>>
>>  I know Paul originally prototyped the relocatable patch on a=20
>> powermac, but I dont' know what if any smp testing he performed. =A0=20=

>> And you said you are actualy on maple not a powermac, so the startup=20=

>> issues are different.
>>
>>
>>> - Obtaining the address of the serial port @3f8 in real mode. The=20
>>> init
>>>  sequence with udbg ON says that the physical address of the port is
>>>  0xf40003f8; however, setting it up in poll mode and trying to stuff
>>>  characters in the tx buffer doesn't produce anything.
>>
>>  Ah yes. =A0In real mode you can only talk to cacheable memory =
without=20
>> implementation specific assistance. =A0However, if you look in the=20
>> kernel for the maple early udbg support, you will find the code you=20=

>> need to talk to that serial port in real mode.
>>
>>
>>>
>>>  Has anyone recently tried to use the serial port in real mode ?
>>>
>>>  Thanks for any help.
>>>
>>>  Ben
>>
>>  Hope this gets you started. =A0I wrote a lot of the kernel code, but =
I=20
>> had the advantage of external jtag access to the processor to see=20
>> where it when ended up when it went astray.
>>
>>  milton
>>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-02-06 16:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-23 20:59 Maple PPC970 kexec crash-dump problems Benjamin Walsh
2009-01-24  7:52 ` Milton Miller
2009-02-04 18:48   ` Benjamin Walsh
2009-02-06 16:53     ` Milton Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).