All of lore.kernel.org
 help / color / mirror / Atom feed
* Possbile bug in hashtable.S : add_hash_page
@ 2004-06-28 18:40 Bob Doiron
  2004-06-28 19:28 ` Dan Malek
  2004-06-29  8:41 ` Paul Mackerras
  0 siblings, 2 replies; 6+ messages in thread
From: Bob Doiron @ 2004-06-28 18:40 UTC (permalink / raw)
  To: linuxppc-embedded


Hi all. I'm working on an embedded ppc / linux project and I've run into
what looks like a possible bug.

On entry to add_hash_page the return address is stored on the stack like so:

   _GLOBAL(add_hash_page)
      mflr  r0
      stw   r0,4(r1)  <-- store into *(r1) + 4

	< do some magic >
      < then disabled interrupts >


then upon return, it's retreived like so:

      mtmsr	r10 <--- interrupts enabled here
	SYNC_601
	isync

	lwz	r0,4(r1) <-- read out *(r1) + 4
	mtlr	r0
	blr


However - r1 remains unchanged within the function. As it stands, it seems
that an interrupt could come along and scrible on our return address by
using the unchanged stack pointer. What I've been seeing is that when under
heavy interrupt load (nfs mount kernel compile for example) my ppc gets
stuck in a "here: jump here" loop right at the blr instruction listed above.
Inspecting the stack @ 4(r1) showed that it did in fact have the address of
the blr instruction there, which led me to beleive that it is getting
corrupted by an interrupt. Simply changing the "4" offset in 4(r1) to 12
makes for a rock stable kernel, but it's a horrific hack.

I guess my question is, am I missing something? or is this a real bug? My
extremely limited ppc assembly tells seems to think that r1 should be moved
prior to using stack space...

--Bob


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possbile bug in hashtable.S : add_hash_page
  2004-06-28 18:40 Possbile bug in hashtable.S : add_hash_page Bob Doiron
@ 2004-06-28 19:28 ` Dan Malek
  2004-06-28 19:43   ` Bob Doiron
  2004-06-29  8:41 ` Paul Mackerras
  1 sibling, 1 reply; 6+ messages in thread
From: Dan Malek @ 2004-06-28 19:28 UTC (permalink / raw)
  To: Bob Doiron; +Cc: linuxppc-embedded


On Jun 28, 2004, at 2:40 PM, Bob Doiron wrote:

>
> Hi all. I'm working on an embedded ppc / linux project and I've run
> into
> what looks like a possible bug.

What kernel?  Where did you get it?  What processor?

> 	< do some magic >

can only be explained if we know more details.


	-- Dan


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Possbile bug in hashtable.S : add_hash_page
  2004-06-28 19:28 ` Dan Malek
@ 2004-06-28 19:43   ` Bob Doiron
  0 siblings, 0 replies; 6+ messages in thread
From: Bob Doiron @ 2004-06-28 19:43 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded


I started with the vanilla 2.6.6 kernel, then I compared hashtable.S to the
one found here: http://ppc.bkbits.net:8080/linuxppc-2.5 and saw that it had
the same stack/interrupt handling within the add_hash_page function. The
processor is an mpc7457.

I guess I was hoping the ppc assembly guru's would check thier current ppc
hashtable.S to see if the code was the same, and if so, take a stab at
whether it looks reasonable given my problem explanation.

-Bob


-----Original Message-----
From: Dan Malek [mailto:dan@embeddededge.com]
Sent: Monday, June 28, 2004 4:29 PM
To: Bob Doiron
Cc: linuxppc-embedded@lists.linuxppc.org
Subject: Re: Possbile bug in hashtable.S : add_hash_page



On Jun 28, 2004, at 2:40 PM, Bob Doiron wrote:

>
> Hi all. I'm working on an embedded ppc / linux project and I've run
> into
> what looks like a possible bug.

What kernel?  Where did you get it?  What processor?

> 	< do some magic >

can only be explained if we know more details.


	-- Dan


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possbile bug in hashtable.S : add_hash_page
  2004-06-28 18:40 Possbile bug in hashtable.S : add_hash_page Bob Doiron
  2004-06-28 19:28 ` Dan Malek
@ 2004-06-29  8:41 ` Paul Mackerras
  1 sibling, 0 replies; 6+ messages in thread
From: Paul Mackerras @ 2004-06-29  8:41 UTC (permalink / raw)
  To: Bob Doiron; +Cc: linuxppc-embedded


Bob Doiron writes:

> However - r1 remains unchanged within the function. As it stands, it seems
> that an interrupt could come along and scrible on our return address by
> using the unchanged stack pointer. What I've been seeing is that when under

No - if an interrupt happens, it should decrement the stack pointer
before writing to the stack.  An interrupt should never modify
anything at or above the current stack pointer.

> heavy interrupt load (nfs mount kernel compile for example) my ppc gets
> stuck in a "here: jump here" loop right at the blr instruction listed above.
> Inspecting the stack @ 4(r1) showed that it did in fact have the address of
> the blr instruction there, which led me to beleive that it is getting
> corrupted by an interrupt. Simply changing the "4" offset in 4(r1) to 12
> makes for a rock stable kernel, but it's a horrific hack.

Well, it might be getting corrupted by an interrupt, in which case the
most likely cause is some code using a local array variable and
accessing outside the bounds of the array.  It's hard to see what that
code would be doing to put the address of the blr in that word of
memory though.

Paul.

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Possbile bug in hashtable.S : add_hash_page
  2004-06-28 20:16 VanBaren, Gerald (AGRE)
@ 2004-06-28 20:30 ` Bob Doiron
  0 siblings, 0 replies; 6+ messages in thread
From: Bob Doiron @ 2004-06-28 20:30 UTC (permalink / raw)
  To: VanBaren, Gerald (AGRE), linuxppc-embedded


Thanks. That's exactly the kind of answer I was looking for ;)
--Bob

-----Original Message-----
From: owner-linuxppc-embedded@lists.linuxppc.org
[mailto:owner-linuxppc-embedded@lists.linuxppc.org]On Behalf Of
VanBaren, Gerald (AGRE)
Sent: Monday, June 28, 2004 5:17 PM
To: linuxppc-embedded@lists.linuxppc.org
Subject: RE: Possbile bug in hashtable.S : add_hash_page



> -----Original Message-----
> From: owner-linuxppc-embedded@lists.linuxppc.org
> [mailto:owner-linuxppc-embedded@lists.linuxppc.org]On Behalf Of Bob
> Doiron
> Sent: Monday, June 28, 2004 2:40 PM
> To: linuxppc-embedded@lists.linuxppc.org
> Subject: Possbile bug in hashtable.S : add_hash_page
>
>
>
> Hi all. I'm working on an embedded ppc / linux project and
> I've run into
> what looks like a possible bug.
>
> On entry to add_hash_page the return address is stored on the
> stack like so:
>
>    _GLOBAL(add_hash_page)
>       mflr  r0
>       stw   r0,4(r1)  <-- store into *(r1) + 4
>
> 	< do some magic >
>       < then disabled interrupts >
>
>
> then upon return, it's retreived like so:
>
>       mtmsr	r10 <--- interrupts enabled here
> 	SYNC_601
> 	isync
>
> 	lwz	r0,4(r1) <-- read out *(r1) + 4
> 	mtlr	r0
> 	blr
>
>
> However - r1 remains unchanged within the function. As it
> stands, it seems
> that an interrupt could come along and scrible on our return
> address by
> using the unchanged stack pointer. What I've been seeing is
> that when under
> heavy interrupt load (nfs mount kernel compile for example)
> my ppc gets
> stuck in a "here: jump here" loop right at the blr
> instruction listed above.
> Inspecting the stack @ 4(r1) showed that it did in fact have
> the address of
> the blr instruction there, which led me to beleive that it is getting
> corrupted by an interrupt. Simply changing the "4" offset in
> 4(r1) to 12
> makes for a rock stable kernel, but it's a horrific hack.
>
> I guess my question is, am I missing something? or is this a
> real bug? My
> extremely limited ppc assembly tells seems to think that r1
> should be moved
> prior to using stack space...
>
> --Bob


Hi Bob,

I don't claim to be a PPC guru, but the PPC stack is rather unconventional.

This is based on the PowerPC Embedded Application Binary Interface (EABI)
and ABI

The link register (return address) is stored in the second entry of the
previous stack (frame pointers and return addresses from the current and
previous stack frames get intertwined in a horribly confusing way). This is
odd to us traditional stack types, but it is the PPC convention the caller
has saved an unused 32 bit word on his stack for this.

The only problems you can run into is if someone doesn't reserve the extra
room. This happens in power up initialization or hand written assembly
language routines you need to have the reserved room in your stack and it is
easy to forget. Your initial stack should have two entries of zeros (EABI
stacks are 8 byte aligned - ABI conformant stacks are 16 byte aligned). One
entry is reserved for the callee's return address and one entry terminates
the back chain.

EABI:
          +----------+
     xxxC | 00000000 | reserved for callee's return address
r1-> xxx8 | 00000000 | back chain pointer (zeros terminates the back chain)
        ^
        +- address (least significant nibble)

ABI:
          +----------+
     xxxC | 00000000 | \ alignment padding
     xxx8 | 00000000 | /
     xxx4 | 00000000 | reserved for callee's return address
r1-> xxx0 | 00000000 | back chain pointer (zeros terminates the back chain)

In your hash example, I'm assuming it is a leaf function -- it doesn't call
anybody else -- in which case it would not need to allocate any stack space
for subroutines because it doesn't call any.

You will need to look at your interrupt handler support to see what the
interrupt handler does with the stack.  This will be even more confusing and
it is different for the different flavors of PPC processors (PPCs are
application code compatible but not system level code compatible).

gvb


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Possbile bug in hashtable.S : add_hash_page
@ 2004-06-28 20:16 VanBaren, Gerald (AGRE)
  2004-06-28 20:30 ` Bob Doiron
  0 siblings, 1 reply; 6+ messages in thread
From: VanBaren, Gerald (AGRE) @ 2004-06-28 20:16 UTC (permalink / raw)
  To: linuxppc-embedded


> -----Original Message-----
> From: owner-linuxppc-embedded@lists.linuxppc.org
> [mailto:owner-linuxppc-embedded@lists.linuxppc.org]On Behalf Of Bob
> Doiron
> Sent: Monday, June 28, 2004 2:40 PM
> To: linuxppc-embedded@lists.linuxppc.org
> Subject: Possbile bug in hashtable.S : add_hash_page
>
>
>
> Hi all. I'm working on an embedded ppc / linux project and
> I've run into
> what looks like a possible bug.
>
> On entry to add_hash_page the return address is stored on the
> stack like so:
>
>    _GLOBAL(add_hash_page)
>       mflr  r0
>       stw   r0,4(r1)  <-- store into *(r1) + 4
>
> 	< do some magic >
>       < then disabled interrupts >
>
>
> then upon return, it's retreived like so:
>
>       mtmsr	r10 <--- interrupts enabled here
> 	SYNC_601
> 	isync
>
> 	lwz	r0,4(r1) <-- read out *(r1) + 4
> 	mtlr	r0
> 	blr
>
>
> However - r1 remains unchanged within the function. As it
> stands, it seems
> that an interrupt could come along and scrible on our return
> address by
> using the unchanged stack pointer. What I've been seeing is
> that when under
> heavy interrupt load (nfs mount kernel compile for example)
> my ppc gets
> stuck in a "here: jump here" loop right at the blr
> instruction listed above.
> Inspecting the stack @ 4(r1) showed that it did in fact have
> the address of
> the blr instruction there, which led me to beleive that it is getting
> corrupted by an interrupt. Simply changing the "4" offset in
> 4(r1) to 12
> makes for a rock stable kernel, but it's a horrific hack.
>
> I guess my question is, am I missing something? or is this a
> real bug? My
> extremely limited ppc assembly tells seems to think that r1
> should be moved
> prior to using stack space...
>
> --Bob


Hi Bob,

I don't claim to be a PPC guru, but the PPC stack is rather unconventional.

This is based on the PowerPC Embedded Application Binary Interface (EABI) and ABI

The link register (return address) is stored in the second entry of the previous stack (frame pointers and return addresses from the current and previous stack frames get intertwined in a horribly confusing way). This is odd to us traditional stack types, but it is the PPC convention the caller has saved an unused 32 bit word on his stack for this.

The only problems you can run into is if someone doesn't reserve the extra room. This happens in power up initialization or hand written assembly language routines you need to have the reserved room in your stack and it is easy to forget. Your initial stack should have two entries of zeros (EABI stacks are 8 byte aligned - ABI conformant stacks are 16 byte aligned). One entry is reserved for the callee's return address and one entry terminates the back chain.

EABI:
          +----------+
     xxxC | 00000000 | reserved for callee's return address
r1-> xxx8 | 00000000 | back chain pointer (zeros terminates the back chain)
        ^
        +- address (least significant nibble)

ABI:
          +----------+
     xxxC | 00000000 | \ alignment padding
     xxx8 | 00000000 | /
     xxx4 | 00000000 | reserved for callee's return address
r1-> xxx0 | 00000000 | back chain pointer (zeros terminates the back chain)

In your hash example, I'm assuming it is a leaf function -- it doesn't call anybody else -- in which case it would not need to allocate any stack space for subroutines because it doesn't call any.

You will need to look at your interrupt handler support to see what the interrupt handler does with the stack.  This will be even more confusing and it is different for the different flavors of PPC processors (PPCs are application code compatible but not system level code compatible).

gvb


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-06-29  8:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-06-28 18:40 Possbile bug in hashtable.S : add_hash_page Bob Doiron
2004-06-28 19:28 ` Dan Malek
2004-06-28 19:43   ` Bob Doiron
2004-06-29  8:41 ` Paul Mackerras
2004-06-28 20:16 VanBaren, Gerald (AGRE)
2004-06-28 20:30 ` Bob Doiron

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.