All of lore.kernel.org
 help / color / mirror / Atom feed
* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
@ 2011-05-18 15:25 Frank Hofmann
  2011-05-18 16:06 ` Frank Hofmann
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Frank Hofmann @ 2011-05-18 15:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,


with the work on the hibernation support code, I've managed to make it all 
C-with-inline-asm. This happens to work fine on OMAP3, but there's an 
issue when I build it for Samsung 64xx. It's about this code:


int __naked swsusp_arch_resume(void)
{
 	register struct pbe *pbe;
 	register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
 	register void *saved_ctx asm ("r0") = ctx;

 	__asm__ __volatile__("" : "+sp"(sp));		/* don't optimize away ! */

 	for (pbe = restore_pblist; pbe; pbe = pbe->next)
 		copy_page(pbe->orig_address, pbe->address);

 	__asm__ __volatile__(
 		"mov	r1, #" __stringify(SYSTEM_MODE) "\n\t"
 		"msr	cpsr_c, r1\n\t"
 		"ldr	r1, [r0], #4\n\t"
 		"msr	cpsr_cxsf, r1\n\t"
 		"ldm	r0!, {r4-r12,lr}\n\t"
 		"ldr	sp, [r0], #4\n\t"
 		"mov	r2, #" __stringify(SVC_MODE) "\n\t"
 		"msr	cpsr_c, r2\n\t"
 		"ldm	r0!, {r2, lr}\n\t"
 		"ldr	sp, [r0], #4\n\t"
 		"msr	spsr_cxsf, r2\n\t"
 		"msr	cpsr, r1\n\t"
 		"mov	r1, #0\n\t"
 		"push	{r1, lr}\n\t"
 		"bl	__restore_processor_state\n\t"
 		"pop	{r0, pc}\n\t"
 		: "+r"(saved_ctx)
 		:
 		: "memory", "cc",
 		"r1", "r2", "r3", "r4", "r5", "r6", "r7",
 		"r8", "r9", "r10", "ip", "fp" "sp", "lr", "pc");

 	return 0;
}


This has the special requirement to run on a separate stack, because the 
copy loop will blot over whatever there is. Clobbering everything is part 
of the design/intent.



On OMAP (where I compile _without_ CONFIG_FTRACE, but even when 
switching it on there's no change), this creates the following assembly, 
which is ok:

00005bb0 <swsusp_arch_resume>:
     5bb0:       e59fd074        ldr     sp, [pc, #116]  ; 5c2c <swsusp_arch_resume+0x7c>
     5bb4:       e3003000        movw    r3, #0  ; 0x0
     5bb8:       e3403000        movt    r3, #0  ; 0x0
     5bbc:       e5934000        ldr     r4, [r3]
     5bc0:       e3540000        cmp     r4, #0  ; 0x0
     5bc4:       0a000005        beq     5be0 <swsusp_arch_resume+0x30>
     5bc8:       e5940004        ldr     r0, [r4, #4]
     5bcc:       e5941000        ldr     r1, [r4]
     5bd0:       ebfffffe        bl      162b80 <copy_page>
     5bd4:       e5944008        ldr     r4, [r4, #8]
     5bd8:       e3540000        cmp     r4, #0  ; 0x0
     5bdc:       1afffff9        bne     5bc8 <swsusp_arch_resume+0x18>
     5be0:       e3000000        movw    r0, #0  ; 0x0
     5be4:       e3400000        movt    r0, #0  ; 0x0
     5be8:       e3a0101f        mov     r1, #31 ; 0x1f
     5bec:       e121f001        msr     CPSR_c, r1
[ ... inline as above follows ... ]
     5c1c:       e92d4002        push    {r1, lr}
     5c20:       ebfffffe        bl      1538c <__restore_processor_state>
     5c24:       e8bd8001        pop     {r0, pc}
     5c28:       e3a00000        mov     r0, #0  ; 0x0
     5c2c:       00000ff8        .word   0x00000ff8

Notice how the for() loop has been coded into using R4.



When I build it for S5P64xx though (where we have CONFIG_FTRACE, but see 
below), I get a compile error (same gcc, just different kernel defconfig 
target, ARM11xx instead of Cortex-A8):

arch/arm/kernel/hibernate.c: In function 'swsusp_arch_resume':
arch/arm/kernel/hibernate.c:136: error: fp cannot be used in asm here

That's the '}' of the function. Strange enough ... the compiler wants to 
do something with FP and isn't willing to give that up.


If I take "fp" out of the clobber list then it compiles.

But the generated assembly code is incorrect:

0000757c <swsusp_arch_resume>:
     757c:       e59fd078        ldr     sp, [pc, #120]  ; 75fc <swsusp_arch_resume+0x80>
     7580:       e59f3078        ldr     r3, [pc, #120]  ; 7600 <swsusp_arch_resume+0x84>
     7584:       e5933000        ldr     r3, [r3]
     7588:       ea000005        b       75a4 <swsusp_arch_resume+0x28>
     758c:       e59b3000        ldr     r3, [fp]
     7590:       e5930004        ldr     r0, [r3, #4]
     7594:       e5931000        ldr     r1, [r3]
     7598:       ebfffffe        bl      1647a0 <copy_page>
     759c:       e59b3000        ldr     r3, [fp]
     75a0:       e5933008        ldr     r3, [r3, #8]
     75a4:       e58b3000        str     r3, [fp]
     75a8:       e59b3000        ldr     r3, [fp]
     75ac:       e3530000        cmp     r3, #0  ; 0x0
     75b0:       1afffff5        bne     758c <swsusp_arch_resume+0x10>
     75b4:       e59f0048        ldr     r0, [pc, #72]   ; 7604 <swsusp_arch_resume+0x88>
     75b8:       e3a0101f        mov     r1, #31 ; 0x1f
     75bc:       e121f001        msr     CPSR_c, r1
[ ... inline as above follows ... ]
     75ec:       e92d4002        push    {r1, lr}
     75f0:       ebfffffe        bl      dc58 <__restore_processor_state>
     75f4:       e8bd8001        pop     {r0, pc}
     75f8:       e59b0000        ldr     r0, [fp]
     75fc:       00000ff8        .word   0x00000ff8
     7600:       00000000        .word   0x00000000
     7604:       00002000        .word   0x00002000


This assumes FP is set up so that the local variable held in R3 (that the 
OMAP compile stuffs into R4) can be stored there, to preserve over the 
function call.

Adding a register asm("r4") to the variable declaration changes nothing, 
it still insists on saving it to/from the stack via [fp].


Now this means there's something wrong here; either it's lucky that my 
compiler creates the OMAP (armv7) code that it does and the 6450 (armv6) 
is actually correct in assuming this reg/mem location is available/usable 
for it.

Or there's a problem with the ARM11xx / armv6 code generation.


I'm tending towards believing that the OMAP result is luck, and to make 
sure the stack switch fools the entire ABI convention into working the 
stackframe setup needs to initialize more than just SP.


I can hack around the issue by explicitly initializing FP via:

         __asm__ __volatile__("add fp, %0, #4\n\t" : "+sp"(sp));

instead, but that's sneaky in the sense that I'm not telling the compiler 
I've done so (adding "fp" to the clobber fails compile again). Also, while 
this specific incantation uses one word, how can I be sure that that'll be 
it and no other gcc version will pull more [fp] usage out of some hat ?

Note that while GCC chose to use [fp], it hasn't actually put proper 
offsets in; it uses the same [fp] for both 'pbe' and the zero return 
value; that's definitely not ok, but again is an artifact of the __naked.



Another option I've found is:

int __naked swsusp_arch_resume(void)
{
         register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
         __asm__ __volatile__("" : "+sp"(sp));

         return resume_internal();
}

and make resume_internal() the actual function, non-naked; this creates a 
normal prologue which sets up FP. resume_internal() cannot be made static, 
though, or else the compiler inlines it - without the prologues, and it's 
back to #1^2.



Re, the use of CONFIG_FTRACE: Just switching that off on S5P6450 results 
in using [sp #4] instead of [fp]. Still incorrect code (albeit it works 
out of the box), and still not possible to put "fp" into the clobber list.




I am wondering if there's a way to convince gcc to do the right thing 
here, __naked notwithstanding.

Is the use of a naked "bounce function" for this specific purpose (switch 
the stackpointer) acceptable, and guaranteed to create the correct code ?


It'd be nice not having to dump this piece into assembly...



Thanks for any ideas !
FrankH.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
  2011-05-18 15:25 Question on ARM inline assembly - stackframes, sp/fp usage, __naked Frank Hofmann
@ 2011-05-18 16:06 ` Frank Hofmann
  2011-05-18 16:50 ` Mikael Pettersson
  2011-05-18 17:45 ` Dave Martin
  2 siblings, 0 replies; 8+ messages in thread
From: Frank Hofmann @ 2011-05-18 16:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,


I've found that the key difference to code generation in this case seems 
to be the use of the register clobber list. While gcc targeting armv7 / 
omap3 looks to be pretty happy about that matching reality (it still 
allocates local vars to registers alright - i.e. it recoginzes that the 
clobber happens after), that's not the case for armv6 / s5p6450. On those, 
seeing the r0-r15 clobber list, gcc insists on having "one tiny reg, just 
one little teeny tinsy reg for itself", with FP as last resort ... and the 
shown consequences for code generation if one lets fp through.


Now since the inline asm statement is finalizing the function anyway, 
there's no real harm in not specifying a clobber list at all
there's never anything after it that could depend on saved state.
Doing so makes the generated armv6 code use the same regs as the armv7 
does.

Is it, for the sake of fooling gcc under all circumstances, necessary to 
go via the naked trampoline ?


FrankH.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
  2011-05-18 15:25 Question on ARM inline assembly - stackframes, sp/fp usage, __naked Frank Hofmann
  2011-05-18 16:06 ` Frank Hofmann
@ 2011-05-18 16:50 ` Mikael Pettersson
  2011-05-18 17:45 ` Dave Martin
  2 siblings, 0 replies; 8+ messages in thread
From: Mikael Pettersson @ 2011-05-18 16:50 UTC (permalink / raw)
  To: linux-arm-kernel

Frank Hofmann writes:
 > with the work on the hibernation support code, I've managed to make it all 
 > C-with-inline-asm. This happens to work fine on OMAP3, but there's an 
 > issue when I build it for Samsung 64xx. It's about this code:
 > 
 > 
 > int __naked swsusp_arch_resume(void)
 > {
 >  	register struct pbe *pbe;
 >  	register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
 >  	register void *saved_ctx asm ("r0") = ctx;
 > 
 >  	__asm__ __volatile__("" : "+sp"(sp));		/* don't optimize away ! */
...
 > This has the special requirement to run on a separate stack, because the 
 > copy loop will blot over whatever there is. Clobbering everything is part 
 > of the design/intent.
...
 > I am wondering if there's a way to convince gcc to do the right thing 
 > here, __naked notwithstanding.
 > 
 > Is the use of a naked "bounce function" for this specific purpose (switch 
 > the stackpointer) acceptable, and guaranteed to create the correct code ?
 > 
 > 
 > It'd be nice not having to dump this piece into assembly...

Just write the stack switching fragment in proper assembly, and
keep both sides of it in normal C.  End of problem.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
  2011-05-18 15:25 Question on ARM inline assembly - stackframes, sp/fp usage, __naked Frank Hofmann
  2011-05-18 16:06 ` Frank Hofmann
  2011-05-18 16:50 ` Mikael Pettersson
@ 2011-05-18 17:45 ` Dave Martin
  2011-05-19  8:46   ` Frank Hofmann
  2 siblings, 1 reply; 8+ messages in thread
From: Dave Martin @ 2011-05-18 17:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 18, 2011 at 04:25:15PM +0100, Frank Hofmann wrote:
> Hi,
> 
> 
> with the work on the hibernation support code, I've managed to make
> it all C-with-inline-asm. This happens to work fine on OMAP3, but
> there's an issue when I build it for Samsung 64xx. It's about this
> code:
> 
> 
> int __naked swsusp_arch_resume(void)
> {
> 	register struct pbe *pbe;
> 	register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;

(Why - 8?)

> 	register void *saved_ctx asm ("r0") = ctx;
> 
> 	__asm__ __volatile__("" : "+sp"(sp));		/* don't optimize away ! */

Is this an attempt to force gcc to change sp before the subsequent loop?
I think you don't mean "+sp".  Something like "+r" might be appropriate.

This feels subtle and fragile to me...

> 
> 	for (pbe = restore_pblist; pbe; pbe = pbe->next)
> 		copy_page(pbe->orig_address, pbe->address);
> 
> 	__asm__ __volatile__(
> 		"mov	r1, #" __stringify(SYSTEM_MODE) "\n\t"
> 		"msr	cpsr_c, r1\n\t"
> 		"ldr	r1, [r0], #4\n\t"
> 		"msr	cpsr_cxsf, r1\n\t"
> 		"ldm	r0!, {r4-r12,lr}\n\t"
> 		"ldr	sp, [r0], #4\n\t"
> 		"mov	r2, #" __stringify(SVC_MODE) "\n\t"
> 		"msr	cpsr_c, r2\n\t"
> 		"ldm	r0!, {r2, lr}\n\t"
> 		"ldr	sp, [r0], #4\n\t"
> 		"msr	spsr_cxsf, r2\n\t"
> 		"msr	cpsr, r1\n\t"
> 		"mov	r1, #0\n\t"
> 		"push	{r1, lr}\n\t"
> 		"bl	__restore_processor_state\n\t"
> 		"pop	{r0, pc}\n\t"
> 		: "+r"(saved_ctx)
> 		:
> 		: "memory", "cc",
> 		"r1", "r2", "r3", "r4", "r5", "r6", "r7",
> 		"r8", "r9", "r10", "ip", "fp" "sp", "lr", "pc");
> 
> 	return 0;
> }
> 
> 
> This has the special requirement to run on a separate stack, because
> the copy loop will blot over whatever there is. Clobbering
> everything is part of the design/intent.
> 
> 
> 
> On OMAP (where I compile _without_ CONFIG_FTRACE, but even when
> switching it on there's no change), this creates the following
> assembly, which is ok:
> 
> 00005bb0 <swsusp_arch_resume>:
>     5bb0:       e59fd074        ldr     sp, [pc, #116]  ; 5c2c <swsusp_arch_resume+0x7c>
>     5bb4:       e3003000        movw    r3, #0  ; 0x0
>     5bb8:       e3403000        movt    r3, #0  ; 0x0
>     5bbc:       e5934000        ldr     r4, [r3]
>     5bc0:       e3540000        cmp     r4, #0  ; 0x0
>     5bc4:       0a000005        beq     5be0 <swsusp_arch_resume+0x30>
>     5bc8:       e5940004        ldr     r0, [r4, #4]
>     5bcc:       e5941000        ldr     r1, [r4]
>     5bd0:       ebfffffe        bl      162b80 <copy_page>
>     5bd4:       e5944008        ldr     r4, [r4, #8]
>     5bd8:       e3540000        cmp     r4, #0  ; 0x0
>     5bdc:       1afffff9        bne     5bc8 <swsusp_arch_resume+0x18>
>     5be0:       e3000000        movw    r0, #0  ; 0x0
>     5be4:       e3400000        movt    r0, #0  ; 0x0
>     5be8:       e3a0101f        mov     r1, #31 ; 0x1f
>     5bec:       e121f001        msr     CPSR_c, r1
> [ ... inline as above follows ... ]
>     5c1c:       e92d4002        push    {r1, lr}
>     5c20:       ebfffffe        bl      1538c <__restore_processor_state>
>     5c24:       e8bd8001        pop     {r0, pc}
>     5c28:       e3a00000        mov     r0, #0  ; 0x0
>     5c2c:       00000ff8        .word   0x00000ff8
> 
> Notice how the for() loop has been coded into using R4.
> 
> 
> 
> When I build it for S5P64xx though (where we have CONFIG_FTRACE, but
> see below), I get a compile error (same gcc, just different kernel
> defconfig target, ARM11xx instead of Cortex-A8):
> 
> arch/arm/kernel/hibernate.c: In function 'swsusp_arch_resume':
> arch/arm/kernel/hibernate.c:136: error: fp cannot be used in asm here

Probably this is a side-effect of building for ARM versus Thumb-2.
ARM1176 only supports ARM, and fp is used as the framepointer when
building a kernel in ARM.

Building that file with -fomit-frame-pointer avoids the error,
but that's best avoided unless there's a really good reason.

You would get the same problem when building for Thumb-2 if you pass
-fno-omit-frame-pointer to the compiler and attempt to clobber r7
(by default, Thumb-2 code has no framepointer).

> 
> That's the '}' of the function. Strange enough ... the compiler
> wants to do something with FP and isn't willing to give that up.
> 
> 
> If I take "fp" out of the clobber list then it compiles.
> 
> But the generated assembly code is incorrect:
> 
> 0000757c <swsusp_arch_resume>:
>     757c:       e59fd078        ldr     sp, [pc, #120]  ; 75fc <swsusp_arch_resume+0x80>
>     7580:       e59f3078        ldr     r3, [pc, #120]  ; 7600 <swsusp_arch_resume+0x84>
>     7584:       e5933000        ldr     r3, [r3]
>     7588:       ea000005        b       75a4 <swsusp_arch_resume+0x28>
>     758c:       e59b3000        ldr     r3, [fp]
>     7590:       e5930004        ldr     r0, [r3, #4]
>     7594:       e5931000        ldr     r1, [r3]
>     7598:       ebfffffe        bl      1647a0 <copy_page>
>     759c:       e59b3000        ldr     r3, [fp]
>     75a0:       e5933008        ldr     r3, [r3, #8]
>     75a4:       e58b3000        str     r3, [fp]
>     75a8:       e59b3000        ldr     r3, [fp]
>     75ac:       e3530000        cmp     r3, #0  ; 0x0
>     75b0:       1afffff5        bne     758c <swsusp_arch_resume+0x10>
>     75b4:       e59f0048        ldr     r0, [pc, #72]   ; 7604 <swsusp_arch_resume+0x88>
>     75b8:       e3a0101f        mov     r1, #31 ; 0x1f
>     75bc:       e121f001        msr     CPSR_c, r1
> [ ... inline as above follows ... ]
>     75ec:       e92d4002        push    {r1, lr}
>     75f0:       ebfffffe        bl      dc58 <__restore_processor_state>
>     75f4:       e8bd8001        pop     {r0, pc}
>     75f8:       e59b0000        ldr     r0, [fp]
>     75fc:       00000ff8        .word   0x00000ff8
>     7600:       00000000        .word   0x00000000
>     7604:       00002000        .word   0x00002000
> 
> 
> This assumes FP is set up so that the local variable held in R3
> (that the OMAP compile stuffs into R4) can be stored there, to
> preserve over the function call.
> 
> Adding a register asm("r4") to the variable declaration changes
> nothing, it still insists on saving it to/from the stack via [fp].
> 
> 
> Now this means there's something wrong here; either it's lucky that
> my compiler creates the OMAP (armv7) code that it does and the 6450
> (armv6) is actually correct in assuming this reg/mem location is
> available/usable for it.
> 
> Or there's a problem with the ARM11xx / armv6 code generation.
> 
> 
> I'm tending towards believing that the OMAP result is luck, and to
> make sure the stack switch fools the entire ABI convention into
> working the stackframe setup needs to initialize more than just SP.
> 
> 
> I can hack around the issue by explicitly initializing FP via:
> 
>         __asm__ __volatile__("add fp, %0, #4\n\t" : "+sp"(sp));
> 
> instead, but that's sneaky in the sense that I'm not telling the
> compiler I've done so (adding "fp" to the clobber fails compile
> again). Also, while this specific incantation uses one word, how can
> I be sure that that'll be it and no other gcc version will pull more
> [fp] usage out of some hat ?
> 
> Note that while GCC chose to use [fp], it hasn't actually put proper
> offsets in; it uses the same [fp] for both 'pbe' and the zero return
> value; that's definitely not ok, but again is an artifact of the
> __naked.
> 
> 
> 
> Another option I've found is:
> 
> int __naked swsusp_arch_resume(void)
> {
>         register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
>         __asm__ __volatile__("" : "+sp"(sp));
> 
>         return resume_internal();
> }
> 
> and make resume_internal() the actual function, non-naked; this
> creates a normal prologue which sets up FP. resume_internal() cannot
> be made static, though, or else the compiler inlines it - without
> the prologues, and it's back to #1^2.
> 
> 
> 
> Re, the use of CONFIG_FTRACE: Just switching that off on S5P6450
> results in using [sp #4] instead of [fp]. Still incorrect code
> (albeit it works out of the box), and still not possible to put "fp"
> into the clobber list.
> 
> 
> 
> 
> I am wondering if there's a way to convince gcc to do the right
> thing here, __naked notwithstanding.

I think basically luck is the answer.  If you ask the compiler to saw
its own head off it may occasionally make a neat job of it, but don't
bet on that happening every time...

> 
> Is the use of a naked "bounce function" for this specific purpose
> (switch the stackpointer) acceptable, and guaranteed to create the
> correct code ?
> 
> 
> It'd be nice not having to dump this piece into assembly...

Using out-of-line assembler avoids the ABI issues, and is often
more readable IMHO.

On Wed, May 18, 2011 at 06:50:07PM +0200, Mikael Pettersson wrote:
[...]
> Just write the stack switching fragment in proper assembly, and
> keep both sides of it in normal C.  End of problem.

Indeed, though there is only a little section of C, in the middle
of some assembler which is not implementable in C.  


Maybe something like this:


ENTRY(swsusp_arch_resume)
	ldr	sp, resume_sp
	bl	__swsusp_arch_resume_copy_pages
	/* the rest of your code code goes here ... */

.align 2
resume_sp: .long swsusp_resume_stack + PAGE_SIZE - 8
ENDPROC(swsusp_arch_resume)


Then in C:

asmlinkage void __swsusp_arch_resume_copy_pages(void)
{
 	struct pbe *pbe;

 	for (pbe = restore_pblist; pbe; pbe = pbe->next)
 		copy_page(pbe->orig_address, pbe->address);
}


There's now nothing for the compiler to get wrong...

Cheers
---Dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
  2011-05-18 17:45 ` Dave Martin
@ 2011-05-19  8:46   ` Frank Hofmann
  2011-05-19  9:47     ` Dave Martin
  2011-05-19 11:08     ` Måns Rullgård
  0 siblings, 2 replies; 8+ messages in thread
From: Frank Hofmann @ 2011-05-19  8:46 UTC (permalink / raw)
  To: linux-arm-kernel



On Wed, 18 May 2011, Dave Martin wrote:

> On Wed, May 18, 2011 at 04:25:15PM +0100, Frank Hofmann wrote:
>> Hi,
>>
>>
>> with the work on the hibernation support code, I've managed to make
>> it all C-with-inline-asm. This happens to work fine on OMAP3, but
>> there's an issue when I build it for Samsung 64xx. It's about this
>> code:
>>
>>
>> int __naked swsusp_arch_resume(void)
>> {
>> 	register struct pbe *pbe;
>> 	register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
>
> (Why - 8?)

Sorry, my mistake, thought PUSH is post-decrement (empty descending). From 
that assumption and the 8-byte-alignment-at-call, the offset were 
necessary. You're right it's not needed.
Thanks for spotting.

>
>> 	register void *saved_ctx asm ("r0") = ctx;
>>
>> 	__asm__ __volatile__("" : "+sp"(sp));		/* don't optimize away ! */
>
> Is this an attempt to force gcc to change sp before the subsequent loop?

That's exactly what it does.

> I think you don't mean "+sp".  Something like "+r" might be appropriate.
>
> This feels subtle and fragile to me.

Looked this up; again you're right the direct use of "rXX" (or the higher 
reg names) for gcc ARM inline assembly input/output constraints is 
undocumented, but the "Extended ASM" section in the manual,

 	http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

states that the way to force a variable into a specific register is:

 	register <type> variable asm("register") = ...;

and then to make sure it happens, to create an empty-asm-with-constraint 
that forces a register write,

 	asm volatile("" : "+r"(variable));

For the paranoid, one could do:

 	asm volatile("mov sp, %0\n\t" : "+r"(variable) : : "sp");

which is, unlike the use of "sp" as input/output register, actually 
documented explicitly in the "Extended ASM" section of gcc's manual, 
quote:

 	If you refer to a particular hardware register from the
 	assembler code, you will probably have to list the register
 	after the third colon to tell the compiler the register's
 	value is modified. In some assemblers, the register names
 	begin with `%'; to produce one `%' in the assembler code,
 	you must write `%%' in the input.

(That's the only place where the doc acknowledges gcc's inline assembler's 
knowledge of machine register names as constraints - in the clobber list)

[ ... ]
> Probably this is a side-effect of building for ARM versus Thumb-2.
> ARM1176 only supports ARM, and fp is used as the framepointer when
> building a kernel in ARM.

Both targets in my compile are ARM, but yes you're right framepointer or 
no is a difference in the two defconfigs.

>
> Building that file with -fomit-frame-pointer avoids the error,
> but that's best avoided unless there's a really good reason.

The very specific purpose of this function ... it'd never be possible to 
backtrace over it in any case.

>
> You would get the same problem when building for Thumb-2 if you pass
> -fno-omit-frame-pointer to the compiler and attempt to clobber r7
> (by default, Thumb-2 code has no framepointer).
>
[ ... ]
>> I am wondering if there's a way to convince gcc to do the right
>> thing here, __naked notwithstanding.
>
> I think basically luck is the answer.  If you ask the compiler to saw
> its own head off it may occasionally make a neat job of it, but don't
> bet on that happening every time...

The head-sawing-off piece works fine (substituting the stackpointer), but 
to convince it that it shouldn't attempt to facepalm itself afterwards 
looks like a bit harder ;-)


>
>>
>> Is the use of a naked "bounce function" for this specific purpose
>> (switch the stackpointer) acceptable, and guaranteed to create the
>> correct code ?
>>
>>
>> It'd be nice not having to dump this piece into assembly...
>
> Using out-of-line assembler avoids the ABI issues, and is often
> more readable IMHO.
>
> On Wed, May 18, 2011 at 06:50:07PM +0200, Mikael Pettersson wrote:
> [...]
>> Just write the stack switching fragment in proper assembly, and
>> keep both sides of it in normal C.  End of problem.
>
> Indeed, though there is only a little section of C, in the middle
> of some assembler which is not implementable in C.

The stack switching segment isn't the problem ... it's the C bit, as long 
as it lives in the same func.

>
>
> Maybe something like this:
>
>
> ENTRY(swsusp_arch_resume)
> 	ldr	sp, resume_sp
> 	bl	__swsusp_arch_resume_copy_pages
> 	/* the rest of your code code goes here ... */
>
> .align 2
> resume_sp: .long swsusp_resume_stack + PAGE_SIZE - 8
> ENDPROC(swsusp_arch_resume)

Yow, morale of the exercise, I guess: It's possible to shoehorn GCC into 
doing what I want, but the shoe isn't a good fit ;-)

I'm still not convinced why:

void __naked swsusp_arch_resume(void)
{
 	register void *sp asm("sp") = ...;
 	asm volatile("" : "+r"(sp));
 	__swsusp_arch_resume_copy_pages();
 	__swsusp_arch_resume_cpu();
}

with two explicit "noinline" funcs wouldn't be ok.

>
>
> Then in C:
>
> asmlinkage void __swsusp_arch_resume_copy_pages(void)
> {
> 	struct pbe *pbe;
>
> 	for (pbe = restore_pblist; pbe; pbe = pbe->next)
> 		copy_page(pbe->orig_address, pbe->address);
> }
>
>
> There's now nothing for the compiler to get wrong...

Maybe I'm trying too hard ;-) all other architectures supporting 
suspend-to-disk haven't dared to put this thing into C ... even though 
they all do this very operation as well. But then, none of the other 
swsusp_arch_resume elsewhere has done the stackpointer switch either.

>
> Cheers
> ---Dave
>
>


Thanks for the extensive help !
FrankH.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
  2011-05-19  8:46   ` Frank Hofmann
@ 2011-05-19  9:47     ` Dave Martin
  2011-05-19 15:25       ` Frank Hofmann
  2011-05-19 11:08     ` Måns Rullgård
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Martin @ 2011-05-19  9:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 19, 2011 at 09:46:30AM +0100, Frank Hofmann wrote:
> 
> 
> On Wed, 18 May 2011, Dave Martin wrote:
> 
> >On Wed, May 18, 2011 at 04:25:15PM +0100, Frank Hofmann wrote:
> >>Hi,
> >>
> >>
> >>with the work on the hibernation support code, I've managed to make
> >>it all C-with-inline-asm. This happens to work fine on OMAP3, but
> >>there's an issue when I build it for Samsung 64xx. It's about this
> >>code:
> >>
> >>
> >>int __naked swsusp_arch_resume(void)
> >>{
> >>	register struct pbe *pbe;
> >>	register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
> >
> >(Why - 8?)
> 
> Sorry, my mistake, thought PUSH is post-decrement (empty
> descending). From that assumption and the 8-byte-alignment-at-call,
> the offset were necessary. You're right it's not needed.
> Thanks for spotting.

OK -- I thoughr you might be using the top two words for something else.

push is just the same as stmfd sp!,

> 
> >
> >>	register void *saved_ctx asm ("r0") = ctx;
> >>
> >>	__asm__ __volatile__("" : "+sp"(sp));		/* don't optimize away ! */
> >
> >Is this an attempt to force gcc to change sp before the subsequent loop?
> 
> That's exactly what it does.
> 
> >I think you don't mean "+sp".  Something like "+r" might be appropriate.
> >
> >This feels subtle and fragile to me.
> 
> Looked this up; again you're right the direct use of "rXX" (or the
> higher reg names) for gcc ARM inline assembly input/output
> constraints is undocumented, but the "Extended ASM" section in the
> manual,
> 
> 	http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
> 
> states that the way to force a variable into a specific register is:
> 
> 	register <type> variable asm("register") = ...;
> 
> and then to make sure it happens, to create an
> empty-asm-with-constraint that forces a register write,
> 
> 	asm volatile("" : "+r"(variable));
> 
> For the paranoid, one could do:
> 
> 	asm volatile("mov sp, %0\n\t" : "+r"(variable) : : "sp");
> 
> which is, unlike the use of "sp" as input/output register, actually
> documented explicitly in the "Extended ASM" section of gcc's manual,
> quote:
> 
> 	If you refer to a particular hardware register from the
> 	assembler code, you will probably have to list the register
> 	after the third colon to tell the compiler the register's
> 	value is modified. In some assemblers, the register names
> 	begin with `%'; to produce one `%' in the assembler code,
> 	you must write `%%' in the input.
> 
> (That's the only place where the doc acknowledges gcc's inline
> assembler's knowledge of machine register names as constraints - in
> the clobber list)

I think the clobber list is a different thing from the constraint
lists: in the clobber list, you can only have register names and a few
special things like "cc" and "memory".

In the constraint lists, you can only have constraints, not register
names.  On some arches, there are constraints which map to a specific
register (such as the "a", "b", "c", "d" constraints on x86), but
I don't think there's anything like this for ARM.

As you observe, the register ... asm("rX") declarations are needed
instead to map arguments to specific registers.

> 
> [ ... ]
> >Probably this is a side-effect of building for ARM versus Thumb-2.
> >ARM1176 only supports ARM, and fp is used as the framepointer when
> >building a kernel in ARM.
> 
> Both targets in my compile are ARM, but yes you're right
> framepointer or no is a difference in the two defconfigs.
> 
> >
> >Building that file with -fomit-frame-pointer avoids the error,
> >but that's best avoided unless there's a really good reason.
> 
> The very specific purpose of this function ... it'd never be
> possible to backtrace over it in any case.

Yep, that was my assumption.  If a fault happens in this code, the
battle is probably already lost.

> 
> >
> >You would get the same problem when building for Thumb-2 if you pass
> >-fno-omit-frame-pointer to the compiler and attempt to clobber r7
> >(by default, Thumb-2 code has no framepointer).
> >
> [ ... ]
> >>I am wondering if there's a way to convince gcc to do the right
> >>thing here, __naked notwithstanding.
> >
> >I think basically luck is the answer.  If you ask the compiler to saw
> >its own head off it may occasionally make a neat job of it, but don't
> >bet on that happening every time...
> 
> The head-sawing-off piece works fine (substituting the
> stackpointer), but to convince it that it shouldn't attempt to
> facepalm itself afterwards looks like a bit harder ;-)

Well, no analogy is perfect ;)

> 
> 
> >
> >>
> >>Is the use of a naked "bounce function" for this specific purpose
> >>(switch the stackpointer) acceptable, and guaranteed to create the
> >>correct code ?
> >>
> >>
> >>It'd be nice not having to dump this piece into assembly...
> >
> >Using out-of-line assembler avoids the ABI issues, and is often
> >more readable IMHO.
> >
> >On Wed, May 18, 2011 at 06:50:07PM +0200, Mikael Pettersson wrote:
> >[...]
> >>Just write the stack switching fragment in proper assembly, and
> >>keep both sides of it in normal C.  End of problem.
> >
> >Indeed, though there is only a little section of C, in the middle
> >of some assembler which is not implementable in C.
> 
> The stack switching segment isn't the problem ... it's the C bit, as
> long as it lives in the same func.
> 
> >
> >
> >Maybe something like this:
> >
> >
> >ENTRY(swsusp_arch_resume)
> >	ldr	sp, resume_sp
> >	bl	__swsusp_arch_resume_copy_pages
> >	/* the rest of your code code goes here ... */
> >
> >.align 2
> >resume_sp: .long swsusp_resume_stack + PAGE_SIZE - 8
> >ENDPROC(swsusp_arch_resume)
> 
> Yow, morale of the exercise, I guess: It's possible to shoehorn GCC
> into doing what I want, but the shoe isn't a good fit ;-)

> 
> I'm still not convinced why:
> 
> void __naked swsusp_arch_resume(void)
> {
> 	register void *sp asm("sp") = ...;
> 	asm volatile("" : "+r"(sp));
> 	__swsusp_arch_resume_copy_pages();
> 	__swsusp_arch_resume_cpu();
> }
> 
> with two explicit "noinline" funcs wouldn't be ok.

As I said, it may work, today.  It just feels more complex and
failure-prone, because it reiles on details of the compiler behaviour.
If doing this provided a real benefit, like better maintainability or
performance, then it would be worth trying.  But I feel we don't get
any real advantages in this case.

That's just my opinion, but I know from experience how hard it is to get
compiler guys on your side when this kind of hack goes wrong...

> 
> >
> >
> >Then in C:
> >
> >asmlinkage void __swsusp_arch_resume_copy_pages(void)
> >{
> >	struct pbe *pbe;
> >
> >	for (pbe = restore_pblist; pbe; pbe = pbe->next)
> >		copy_page(pbe->orig_address, pbe->address);
> >}
> >
> >
> >There's now nothing for the compiler to get wrong...
> 
> Maybe I'm trying too hard ;-) all other architectures supporting
> suspend-to-disk haven't dared to put this thing into C ... even
> though they all do this very operation as well. But then, none of
> the other swsusp_arch_resume elsewhere has done the stackpointer
> switch either.

Well, the more we can put in C the better, because C code is easier to
maintain and reuse -- hence it's good to be calling copy_pages() instead
of re-coding this in the assembly code.  But there's a difference between
coding in C and shoehorning assembler into a .c file.

For totally arch-specific parts like saving/restoring the processor
state, at least some assembler is unavoidable...

Cheers
---Dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
  2011-05-19  8:46   ` Frank Hofmann
  2011-05-19  9:47     ` Dave Martin
@ 2011-05-19 11:08     ` Måns Rullgård
  1 sibling, 0 replies; 8+ messages in thread
From: Måns Rullgård @ 2011-05-19 11:08 UTC (permalink / raw)
  To: linux-arm-kernel

Frank Hofmann <frank.hofmann@tomtom.com> writes:

>>> 	register void *saved_ctx asm ("r0") = ctx;
>>>
>>> 	__asm__ __volatile__("" : "+sp"(sp));		/* don't optimize away ! */
>>
>> Is this an attempt to force gcc to change sp before the subsequent loop?
>
> That's exactly what it does.
>
>> I think you don't mean "+sp".  Something like "+r" might be appropriate.
>>
>> This feels subtle and fragile to me.
>
> Looked this up; again you're right the direct use of "rXX" (or the
> higher reg names) for gcc ARM inline assembly input/output constraints
> is undocumented, but the "Extended ASM" section in the manual,
>
> 	http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
>
> states that the way to force a variable into a specific register is:
>
> 	register <type> variable asm("register") = ...;
>
> and then to make sure it happens, to create an
> empty-asm-with-constraint that forces a register write,
>
> 	asm volatile("" : "+r"(variable));

This is not how gcc inline asm works.  A variable declared with an
asm("rXX") tag will be placed in that register when used as a parameter
of an asm code statement.  Outside that statement no promises are made.
It is possible, and it does happen, that gcc moves the value to the
requested register immediately before asm statement and moves it back
where it came from immediately after.

> For the paranoid, one could do:
>
> 	asm volatile("mov sp, %0\n\t" : "+r"(variable) : : "sp");
>
> which is, unlike the use of "sp" as input/output register, actually
> documented explicitly in the "Extended ASM" section of gcc's manual,
> quote:
>
> 	If you refer to a particular hardware register from the
> 	assembler code, you will probably have to list the register
> 	after the third colon to tell the compiler the register's
> 	value is modified. In some assemblers, the register names
> 	begin with `%'; to produce one `%' in the assembler code,
> 	you must write `%%' in the input.
>
> (That's the only place where the doc acknowledges gcc's inline
> assembler's knowledge of machine register names as constraints - in
> the clobber list)

The clobber list enumerates all explicitly named registers (not from
parameters) used in the code section, and the compiler will assume these
contain garbage after the asm statement, reloading them from memory or
otherwise repopulating them with good values.  Whatever values they were
given in the asm statement are lost.

>>> Is the use of a naked "bounce function" for this specific purpose
>>> (switch the stackpointer) acceptable, and guaranteed to create the
>>> correct code ?
>>>
>>>
>>> It'd be nice not having to dump this piece into assembly...
>>
>> Using out-of-line assembler avoids the ABI issues, and is often
>> more readable IMHO.
>>
>> On Wed, May 18, 2011 at 06:50:07PM +0200, Mikael Pettersson wrote:
>> [...]
>>> Just write the stack switching fragment in proper assembly, and
>>> keep both sides of it in normal C.  End of problem.
>>
>> Indeed, though there is only a little section of C, in the middle
>> of some assembler which is not implementable in C.
>
> The stack switching segment isn't the problem ... it's the C bit, as
> long as it lives in the same func.

A stack switching trampoline in pure asm seems to me the cleaner
solution by far.

> Yow, morale of the exercise, I guess: It's possible to shoehorn GCC
> into doing what I want, but the shoe isn't a good fit ;-)
>
> I'm still not convinced why:
>
> void __naked swsusp_arch_resume(void)
> {
> 	register void *sp asm("sp") = ...;
> 	asm volatile("" : "+r"(sp));
> 	__swsusp_arch_resume_copy_pages();
> 	__swsusp_arch_resume_cpu();
> }
>
> with two explicit "noinline" funcs wouldn't be ok.

I'm not convinced it _would_ be OK.  Messing with the stack pointer from
inline asm usually causes all kinds of bad things to happen.  Been
there, done that.

-- 
M?ns Rullg?rd
mans at mansr.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
  2011-05-19  9:47     ` Dave Martin
@ 2011-05-19 15:25       ` Frank Hofmann
  0 siblings, 0 replies; 8+ messages in thread
From: Frank Hofmann @ 2011-05-19 15:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 19 May 2011, Dave Martin wrote:

> On Thu, May 19, 2011 at 09:46:30AM +0100, Frank Hofmann wrote:
>>
>>
>> On Wed, 18 May 2011, Dave Martin wrote:
>>
>>> On Wed, May 18, 2011 at 04:25:15PM +0100, Frank Hofmann wrote:
>>>> Hi,
[ ... ]
>
> As I said, it may work, today.  It just feels more complex and
> failure-prone, because it reiles on details of the compiler behaviour.
> If doing this provided a real benefit, like better maintainability or
> performance, then it would be worth trying.  But I feel we don't get
> any real advantages in this case.
>
> That's just my opinion, but I know from experience how hard it is to get
> compiler guys on your side when this kind of hack goes wrong...

That bit of _experience_ is what I'm still building up, on ARM ... a great 
thanks to you, M?ns and Mikael for providing that.

GCC is capricious wrt. to inline assembly, I understand now.

[ ... ]
> For totally arch-specific parts like saving/restoring the processor
> state, at least some assembler is unavoidable...

I'll keep the two files (one for the pure assembly, one for C only 
parts) for now.

Best wishes,
FrankH.

>
> Cheers
> ---Dave
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-05-19 15:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-18 15:25 Question on ARM inline assembly - stackframes, sp/fp usage, __naked Frank Hofmann
2011-05-18 16:06 ` Frank Hofmann
2011-05-18 16:50 ` Mikael Pettersson
2011-05-18 17:45 ` Dave Martin
2011-05-19  8:46   ` Frank Hofmann
2011-05-19  9:47     ` Dave Martin
2011-05-19 15:25       ` Frank Hofmann
2011-05-19 11:08     ` Måns Rullgård

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.