* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
@ 2011-05-18 15:25 Frank Hofmann
2011-05-18 16:06 ` Frank Hofmann
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Frank Hofmann @ 2011-05-18 15:25 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
with the work on the hibernation support code, I've managed to make it all
C-with-inline-asm. This happens to work fine on OMAP3, but there's an
issue when I build it for Samsung 64xx. It's about this code:
int __naked swsusp_arch_resume(void)
{
register struct pbe *pbe;
register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
register void *saved_ctx asm ("r0") = ctx;
__asm__ __volatile__("" : "+sp"(sp)); /* don't optimize away ! */
for (pbe = restore_pblist; pbe; pbe = pbe->next)
copy_page(pbe->orig_address, pbe->address);
__asm__ __volatile__(
"mov r1, #" __stringify(SYSTEM_MODE) "\n\t"
"msr cpsr_c, r1\n\t"
"ldr r1, [r0], #4\n\t"
"msr cpsr_cxsf, r1\n\t"
"ldm r0!, {r4-r12,lr}\n\t"
"ldr sp, [r0], #4\n\t"
"mov r2, #" __stringify(SVC_MODE) "\n\t"
"msr cpsr_c, r2\n\t"
"ldm r0!, {r2, lr}\n\t"
"ldr sp, [r0], #4\n\t"
"msr spsr_cxsf, r2\n\t"
"msr cpsr, r1\n\t"
"mov r1, #0\n\t"
"push {r1, lr}\n\t"
"bl __restore_processor_state\n\t"
"pop {r0, pc}\n\t"
: "+r"(saved_ctx)
:
: "memory", "cc",
"r1", "r2", "r3", "r4", "r5", "r6", "r7",
"r8", "r9", "r10", "ip", "fp" "sp", "lr", "pc");
return 0;
}
This has the special requirement to run on a separate stack, because the
copy loop will blot over whatever there is. Clobbering everything is part
of the design/intent.
On OMAP (where I compile _without_ CONFIG_FTRACE, but even when
switching it on there's no change), this creates the following assembly,
which is ok:
00005bb0 <swsusp_arch_resume>:
5bb0: e59fd074 ldr sp, [pc, #116] ; 5c2c <swsusp_arch_resume+0x7c>
5bb4: e3003000 movw r3, #0 ; 0x0
5bb8: e3403000 movt r3, #0 ; 0x0
5bbc: e5934000 ldr r4, [r3]
5bc0: e3540000 cmp r4, #0 ; 0x0
5bc4: 0a000005 beq 5be0 <swsusp_arch_resume+0x30>
5bc8: e5940004 ldr r0, [r4, #4]
5bcc: e5941000 ldr r1, [r4]
5bd0: ebfffffe bl 162b80 <copy_page>
5bd4: e5944008 ldr r4, [r4, #8]
5bd8: e3540000 cmp r4, #0 ; 0x0
5bdc: 1afffff9 bne 5bc8 <swsusp_arch_resume+0x18>
5be0: e3000000 movw r0, #0 ; 0x0
5be4: e3400000 movt r0, #0 ; 0x0
5be8: e3a0101f mov r1, #31 ; 0x1f
5bec: e121f001 msr CPSR_c, r1
[ ... inline as above follows ... ]
5c1c: e92d4002 push {r1, lr}
5c20: ebfffffe bl 1538c <__restore_processor_state>
5c24: e8bd8001 pop {r0, pc}
5c28: e3a00000 mov r0, #0 ; 0x0
5c2c: 00000ff8 .word 0x00000ff8
Notice how the for() loop has been coded into using R4.
When I build it for S5P64xx though (where we have CONFIG_FTRACE, but see
below), I get a compile error (same gcc, just different kernel defconfig
target, ARM11xx instead of Cortex-A8):
arch/arm/kernel/hibernate.c: In function 'swsusp_arch_resume':
arch/arm/kernel/hibernate.c:136: error: fp cannot be used in asm here
That's the '}' of the function. Strange enough ... the compiler wants to
do something with FP and isn't willing to give that up.
If I take "fp" out of the clobber list then it compiles.
But the generated assembly code is incorrect:
0000757c <swsusp_arch_resume>:
757c: e59fd078 ldr sp, [pc, #120] ; 75fc <swsusp_arch_resume+0x80>
7580: e59f3078 ldr r3, [pc, #120] ; 7600 <swsusp_arch_resume+0x84>
7584: e5933000 ldr r3, [r3]
7588: ea000005 b 75a4 <swsusp_arch_resume+0x28>
758c: e59b3000 ldr r3, [fp]
7590: e5930004 ldr r0, [r3, #4]
7594: e5931000 ldr r1, [r3]
7598: ebfffffe bl 1647a0 <copy_page>
759c: e59b3000 ldr r3, [fp]
75a0: e5933008 ldr r3, [r3, #8]
75a4: e58b3000 str r3, [fp]
75a8: e59b3000 ldr r3, [fp]
75ac: e3530000 cmp r3, #0 ; 0x0
75b0: 1afffff5 bne 758c <swsusp_arch_resume+0x10>
75b4: e59f0048 ldr r0, [pc, #72] ; 7604 <swsusp_arch_resume+0x88>
75b8: e3a0101f mov r1, #31 ; 0x1f
75bc: e121f001 msr CPSR_c, r1
[ ... inline as above follows ... ]
75ec: e92d4002 push {r1, lr}
75f0: ebfffffe bl dc58 <__restore_processor_state>
75f4: e8bd8001 pop {r0, pc}
75f8: e59b0000 ldr r0, [fp]
75fc: 00000ff8 .word 0x00000ff8
7600: 00000000 .word 0x00000000
7604: 00002000 .word 0x00002000
This assumes FP is set up so that the local variable held in R3 (that the
OMAP compile stuffs into R4) can be stored there, to preserve over the
function call.
Adding a register asm("r4") to the variable declaration changes nothing,
it still insists on saving it to/from the stack via [fp].
Now this means there's something wrong here; either it's lucky that my
compiler creates the OMAP (armv7) code that it does and the 6450 (armv6)
is actually correct in assuming this reg/mem location is available/usable
for it.
Or there's a problem with the ARM11xx / armv6 code generation.
I'm tending towards believing that the OMAP result is luck, and to make
sure the stack switch fools the entire ABI convention into working the
stackframe setup needs to initialize more than just SP.
I can hack around the issue by explicitly initializing FP via:
__asm__ __volatile__("add fp, %0, #4\n\t" : "+sp"(sp));
instead, but that's sneaky in the sense that I'm not telling the compiler
I've done so (adding "fp" to the clobber fails compile again). Also, while
this specific incantation uses one word, how can I be sure that that'll be
it and no other gcc version will pull more [fp] usage out of some hat ?
Note that while GCC chose to use [fp], it hasn't actually put proper
offsets in; it uses the same [fp] for both 'pbe' and the zero return
value; that's definitely not ok, but again is an artifact of the __naked.
Another option I've found is:
int __naked swsusp_arch_resume(void)
{
register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
__asm__ __volatile__("" : "+sp"(sp));
return resume_internal();
}
and make resume_internal() the actual function, non-naked; this creates a
normal prologue which sets up FP. resume_internal() cannot be made static,
though, or else the compiler inlines it - without the prologues, and it's
back to #1^2.
Re, the use of CONFIG_FTRACE: Just switching that off on S5P6450 results
in using [sp #4] instead of [fp]. Still incorrect code (albeit it works
out of the box), and still not possible to put "fp" into the clobber list.
I am wondering if there's a way to convince gcc to do the right thing
here, __naked notwithstanding.
Is the use of a naked "bounce function" for this specific purpose (switch
the stackpointer) acceptable, and guaranteed to create the correct code ?
It'd be nice not having to dump this piece into assembly...
Thanks for any ideas !
FrankH.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
2011-05-18 15:25 Question on ARM inline assembly - stackframes, sp/fp usage, __naked Frank Hofmann
@ 2011-05-18 16:06 ` Frank Hofmann
2011-05-18 16:50 ` Mikael Pettersson
2011-05-18 17:45 ` Dave Martin
2 siblings, 0 replies; 8+ messages in thread
From: Frank Hofmann @ 2011-05-18 16:06 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
I've found that the key difference to code generation in this case seems
to be the use of the register clobber list. While gcc targeting armv7 /
omap3 looks to be pretty happy about that matching reality (it still
allocates local vars to registers alright - i.e. it recoginzes that the
clobber happens after), that's not the case for armv6 / s5p6450. On those,
seeing the r0-r15 clobber list, gcc insists on having "one tiny reg, just
one little teeny tinsy reg for itself", with FP as last resort ... and the
shown consequences for code generation if one lets fp through.
Now since the inline asm statement is finalizing the function anyway,
there's no real harm in not specifying a clobber list at all
there's never anything after it that could depend on saved state.
Doing so makes the generated armv6 code use the same regs as the armv7
does.
Is it, for the sake of fooling gcc under all circumstances, necessary to
go via the naked trampoline ?
FrankH.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
2011-05-18 15:25 Question on ARM inline assembly - stackframes, sp/fp usage, __naked Frank Hofmann
2011-05-18 16:06 ` Frank Hofmann
@ 2011-05-18 16:50 ` Mikael Pettersson
2011-05-18 17:45 ` Dave Martin
2 siblings, 0 replies; 8+ messages in thread
From: Mikael Pettersson @ 2011-05-18 16:50 UTC (permalink / raw)
To: linux-arm-kernel
Frank Hofmann writes:
> with the work on the hibernation support code, I've managed to make it all
> C-with-inline-asm. This happens to work fine on OMAP3, but there's an
> issue when I build it for Samsung 64xx. It's about this code:
>
>
> int __naked swsusp_arch_resume(void)
> {
> register struct pbe *pbe;
> register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
> register void *saved_ctx asm ("r0") = ctx;
>
> __asm__ __volatile__("" : "+sp"(sp)); /* don't optimize away ! */
...
> This has the special requirement to run on a separate stack, because the
> copy loop will blot over whatever there is. Clobbering everything is part
> of the design/intent.
...
> I am wondering if there's a way to convince gcc to do the right thing
> here, __naked notwithstanding.
>
> Is the use of a naked "bounce function" for this specific purpose (switch
> the stackpointer) acceptable, and guaranteed to create the correct code ?
>
>
> It'd be nice not having to dump this piece into assembly...
Just write the stack switching fragment in proper assembly, and
keep both sides of it in normal C. End of problem.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
2011-05-18 15:25 Question on ARM inline assembly - stackframes, sp/fp usage, __naked Frank Hofmann
2011-05-18 16:06 ` Frank Hofmann
2011-05-18 16:50 ` Mikael Pettersson
@ 2011-05-18 17:45 ` Dave Martin
2011-05-19 8:46 ` Frank Hofmann
2 siblings, 1 reply; 8+ messages in thread
From: Dave Martin @ 2011-05-18 17:45 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, May 18, 2011 at 04:25:15PM +0100, Frank Hofmann wrote:
> Hi,
>
>
> with the work on the hibernation support code, I've managed to make
> it all C-with-inline-asm. This happens to work fine on OMAP3, but
> there's an issue when I build it for Samsung 64xx. It's about this
> code:
>
>
> int __naked swsusp_arch_resume(void)
> {
> register struct pbe *pbe;
> register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
(Why - 8?)
> register void *saved_ctx asm ("r0") = ctx;
>
> __asm__ __volatile__("" : "+sp"(sp)); /* don't optimize away ! */
Is this an attempt to force gcc to change sp before the subsequent loop?
I think you don't mean "+sp". Something like "+r" might be appropriate.
This feels subtle and fragile to me...
>
> for (pbe = restore_pblist; pbe; pbe = pbe->next)
> copy_page(pbe->orig_address, pbe->address);
>
> __asm__ __volatile__(
> "mov r1, #" __stringify(SYSTEM_MODE) "\n\t"
> "msr cpsr_c, r1\n\t"
> "ldr r1, [r0], #4\n\t"
> "msr cpsr_cxsf, r1\n\t"
> "ldm r0!, {r4-r12,lr}\n\t"
> "ldr sp, [r0], #4\n\t"
> "mov r2, #" __stringify(SVC_MODE) "\n\t"
> "msr cpsr_c, r2\n\t"
> "ldm r0!, {r2, lr}\n\t"
> "ldr sp, [r0], #4\n\t"
> "msr spsr_cxsf, r2\n\t"
> "msr cpsr, r1\n\t"
> "mov r1, #0\n\t"
> "push {r1, lr}\n\t"
> "bl __restore_processor_state\n\t"
> "pop {r0, pc}\n\t"
> : "+r"(saved_ctx)
> :
> : "memory", "cc",
> "r1", "r2", "r3", "r4", "r5", "r6", "r7",
> "r8", "r9", "r10", "ip", "fp" "sp", "lr", "pc");
>
> return 0;
> }
>
>
> This has the special requirement to run on a separate stack, because
> the copy loop will blot over whatever there is. Clobbering
> everything is part of the design/intent.
>
>
>
> On OMAP (where I compile _without_ CONFIG_FTRACE, but even when
> switching it on there's no change), this creates the following
> assembly, which is ok:
>
> 00005bb0 <swsusp_arch_resume>:
> 5bb0: e59fd074 ldr sp, [pc, #116] ; 5c2c <swsusp_arch_resume+0x7c>
> 5bb4: e3003000 movw r3, #0 ; 0x0
> 5bb8: e3403000 movt r3, #0 ; 0x0
> 5bbc: e5934000 ldr r4, [r3]
> 5bc0: e3540000 cmp r4, #0 ; 0x0
> 5bc4: 0a000005 beq 5be0 <swsusp_arch_resume+0x30>
> 5bc8: e5940004 ldr r0, [r4, #4]
> 5bcc: e5941000 ldr r1, [r4]
> 5bd0: ebfffffe bl 162b80 <copy_page>
> 5bd4: e5944008 ldr r4, [r4, #8]
> 5bd8: e3540000 cmp r4, #0 ; 0x0
> 5bdc: 1afffff9 bne 5bc8 <swsusp_arch_resume+0x18>
> 5be0: e3000000 movw r0, #0 ; 0x0
> 5be4: e3400000 movt r0, #0 ; 0x0
> 5be8: e3a0101f mov r1, #31 ; 0x1f
> 5bec: e121f001 msr CPSR_c, r1
> [ ... inline as above follows ... ]
> 5c1c: e92d4002 push {r1, lr}
> 5c20: ebfffffe bl 1538c <__restore_processor_state>
> 5c24: e8bd8001 pop {r0, pc}
> 5c28: e3a00000 mov r0, #0 ; 0x0
> 5c2c: 00000ff8 .word 0x00000ff8
>
> Notice how the for() loop has been coded into using R4.
>
>
>
> When I build it for S5P64xx though (where we have CONFIG_FTRACE, but
> see below), I get a compile error (same gcc, just different kernel
> defconfig target, ARM11xx instead of Cortex-A8):
>
> arch/arm/kernel/hibernate.c: In function 'swsusp_arch_resume':
> arch/arm/kernel/hibernate.c:136: error: fp cannot be used in asm here
Probably this is a side-effect of building for ARM versus Thumb-2.
ARM1176 only supports ARM, and fp is used as the framepointer when
building a kernel in ARM.
Building that file with -fomit-frame-pointer avoids the error,
but that's best avoided unless there's a really good reason.
You would get the same problem when building for Thumb-2 if you pass
-fno-omit-frame-pointer to the compiler and attempt to clobber r7
(by default, Thumb-2 code has no framepointer).
>
> That's the '}' of the function. Strange enough ... the compiler
> wants to do something with FP and isn't willing to give that up.
>
>
> If I take "fp" out of the clobber list then it compiles.
>
> But the generated assembly code is incorrect:
>
> 0000757c <swsusp_arch_resume>:
> 757c: e59fd078 ldr sp, [pc, #120] ; 75fc <swsusp_arch_resume+0x80>
> 7580: e59f3078 ldr r3, [pc, #120] ; 7600 <swsusp_arch_resume+0x84>
> 7584: e5933000 ldr r3, [r3]
> 7588: ea000005 b 75a4 <swsusp_arch_resume+0x28>
> 758c: e59b3000 ldr r3, [fp]
> 7590: e5930004 ldr r0, [r3, #4]
> 7594: e5931000 ldr r1, [r3]
> 7598: ebfffffe bl 1647a0 <copy_page>
> 759c: e59b3000 ldr r3, [fp]
> 75a0: e5933008 ldr r3, [r3, #8]
> 75a4: e58b3000 str r3, [fp]
> 75a8: e59b3000 ldr r3, [fp]
> 75ac: e3530000 cmp r3, #0 ; 0x0
> 75b0: 1afffff5 bne 758c <swsusp_arch_resume+0x10>
> 75b4: e59f0048 ldr r0, [pc, #72] ; 7604 <swsusp_arch_resume+0x88>
> 75b8: e3a0101f mov r1, #31 ; 0x1f
> 75bc: e121f001 msr CPSR_c, r1
> [ ... inline as above follows ... ]
> 75ec: e92d4002 push {r1, lr}
> 75f0: ebfffffe bl dc58 <__restore_processor_state>
> 75f4: e8bd8001 pop {r0, pc}
> 75f8: e59b0000 ldr r0, [fp]
> 75fc: 00000ff8 .word 0x00000ff8
> 7600: 00000000 .word 0x00000000
> 7604: 00002000 .word 0x00002000
>
>
> This assumes FP is set up so that the local variable held in R3
> (that the OMAP compile stuffs into R4) can be stored there, to
> preserve over the function call.
>
> Adding a register asm("r4") to the variable declaration changes
> nothing, it still insists on saving it to/from the stack via [fp].
>
>
> Now this means there's something wrong here; either it's lucky that
> my compiler creates the OMAP (armv7) code that it does and the 6450
> (armv6) is actually correct in assuming this reg/mem location is
> available/usable for it.
>
> Or there's a problem with the ARM11xx / armv6 code generation.
>
>
> I'm tending towards believing that the OMAP result is luck, and to
> make sure the stack switch fools the entire ABI convention into
> working the stackframe setup needs to initialize more than just SP.
>
>
> I can hack around the issue by explicitly initializing FP via:
>
> __asm__ __volatile__("add fp, %0, #4\n\t" : "+sp"(sp));
>
> instead, but that's sneaky in the sense that I'm not telling the
> compiler I've done so (adding "fp" to the clobber fails compile
> again). Also, while this specific incantation uses one word, how can
> I be sure that that'll be it and no other gcc version will pull more
> [fp] usage out of some hat ?
>
> Note that while GCC chose to use [fp], it hasn't actually put proper
> offsets in; it uses the same [fp] for both 'pbe' and the zero return
> value; that's definitely not ok, but again is an artifact of the
> __naked.
>
>
>
> Another option I've found is:
>
> int __naked swsusp_arch_resume(void)
> {
> register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
> __asm__ __volatile__("" : "+sp"(sp));
>
> return resume_internal();
> }
>
> and make resume_internal() the actual function, non-naked; this
> creates a normal prologue which sets up FP. resume_internal() cannot
> be made static, though, or else the compiler inlines it - without
> the prologues, and it's back to #1^2.
>
>
>
> Re, the use of CONFIG_FTRACE: Just switching that off on S5P6450
> results in using [sp #4] instead of [fp]. Still incorrect code
> (albeit it works out of the box), and still not possible to put "fp"
> into the clobber list.
>
>
>
>
> I am wondering if there's a way to convince gcc to do the right
> thing here, __naked notwithstanding.
I think basically luck is the answer. If you ask the compiler to saw
its own head off it may occasionally make a neat job of it, but don't
bet on that happening every time...
>
> Is the use of a naked "bounce function" for this specific purpose
> (switch the stackpointer) acceptable, and guaranteed to create the
> correct code ?
>
>
> It'd be nice not having to dump this piece into assembly...
Using out-of-line assembler avoids the ABI issues, and is often
more readable IMHO.
On Wed, May 18, 2011 at 06:50:07PM +0200, Mikael Pettersson wrote:
[...]
> Just write the stack switching fragment in proper assembly, and
> keep both sides of it in normal C. End of problem.
Indeed, though there is only a little section of C, in the middle
of some assembler which is not implementable in C.
Maybe something like this:
ENTRY(swsusp_arch_resume)
ldr sp, resume_sp
bl __swsusp_arch_resume_copy_pages
/* the rest of your code code goes here ... */
.align 2
resume_sp: .long swsusp_resume_stack + PAGE_SIZE - 8
ENDPROC(swsusp_arch_resume)
Then in C:
asmlinkage void __swsusp_arch_resume_copy_pages(void)
{
struct pbe *pbe;
for (pbe = restore_pblist; pbe; pbe = pbe->next)
copy_page(pbe->orig_address, pbe->address);
}
There's now nothing for the compiler to get wrong...
Cheers
---Dave
^ permalink raw reply [flat|nested] 8+ messages in thread
* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
2011-05-18 17:45 ` Dave Martin
@ 2011-05-19 8:46 ` Frank Hofmann
2011-05-19 9:47 ` Dave Martin
2011-05-19 11:08 ` Måns Rullgård
0 siblings, 2 replies; 8+ messages in thread
From: Frank Hofmann @ 2011-05-19 8:46 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 18 May 2011, Dave Martin wrote:
> On Wed, May 18, 2011 at 04:25:15PM +0100, Frank Hofmann wrote:
>> Hi,
>>
>>
>> with the work on the hibernation support code, I've managed to make
>> it all C-with-inline-asm. This happens to work fine on OMAP3, but
>> there's an issue when I build it for Samsung 64xx. It's about this
>> code:
>>
>>
>> int __naked swsusp_arch_resume(void)
>> {
>> register struct pbe *pbe;
>> register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
>
> (Why - 8?)
Sorry, my mistake, thought PUSH is post-decrement (empty descending). From
that assumption and the 8-byte-alignment-at-call, the offset were
necessary. You're right it's not needed.
Thanks for spotting.
>
>> register void *saved_ctx asm ("r0") = ctx;
>>
>> __asm__ __volatile__("" : "+sp"(sp)); /* don't optimize away ! */
>
> Is this an attempt to force gcc to change sp before the subsequent loop?
That's exactly what it does.
> I think you don't mean "+sp". Something like "+r" might be appropriate.
>
> This feels subtle and fragile to me.
Looked this up; again you're right the direct use of "rXX" (or the higher
reg names) for gcc ARM inline assembly input/output constraints is
undocumented, but the "Extended ASM" section in the manual,
http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
states that the way to force a variable into a specific register is:
register <type> variable asm("register") = ...;
and then to make sure it happens, to create an empty-asm-with-constraint
that forces a register write,
asm volatile("" : "+r"(variable));
For the paranoid, one could do:
asm volatile("mov sp, %0\n\t" : "+r"(variable) : : "sp");
which is, unlike the use of "sp" as input/output register, actually
documented explicitly in the "Extended ASM" section of gcc's manual,
quote:
If you refer to a particular hardware register from the
assembler code, you will probably have to list the register
after the third colon to tell the compiler the register's
value is modified. In some assemblers, the register names
begin with `%'; to produce one `%' in the assembler code,
you must write `%%' in the input.
(That's the only place where the doc acknowledges gcc's inline assembler's
knowledge of machine register names as constraints - in the clobber list)
[ ... ]
> Probably this is a side-effect of building for ARM versus Thumb-2.
> ARM1176 only supports ARM, and fp is used as the framepointer when
> building a kernel in ARM.
Both targets in my compile are ARM, but yes you're right framepointer or
no is a difference in the two defconfigs.
>
> Building that file with -fomit-frame-pointer avoids the error,
> but that's best avoided unless there's a really good reason.
The very specific purpose of this function ... it'd never be possible to
backtrace over it in any case.
>
> You would get the same problem when building for Thumb-2 if you pass
> -fno-omit-frame-pointer to the compiler and attempt to clobber r7
> (by default, Thumb-2 code has no framepointer).
>
[ ... ]
>> I am wondering if there's a way to convince gcc to do the right
>> thing here, __naked notwithstanding.
>
> I think basically luck is the answer. If you ask the compiler to saw
> its own head off it may occasionally make a neat job of it, but don't
> bet on that happening every time...
The head-sawing-off piece works fine (substituting the stackpointer), but
to convince it that it shouldn't attempt to facepalm itself afterwards
looks like a bit harder ;-)
>
>>
>> Is the use of a naked "bounce function" for this specific purpose
>> (switch the stackpointer) acceptable, and guaranteed to create the
>> correct code ?
>>
>>
>> It'd be nice not having to dump this piece into assembly...
>
> Using out-of-line assembler avoids the ABI issues, and is often
> more readable IMHO.
>
> On Wed, May 18, 2011 at 06:50:07PM +0200, Mikael Pettersson wrote:
> [...]
>> Just write the stack switching fragment in proper assembly, and
>> keep both sides of it in normal C. End of problem.
>
> Indeed, though there is only a little section of C, in the middle
> of some assembler which is not implementable in C.
The stack switching segment isn't the problem ... it's the C bit, as long
as it lives in the same func.
>
>
> Maybe something like this:
>
>
> ENTRY(swsusp_arch_resume)
> ldr sp, resume_sp
> bl __swsusp_arch_resume_copy_pages
> /* the rest of your code code goes here ... */
>
> .align 2
> resume_sp: .long swsusp_resume_stack + PAGE_SIZE - 8
> ENDPROC(swsusp_arch_resume)
Yow, morale of the exercise, I guess: It's possible to shoehorn GCC into
doing what I want, but the shoe isn't a good fit ;-)
I'm still not convinced why:
void __naked swsusp_arch_resume(void)
{
register void *sp asm("sp") = ...;
asm volatile("" : "+r"(sp));
__swsusp_arch_resume_copy_pages();
__swsusp_arch_resume_cpu();
}
with two explicit "noinline" funcs wouldn't be ok.
>
>
> Then in C:
>
> asmlinkage void __swsusp_arch_resume_copy_pages(void)
> {
> struct pbe *pbe;
>
> for (pbe = restore_pblist; pbe; pbe = pbe->next)
> copy_page(pbe->orig_address, pbe->address);
> }
>
>
> There's now nothing for the compiler to get wrong...
Maybe I'm trying too hard ;-) all other architectures supporting
suspend-to-disk haven't dared to put this thing into C ... even though
they all do this very operation as well. But then, none of the other
swsusp_arch_resume elsewhere has done the stackpointer switch either.
>
> Cheers
> ---Dave
>
>
Thanks for the extensive help !
FrankH.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
2011-05-19 8:46 ` Frank Hofmann
@ 2011-05-19 9:47 ` Dave Martin
2011-05-19 15:25 ` Frank Hofmann
2011-05-19 11:08 ` Måns Rullgård
1 sibling, 1 reply; 8+ messages in thread
From: Dave Martin @ 2011-05-19 9:47 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, May 19, 2011 at 09:46:30AM +0100, Frank Hofmann wrote:
>
>
> On Wed, 18 May 2011, Dave Martin wrote:
>
> >On Wed, May 18, 2011 at 04:25:15PM +0100, Frank Hofmann wrote:
> >>Hi,
> >>
> >>
> >>with the work on the hibernation support code, I've managed to make
> >>it all C-with-inline-asm. This happens to work fine on OMAP3, but
> >>there's an issue when I build it for Samsung 64xx. It's about this
> >>code:
> >>
> >>
> >>int __naked swsusp_arch_resume(void)
> >>{
> >> register struct pbe *pbe;
> >> register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
> >
> >(Why - 8?)
>
> Sorry, my mistake, thought PUSH is post-decrement (empty
> descending). From that assumption and the 8-byte-alignment-at-call,
> the offset were necessary. You're right it's not needed.
> Thanks for spotting.
OK -- I thoughr you might be using the top two words for something else.
push is just the same as stmfd sp!,
>
> >
> >> register void *saved_ctx asm ("r0") = ctx;
> >>
> >> __asm__ __volatile__("" : "+sp"(sp)); /* don't optimize away ! */
> >
> >Is this an attempt to force gcc to change sp before the subsequent loop?
>
> That's exactly what it does.
>
> >I think you don't mean "+sp". Something like "+r" might be appropriate.
> >
> >This feels subtle and fragile to me.
>
> Looked this up; again you're right the direct use of "rXX" (or the
> higher reg names) for gcc ARM inline assembly input/output
> constraints is undocumented, but the "Extended ASM" section in the
> manual,
>
> http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
>
> states that the way to force a variable into a specific register is:
>
> register <type> variable asm("register") = ...;
>
> and then to make sure it happens, to create an
> empty-asm-with-constraint that forces a register write,
>
> asm volatile("" : "+r"(variable));
>
> For the paranoid, one could do:
>
> asm volatile("mov sp, %0\n\t" : "+r"(variable) : : "sp");
>
> which is, unlike the use of "sp" as input/output register, actually
> documented explicitly in the "Extended ASM" section of gcc's manual,
> quote:
>
> If you refer to a particular hardware register from the
> assembler code, you will probably have to list the register
> after the third colon to tell the compiler the register's
> value is modified. In some assemblers, the register names
> begin with `%'; to produce one `%' in the assembler code,
> you must write `%%' in the input.
>
> (That's the only place where the doc acknowledges gcc's inline
> assembler's knowledge of machine register names as constraints - in
> the clobber list)
I think the clobber list is a different thing from the constraint
lists: in the clobber list, you can only have register names and a few
special things like "cc" and "memory".
In the constraint lists, you can only have constraints, not register
names. On some arches, there are constraints which map to a specific
register (such as the "a", "b", "c", "d" constraints on x86), but
I don't think there's anything like this for ARM.
As you observe, the register ... asm("rX") declarations are needed
instead to map arguments to specific registers.
>
> [ ... ]
> >Probably this is a side-effect of building for ARM versus Thumb-2.
> >ARM1176 only supports ARM, and fp is used as the framepointer when
> >building a kernel in ARM.
>
> Both targets in my compile are ARM, but yes you're right
> framepointer or no is a difference in the two defconfigs.
>
> >
> >Building that file with -fomit-frame-pointer avoids the error,
> >but that's best avoided unless there's a really good reason.
>
> The very specific purpose of this function ... it'd never be
> possible to backtrace over it in any case.
Yep, that was my assumption. If a fault happens in this code, the
battle is probably already lost.
>
> >
> >You would get the same problem when building for Thumb-2 if you pass
> >-fno-omit-frame-pointer to the compiler and attempt to clobber r7
> >(by default, Thumb-2 code has no framepointer).
> >
> [ ... ]
> >>I am wondering if there's a way to convince gcc to do the right
> >>thing here, __naked notwithstanding.
> >
> >I think basically luck is the answer. If you ask the compiler to saw
> >its own head off it may occasionally make a neat job of it, but don't
> >bet on that happening every time...
>
> The head-sawing-off piece works fine (substituting the
> stackpointer), but to convince it that it shouldn't attempt to
> facepalm itself afterwards looks like a bit harder ;-)
Well, no analogy is perfect ;)
>
>
> >
> >>
> >>Is the use of a naked "bounce function" for this specific purpose
> >>(switch the stackpointer) acceptable, and guaranteed to create the
> >>correct code ?
> >>
> >>
> >>It'd be nice not having to dump this piece into assembly...
> >
> >Using out-of-line assembler avoids the ABI issues, and is often
> >more readable IMHO.
> >
> >On Wed, May 18, 2011 at 06:50:07PM +0200, Mikael Pettersson wrote:
> >[...]
> >>Just write the stack switching fragment in proper assembly, and
> >>keep both sides of it in normal C. End of problem.
> >
> >Indeed, though there is only a little section of C, in the middle
> >of some assembler which is not implementable in C.
>
> The stack switching segment isn't the problem ... it's the C bit, as
> long as it lives in the same func.
>
> >
> >
> >Maybe something like this:
> >
> >
> >ENTRY(swsusp_arch_resume)
> > ldr sp, resume_sp
> > bl __swsusp_arch_resume_copy_pages
> > /* the rest of your code code goes here ... */
> >
> >.align 2
> >resume_sp: .long swsusp_resume_stack + PAGE_SIZE - 8
> >ENDPROC(swsusp_arch_resume)
>
> Yow, morale of the exercise, I guess: It's possible to shoehorn GCC
> into doing what I want, but the shoe isn't a good fit ;-)
>
> I'm still not convinced why:
>
> void __naked swsusp_arch_resume(void)
> {
> register void *sp asm("sp") = ...;
> asm volatile("" : "+r"(sp));
> __swsusp_arch_resume_copy_pages();
> __swsusp_arch_resume_cpu();
> }
>
> with two explicit "noinline" funcs wouldn't be ok.
As I said, it may work, today. It just feels more complex and
failure-prone, because it reiles on details of the compiler behaviour.
If doing this provided a real benefit, like better maintainability or
performance, then it would be worth trying. But I feel we don't get
any real advantages in this case.
That's just my opinion, but I know from experience how hard it is to get
compiler guys on your side when this kind of hack goes wrong...
>
> >
> >
> >Then in C:
> >
> >asmlinkage void __swsusp_arch_resume_copy_pages(void)
> >{
> > struct pbe *pbe;
> >
> > for (pbe = restore_pblist; pbe; pbe = pbe->next)
> > copy_page(pbe->orig_address, pbe->address);
> >}
> >
> >
> >There's now nothing for the compiler to get wrong...
>
> Maybe I'm trying too hard ;-) all other architectures supporting
> suspend-to-disk haven't dared to put this thing into C ... even
> though they all do this very operation as well. But then, none of
> the other swsusp_arch_resume elsewhere has done the stackpointer
> switch either.
Well, the more we can put in C the better, because C code is easier to
maintain and reuse -- hence it's good to be calling copy_pages() instead
of re-coding this in the assembly code. But there's a difference between
coding in C and shoehorning assembler into a .c file.
For totally arch-specific parts like saving/restoring the processor
state, at least some assembler is unavoidable...
Cheers
---Dave
^ permalink raw reply [flat|nested] 8+ messages in thread
* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
2011-05-19 8:46 ` Frank Hofmann
2011-05-19 9:47 ` Dave Martin
@ 2011-05-19 11:08 ` Måns Rullgård
1 sibling, 0 replies; 8+ messages in thread
From: Måns Rullgård @ 2011-05-19 11:08 UTC (permalink / raw)
To: linux-arm-kernel
Frank Hofmann <frank.hofmann@tomtom.com> writes:
>>> register void *saved_ctx asm ("r0") = ctx;
>>>
>>> __asm__ __volatile__("" : "+sp"(sp)); /* don't optimize away ! */
>>
>> Is this an attempt to force gcc to change sp before the subsequent loop?
>
> That's exactly what it does.
>
>> I think you don't mean "+sp". Something like "+r" might be appropriate.
>>
>> This feels subtle and fragile to me.
>
> Looked this up; again you're right the direct use of "rXX" (or the
> higher reg names) for gcc ARM inline assembly input/output constraints
> is undocumented, but the "Extended ASM" section in the manual,
>
> http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
>
> states that the way to force a variable into a specific register is:
>
> register <type> variable asm("register") = ...;
>
> and then to make sure it happens, to create an
> empty-asm-with-constraint that forces a register write,
>
> asm volatile("" : "+r"(variable));
This is not how gcc inline asm works. A variable declared with an
asm("rXX") tag will be placed in that register when used as a parameter
of an asm code statement. Outside that statement no promises are made.
It is possible, and it does happen, that gcc moves the value to the
requested register immediately before asm statement and moves it back
where it came from immediately after.
> For the paranoid, one could do:
>
> asm volatile("mov sp, %0\n\t" : "+r"(variable) : : "sp");
>
> which is, unlike the use of "sp" as input/output register, actually
> documented explicitly in the "Extended ASM" section of gcc's manual,
> quote:
>
> If you refer to a particular hardware register from the
> assembler code, you will probably have to list the register
> after the third colon to tell the compiler the register's
> value is modified. In some assemblers, the register names
> begin with `%'; to produce one `%' in the assembler code,
> you must write `%%' in the input.
>
> (That's the only place where the doc acknowledges gcc's inline
> assembler's knowledge of machine register names as constraints - in
> the clobber list)
The clobber list enumerates all explicitly named registers (not from
parameters) used in the code section, and the compiler will assume these
contain garbage after the asm statement, reloading them from memory or
otherwise repopulating them with good values. Whatever values they were
given in the asm statement are lost.
>>> Is the use of a naked "bounce function" for this specific purpose
>>> (switch the stackpointer) acceptable, and guaranteed to create the
>>> correct code ?
>>>
>>>
>>> It'd be nice not having to dump this piece into assembly...
>>
>> Using out-of-line assembler avoids the ABI issues, and is often
>> more readable IMHO.
>>
>> On Wed, May 18, 2011 at 06:50:07PM +0200, Mikael Pettersson wrote:
>> [...]
>>> Just write the stack switching fragment in proper assembly, and
>>> keep both sides of it in normal C. End of problem.
>>
>> Indeed, though there is only a little section of C, in the middle
>> of some assembler which is not implementable in C.
>
> The stack switching segment isn't the problem ... it's the C bit, as
> long as it lives in the same func.
A stack switching trampoline in pure asm seems to me the cleaner
solution by far.
> Yow, morale of the exercise, I guess: It's possible to shoehorn GCC
> into doing what I want, but the shoe isn't a good fit ;-)
>
> I'm still not convinced why:
>
> void __naked swsusp_arch_resume(void)
> {
> register void *sp asm("sp") = ...;
> asm volatile("" : "+r"(sp));
> __swsusp_arch_resume_copy_pages();
> __swsusp_arch_resume_cpu();
> }
>
> with two explicit "noinline" funcs wouldn't be ok.
I'm not convinced it _would_ be OK. Messing with the stack pointer from
inline asm usually causes all kinds of bad things to happen. Been
there, done that.
--
M?ns Rullg?rd
mans at mansr.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Question on ARM inline assembly - stackframes, sp/fp usage, __naked
2011-05-19 9:47 ` Dave Martin
@ 2011-05-19 15:25 ` Frank Hofmann
0 siblings, 0 replies; 8+ messages in thread
From: Frank Hofmann @ 2011-05-19 15:25 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 19 May 2011, Dave Martin wrote:
> On Thu, May 19, 2011 at 09:46:30AM +0100, Frank Hofmann wrote:
>>
>>
>> On Wed, 18 May 2011, Dave Martin wrote:
>>
>>> On Wed, May 18, 2011 at 04:25:15PM +0100, Frank Hofmann wrote:
>>>> Hi,
[ ... ]
>
> As I said, it may work, today. It just feels more complex and
> failure-prone, because it reiles on details of the compiler behaviour.
> If doing this provided a real benefit, like better maintainability or
> performance, then it would be worth trying. But I feel we don't get
> any real advantages in this case.
>
> That's just my opinion, but I know from experience how hard it is to get
> compiler guys on your side when this kind of hack goes wrong...
That bit of _experience_ is what I'm still building up, on ARM ... a great
thanks to you, M?ns and Mikael for providing that.
GCC is capricious wrt. to inline assembly, I understand now.
[ ... ]
> For totally arch-specific parts like saving/restoring the processor
> state, at least some assembler is unavoidable...
I'll keep the two files (one for the pure assembly, one for C only
parts) for now.
Best wishes,
FrankH.
>
> Cheers
> ---Dave
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-05-19 15:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-18 15:25 Question on ARM inline assembly - stackframes, sp/fp usage, __naked Frank Hofmann
2011-05-18 16:06 ` Frank Hofmann
2011-05-18 16:50 ` Mikael Pettersson
2011-05-18 17:45 ` Dave Martin
2011-05-19 8:46 ` Frank Hofmann
2011-05-19 9:47 ` Dave Martin
2011-05-19 15:25 ` Frank Hofmann
2011-05-19 11:08 ` Måns Rullgård
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.