linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
@ 2018-10-15 22:16 Stefan Agner
  2018-10-15 22:23 ` Russell King - ARM Linux
  2018-10-16  8:00 ` Linus Walleij
  0 siblings, 2 replies; 21+ messages in thread
From: Stefan Agner @ 2018-10-15 22:16 UTC (permalink / raw)
  To: linux, ulli.kroll
  Cc: joel, nico, arnd, linus.walleij, linux-arm-kernel, linux-kernel,
	Stefan Agner

When functions incoming parameters are not in input operands list gcc
4.5 does not load the parameters into registers before calling this
function but the inline assembly assumes valid addresses inside this
function. This breaks the code because r0 and r1 are invalid when
execution enters v4wb_copy_user_page ()

Also the constant needs to be used as third input operand so account
for that as well.

This fixes copypage-fa.c what has previously done before for the other
copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto
and kfrom to input operands list.").

Signed-off-by: Stefan Agner <stefan@agner.ch>
---
 arch/arm/mm/copypage-fa.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c
index d130a5ece5d5..ec6501308c60 100644
--- a/arch/arm/mm/copypage-fa.c
+++ b/arch/arm/mm/copypage-fa.c
@@ -22,7 +22,7 @@ fa_copy_user_page(void *kto, const void *kfrom)
 {
 	asm("\
 	stmfd	sp!, {r4, lr}			@ 2\n\
-	mov	r2, %0				@ 1\n\
+	mov	r2, %2				@ 1\n\
 1:	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
 	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
 	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
@@ -36,7 +36,7 @@ fa_copy_user_page(void *kto, const void *kfrom)
 	mcr	p15, 0, r2, c7, c10, 4		@ 1   drain WB\n\
 	ldmfd	sp!, {r4, pc}			@ 3"
 	:
-	: "I" (PAGE_SIZE / 32));
+	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 32));
 }
 
 void fa_copy_user_highpage(struct page *to, struct page *from,
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-10-15 22:16 [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list Stefan Agner
@ 2018-10-15 22:23 ` Russell King - ARM Linux
  2018-10-15 22:39   ` Stefan Agner
  2018-10-16  8:00 ` Linus Walleij
  1 sibling, 1 reply; 21+ messages in thread
From: Russell King - ARM Linux @ 2018-10-15 22:23 UTC (permalink / raw)
  To: Stefan Agner
  Cc: ulli.kroll, joel, nico, arnd, linus.walleij, linux-arm-kernel,
	linux-kernel

On Tue, Oct 16, 2018 at 12:16:29AM +0200, Stefan Agner wrote:
> When functions incoming parameters are not in input operands list gcc
> 4.5 does not load the parameters into registers before calling this
> function but the inline assembly assumes valid addresses inside this
> function. This breaks the code because r0 and r1 are invalid when
> execution enters v4wb_copy_user_page ()

NAK.  Naked functions must never be inlined.  Please add a "noinline"
attribute to the function rather than making things more complex.

The GCC manual states:

`naked'
     Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX
     and SPU ports to indicate that the specified function does not
     need prologue/epilogue sequences generated by the compiler.  It is
     up to the programmer to provide these sequences. The only
                                                      ^^^^^^^^
     statements that can be safely included in naked functions are
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     `asm' statements that do not have operands.  All other statements,
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     including declarations of local variables, `if' statements, and so
     forth, should be avoided.  Naked functions should be used to
     implement the body of an assembly function, while allowing the
     compiler to construct the requisite function declaration for the
     assembler.

The 'I' attribute is fine here because it is a constant that is not
allowed to be in a register (and hence has no code generation side
effects.)

Adding operands for the input parameters, however, isn't going to
work around the fact that _this_ assembly is written to be out of
line and so it must never be inlined by the compiler.

> Also the constant needs to be used as third input operand so account
> for that as well.
> 
> This fixes copypage-fa.c what has previously done before for the other
> copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto
> and kfrom to input operands list.").
> 
> Signed-off-by: Stefan Agner <stefan@agner.ch>
> ---
>  arch/arm/mm/copypage-fa.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c
> index d130a5ece5d5..ec6501308c60 100644
> --- a/arch/arm/mm/copypage-fa.c
> +++ b/arch/arm/mm/copypage-fa.c
> @@ -22,7 +22,7 @@ fa_copy_user_page(void *kto, const void *kfrom)
>  {
>  	asm("\
>  	stmfd	sp!, {r4, lr}			@ 2\n\
> -	mov	r2, %0				@ 1\n\
> +	mov	r2, %2				@ 1\n\
>  1:	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
>  	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
>  	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> @@ -36,7 +36,7 @@ fa_copy_user_page(void *kto, const void *kfrom)
>  	mcr	p15, 0, r2, c7, c10, 4		@ 1   drain WB\n\
>  	ldmfd	sp!, {r4, pc}			@ 3"
>  	:
> -	: "I" (PAGE_SIZE / 32));
> +	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 32));
>  }
>  
>  void fa_copy_user_highpage(struct page *to, struct page *from,
> -- 
> 2.19.1
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-10-15 22:23 ` Russell King - ARM Linux
@ 2018-10-15 22:39   ` Stefan Agner
  2018-10-15 22:46     ` Russell King - ARM Linux
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Agner @ 2018-10-15 22:39 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: raj.khem, ulli.kroll, joel, nico, arnd, linus.walleij,
	linux-arm-kernel, linux-kernel

On 16.10.2018 00:23, Russell King - ARM Linux wrote:
> On Tue, Oct 16, 2018 at 12:16:29AM +0200, Stefan Agner wrote:
>> When functions incoming parameters are not in input operands list gcc
>> 4.5 does not load the parameters into registers before calling this
>> function but the inline assembly assumes valid addresses inside this
>> function. This breaks the code because r0 and r1 are invalid when
>> execution enters v4wb_copy_user_page ()
> 
> NAK.  Naked functions must never be inlined.  Please add a "noinline"
> attribute to the function rather than making things more complex.
> 

To be honest, I did not put much thought into this commit since it is
just doing to copypage-fa.c what 9a40ac86152c ("ARM: 6164/1: Add kto and
kfrom to input operands list.") has been done to the other copypage
implementations...

[adding Khem]

> The GCC manual states:
> 
> `naked'
>      Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX
>      and SPU ports to indicate that the specified function does not
>      need prologue/epilogue sequences generated by the compiler.  It is
>      up to the programmer to provide these sequences. The only
>                                                       ^^^^^^^^
>      statements that can be safely included in naked functions are
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>      `asm' statements that do not have operands.  All other statements,
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>      including declarations of local variables, `if' statements, and so
>      forth, should be avoided.  Naked functions should be used to
>      implement the body of an assembly function, while allowing the
>      compiler to construct the requisite function declaration for the
>      assembler.
> 
> The 'I' attribute is fine here because it is a constant that is not
> allowed to be in a register (and hence has no code generation side
> effects.)
> 
> Adding operands for the input parameters, however, isn't going to
> work around the fact that _this_ assembly is written to be out of
> line and so it must never be inlined by the compiler.

I briefly looked at a disassembled version after applying both patches,
it indeed leads to inlining. However, the code seems to be working
(thanks to asm volatile?)...

Anyway, my goal is actually what patch 2 ("ARM: copypage: do not use
naked functions") is doing: Make Clang happy. As a matter of fact,
reverting 9a40ac86152c actually fixes compilation for Clang too, and
seems to lead to a working Kernel (tested with versatile_defconfig in
Qemu), so maybe that is what we should do here?

--
Stefan

> 
>> Also the constant needs to be used as third input operand so account
>> for that as well.
>>
>> This fixes copypage-fa.c what has previously done before for the other
>> copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto
>> and kfrom to input operands list.").
>>
>> Signed-off-by: Stefan Agner <stefan@agner.ch>
>> ---
>>  arch/arm/mm/copypage-fa.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c
>> index d130a5ece5d5..ec6501308c60 100644
>> --- a/arch/arm/mm/copypage-fa.c
>> +++ b/arch/arm/mm/copypage-fa.c
>> @@ -22,7 +22,7 @@ fa_copy_user_page(void *kto, const void *kfrom)
>>  {
>>  	asm("\
>>  	stmfd	sp!, {r4, lr}			@ 2\n\
>> -	mov	r2, %0				@ 1\n\
>> +	mov	r2, %2				@ 1\n\
>>  1:	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
>>  	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
>>  	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
>> @@ -36,7 +36,7 @@ fa_copy_user_page(void *kto, const void *kfrom)
>>  	mcr	p15, 0, r2, c7, c10, 4		@ 1   drain WB\n\
>>  	ldmfd	sp!, {r4, pc}			@ 3"
>>  	:
>> -	: "I" (PAGE_SIZE / 32));
>> +	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 32));
>>  }
>>
>>  void fa_copy_user_highpage(struct page *to, struct page *from,
>> --
>> 2.19.1
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-10-15 22:39   ` Stefan Agner
@ 2018-10-15 22:46     ` Russell King - ARM Linux
  2018-10-15 22:52       ` Stefan Agner
  0 siblings, 1 reply; 21+ messages in thread
From: Russell King - ARM Linux @ 2018-10-15 22:46 UTC (permalink / raw)
  To: Stefan Agner
  Cc: raj.khem, ulli.kroll, joel, nico, arnd, linus.walleij,
	linux-arm-kernel, linux-kernel

On Tue, Oct 16, 2018 at 12:39:54AM +0200, Stefan Agner wrote:
> On 16.10.2018 00:23, Russell King - ARM Linux wrote:
> > On Tue, Oct 16, 2018 at 12:16:29AM +0200, Stefan Agner wrote:
> >> When functions incoming parameters are not in input operands list gcc
> >> 4.5 does not load the parameters into registers before calling this
> >> function but the inline assembly assumes valid addresses inside this
> >> function. This breaks the code because r0 and r1 are invalid when
> >> execution enters v4wb_copy_user_page ()
> > 
> > NAK.  Naked functions must never be inlined.  Please add a "noinline"
> > attribute to the function rather than making things more complex.
> > 
> 
> To be honest, I did not put much thought into this commit since it is
> just doing to copypage-fa.c what 9a40ac86152c ("ARM: 6164/1: Add kto and
> kfrom to input operands list.") has been done to the other copypage
> implementations...
> 
> [adding Khem]
> 
> > The GCC manual states:
> > 
> > `naked'
> >      Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX
> >      and SPU ports to indicate that the specified function does not
> >      need prologue/epilogue sequences generated by the compiler.  It is
> >      up to the programmer to provide these sequences. The only
> >                                                       ^^^^^^^^
> >      statements that can be safely included in naked functions are
> >      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >      `asm' statements that do not have operands.  All other statements,
> >      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >      including declarations of local variables, `if' statements, and so
> >      forth, should be avoided.  Naked functions should be used to
> >      implement the body of an assembly function, while allowing the
> >      compiler to construct the requisite function declaration for the
> >      assembler.
> > 
> > The 'I' attribute is fine here because it is a constant that is not
> > allowed to be in a register (and hence has no code generation side
> > effects.)
> > 
> > Adding operands for the input parameters, however, isn't going to
> > work around the fact that _this_ assembly is written to be out of
> > line and so it must never be inlined by the compiler.
> 
> I briefly looked at a disassembled version after applying both patches,
> it indeed leads to inlining. However, the code seems to be working
> (thanks to asm volatile?)...

Apart from v4wb_copy_user_page() and mc_copy_user_page(), how is
Clang inlining these static functions that are only used through
function pointers?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-10-15 22:46     ` Russell King - ARM Linux
@ 2018-10-15 22:52       ` Stefan Agner
  2018-10-15 23:03         ` Russell King - ARM Linux
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Agner @ 2018-10-15 22:52 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: raj.khem, ulli.kroll, joel, nico, arnd, linus.walleij,
	linux-arm-kernel, linux-kernel

On 16.10.2018 00:46, Russell King - ARM Linux wrote:
> On Tue, Oct 16, 2018 at 12:39:54AM +0200, Stefan Agner wrote:
>> On 16.10.2018 00:23, Russell King - ARM Linux wrote:
>> > On Tue, Oct 16, 2018 at 12:16:29AM +0200, Stefan Agner wrote:
>> >> When functions incoming parameters are not in input operands list gcc
>> >> 4.5 does not load the parameters into registers before calling this
>> >> function but the inline assembly assumes valid addresses inside this
>> >> function. This breaks the code because r0 and r1 are invalid when
>> >> execution enters v4wb_copy_user_page ()
>> >
>> > NAK.  Naked functions must never be inlined.  Please add a "noinline"
>> > attribute to the function rather than making things more complex.
>> >
>>
>> To be honest, I did not put much thought into this commit since it is
>> just doing to copypage-fa.c what 9a40ac86152c ("ARM: 6164/1: Add kto and
>> kfrom to input operands list.") has been done to the other copypage
>> implementations...
>>
>> [adding Khem]
>>
>> > The GCC manual states:
>> >
>> > `naked'
>> >      Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX
>> >      and SPU ports to indicate that the specified function does not
>> >      need prologue/epilogue sequences generated by the compiler.  It is
>> >      up to the programmer to provide these sequences. The only
>> >                                                       ^^^^^^^^
>> >      statements that can be safely included in naked functions are
>> >      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> >      `asm' statements that do not have operands.  All other statements,
>> >      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> >      including declarations of local variables, `if' statements, and so
>> >      forth, should be avoided.  Naked functions should be used to
>> >      implement the body of an assembly function, while allowing the
>> >      compiler to construct the requisite function declaration for the
>> >      assembler.
>> >
>> > The 'I' attribute is fine here because it is a constant that is not
>> > allowed to be in a register (and hence has no code generation side
>> > effects.)
>> >
>> > Adding operands for the input parameters, however, isn't going to
>> > work around the fact that _this_ assembly is written to be out of
>> > line and so it must never be inlined by the compiler.
>>
>> I briefly looked at a disassembled version after applying both patches,
>> it indeed leads to inlining. However, the code seems to be working
>> (thanks to asm volatile?)...
> 
> Apart from v4wb_copy_user_page() and mc_copy_user_page(), how is
> Clang inlining these static functions that are only used through
> function pointers?

I only looked at copypage-xscale.c (the mc_copy_user_page() case)...

--
Stefan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-10-15 22:52       ` Stefan Agner
@ 2018-10-15 23:03         ` Russell King - ARM Linux
  0 siblings, 0 replies; 21+ messages in thread
From: Russell King - ARM Linux @ 2018-10-15 23:03 UTC (permalink / raw)
  To: Stefan Agner
  Cc: raj.khem, ulli.kroll, joel, nico, arnd, linus.walleij,
	linux-arm-kernel, linux-kernel

On Tue, Oct 16, 2018 at 12:52:58AM +0200, Stefan Agner wrote:
> On 16.10.2018 00:46, Russell King - ARM Linux wrote:
> > On Tue, Oct 16, 2018 at 12:39:54AM +0200, Stefan Agner wrote:
> >> On 16.10.2018 00:23, Russell King - ARM Linux wrote:
> >> > On Tue, Oct 16, 2018 at 12:16:29AM +0200, Stefan Agner wrote:
> >> >> When functions incoming parameters are not in input operands list gcc
> >> >> 4.5 does not load the parameters into registers before calling this
> >> >> function but the inline assembly assumes valid addresses inside this
> >> >> function. This breaks the code because r0 and r1 are invalid when
> >> >> execution enters v4wb_copy_user_page ()
> >> >
> >> > NAK.  Naked functions must never be inlined.  Please add a "noinline"
> >> > attribute to the function rather than making things more complex.
> >> >
> >>
> >> To be honest, I did not put much thought into this commit since it is
> >> just doing to copypage-fa.c what 9a40ac86152c ("ARM: 6164/1: Add kto and
> >> kfrom to input operands list.") has been done to the other copypage
> >> implementations...
> >>
> >> [adding Khem]
> >>
> >> > The GCC manual states:
> >> >
> >> > `naked'
> >> >      Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX
> >> >      and SPU ports to indicate that the specified function does not
> >> >      need prologue/epilogue sequences generated by the compiler.  It is
> >> >      up to the programmer to provide these sequences. The only
> >> >                                                       ^^^^^^^^
> >> >      statements that can be safely included in naked functions are
> >> >      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >> >      `asm' statements that do not have operands.  All other statements,
> >> >      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >> >      including declarations of local variables, `if' statements, and so
> >> >      forth, should be avoided.  Naked functions should be used to
> >> >      implement the body of an assembly function, while allowing the
> >> >      compiler to construct the requisite function declaration for the
> >> >      assembler.
> >> >
> >> > The 'I' attribute is fine here because it is a constant that is not
> >> > allowed to be in a register (and hence has no code generation side
> >> > effects.)
> >> >
> >> > Adding operands for the input parameters, however, isn't going to
> >> > work around the fact that _this_ assembly is written to be out of
> >> > line and so it must never be inlined by the compiler.
> >>
> >> I briefly looked at a disassembled version after applying both patches,
> >> it indeed leads to inlining. However, the code seems to be working
> >> (thanks to asm volatile?)...
> > 
> > Apart from v4wb_copy_user_page() and mc_copy_user_page(), how is
> > Clang inlining these static functions that are only used through
> > function pointers?
> 
> I only looked at copypage-xscale.c (the mc_copy_user_page() case)...

The two I mention are different from the rest, because they are used
from other functions within the same file.  The rest are all used
through function pointers and should, therefore, never be inlined.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-10-15 22:16 [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list Stefan Agner
  2018-10-15 22:23 ` Russell King - ARM Linux
@ 2018-10-16  8:00 ` Linus Walleij
  2018-10-16  8:44   ` Russell King - ARM Linux
  1 sibling, 1 reply; 21+ messages in thread
From: Linus Walleij @ 2018-10-16  8:00 UTC (permalink / raw)
  To: Stefan Agner
  Cc: Russell King, Hans Ulli Kroll, Joel Stanley, Nicolas Pitre,
	Arnd Bergmann, Linux ARM, linux-kernel, Roman Yeryomin

On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner <stefan@agner.ch> wrote:

> When functions incoming parameters are not in input operands list gcc
> 4.5 does not load the parameters into registers before calling this
> function but the inline assembly assumes valid addresses inside this
> function. This breaks the code because r0 and r1 are invalid when
> execution enters v4wb_copy_user_page ()
>
> Also the constant needs to be used as third input operand so account
> for that as well.
>
> This fixes copypage-fa.c what has previously done before for the other
> copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto
> and kfrom to input operands list.").
>
> Signed-off-by: Stefan Agner <stefan@agner.ch>

Please add:
Cc: stable@vger.kernel.org

I am on deep waters with ARM assembly, admittedly. So I wanted to
ask: OpenWRT has this cache patch:
https://github.com/openwrt/openwrt/blob/master/target/linux/gemini/patches-4.14/0001-cache-patch-from-OpenWRT.patch
I do not know why (sorry).

Do you think that patch is actually a hack to hide the problem
fixed with this patch? (OK maybe stupid question but...)
it appeared anonymously in OpenWRT with the commit message
"add v3.18 support" at one point.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-10-16  8:00 ` Linus Walleij
@ 2018-10-16  8:44   ` Russell King - ARM Linux
  2018-10-16 11:35     ` Linus Walleij
  2018-10-16 20:43     ` Nicolas Pitre
  0 siblings, 2 replies; 21+ messages in thread
From: Russell King - ARM Linux @ 2018-10-16  8:44 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Stefan Agner, Hans Ulli Kroll, Joel Stanley, Nicolas Pitre,
	Arnd Bergmann, Linux ARM, linux-kernel, Roman Yeryomin

On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote:
> On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner <stefan@agner.ch> wrote:
> 
> > When functions incoming parameters are not in input operands list gcc
> > 4.5 does not load the parameters into registers before calling this
> > function but the inline assembly assumes valid addresses inside this
> > function. This breaks the code because r0 and r1 are invalid when
> > execution enters v4wb_copy_user_page ()
> >
> > Also the constant needs to be used as third input operand so account
> > for that as well.
> >
> > This fixes copypage-fa.c what has previously done before for the other
> > copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto
> > and kfrom to input operands list.").
> >
> > Signed-off-by: Stefan Agner <stefan@agner.ch>
> 
> Please add:
> Cc: stable@vger.kernel.org

It's not obvious yet whether this is right - it contradicts the GCC
manual, but then we have evidence that it's required for some GCC
versions where GCC may clone the function, or if the function is
used within the same file.

> I am on deep waters with ARM assembly, admittedly. So I wanted to
> ask: OpenWRT has this cache patch:
> https://github.com/openwrt/openwrt/blob/master/target/linux/gemini/patches-4.14/0001-cache-patch-from-OpenWRT.patch
> I do not know why (sorry).
> 
> Do you think that patch is actually a hack to hide the problem
> fixed with this patch? (OK maybe stupid question but...)

No, it looks to me like a hack to make DMA cache handling "more
efficient" by cleaning/invalidating the entire cache when dealing
with large streaming buffers.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-10-16  8:44   ` Russell King - ARM Linux
@ 2018-10-16 11:35     ` Linus Walleij
  2018-10-16 20:43     ` Nicolas Pitre
  1 sibling, 0 replies; 21+ messages in thread
From: Linus Walleij @ 2018-10-16 11:35 UTC (permalink / raw)
  To: Russell King
  Cc: Stefan Agner, Hans Ulli Kroll, Joel Stanley, Nicolas Pitre,
	Arnd Bergmann, Linux ARM, linux-kernel, Roman Yeryomin

On Tue, Oct 16, 2018 at 10:44 AM Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote:

> > I am on deep waters with ARM assembly, admittedly. So I wanted to
> > ask: OpenWRT has this cache patch:
> > https://github.com/openwrt/openwrt/blob/master/target/linux/gemini/patches-4.14/0001-cache-patch-from-OpenWRT.patch
> > I do not know why (sorry).
> >
> > Do you think that patch is actually a hack to hide the problem
> > fixed with this patch? (OK maybe stupid question but...)
>
> No, it looks to me like a hack to make DMA cache handling "more
> efficient" by cleaning/invalidating the entire cache when dealing
> with large streaming buffers.

Aha that makes a lot of sense.

I will attempt to drop it from OpenWRT in the next kernel upgrade
unless benchmarks can show that it is worth it.

Thanks Russell!

Linus Walleij

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-10-16  8:44   ` Russell King - ARM Linux
  2018-10-16 11:35     ` Linus Walleij
@ 2018-10-16 20:43     ` Nicolas Pitre
  2018-10-16 21:59       ` Stefan Agner
                         ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: Nicolas Pitre @ 2018-10-16 20:43 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Linus Walleij, Stefan Agner, Hans Ulli Kroll, Joel Stanley,
	Arnd Bergmann, Linux ARM, linux-kernel, Roman Yeryomin

On Tue, 16 Oct 2018, Russell King - ARM Linux wrote:

> On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote:
> > On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner <stefan@agner.ch> wrote:
> > 
> > > When functions incoming parameters are not in input operands list gcc
> > > 4.5 does not load the parameters into registers before calling this
> > > function but the inline assembly assumes valid addresses inside this
> > > function. This breaks the code because r0 and r1 are invalid when
> > > execution enters v4wb_copy_user_page ()
> > >
> > > Also the constant needs to be used as third input operand so account
> > > for that as well.
> > >
> > > This fixes copypage-fa.c what has previously done before for the other
> > > copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto
> > > and kfrom to input operands list.").
> > >
> > > Signed-off-by: Stefan Agner <stefan@agner.ch>
> > 
> > Please add:
> > Cc: stable@vger.kernel.org
> 
> It's not obvious yet whether this is right - it contradicts the GCC
> manual, but then we have evidence that it's required for some GCC
> versions where GCC may clone the function, or if the function is
> used within the same file.

Why not getting rid of __naked altogether? Here's what I suggest:

----- >8
Subject: [PATCH] ARM: remove naked function usage

Convert page copy functions not to rely on the naked function attribute.

This attribute is known to confuse some gcc versions when function
arguments aren't explicitly listed as inline assembly operands despite
the gcc documentation. That resulted in commit 9a40ac86152c ("ARM:
6164/1: Add kto and kfrom to input operands list.").

Yet that commit has problems of its own by having assembly operand
constraints completely wrong. If the generated code has been OK since
then, it is due to luck rather than correctness. So this patch provides
proper assembly operand usage, and removes two instances of redundant
register duplications in the implementation while at it.

Inspection of the generated code with this patch doesn't show any obvious
quality degradation either, so not relying on __naked at all will make
the code less fragile, and more likely to be compilable with clang.

The only remaining __naked instances (excluding the kprobes test cases)
are exynos_pm_power_up_setup() and tc2_pm_power_up_setup(). But in those
cases only the function address is used by the compiler with no chance of
inlining it by mistake.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/mm/copypage-fa.c       | 34 ++++++------
 arch/arm/mm/copypage-feroceon.c | 97 +++++++++++++++++------------------
 arch/arm/mm/copypage-v4mc.c     | 18 +++----
 arch/arm/mm/copypage-v4wb.c     | 40 +++++++--------
 arch/arm/mm/copypage-v4wt.c     | 36 ++++++-------
 arch/arm/mm/copypage-xsc3.c     | 70 +++++++++++--------------
 arch/arm/mm/copypage-xscale.c   | 70 ++++++++++++-------------
 7 files changed, 171 insertions(+), 194 deletions(-)

diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c
index d130a5ece5..453a3341ca 100644
--- a/arch/arm/mm/copypage-fa.c
+++ b/arch/arm/mm/copypage-fa.c
@@ -17,26 +17,24 @@
 /*
  * Faraday optimised copy_user_page
  */
-static void __naked
-fa_copy_user_page(void *kto, const void *kfrom)
+static void fa_copy_user_page(void *kto, const void *kfrom)
 {
-	asm("\
-	stmfd	sp!, {r4, lr}			@ 2\n\
-	mov	r2, %0				@ 1\n\
-1:	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
-	add	r0, r0, #16			@ 1\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
-	add	r0, r0, #16			@ 1\n\
-	subs	r2, r2, #1			@ 1\n\
+	int tmp;
+	asm volatile ("\
+1:	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+	stmia	%0, {r3, r4, ip, lr}		@ 4\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ 1   clean and invalidate D line\n\
+	add	%0, %0, #16			@ 1\n\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+	stmia	%0, {r3, r4, ip, lr}		@ 4\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ 1   clean and invalidate D line\n\
+	add	%0, %0, #16			@ 1\n\
+	subs	%2, %2, #1			@ 1\n\
 	bne	1b				@ 1\n\
-	mcr	p15, 0, r2, c7, c10, 4		@ 1   drain WB\n\
-	ldmfd	sp!, {r4, pc}			@ 3"
-	:
-	: "I" (PAGE_SIZE / 32));
+	mcr	p15, 0, %2, c7, c10, 4		@ 1   drain WB"
+	: "+&r" (kto), "+&r" (kfrom), "=&r" "tmp)
+	: "2" (PAGE_SIZE / 32)
+	: "r3", "r4", "ip", "lr");
 }
 
 void fa_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-feroceon.c b/arch/arm/mm/copypage-feroceon.c
index 49ee0c1a72..1349430c63 100644
--- a/arch/arm/mm/copypage-feroceon.c
+++ b/arch/arm/mm/copypage-feroceon.c
@@ -13,58 +13,55 @@
 #include <linux/init.h>
 #include <linux/highmem.h>
 
-static void __naked
-feroceon_copy_user_page(void *kto, const void *kfrom)
+static void feroceon_copy_user_page(void *kto, const void *kfrom)
 {
-	asm("\
-	stmfd	sp!, {r4-r9, lr}		\n\
-	mov	ip, %2				\n\
-1:	mov	lr, r1				\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	pld	[lr, #32]			\n\
-	pld	[lr, #64]			\n\
-	pld	[lr, #96]			\n\
-	pld	[lr, #128]			\n\
-	pld	[lr, #160]			\n\
-	pld	[lr, #192]			\n\
-	pld	[lr, #224]			\n\
-	stmia	r0, {r2 - r9}			\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
-	stmia	r0, {r2 - r9}			\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
-	stmia	r0, {r2 - r9}			\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
-	stmia	r0, {r2 - r9}			\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
-	stmia	r0, {r2 - r9}			\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
-	stmia	r0, {r2 - r9}			\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
-	stmia	r0, {r2 - r9}			\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
-	stmia	r0, {r2 - r9}			\n\
-	subs	ip, ip, #(32 * 8)		\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
+	int tmp;
+	asm volatile ("\
+1:	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	pld	[%1, #0]			\n\
+	pld	[%1, #32]			\n\
+	pld	[%1, #64]			\n\
+	pld	[%1, #96]			\n\
+	pld	[%1, #128]			\n\
+	pld	[%1, #160]			\n\
+	pld	[%1, #192]			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	subs	%2, %2, #(32 * 8)		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
 	bne	1b				\n\
-	mcr	p15, 0, ip, c7, c10, 4		@ drain WB\n\
-	ldmfd	sp!, {r4-r9, pc}"
-	:
-	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE));
+	mcr	p15, 0, %2, c7, c10, 4		@ drain WB"
+	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
+	: =2" (PAGE_SIZE),
+	: "r2", "r3", "r4", "r5", "r6", "r7", "ip", "lr");
 }
 
 void feroceon_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
index 0224416cba..494ddc435a 100644
--- a/arch/arm/mm/copypage-v4mc.c
+++ b/arch/arm/mm/copypage-v4mc.c
@@ -40,12 +40,10 @@ static DEFINE_RAW_SPINLOCK(minicache_lock);
  * instruction.  If your processor does not supply this, you have to write your
  * own copy_user_highpage that does the right thing.
  */
-static void __naked
-mc_copy_user_page(void *from, void *to)
+static void mc_copy_user_page(void *from, void *to)
 {
-	asm volatile(
-	"stmfd	sp!, {r4, lr}			@ 2\n\
-	mov	r4, %2				@ 1\n\
+	int tmp;
+	asm volatile ("\
 	ldmia	%0!, {r2, r3, ip, lr}		@ 4\n\
 1:	mcr	p15, 0, %1, c7, c6, 1		@ 1   invalidate D line\n\
 	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
@@ -55,13 +53,13 @@ mc_copy_user_page(void *from, void *to)
 	mcr	p15, 0, %1, c7, c6, 1		@ 1   invalidate D line\n\
 	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
 	ldmia	%0!, {r2, r3, ip, lr}		@ 4\n\
-	subs	r4, r4, #1			@ 1\n\
+	subs	%2, %2, #1			@ 1\n\
 	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
 	ldmneia	%0!, {r2, r3, ip, lr}		@ 4\n\
-	bne	1b				@ 1\n\
-	ldmfd	sp!, {r4, pc}			@ 3"
-	:
-	: "r" (from), "r" (to), "I" (PAGE_SIZE / 64));
+	bne	1b				@ "
+	: "+&r" (from), "+&r" (to), "=&r" (tmp)
+	: "2" (PAGE_SIZE / 64)
+	: "r2", "r3", "ip", "lr");
 }
 
 void v4_mc_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-v4wb.c b/arch/arm/mm/copypage-v4wb.c
index 067d0fdd63..cf064ac6fc 100644
--- a/arch/arm/mm/copypage-v4wb.c
+++ b/arch/arm/mm/copypage-v4wb.c
@@ -22,29 +22,27 @@
  * instruction.  If your processor does not supply this, you have to write your
  * own copy_user_highpage that does the right thing.
  */
-static void __naked
-v4wb_copy_user_page(void *kto, const void *kfrom)
+static void v4wb_copy_user_page(void *kto, const void *kfrom)
 {
-	asm("\
-	stmfd	sp!, {r4, lr}			@ 2\n\
-	mov	r2, %2				@ 1\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-1:	mcr	p15, 0, r0, c7, c6, 1		@ 1   invalidate D line\n\
-	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4+1\n\
-	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-	mcr	p15, 0, r0, c7, c6, 1		@ 1   invalidate D line\n\
-	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-	subs	r2, r2, #1			@ 1\n\
-	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmneia	r1!, {r3, r4, ip, lr}		@ 4\n\
+	int tmp;
+	asm volatile ("\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+1:	mcr	p15, 0, %0, c7, c6, 1		@ 1   invalidate D line\n\
+	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4+1\n\
+	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+	mcr	p15, 0, %0, c7, c6, 1		@ 1   invalidate D line\n\
+	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+	subs	%2, %2, #1			@ 1\n\
+	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmneia	%1!, {r3, r4, ip, lr}		@ 4\n\
 	bne	1b				@ 1\n\
-	mcr	p15, 0, r1, c7, c10, 4		@ 1   drain WB\n\
-	ldmfd	 sp!, {r4, pc}			@ 3"
-	:
-	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64));
+	mcr	p15, 0, %1, c7, c10, 4		@ 1   drain WB"
+	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
+	: "2" (PAGE_SIZE / 64)
+	: "r3", "r4", "ip", "lr");
 }
 
 void v4wb_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-v4wt.c b/arch/arm/mm/copypage-v4wt.c
index b85c5da2e5..66745bd3a6 100644
--- a/arch/arm/mm/copypage-v4wt.c
+++ b/arch/arm/mm/copypage-v4wt.c
@@ -20,27 +20,25 @@
  * dirty data in the cache.  However, we do have to ensure that
  * subsequent reads are up to date.
  */
-static void __naked
-v4wt_copy_user_page(void *kto, const void *kfrom)
+static void v4wt_copy_user_page(void *kto, const void *kfrom)
 {
-	asm("\
-	stmfd	sp!, {r4, lr}			@ 2\n\
-	mov	r2, %2				@ 1\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-1:	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4+1\n\
-	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-	subs	r2, r2, #1			@ 1\n\
-	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmneia	r1!, {r3, r4, ip, lr}		@ 4\n\
+	int tmp;
+	asm volatile ("\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+1:	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4+1\n\
+	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+	subs	%2, %2, #1			@ 1\n\
+	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmneia	%1!, {r3, r4, ip, lr}		@ 4\n\
 	bne	1b				@ 1\n\
-	mcr	p15, 0, r2, c7, c7, 0		@ flush ID cache\n\
-	ldmfd	sp!, {r4, pc}			@ 3"
-	:
-	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64));
+	mcr	p15, 0, %2, c7, c7, 0		@ flush ID cache"
+	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
+	: "2" (PAGE_SIZE / 64)
+	: "r3", "r4", "ip", "lr");
 }
 
 void v4wt_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-xsc3.c b/arch/arm/mm/copypage-xsc3.c
index 03a2042ace..727a02c149 100644
--- a/arch/arm/mm/copypage-xsc3.c
+++ b/arch/arm/mm/copypage-xsc3.c
@@ -21,53 +21,45 @@
 
 /*
  * XSC3 optimised copy_user_highpage
- *  r0 = destination
- *  r1 = source
  *
  * The source page may have some clean entries in the cache already, but we
  * can safely ignore them - break_cow() will flush them out of the cache
  * if we eventually end up using our copied page.
  *
  */
-static void __naked
-xsc3_mc_copy_user_page(void *kto, const void *kfrom)
+static void xsc3_mc_copy_user_page(void *kto, const void *kfrom)
 {
-	asm("\
-	stmfd	sp!, {r4, r5, lr}		\n\
-	mov	lr, %2				\n\
-						\n\
-	pld	[r1, #0]			\n\
-	pld	[r1, #32]			\n\
-1:	pld	[r1, #64]			\n\
-	pld	[r1, #96]			\n\
+	int tmp;
+	asm volatile ("\
+	pld	[%1, #0]			\n\
+	pld	[%1, #32]			\n\
+1:	pld	[%1, #64]			\n\
+	pld	[%1, #96]			\n\
 						\n\
-2:	ldrd	r2, [r1], #8			\n\
-	mov	ip, r0				\n\
-	ldrd	r4, [r1], #8			\n\
-	mcr	p15, 0, ip, c7, c6, 1		@ invalidate\n\
-	strd	r2, [r0], #8			\n\
-	ldrd	r2, [r1], #8			\n\
-	strd	r4, [r0], #8			\n\
-	ldrd	r4, [r1], #8			\n\
-	strd	r2, [r0], #8			\n\
-	strd	r4, [r0], #8			\n\
-	ldrd	r2, [r1], #8			\n\
-	mov	ip, r0				\n\
-	ldrd	r4, [r1], #8			\n\
-	mcr	p15, 0, ip, c7, c6, 1		@ invalidate\n\
-	strd	r2, [r0], #8			\n\
-	ldrd	r2, [r1], #8			\n\
-	subs	lr, lr, #1			\n\
-	strd	r4, [r0], #8			\n\
-	ldrd	r4, [r1], #8			\n\
-	strd	r2, [r0], #8			\n\
-	strd	r4, [r0], #8			\n\
+2:	ldrd	r2, [%1], #8			\n\
+	ldrd	r4, [%1], #8			\n\
+	mcr	p15, 0, %0, c7, c6, 1		@ invalidate\n\
+	strd	r2, [%0], #8			\n\
+	ldrd	r2, [%1], #8			\n\
+	strd	r4, [%0], #8			\n\
+	ldrd	r4, [%1], #8			\n\
+	strd	r2, [%0], #8			\n\
+	strd	r4, [%0], #8			\n\
+	ldrd	r2, [%1], #8			\n\
+	ldrd	r4, [%1], #8			\n\
+	mcr	p15, 0, %0, c7, c6, 1		@ invalidate\n\
+	strd	r2, [%0], #8			\n\
+	ldrd	r2, [%1], #8			\n\
+	subs	%2, %2, #1			\n\
+	strd	r4, [%0], #8			\n\
+	ldrd	r4, [%1], #8			\n\
+	strd	r2, [%0], #8			\n\
+	strd	r4, [%0], #8			\n\
 	bgt	1b				\n\
-	beq	2b				\n\
-						\n\
-	ldmfd	sp!, {r4, r5, pc}"
-	:
-	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64 - 1));
+	beq	2b				"
+	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
+	: "2" (PAGE_SIZE / 64 - 1)
+	: "r2", "r3", "r4", "r5");
 }
 
 void xsc3_mc_copy_user_highpage(struct page *to, struct page *from,
@@ -85,8 +77,6 @@ void xsc3_mc_copy_user_highpage(struct page *to, struct page *from,
 
 /*
  * XScale optimised clear_user_page
- *  r0 = destination
- *  r1 = virtual user address of ultimate destination page
  */
 void xsc3_mc_clear_user_highpage(struct page *page, unsigned long vaddr)
 {
diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
index 97972379f4..fa0be66082 100644
--- a/arch/arm/mm/copypage-xscale.c
+++ b/arch/arm/mm/copypage-xscale.c
@@ -36,52 +36,50 @@ static DEFINE_RAW_SPINLOCK(minicache_lock);
  * Dcache aliasing issue.  The writes will be forwarded to the write buffer,
  * and merged as appropriate.
  */
-static void __naked
-mc_copy_user_page(void *from, void *to)
+static void mc_copy_user_page(void *from, void *to)
 {
+	int tmp;
 	/*
 	 * Strangely enough, best performance is achieved
 	 * when prefetching destination as well.  (NP)
 	 */
-	asm volatile(
-	"stmfd	sp!, {r4, r5, lr}		\n\
-	mov	lr, %2				\n\
-	pld	[r0, #0]			\n\
-	pld	[r0, #32]			\n\
-	pld	[r1, #0]			\n\
-	pld	[r1, #32]			\n\
-1:	pld	[r0, #64]			\n\
-	pld	[r0, #96]			\n\
-	pld	[r1, #64]			\n\
-	pld	[r1, #96]			\n\
-2:	ldrd	r2, [r0], #8			\n\
-	ldrd	r4, [r0], #8			\n\
-	mov	ip, r1				\n\
-	strd	r2, [r1], #8			\n\
-	ldrd	r2, [r0], #8			\n\
-	strd	r4, [r1], #8			\n\
-	ldrd	r4, [r0], #8			\n\
-	strd	r2, [r1], #8			\n\
-	strd	r4, [r1], #8			\n\
+	asm volatile ("\
+	pld	[%0, #0]			\n\
+	pld	[%0, #32]			\n\
+	pld	[%1, #0]			\n\
+	pld	[%1, #32]			\n\
+1:	pld	[%0, #64]			\n\
+	pld	[%0, #96]			\n\
+	pld	[%1, #64]			\n\
+	pld	[%1, #96]			\n\
+2:	ldrd	r2, [%0], #8			\n\
+	ldrd	r4, [%0], #8			\n\
+	mov	ip, %1				\n\
+	strd	r2, [%1], #8			\n\
+	ldrd	r2, [%0], #8			\n\
+	strd	r4, [%1], #8			\n\
+	ldrd	r4, [%0], #8			\n\
+	strd	r2, [%1], #8			\n\
+	strd	r4, [%1], #8			\n\
 	mcr	p15, 0, ip, c7, c10, 1		@ clean D line\n\
-	ldrd	r2, [r0], #8			\n\
+	ldrd	r2, [%0], #8			\n\
 	mcr	p15, 0, ip, c7, c6, 1		@ invalidate D line\n\
-	ldrd	r4, [r0], #8			\n\
-	mov	ip, r1				\n\
-	strd	r2, [r1], #8			\n\
-	ldrd	r2, [r0], #8			\n\
-	strd	r4, [r1], #8			\n\
-	ldrd	r4, [r0], #8			\n\
-	strd	r2, [r1], #8			\n\
-	strd	r4, [r1], #8			\n\
+	ldrd	r4, [%0], #8			\n\
+	mov	ip, %1				\n\
+	strd	r2, [%1], #8			\n\
+	ldrd	r2, [%0], #8			\n\
+	strd	r4, [%1], #8			\n\
+	ldrd	r4, [%0], #8			\n\
+	strd	r2, [%1], #8			\n\
+	strd	r4, [%1], #8			\n\
 	mcr	p15, 0, ip, c7, c10, 1		@ clean D line\n\
-	subs	lr, lr, #1			\n\
+	subs	%2, %2, #1			\n\
 	mcr	p15, 0, ip, c7, c6, 1		@ invalidate D line\n\
 	bgt	1b				\n\
-	beq	2b				\n\
-	ldmfd	sp!, {r4, r5, pc}		"
-	:
-	: "r" (from), "r" (to), "I" (PAGE_SIZE / 64 - 1));
+	beq	2b				"
+	: "+&r" (from), "+&r" (to), "=&r" (tmp)
+	: "2" (PAGE_SIZE / 64 - 1)
+	: "r2", "r3", "r4", "r5", "ip");
 }
 
 void xscale_mc_copy_user_highpage(struct page *to, struct page *from,


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-10-16 20:43     ` Nicolas Pitre
@ 2018-10-16 21:59       ` Stefan Agner
  2018-10-17  8:58       ` Arnd Bergmann
  2018-11-05 23:00       ` Stefan Agner
  2 siblings, 0 replies; 21+ messages in thread
From: Stefan Agner @ 2018-10-16 21:59 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Russell King - ARM Linux, Linus Walleij, Hans Ulli Kroll,
	Joel Stanley, Arnd Bergmann, Linux ARM, linux-kernel,
	Roman Yeryomin

On 16.10.2018 22:43, Nicolas Pitre wrote:
> On Tue, 16 Oct 2018, Russell King - ARM Linux wrote:
> 
>> On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote:
>> > On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner <stefan@agner.ch> wrote:
>> >
>> > > When functions incoming parameters are not in input operands list gcc
>> > > 4.5 does not load the parameters into registers before calling this
>> > > function but the inline assembly assumes valid addresses inside this
>> > > function. This breaks the code because r0 and r1 are invalid when
>> > > execution enters v4wb_copy_user_page ()
>> > >
>> > > Also the constant needs to be used as third input operand so account
>> > > for that as well.
>> > >
>> > > This fixes copypage-fa.c what has previously done before for the other
>> > > copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto
>> > > and kfrom to input operands list.").
>> > >
>> > > Signed-off-by: Stefan Agner <stefan@agner.ch>
>> >
>> > Please add:
>> > Cc: stable@vger.kernel.org
>>
>> It's not obvious yet whether this is right - it contradicts the GCC
>> manual, but then we have evidence that it's required for some GCC
>> versions where GCC may clone the function, or if the function is
>> used within the same file.
> 
> Why not getting rid of __naked altogether? Here's what I suggest:
> 
> ----- >8
> Subject: [PATCH] ARM: remove naked function usage
> 
> Convert page copy functions not to rely on the naked function attribute.
> 
> This attribute is known to confuse some gcc versions when function
> arguments aren't explicitly listed as inline assembly operands despite
> the gcc documentation. That resulted in commit 9a40ac86152c ("ARM:
> 6164/1: Add kto and kfrom to input operands list.").
> 
> Yet that commit has problems of its own by having assembly operand
> constraints completely wrong. If the generated code has been OK since
> then, it is due to luck rather than correctness. So this patch provides
> proper assembly operand usage, and removes two instances of redundant
> register duplications in the implementation while at it.
> 
> Inspection of the generated code with this patch doesn't show any obvious
> quality degradation either, so not relying on __naked at all will make
> the code less fragile, and more likely to be compilable with clang.
> 
> The only remaining __naked instances (excluding the kprobes test cases)
> are exynos_pm_power_up_setup() and tc2_pm_power_up_setup(). But in those
> cases only the function address is used by the compiler with no chance of
> inlining it by mistake.

Tested using Qemu mainstone and versatileab (pxa_defconfig-CONFIG_FTRACE
and versatile_defconfig) compiled with Clang 7.0. Both configuration
compile and boot fine.

So from that perspective:

Tested-by: Stefan Agner <stefan@agner.ch>

--
Stefan

> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> ---
>  arch/arm/mm/copypage-fa.c       | 34 ++++++------
>  arch/arm/mm/copypage-feroceon.c | 97 +++++++++++++++++------------------
>  arch/arm/mm/copypage-v4mc.c     | 18 +++----
>  arch/arm/mm/copypage-v4wb.c     | 40 +++++++--------
>  arch/arm/mm/copypage-v4wt.c     | 36 ++++++-------
>  arch/arm/mm/copypage-xsc3.c     | 70 +++++++++++--------------
>  arch/arm/mm/copypage-xscale.c   | 70 ++++++++++++-------------
>  7 files changed, 171 insertions(+), 194 deletions(-)
> 
> diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c
> index d130a5ece5..453a3341ca 100644
> --- a/arch/arm/mm/copypage-fa.c
> +++ b/arch/arm/mm/copypage-fa.c
> @@ -17,26 +17,24 @@
>  /*
>   * Faraday optimised copy_user_page
>   */
> -static void __naked
> -fa_copy_user_page(void *kto, const void *kfrom)
> +static void fa_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4, lr}			@ 2\n\
> -	mov	r2, %0				@ 1\n\
> -1:	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> -	add	r0, r0, #16			@ 1\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> -	add	r0, r0, #16			@ 1\n\
> -	subs	r2, r2, #1			@ 1\n\
> +	int tmp;
> +	asm volatile ("\
> +1:	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	stmia	%0, {r3, r4, ip, lr}		@ 4\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> +	add	%0, %0, #16			@ 1\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	stmia	%0, {r3, r4, ip, lr}		@ 4\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> +	add	%0, %0, #16			@ 1\n\
> +	subs	%2, %2, #1			@ 1\n\
>  	bne	1b				@ 1\n\
> -	mcr	p15, 0, r2, c7, c10, 4		@ 1   drain WB\n\
> -	ldmfd	sp!, {r4, pc}			@ 3"
> -	:
> -	: "I" (PAGE_SIZE / 32));
> +	mcr	p15, 0, %2, c7, c10, 4		@ 1   drain WB"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" "tmp)
> +	: "2" (PAGE_SIZE / 32)
> +	: "r3", "r4", "ip", "lr");
>  }
>  
>  void fa_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-feroceon.c b/arch/arm/mm/copypage-feroceon.c
> index 49ee0c1a72..1349430c63 100644
> --- a/arch/arm/mm/copypage-feroceon.c
> +++ b/arch/arm/mm/copypage-feroceon.c
> @@ -13,58 +13,55 @@
>  #include <linux/init.h>
>  #include <linux/highmem.h>
>  
> -static void __naked
> -feroceon_copy_user_page(void *kto, const void *kfrom)
> +static void feroceon_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4-r9, lr}		\n\
> -	mov	ip, %2				\n\
> -1:	mov	lr, r1				\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	pld	[lr, #32]			\n\
> -	pld	[lr, #64]			\n\
> -	pld	[lr, #96]			\n\
> -	pld	[lr, #128]			\n\
> -	pld	[lr, #160]			\n\
> -	pld	[lr, #192]			\n\
> -	pld	[lr, #224]			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	subs	ip, ip, #(32 * 8)		\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> +	int tmp;
> +	asm volatile ("\
> +1:	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	pld	[%1, #0]			\n\
> +	pld	[%1, #32]			\n\
> +	pld	[%1, #64]			\n\
> +	pld	[%1, #96]			\n\
> +	pld	[%1, #128]			\n\
> +	pld	[%1, #160]			\n\
> +	pld	[%1, #192]			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	subs	%2, %2, #(32 * 8)		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
>  	bne	1b				\n\
> -	mcr	p15, 0, ip, c7, c10, 4		@ drain WB\n\
> -	ldmfd	sp!, {r4-r9, pc}"
> -	:
> -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE));
> +	mcr	p15, 0, %2, c7, c10, 4		@ drain WB"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> +	: =2" (PAGE_SIZE),
> +	: "r2", "r3", "r4", "r5", "r6", "r7", "ip", "lr");
>  }
>  
>  void feroceon_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
> index 0224416cba..494ddc435a 100644
> --- a/arch/arm/mm/copypage-v4mc.c
> +++ b/arch/arm/mm/copypage-v4mc.c
> @@ -40,12 +40,10 @@ static DEFINE_RAW_SPINLOCK(minicache_lock);
>   * instruction.  If your processor does not supply this, you have to write your
>   * own copy_user_highpage that does the right thing.
>   */
> -static void __naked
> -mc_copy_user_page(void *from, void *to)
> +static void mc_copy_user_page(void *from, void *to)
>  {
> -	asm volatile(
> -	"stmfd	sp!, {r4, lr}			@ 2\n\
> -	mov	r4, %2				@ 1\n\
> +	int tmp;
> +	asm volatile ("\
>  	ldmia	%0!, {r2, r3, ip, lr}		@ 4\n\
>  1:	mcr	p15, 0, %1, c7, c6, 1		@ 1   invalidate D line\n\
>  	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
> @@ -55,13 +53,13 @@ mc_copy_user_page(void *from, void *to)
>  	mcr	p15, 0, %1, c7, c6, 1		@ 1   invalidate D line\n\
>  	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
>  	ldmia	%0!, {r2, r3, ip, lr}		@ 4\n\
> -	subs	r4, r4, #1			@ 1\n\
> +	subs	%2, %2, #1			@ 1\n\
>  	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
>  	ldmneia	%0!, {r2, r3, ip, lr}		@ 4\n\
> -	bne	1b				@ 1\n\
> -	ldmfd	sp!, {r4, pc}			@ 3"
> -	:
> -	: "r" (from), "r" (to), "I" (PAGE_SIZE / 64));
> +	bne	1b				@ "
> +	: "+&r" (from), "+&r" (to), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64)
> +	: "r2", "r3", "ip", "lr");
>  }
>  
>  void v4_mc_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-v4wb.c b/arch/arm/mm/copypage-v4wb.c
> index 067d0fdd63..cf064ac6fc 100644
> --- a/arch/arm/mm/copypage-v4wb.c
> +++ b/arch/arm/mm/copypage-v4wb.c
> @@ -22,29 +22,27 @@
>   * instruction.  If your processor does not supply this, you have to write your
>   * own copy_user_highpage that does the right thing.
>   */
> -static void __naked
> -v4wb_copy_user_page(void *kto, const void *kfrom)
> +static void v4wb_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4, lr}			@ 2\n\
> -	mov	r2, %2				@ 1\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -1:	mcr	p15, 0, r0, c7, c6, 1		@ 1   invalidate D line\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4+1\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	mcr	p15, 0, r0, c7, c6, 1		@ 1   invalidate D line\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	subs	r2, r2, #1			@ 1\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmneia	r1!, {r3, r4, ip, lr}		@ 4\n\
> +	int tmp;
> +	asm volatile ("\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +1:	mcr	p15, 0, %0, c7, c6, 1		@ 1   invalidate D line\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4+1\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	mcr	p15, 0, %0, c7, c6, 1		@ 1   invalidate D line\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	subs	%2, %2, #1			@ 1\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmneia	%1!, {r3, r4, ip, lr}		@ 4\n\
>  	bne	1b				@ 1\n\
> -	mcr	p15, 0, r1, c7, c10, 4		@ 1   drain WB\n\
> -	ldmfd	 sp!, {r4, pc}			@ 3"
> -	:
> -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64));
> +	mcr	p15, 0, %1, c7, c10, 4		@ 1   drain WB"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64)
> +	: "r3", "r4", "ip", "lr");
>  }
>  
>  void v4wb_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-v4wt.c b/arch/arm/mm/copypage-v4wt.c
> index b85c5da2e5..66745bd3a6 100644
> --- a/arch/arm/mm/copypage-v4wt.c
> +++ b/arch/arm/mm/copypage-v4wt.c
> @@ -20,27 +20,25 @@
>   * dirty data in the cache.  However, we do have to ensure that
>   * subsequent reads are up to date.
>   */
> -static void __naked
> -v4wt_copy_user_page(void *kto, const void *kfrom)
> +static void v4wt_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4, lr}			@ 2\n\
> -	mov	r2, %2				@ 1\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -1:	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4+1\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	subs	r2, r2, #1			@ 1\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmneia	r1!, {r3, r4, ip, lr}		@ 4\n\
> +	int tmp;
> +	asm volatile ("\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +1:	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4+1\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	subs	%2, %2, #1			@ 1\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmneia	%1!, {r3, r4, ip, lr}		@ 4\n\
>  	bne	1b				@ 1\n\
> -	mcr	p15, 0, r2, c7, c7, 0		@ flush ID cache\n\
> -	ldmfd	sp!, {r4, pc}			@ 3"
> -	:
> -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64));
> +	mcr	p15, 0, %2, c7, c7, 0		@ flush ID cache"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64)
> +	: "r3", "r4", "ip", "lr");
>  }
>  
>  void v4wt_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-xsc3.c b/arch/arm/mm/copypage-xsc3.c
> index 03a2042ace..727a02c149 100644
> --- a/arch/arm/mm/copypage-xsc3.c
> +++ b/arch/arm/mm/copypage-xsc3.c
> @@ -21,53 +21,45 @@
>  
>  /*
>   * XSC3 optimised copy_user_highpage
> - *  r0 = destination
> - *  r1 = source
>   *
>   * The source page may have some clean entries in the cache already, but we
>   * can safely ignore them - break_cow() will flush them out of the cache
>   * if we eventually end up using our copied page.
>   *
>   */
> -static void __naked
> -xsc3_mc_copy_user_page(void *kto, const void *kfrom)
> +static void xsc3_mc_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4, r5, lr}		\n\
> -	mov	lr, %2				\n\
> -						\n\
> -	pld	[r1, #0]			\n\
> -	pld	[r1, #32]			\n\
> -1:	pld	[r1, #64]			\n\
> -	pld	[r1, #96]			\n\
> +	int tmp;
> +	asm volatile ("\
> +	pld	[%1, #0]			\n\
> +	pld	[%1, #32]			\n\
> +1:	pld	[%1, #64]			\n\
> +	pld	[%1, #96]			\n\
>  						\n\
> -2:	ldrd	r2, [r1], #8			\n\
> -	mov	ip, r0				\n\
> -	ldrd	r4, [r1], #8			\n\
> -	mcr	p15, 0, ip, c7, c6, 1		@ invalidate\n\
> -	strd	r2, [r0], #8			\n\
> -	ldrd	r2, [r1], #8			\n\
> -	strd	r4, [r0], #8			\n\
> -	ldrd	r4, [r1], #8			\n\
> -	strd	r2, [r0], #8			\n\
> -	strd	r4, [r0], #8			\n\
> -	ldrd	r2, [r1], #8			\n\
> -	mov	ip, r0				\n\
> -	ldrd	r4, [r1], #8			\n\
> -	mcr	p15, 0, ip, c7, c6, 1		@ invalidate\n\
> -	strd	r2, [r0], #8			\n\
> -	ldrd	r2, [r1], #8			\n\
> -	subs	lr, lr, #1			\n\
> -	strd	r4, [r0], #8			\n\
> -	ldrd	r4, [r1], #8			\n\
> -	strd	r2, [r0], #8			\n\
> -	strd	r4, [r0], #8			\n\
> +2:	ldrd	r2, [%1], #8			\n\
> +	ldrd	r4, [%1], #8			\n\
> +	mcr	p15, 0, %0, c7, c6, 1		@ invalidate\n\
> +	strd	r2, [%0], #8			\n\
> +	ldrd	r2, [%1], #8			\n\
> +	strd	r4, [%0], #8			\n\
> +	ldrd	r4, [%1], #8			\n\
> +	strd	r2, [%0], #8			\n\
> +	strd	r4, [%0], #8			\n\
> +	ldrd	r2, [%1], #8			\n\
> +	ldrd	r4, [%1], #8			\n\
> +	mcr	p15, 0, %0, c7, c6, 1		@ invalidate\n\
> +	strd	r2, [%0], #8			\n\
> +	ldrd	r2, [%1], #8			\n\
> +	subs	%2, %2, #1			\n\
> +	strd	r4, [%0], #8			\n\
> +	ldrd	r4, [%1], #8			\n\
> +	strd	r2, [%0], #8			\n\
> +	strd	r4, [%0], #8			\n\
>  	bgt	1b				\n\
> -	beq	2b				\n\
> -						\n\
> -	ldmfd	sp!, {r4, r5, pc}"
> -	:
> -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64 - 1));
> +	beq	2b				"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64 - 1)
> +	: "r2", "r3", "r4", "r5");
>  }
>  
>  void xsc3_mc_copy_user_highpage(struct page *to, struct page *from,
> @@ -85,8 +77,6 @@ void xsc3_mc_copy_user_highpage(struct page *to,
> struct page *from,
>  
>  /*
>   * XScale optimised clear_user_page
> - *  r0 = destination
> - *  r1 = virtual user address of ultimate destination page
>   */
>  void xsc3_mc_clear_user_highpage(struct page *page, unsigned long vaddr)
>  {
> diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
> index 97972379f4..fa0be66082 100644
> --- a/arch/arm/mm/copypage-xscale.c
> +++ b/arch/arm/mm/copypage-xscale.c
> @@ -36,52 +36,50 @@ static DEFINE_RAW_SPINLOCK(minicache_lock);
>   * Dcache aliasing issue.  The writes will be forwarded to the write buffer,
>   * and merged as appropriate.
>   */
> -static void __naked
> -mc_copy_user_page(void *from, void *to)
> +static void mc_copy_user_page(void *from, void *to)
>  {
> +	int tmp;
>  	/*
>  	 * Strangely enough, best performance is achieved
>  	 * when prefetching destination as well.  (NP)
>  	 */
> -	asm volatile(
> -	"stmfd	sp!, {r4, r5, lr}		\n\
> -	mov	lr, %2				\n\
> -	pld	[r0, #0]			\n\
> -	pld	[r0, #32]			\n\
> -	pld	[r1, #0]			\n\
> -	pld	[r1, #32]			\n\
> -1:	pld	[r0, #64]			\n\
> -	pld	[r0, #96]			\n\
> -	pld	[r1, #64]			\n\
> -	pld	[r1, #96]			\n\
> -2:	ldrd	r2, [r0], #8			\n\
> -	ldrd	r4, [r0], #8			\n\
> -	mov	ip, r1				\n\
> -	strd	r2, [r1], #8			\n\
> -	ldrd	r2, [r0], #8			\n\
> -	strd	r4, [r1], #8			\n\
> -	ldrd	r4, [r0], #8			\n\
> -	strd	r2, [r1], #8			\n\
> -	strd	r4, [r1], #8			\n\
> +	asm volatile ("\
> +	pld	[%0, #0]			\n\
> +	pld	[%0, #32]			\n\
> +	pld	[%1, #0]			\n\
> +	pld	[%1, #32]			\n\
> +1:	pld	[%0, #64]			\n\
> +	pld	[%0, #96]			\n\
> +	pld	[%1, #64]			\n\
> +	pld	[%1, #96]			\n\
> +2:	ldrd	r2, [%0], #8			\n\
> +	ldrd	r4, [%0], #8			\n\
> +	mov	ip, %1				\n\
> +	strd	r2, [%1], #8			\n\
> +	ldrd	r2, [%0], #8			\n\
> +	strd	r4, [%1], #8			\n\
> +	ldrd	r4, [%0], #8			\n\
> +	strd	r2, [%1], #8			\n\
> +	strd	r4, [%1], #8			\n\
>  	mcr	p15, 0, ip, c7, c10, 1		@ clean D line\n\
> -	ldrd	r2, [r0], #8			\n\
> +	ldrd	r2, [%0], #8			\n\
>  	mcr	p15, 0, ip, c7, c6, 1		@ invalidate D line\n\
> -	ldrd	r4, [r0], #8			\n\
> -	mov	ip, r1				\n\
> -	strd	r2, [r1], #8			\n\
> -	ldrd	r2, [r0], #8			\n\
> -	strd	r4, [r1], #8			\n\
> -	ldrd	r4, [r0], #8			\n\
> -	strd	r2, [r1], #8			\n\
> -	strd	r4, [r1], #8			\n\
> +	ldrd	r4, [%0], #8			\n\
> +	mov	ip, %1				\n\
> +	strd	r2, [%1], #8			\n\
> +	ldrd	r2, [%0], #8			\n\
> +	strd	r4, [%1], #8			\n\
> +	ldrd	r4, [%0], #8			\n\
> +	strd	r2, [%1], #8			\n\
> +	strd	r4, [%1], #8			\n\
>  	mcr	p15, 0, ip, c7, c10, 1		@ clean D line\n\
> -	subs	lr, lr, #1			\n\
> +	subs	%2, %2, #1			\n\
>  	mcr	p15, 0, ip, c7, c6, 1		@ invalidate D line\n\
>  	bgt	1b				\n\
> -	beq	2b				\n\
> -	ldmfd	sp!, {r4, r5, pc}		"
> -	:
> -	: "r" (from), "r" (to), "I" (PAGE_SIZE / 64 - 1));
> +	beq	2b				"
> +	: "+&r" (from), "+&r" (to), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64 - 1)
> +	: "r2", "r3", "r4", "r5", "ip");
>  }
>  
>  void xscale_mc_copy_user_highpage(struct page *to, struct page *from,

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-10-16 20:43     ` Nicolas Pitre
  2018-10-16 21:59       ` Stefan Agner
@ 2018-10-17  8:58       ` Arnd Bergmann
  2018-10-17  9:04         ` [PATCH] [ALTERNATIVE] ARM: fix copypage functions for clang Arnd Bergmann
  2018-10-17 14:23         ` [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list Nicolas Pitre
  2018-11-05 23:00       ` Stefan Agner
  2 siblings, 2 replies; 21+ messages in thread
From: Arnd Bergmann @ 2018-10-17  8:58 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Russell King - ARM Linux, Linus Walleij, Stefan Agner,
	Hans Ulli Kroll, Joel Stanley, Linux ARM,
	Linux Kernel Mailing List, Roman Yeryomin

On Tue, Oct 16, 2018 at 10:43 PM Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Tue, 16 Oct 2018, Russell King - ARM Linux wrote:
> > On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote:
> > > On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner <stefan@agner.ch> wrote:
> > It's not obvious yet whether this is right - it contradicts the GCC
> > manual, but then we have evidence that it's required for some GCC
> > versions where GCC may clone the function, or if the function is
> > used within the same file.
>
> Why not getting rid of __naked altogether? Here's what I suggest:
>
> ----- >8
> Subject: [PATCH] ARM: remove naked function usage
>
> Convert page copy functions not to rely on the naked function attribute.
>
> This attribute is known to confuse some gcc versions when function
> arguments aren't explicitly listed as inline assembly operands despite
> the gcc documentation. That resulted in commit 9a40ac86152c ("ARM:
> 6164/1: Add kto and kfrom to input operands list.").

It's probably worth noting that the minimum gcc version for compiling
the kernel is now gcc-4.6, which I think does not suffer from the gcc-4.5
bug that triggered the change. See in particular commits 9c695203a7dd
("compiler-gcc.h: gcc-4.5 needs noclone and noinline on __naked functions")
and d124b44f09ca ("Compiler Attributes: naked was fixed in gcc 4.6").

The first one made sure we don't inline these functions, so gcc-4.5
no longer runs into the problem even in the absence of the workaround,
and the second patch reverts that again, noting that gcc-4.6 is fixed.

I don't see anything wrong with converting the functions to not
use __naked at all, but I think we can also just revert the original
commit 9a40ac86152c to get it to build with clang. When I last
played with clang on arm32, that's what I did. I'll reply with the
patch I have in my randconfig tree.

      Arnd

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] [ALTERNATIVE] ARM: fix copypage functions for clang
  2018-10-17  8:58       ` Arnd Bergmann
@ 2018-10-17  9:04         ` Arnd Bergmann
  2018-10-17  9:35           ` Russell King - ARM Linux
  2018-10-17 14:23         ` [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list Nicolas Pitre
  1 sibling, 1 reply; 21+ messages in thread
From: Arnd Bergmann @ 2018-10-17  9:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux, linus.walleij, stefan, ulli.kroll, joel, linux-kernel,
	roman, Arnd Bergmann

clang points out that a naked function should not pass
the function arguments into the inline assembly:

arch/arm/mm/copypage-feroceon.c:67:9: error: parameter references not allowed in naked functions
arch/arm/mm/copypage-v4mc.c:64:9: error: parameter references not allowed in naked functions
arch/arm/mm/copypage-v4wb.c:47:9: error: parameter references not allowed in naked functions
arch/arm/mm/copypage-v4wt.c:43:9: error: parameter references not allowed in naked functions
arch/arm/mm/copypage-xsc3.c:70:9: error: parameter references not allowed in naked functions
arch/arm/mm/copypage-xscale.c:84:9: error: parameter references not allowed in naked functions

The constraints were originally added in commit 9a40ac86152c ("ARM:
6164/1: Add kto and kfrom to input operands list.") as a gcc-4.5
workaround. Another workaround for the same problem was added in commit
9c695203a7dd ("compiler-gcc.h: gcc-4.5 needs noclone and noinline
on __naked functions") and should have obsoleted the first one. That
workaroud was subsequently reverted in commit d124b44f09ca ("Compiler
Attributes: naked was fixed in gcc 4.6") as we raised the minimum compiler
level to gcc-4.6.

Remove the extraneous references and use the register numbers
consistently as required by clang.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
I've used this on my randconfig build setup, and it makes all
configurations build without warnings, but I have not done
any runtime testing on it.
---
 arch/arm/mm/copypage-feroceon.c |  4 ++--
 arch/arm/mm/copypage-v4mc.c     | 26 +++++++++++++-------------
 arch/arm/mm/copypage-v4wb.c     |  4 ++--
 arch/arm/mm/copypage-v4wt.c     |  4 ++--
 arch/arm/mm/copypage-xsc3.c     |  6 +++---
 arch/arm/mm/copypage-xscale.c   |  4 ++--
 6 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/arm/mm/copypage-feroceon.c b/arch/arm/mm/copypage-feroceon.c
index 49ee0c1a7209..e69bf2f15f32 100644
--- a/arch/arm/mm/copypage-feroceon.c
+++ b/arch/arm/mm/copypage-feroceon.c
@@ -18,7 +18,7 @@ feroceon_copy_user_page(void *kto, const void *kfrom)
 {
 	asm("\
 	stmfd	sp!, {r4-r9, lr}		\n\
-	mov	ip, %2				\n\
+	mov	ip, %0				\n\
 1:	mov	lr, r1				\n\
 	ldmia	r1!, {r2 - r9}			\n\
 	pld	[lr, #32]			\n\
@@ -64,7 +64,7 @@ feroceon_copy_user_page(void *kto, const void *kfrom)
 	mcr	p15, 0, ip, c7, c10, 4		@ drain WB\n\
 	ldmfd	sp!, {r4-r9, pc}"
 	:
-	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE));
+	: "I" (PAGE_SIZE));
 }
 
 void feroceon_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
index 0224416cba3c..5c70e48ad833 100644
--- a/arch/arm/mm/copypage-v4mc.c
+++ b/arch/arm/mm/copypage-v4mc.c
@@ -45,23 +45,23 @@ mc_copy_user_page(void *from, void *to)
 {
 	asm volatile(
 	"stmfd	sp!, {r4, lr}			@ 2\n\
-	mov	r4, %2				@ 1\n\
-	ldmia	%0!, {r2, r3, ip, lr}		@ 4\n\
-1:	mcr	p15, 0, %1, c7, c6, 1		@ 1   invalidate D line\n\
-	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
-	ldmia	%0!, {r2, r3, ip, lr}		@ 4+1\n\
-	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
-	ldmia	%0!, {r2, r3, ip, lr}		@ 4\n\
-	mcr	p15, 0, %1, c7, c6, 1		@ 1   invalidate D line\n\
-	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
-	ldmia	%0!, {r2, r3, ip, lr}		@ 4\n\
+	mov	r4, %0				@ 1\n\
+	ldmia	r0!, {r2, r3, ip, lr}		@ 4\n\
+1:	mcr	p15, 0, r1, c7, c6, 1		@ 1   invalidate D line\n\
+	stmia	r1!, {r2, r3, ip, lr}		@ 4\n\
+	ldmia	r0!, {r2, r3, ip, lr}		@ 4+1\n\
+	stmia	r1!, {r2, r3, ip, lr}		@ 4\n\
+	ldmia	r0!, {r2, r3, ip, lr}		@ 4\n\
+	mcr	p15, 0, r1, c7, c6, 1		@ 1   invalidate D line\n\
+	stmia	r1!, {r2, r3, ip, lr}		@ 4\n\
+	ldmia	r0!, {r2, r3, ip, lr}		@ 4\n\
 	subs	r4, r4, #1			@ 1\n\
-	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
-	ldmneia	%0!, {r2, r3, ip, lr}		@ 4\n\
+	stmia	r1!, {r2, r3, ip, lr}		@ 4\n\
+	ldmneia	r0!, {r2, r3, ip, lr}		@ 4\n\
 	bne	1b				@ 1\n\
 	ldmfd	sp!, {r4, pc}			@ 3"
 	:
-	: "r" (from), "r" (to), "I" (PAGE_SIZE / 64));
+	: "I" (PAGE_SIZE / 64));
 }
 
 void v4_mc_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-v4wb.c b/arch/arm/mm/copypage-v4wb.c
index 067d0fdd630c..7ea9cf07bd5c 100644
--- a/arch/arm/mm/copypage-v4wb.c
+++ b/arch/arm/mm/copypage-v4wb.c
@@ -27,7 +27,7 @@ v4wb_copy_user_page(void *kto, const void *kfrom)
 {
 	asm("\
 	stmfd	sp!, {r4, lr}			@ 2\n\
-	mov	r2, %2				@ 1\n\
+	mov	r2, %0				@ 1\n\
 	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
 1:	mcr	p15, 0, r0, c7, c6, 1		@ 1   invalidate D line\n\
 	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
@@ -44,7 +44,7 @@ v4wb_copy_user_page(void *kto, const void *kfrom)
 	mcr	p15, 0, r1, c7, c10, 4		@ 1   drain WB\n\
 	ldmfd	 sp!, {r4, pc}			@ 3"
 	:
-	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64));
+	: "I" (PAGE_SIZE / 64));
 }
 
 void v4wb_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-v4wt.c b/arch/arm/mm/copypage-v4wt.c
index b85c5da2e510..c742ab24efd6 100644
--- a/arch/arm/mm/copypage-v4wt.c
+++ b/arch/arm/mm/copypage-v4wt.c
@@ -25,7 +25,7 @@ v4wt_copy_user_page(void *kto, const void *kfrom)
 {
 	asm("\
 	stmfd	sp!, {r4, lr}			@ 2\n\
-	mov	r2, %2				@ 1\n\
+	mov	r2, %0				@ 1\n\
 	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
 1:	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
 	ldmia	r1!, {r3, r4, ip, lr}		@ 4+1\n\
@@ -40,7 +40,7 @@ v4wt_copy_user_page(void *kto, const void *kfrom)
 	mcr	p15, 0, r2, c7, c7, 0		@ flush ID cache\n\
 	ldmfd	sp!, {r4, pc}			@ 3"
 	:
-	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64));
+	: "I" (PAGE_SIZE / 64));
 }
 
 void v4wt_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-xsc3.c b/arch/arm/mm/copypage-xsc3.c
index 03a2042aced5..9944bdb4721d 100644
--- a/arch/arm/mm/copypage-xsc3.c
+++ b/arch/arm/mm/copypage-xsc3.c
@@ -34,8 +34,8 @@ xsc3_mc_copy_user_page(void *kto, const void *kfrom)
 {
 	asm("\
 	stmfd	sp!, {r4, r5, lr}		\n\
-	mov	lr, %2				\n\
-						\n\
+	mov	lr, %0				\n\
+					\n\
 	pld	[r1, #0]			\n\
 	pld	[r1, #32]			\n\
 1:	pld	[r1, #64]			\n\
@@ -67,7 +67,7 @@ xsc3_mc_copy_user_page(void *kto, const void *kfrom)
 						\n\
 	ldmfd	sp!, {r4, r5, pc}"
 	:
-	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64 - 1));
+	: "I" (PAGE_SIZE / 64 - 1));
 }
 
 void xsc3_mc_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
index 97972379f4d6..ef52a052d9bb 100644
--- a/arch/arm/mm/copypage-xscale.c
+++ b/arch/arm/mm/copypage-xscale.c
@@ -45,7 +45,7 @@ mc_copy_user_page(void *from, void *to)
 	 */
 	asm volatile(
 	"stmfd	sp!, {r4, r5, lr}		\n\
-	mov	lr, %2				\n\
+	mov	lr, %0				\n\
 	pld	[r0, #0]			\n\
 	pld	[r0, #32]			\n\
 	pld	[r1, #0]			\n\
@@ -81,7 +81,7 @@ mc_copy_user_page(void *from, void *to)
 	beq	2b				\n\
 	ldmfd	sp!, {r4, r5, pc}		"
 	:
-	: "r" (from), "r" (to), "I" (PAGE_SIZE / 64 - 1));
+	: "I" (PAGE_SIZE / 64 - 1));
 }
 
 void xscale_mc_copy_user_highpage(struct page *to, struct page *from,
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] [ALTERNATIVE] ARM: fix copypage functions for clang
  2018-10-17  9:04         ` [PATCH] [ALTERNATIVE] ARM: fix copypage functions for clang Arnd Bergmann
@ 2018-10-17  9:35           ` Russell King - ARM Linux
  0 siblings, 0 replies; 21+ messages in thread
From: Russell King - ARM Linux @ 2018-10-17  9:35 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, linus.walleij, stefan, ulli.kroll, joel,
	linux-kernel, roman

On Wed, Oct 17, 2018 at 11:04:17AM +0200, Arnd Bergmann wrote:
> The constraints were originally added in commit 9a40ac86152c ("ARM:
> 6164/1: Add kto and kfrom to input operands list.") as a gcc-4.5
> workaround. Another workaround for the same problem was added in commit
> 9c695203a7dd ("compiler-gcc.h: gcc-4.5 needs noclone and noinline
> on __naked functions") and should have obsoleted the first one.

That is an incorrect statement - please read the discussion back then,
particularly Mikael Pettersson's reply:

"I've tested and verified that this bit enables a gcc-4.5 compiled kernel
to boot on TS-119 (Kirkwood) when combined with my fix for __naked.
With neither or only one of the patches applied, the kernel oopses hard
in copy_user_page() as it tries to start /sbin/init."

That is very clear that it is not "one or the other" patch, and it's
certainly not true that one patch obsoletes the other.

Mikael is also very clear in the effects that are going on - to re-quote
what I've already quoted (and clearly you missed):

"- the asm() bodies of these __naked functions have inadequate input
  parameter constraints, in particular they fail to declare any
  dependencies on the functions' formal parameters; gcc-4.5 sees this
  and skips the parameter setup before calling these functions, causing
  runtime crashes"

This description makes it clear that it's not the naked function that
is wrong, but the function that _calls_ the naked function - stating
that GCC fails to setup the parameters _for_ _the_ _called_ _naked_
_function_.

So, there are two issues here:

1. gcc-4.5 has been observed to clone and inline naked functions, which
   you claim has been fixed.
2. gcc-4.5 fails to setup parameters for naked functions, which we have
   no idea whether it's been fixed.

> I've used this on my randconfig build setup, and it makes all
> configurations build without warnings, but I have not done
> any runtime testing on it.

Since the problem has always been a runtime issue, a build-only test is
insufficient.

Sorry, but no, this is way too risky in its current form.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-10-17  8:58       ` Arnd Bergmann
  2018-10-17  9:04         ` [PATCH] [ALTERNATIVE] ARM: fix copypage functions for clang Arnd Bergmann
@ 2018-10-17 14:23         ` Nicolas Pitre
  1 sibling, 0 replies; 21+ messages in thread
From: Nicolas Pitre @ 2018-10-17 14:23 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Russell King - ARM Linux, Linus Walleij, Stefan Agner,
	Hans Ulli Kroll, Joel Stanley, Linux ARM,
	Linux Kernel Mailing List, Roman Yeryomin

On Wed, 17 Oct 2018, Arnd Bergmann wrote:

> On Tue, Oct 16, 2018 at 10:43 PM Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Tue, 16 Oct 2018, Russell King - ARM Linux wrote:
> > > On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote:
> > > > On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner <stefan@agner.ch> wrote:
> > > It's not obvious yet whether this is right - it contradicts the GCC
> > > manual, but then we have evidence that it's required for some GCC
> > > versions where GCC may clone the function, or if the function is
> > > used within the same file.
> >
> > Why not getting rid of __naked altogether? Here's what I suggest:
> >
> > ----- >8
> > Subject: [PATCH] ARM: remove naked function usage
> >
> > Convert page copy functions not to rely on the naked function attribute.
> >
> > This attribute is known to confuse some gcc versions when function
> > arguments aren't explicitly listed as inline assembly operands despite
> > the gcc documentation. That resulted in commit 9a40ac86152c ("ARM:
> > 6164/1: Add kto and kfrom to input operands list.").
> 
> It's probably worth noting that the minimum gcc version for compiling
> the kernel is now gcc-4.6, which I think does not suffer from the gcc-4.5
> bug that triggered the change. See in particular commits 9c695203a7dd
> ("compiler-gcc.h: gcc-4.5 needs noclone and noinline on __naked functions")
> and d124b44f09ca ("Compiler Attributes: naked was fixed in gcc 4.6").
> 
> The first one made sure we don't inline these functions, so gcc-4.5
> no longer runs into the problem even in the absence of the workaround,
> and the second patch reverts that again, noting that gcc-4.6 is fixed.
> 
> I don't see anything wrong with converting the functions to not
> use __naked at all, but I think we can also just revert the original
> commit 9a40ac86152c to get it to build with clang. When I last
> played with clang on arm32, that's what I did. I'll reply with the
> patch I have in my randconfig tree.

The __naked attribute has idiosyncrasies of its own, regardless of any 
potential bugs, that sometimes makes it harder to maintain and prevent 
extra optimizations that the compiler could otherwise take care of. So I 
think that this is a good thing to get rid of __naked when its usage 
isn't necessary, like the instances in this patch.

The remaining instances are cases where there is simply no stack 
available making __naked necessary in those cases.


Nicolas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-10-16 20:43     ` Nicolas Pitre
  2018-10-16 21:59       ` Stefan Agner
  2018-10-17  8:58       ` Arnd Bergmann
@ 2018-11-05 23:00       ` Stefan Agner
  2018-11-06  4:49         ` Nicolas Pitre
  2 siblings, 1 reply; 21+ messages in thread
From: Stefan Agner @ 2018-11-05 23:00 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Russell King - ARM Linux, Linus Walleij, Hans Ulli Kroll,
	Joel Stanley, Arnd Bergmann, Linux ARM, linux-kernel,
	Roman Yeryomin

On 16.10.2018 22:43, Nicolas Pitre wrote:
> On Tue, 16 Oct 2018, Russell King - ARM Linux wrote:
> 
>> On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote:
>> > On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner <stefan@agner.ch> wrote:
>> >
>> > > When functions incoming parameters are not in input operands list gcc
>> > > 4.5 does not load the parameters into registers before calling this
>> > > function but the inline assembly assumes valid addresses inside this
>> > > function. This breaks the code because r0 and r1 are invalid when
>> > > execution enters v4wb_copy_user_page ()
>> > >
>> > > Also the constant needs to be used as third input operand so account
>> > > for that as well.
>> > >
>> > > This fixes copypage-fa.c what has previously done before for the other
>> > > copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto
>> > > and kfrom to input operands list.").
>> > >
>> > > Signed-off-by: Stefan Agner <stefan@agner.ch>
>> >
>> > Please add:
>> > Cc: stable@vger.kernel.org
>>
>> It's not obvious yet whether this is right - it contradicts the GCC
>> manual, but then we have evidence that it's required for some GCC
>> versions where GCC may clone the function, or if the function is
>> used within the same file.
> 
> Why not getting rid of __naked altogether? Here's what I suggest:
> 
> ----- >8
> Subject: [PATCH] ARM: remove naked function usage
> 
> Convert page copy functions not to rely on the naked function attribute.
> 
> This attribute is known to confuse some gcc versions when function
> arguments aren't explicitly listed as inline assembly operands despite
> the gcc documentation. That resulted in commit 9a40ac86152c ("ARM:
> 6164/1: Add kto and kfrom to input operands list.").
> 
> Yet that commit has problems of its own by having assembly operand
> constraints completely wrong. If the generated code has been OK since
> then, it is due to luck rather than correctness. So this patch provides
> proper assembly operand usage, and removes two instances of redundant
> register duplications in the implementation while at it.
> 
> Inspection of the generated code with this patch doesn't show any obvious
> quality degradation either, so not relying on __naked at all will make
> the code less fragile, and more likely to be compilable with clang.
> 
> The only remaining __naked instances (excluding the kprobes test cases)
> are exynos_pm_power_up_setup() and tc2_pm_power_up_setup(). But in those
> cases only the function address is used by the compiler with no chance of
> inlining it by mistake.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>

As mentioned a couple of weeks ago, I did test this patchset on two
architectures (pxa_defconfig -> copypage-xscale.c and
versatile_defconfig -> copypage-v4wb.c).

I really like this approach, can we move forward with this?

A couple of comments below:


> ---
>  arch/arm/mm/copypage-fa.c       | 34 ++++++------
>  arch/arm/mm/copypage-feroceon.c | 97 +++++++++++++++++------------------
>  arch/arm/mm/copypage-v4mc.c     | 18 +++----
>  arch/arm/mm/copypage-v4wb.c     | 40 +++++++--------
>  arch/arm/mm/copypage-v4wt.c     | 36 ++++++-------
>  arch/arm/mm/copypage-xsc3.c     | 70 +++++++++++--------------
>  arch/arm/mm/copypage-xscale.c   | 70 ++++++++++++-------------
>  7 files changed, 171 insertions(+), 194 deletions(-)
> 
> diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c
> index d130a5ece5..453a3341ca 100644
> --- a/arch/arm/mm/copypage-fa.c
> +++ b/arch/arm/mm/copypage-fa.c
> @@ -17,26 +17,24 @@
>  /*
>   * Faraday optimised copy_user_page
>   */
> -static void __naked
> -fa_copy_user_page(void *kto, const void *kfrom)
> +static void fa_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4, lr}			@ 2\n\
> -	mov	r2, %0				@ 1\n\
> -1:	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> -	add	r0, r0, #16			@ 1\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> -	add	r0, r0, #16			@ 1\n\
> -	subs	r2, r2, #1			@ 1\n\
> +	int tmp;

There should be an empty line here.

> +	asm volatile ("\
> +1:	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	stmia	%0, {r3, r4, ip, lr}		@ 4\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> +	add	%0, %0, #16			@ 1\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	stmia	%0, {r3, r4, ip, lr}		@ 4\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> +	add	%0, %0, #16			@ 1\n\
> +	subs	%2, %2, #1			@ 1\n\
>  	bne	1b				@ 1\n\
> -	mcr	p15, 0, r2, c7, c10, 4		@ 1   drain WB\n\
> -	ldmfd	sp!, {r4, pc}			@ 3"
> -	:
> -	: "I" (PAGE_SIZE / 32));
> +	mcr	p15, 0, %2, c7, c10, 4		@ 1   drain WB"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" "tmp)

There is sneaked in a " before tmp instead of (.

> +	: "2" (PAGE_SIZE / 32)
> +	: "r3", "r4", "ip", "lr");
>  }
>  
>  void fa_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-feroceon.c b/arch/arm/mm/copypage-feroceon.c
> index 49ee0c1a72..1349430c63 100644
> --- a/arch/arm/mm/copypage-feroceon.c
> +++ b/arch/arm/mm/copypage-feroceon.c
> @@ -13,58 +13,55 @@
>  #include <linux/init.h>
>  #include <linux/highmem.h>
>  
> -static void __naked
> -feroceon_copy_user_page(void *kto, const void *kfrom)
> +static void feroceon_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4-r9, lr}		\n\
> -	mov	ip, %2				\n\
> -1:	mov	lr, r1				\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	pld	[lr, #32]			\n\
> -	pld	[lr, #64]			\n\
> -	pld	[lr, #96]			\n\
> -	pld	[lr, #128]			\n\
> -	pld	[lr, #160]			\n\
> -	pld	[lr, #192]			\n\
> -	pld	[lr, #224]			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	subs	ip, ip, #(32 * 8)		\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> +	int tmp;

Newline here?

> +	asm volatile ("\
> +1:	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	pld	[%1, #0]			\n\
> +	pld	[%1, #32]			\n\
> +	pld	[%1, #64]			\n\
> +	pld	[%1, #96]			\n\
> +	pld	[%1, #128]			\n\
> +	pld	[%1, #160]			\n\
> +	pld	[%1, #192]			\n\

I see you shifted this by 32 bytes, but the stmia/ldmia below actually
move 256 bytes, so we probably should keep pld	[lr, #224] here?

> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	subs	%2, %2, #(32 * 8)		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
>  	bne	1b				\n\
> -	mcr	p15, 0, ip, c7, c10, 4		@ drain WB\n\
> -	ldmfd	sp!, {r4-r9, pc}"
> -	:
> -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE));
> +	mcr	p15, 0, %2, c7, c10, 4		@ drain WB"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> +	: =2" (PAGE_SIZE),

That should be "2" I guess? Also the comma at the end should not be
there.

> +	: "r2", "r3", "r4", "r5", "r6", "r7", "ip", "lr");
>  }
>  
>  void feroceon_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
> index 0224416cba..494ddc435a 100644
> --- a/arch/arm/mm/copypage-v4mc.c
> +++ b/arch/arm/mm/copypage-v4mc.c
> @@ -40,12 +40,10 @@ static DEFINE_RAW_SPINLOCK(minicache_lock);
>   * instruction.  If your processor does not supply this, you have to write your
>   * own copy_user_highpage that does the right thing.
>   */
> -static void __naked
> -mc_copy_user_page(void *from, void *to)
> +static void mc_copy_user_page(void *from, void *to)
>  {
> -	asm volatile(
> -	"stmfd	sp!, {r4, lr}			@ 2\n\
> -	mov	r4, %2				@ 1\n\
> +	int tmp;

Newline here?

> +	asm volatile ("\
>  	ldmia	%0!, {r2, r3, ip, lr}		@ 4\n\
>  1:	mcr	p15, 0, %1, c7, c6, 1		@ 1   invalidate D line\n\
>  	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
> @@ -55,13 +53,13 @@ mc_copy_user_page(void *from, void *to)
>  	mcr	p15, 0, %1, c7, c6, 1		@ 1   invalidate D line\n\
>  	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
>  	ldmia	%0!, {r2, r3, ip, lr}		@ 4\n\
> -	subs	r4, r4, #1			@ 1\n\
> +	subs	%2, %2, #1			@ 1\n\
>  	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
>  	ldmneia	%0!, {r2, r3, ip, lr}		@ 4\n\
> -	bne	1b				@ 1\n\
> -	ldmfd	sp!, {r4, pc}			@ 3"
> -	:
> -	: "r" (from), "r" (to), "I" (PAGE_SIZE / 64));
> +	bne	1b				@ "
> +	: "+&r" (from), "+&r" (to), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64)
> +	: "r2", "r3", "ip", "lr");
>  }
>  
>  void v4_mc_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-v4wb.c b/arch/arm/mm/copypage-v4wb.c
> index 067d0fdd63..cf064ac6fc 100644
> --- a/arch/arm/mm/copypage-v4wb.c
> +++ b/arch/arm/mm/copypage-v4wb.c
> @@ -22,29 +22,27 @@
>   * instruction.  If your processor does not supply this, you have to write your
>   * own copy_user_highpage that does the right thing.
>   */
> -static void __naked
> -v4wb_copy_user_page(void *kto, const void *kfrom)
> +static void v4wb_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4, lr}			@ 2\n\
> -	mov	r2, %2				@ 1\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -1:	mcr	p15, 0, r0, c7, c6, 1		@ 1   invalidate D line\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4+1\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	mcr	p15, 0, r0, c7, c6, 1		@ 1   invalidate D line\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	subs	r2, r2, #1			@ 1\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmneia	r1!, {r3, r4, ip, lr}		@ 4\n\
> +	int tmp;

Newline here?

> +	asm volatile ("\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +1:	mcr	p15, 0, %0, c7, c6, 1		@ 1   invalidate D line\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4+1\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	mcr	p15, 0, %0, c7, c6, 1		@ 1   invalidate D line\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	subs	%2, %2, #1			@ 1\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmneia	%1!, {r3, r4, ip, lr}		@ 4\n\
>  	bne	1b				@ 1\n\
> -	mcr	p15, 0, r1, c7, c10, 4		@ 1   drain WB\n\
> -	ldmfd	 sp!, {r4, pc}			@ 3"
> -	:
> -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64));
> +	mcr	p15, 0, %1, c7, c10, 4		@ 1   drain WB"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64)
> +	: "r3", "r4", "ip", "lr");
>  }
>  
>  void v4wb_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-v4wt.c b/arch/arm/mm/copypage-v4wt.c
> index b85c5da2e5..66745bd3a6 100644
> --- a/arch/arm/mm/copypage-v4wt.c
> +++ b/arch/arm/mm/copypage-v4wt.c
> @@ -20,27 +20,25 @@
>   * dirty data in the cache.  However, we do have to ensure that
>   * subsequent reads are up to date.
>   */
> -static void __naked
> -v4wt_copy_user_page(void *kto, const void *kfrom)
> +static void v4wt_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4, lr}			@ 2\n\
> -	mov	r2, %2				@ 1\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -1:	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4+1\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	subs	r2, r2, #1			@ 1\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmneia	r1!, {r3, r4, ip, lr}		@ 4\n\
> +	int tmp;

Newline here

> +	asm volatile ("\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +1:	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4+1\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	subs	%2, %2, #1			@ 1\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmneia	%1!, {r3, r4, ip, lr}		@ 4\n\
>  	bne	1b				@ 1\n\
> -	mcr	p15, 0, r2, c7, c7, 0		@ flush ID cache\n\
> -	ldmfd	sp!, {r4, pc}			@ 3"
> -	:
> -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64));
> +	mcr	p15, 0, %2, c7, c7, 0		@ flush ID cache"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64)
> +	: "r3", "r4", "ip", "lr");
>  }
>  
>  void v4wt_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-xsc3.c b/arch/arm/mm/copypage-xsc3.c
> index 03a2042ace..727a02c149 100644
> --- a/arch/arm/mm/copypage-xsc3.c
> +++ b/arch/arm/mm/copypage-xsc3.c
> @@ -21,53 +21,45 @@
>  
>  /*
>   * XSC3 optimised copy_user_highpage
> - *  r0 = destination
> - *  r1 = source
>   *
>   * The source page may have some clean entries in the cache already, but we
>   * can safely ignore them - break_cow() will flush them out of the cache
>   * if we eventually end up using our copied page.
>   *
>   */
> -static void __naked
> -xsc3_mc_copy_user_page(void *kto, const void *kfrom)
> +static void xsc3_mc_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4, r5, lr}		\n\
> -	mov	lr, %2				\n\
> -						\n\
> -	pld	[r1, #0]			\n\
> -	pld	[r1, #32]			\n\
> -1:	pld	[r1, #64]			\n\
> -	pld	[r1, #96]			\n\
> +	int tmp;

Newline here

> +	asm volatile ("\
> +	pld	[%1, #0]			\n\
> +	pld	[%1, #32]			\n\
> +1:	pld	[%1, #64]			\n\
> +	pld	[%1, #96]			\n\
>  						\n\
> -2:	ldrd	r2, [r1], #8			\n\
> -	mov	ip, r0				\n\
> -	ldrd	r4, [r1], #8			\n\
> -	mcr	p15, 0, ip, c7, c6, 1		@ invalidate\n\
> -	strd	r2, [r0], #8			\n\
> -	ldrd	r2, [r1], #8			\n\
> -	strd	r4, [r0], #8			\n\
> -	ldrd	r4, [r1], #8			\n\
> -	strd	r2, [r0], #8			\n\
> -	strd	r4, [r0], #8			\n\
> -	ldrd	r2, [r1], #8			\n\
> -	mov	ip, r0				\n\
> -	ldrd	r4, [r1], #8			\n\
> -	mcr	p15, 0, ip, c7, c6, 1		@ invalidate\n\
> -	strd	r2, [r0], #8			\n\
> -	ldrd	r2, [r1], #8			\n\
> -	subs	lr, lr, #1			\n\
> -	strd	r4, [r0], #8			\n\
> -	ldrd	r4, [r1], #8			\n\
> -	strd	r2, [r0], #8			\n\
> -	strd	r4, [r0], #8			\n\
> +2:	ldrd	r2, [%1], #8			\n\
> +	ldrd	r4, [%1], #8			\n\
> +	mcr	p15, 0, %0, c7, c6, 1		@ invalidate\n\
> +	strd	r2, [%0], #8			\n\
> +	ldrd	r2, [%1], #8			\n\
> +	strd	r4, [%0], #8			\n\
> +	ldrd	r4, [%1], #8			\n\
> +	strd	r2, [%0], #8			\n\
> +	strd	r4, [%0], #8			\n\
> +	ldrd	r2, [%1], #8			\n\
> +	ldrd	r4, [%1], #8			\n\
> +	mcr	p15, 0, %0, c7, c6, 1		@ invalidate\n\
> +	strd	r2, [%0], #8			\n\
> +	ldrd	r2, [%1], #8			\n\
> +	subs	%2, %2, #1			\n\
> +	strd	r4, [%0], #8			\n\
> +	ldrd	r4, [%1], #8			\n\
> +	strd	r2, [%0], #8			\n\
> +	strd	r4, [%0], #8			\n\
>  	bgt	1b				\n\
> -	beq	2b				\n\
> -						\n\
> -	ldmfd	sp!, {r4, r5, pc}"
> -	:
> -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64 - 1));
> +	beq	2b				"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64 - 1)
> +	: "r2", "r3", "r4", "r5");

r3 and r5 are not used above, so no need to have them in the clobber
list.

>  }
>  
>  void xsc3_mc_copy_user_highpage(struct page *to, struct page *from,
> @@ -85,8 +77,6 @@ void xsc3_mc_copy_user_highpage(struct page *to,
> struct page *from,
>  
>  /*
>   * XScale optimised clear_user_page
> - *  r0 = destination
> - *  r1 = virtual user address of ultimate destination page
>   */
>  void xsc3_mc_clear_user_highpage(struct page *page, unsigned long vaddr)
>  {
> diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
> index 97972379f4..fa0be66082 100644
> --- a/arch/arm/mm/copypage-xscale.c
> +++ b/arch/arm/mm/copypage-xscale.c
> @@ -36,52 +36,50 @@ static DEFINE_RAW_SPINLOCK(minicache_lock);
>   * Dcache aliasing issue.  The writes will be forwarded to the write buffer,
>   * and merged as appropriate.
>   */
> -static void __naked
> -mc_copy_user_page(void *from, void *to)
> +static void mc_copy_user_page(void *from, void *to)
>  {
> +	int tmp;
>  	/*
>  	 * Strangely enough, best performance is achieved
>  	 * when prefetching destination as well.  (NP)
>  	 */
> -	asm volatile(
> -	"stmfd	sp!, {r4, r5, lr}		\n\
> -	mov	lr, %2				\n\
> -	pld	[r0, #0]			\n\
> -	pld	[r0, #32]			\n\
> -	pld	[r1, #0]			\n\
> -	pld	[r1, #32]			\n\
> -1:	pld	[r0, #64]			\n\
> -	pld	[r0, #96]			\n\
> -	pld	[r1, #64]			\n\
> -	pld	[r1, #96]			\n\
> -2:	ldrd	r2, [r0], #8			\n\
> -	ldrd	r4, [r0], #8			\n\
> -	mov	ip, r1				\n\
> -	strd	r2, [r1], #8			\n\
> -	ldrd	r2, [r0], #8			\n\
> -	strd	r4, [r1], #8			\n\
> -	ldrd	r4, [r0], #8			\n\
> -	strd	r2, [r1], #8			\n\
> -	strd	r4, [r1], #8			\n\
> +	asm volatile ("\
> +	pld	[%0, #0]			\n\
> +	pld	[%0, #32]			\n\
> +	pld	[%1, #0]			\n\
> +	pld	[%1, #32]			\n\
> +1:	pld	[%0, #64]			\n\
> +	pld	[%0, #96]			\n\
> +	pld	[%1, #64]			\n\
> +	pld	[%1, #96]			\n\
> +2:	ldrd	r2, [%0], #8			\n\
> +	ldrd	r4, [%0], #8			\n\
> +	mov	ip, %1				\n\
> +	strd	r2, [%1], #8			\n\
> +	ldrd	r2, [%0], #8			\n\
> +	strd	r4, [%1], #8			\n\
> +	ldrd	r4, [%0], #8			\n\
> +	strd	r2, [%1], #8			\n\
> +	strd	r4, [%1], #8			\n\
>  	mcr	p15, 0, ip, c7, c10, 1		@ clean D line\n\

How about using %1 here directly and skip the move to ip, as you did in
copypage-xsc3.c above?

> -	ldrd	r2, [r0], #8			\n\
> +	ldrd	r2, [%0], #8			\n\
>  	mcr	p15, 0, ip, c7, c6, 1		@ invalidate D line\n\
> -	ldrd	r4, [r0], #8			\n\
> -	mov	ip, r1				\n\
> -	strd	r2, [r1], #8			\n\
> -	ldrd	r2, [r0], #8			\n\
> -	strd	r4, [r1], #8			\n\
> -	ldrd	r4, [r0], #8			\n\
> -	strd	r2, [r1], #8			\n\
> -	strd	r4, [r1], #8			\n\
> +	ldrd	r4, [%0], #8			\n\
> +	mov	ip, %1				\n\
> +	strd	r2, [%1], #8			\n\
> +	ldrd	r2, [%0], #8			\n\
> +	strd	r4, [%1], #8			\n\
> +	ldrd	r4, [%0], #8			\n\
> +	strd	r2, [%1], #8			\n\
> +	strd	r4, [%1], #8			\n\
>  	mcr	p15, 0, ip, c7, c10, 1		@ clean D line\n\
> -	subs	lr, lr, #1			\n\
> +	subs	%2, %2, #1			\n\
>  	mcr	p15, 0, ip, c7, c6, 1		@ invalidate D line\n\
>  	bgt	1b				\n\
> -	beq	2b				\n\
> -	ldmfd	sp!, {r4, r5, pc}		"
> -	:
> -	: "r" (from), "r" (to), "I" (PAGE_SIZE / 64 - 1));
> +	beq	2b				"
> +	: "+&r" (from), "+&r" (to), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64 - 1)
> +	: "r2", "r3", "r4", "r5", "ip");

r3 and r5 are not used above, so no need in the clobber list...

--
Stefan

>  }
>  
>  void xscale_mc_copy_user_highpage(struct page *to, struct page *from,

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-11-05 23:00       ` Stefan Agner
@ 2018-11-06  4:49         ` Nicolas Pitre
  2018-11-06 13:16           ` Robin Murphy
  2018-11-07 16:27           ` Stefan Agner
  0 siblings, 2 replies; 21+ messages in thread
From: Nicolas Pitre @ 2018-11-06  4:49 UTC (permalink / raw)
  To: Stefan Agner
  Cc: Russell King - ARM Linux, Linus Walleij, Hans Ulli Kroll,
	Joel Stanley, Arnd Bergmann, Linux ARM, linux-kernel,
	Roman Yeryomin

On Tue, 6 Nov 2018, Stefan Agner wrote:

> On 16.10.2018 22:43, Nicolas Pitre wrote:
> > Subject: [PATCH] ARM: remove naked function usage
> > 
> > Convert page copy functions not to rely on the naked function attribute.
> > 
> > This attribute is known to confuse some gcc versions when function
> > arguments aren't explicitly listed as inline assembly operands despite
> > the gcc documentation. That resulted in commit 9a40ac86152c ("ARM:
> > 6164/1: Add kto and kfrom to input operands list.").
> > 
> > Yet that commit has problems of its own by having assembly operand
> > constraints completely wrong. If the generated code has been OK since
> > then, it is due to luck rather than correctness. So this patch provides
> > proper assembly operand usage, and removes two instances of redundant
> > register duplications in the implementation while at it.
> > 
> > Inspection of the generated code with this patch doesn't show any obvious
> > quality degradation either, so not relying on __naked at all will make
> > the code less fragile, and more likely to be compilable with clang.
> > 
> > The only remaining __naked instances (excluding the kprobes test cases)
> > are exynos_pm_power_up_setup() and tc2_pm_power_up_setup(). But in those
> > cases only the function address is used by the compiler with no chance of
> > inlining it by mistake.
> > 
> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
> 
> As mentioned a couple of weeks ago, I did test this patchset on two
> architectures (pxa_defconfig -> copypage-xscale.c and
> versatile_defconfig -> copypage-v4wb.c).
> 
> I really like this approach, can we move forward with this?

Yes, the patch was submitted to the patch tracker a few days later.

> A couple of comments below:
> 
> 
> > ---
> >  arch/arm/mm/copypage-fa.c       | 34 ++++++------
> >  arch/arm/mm/copypage-feroceon.c | 97 +++++++++++++++++------------------
> >  arch/arm/mm/copypage-v4mc.c     | 18 +++----
> >  arch/arm/mm/copypage-v4wb.c     | 40 +++++++--------
> >  arch/arm/mm/copypage-v4wt.c     | 36 ++++++-------
> >  arch/arm/mm/copypage-xsc3.c     | 70 +++++++++++--------------
> >  arch/arm/mm/copypage-xscale.c   | 70 ++++++++++++-------------
> >  7 files changed, 171 insertions(+), 194 deletions(-)
> > 
> > diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c
> > index d130a5ece5..453a3341ca 100644
> > --- a/arch/arm/mm/copypage-fa.c
> > +++ b/arch/arm/mm/copypage-fa.c
> > @@ -17,26 +17,24 @@
> >  /*
> >   * Faraday optimised copy_user_page
> >   */
> > -static void __naked
> > -fa_copy_user_page(void *kto, const void *kfrom)
> > +static void fa_copy_user_page(void *kto, const void *kfrom)
> >  {
> > -	asm("\
> > -	stmfd	sp!, {r4, lr}			@ 2\n\
> > -	mov	r2, %0				@ 1\n\
> > -1:	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> > -	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
> > -	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> > -	add	r0, r0, #16			@ 1\n\
> > -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> > -	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
> > -	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> > -	add	r0, r0, #16			@ 1\n\
> > -	subs	r2, r2, #1			@ 1\n\
> > +	int tmp;
> 
> There should be an empty line here.

Yeah... there should.

> > +	asm volatile ("\
> > +1:	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> > +	stmia	%0, {r3, r4, ip, lr}		@ 4\n\
> > +	mcr	p15, 0, %0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> > +	add	%0, %0, #16			@ 1\n\
> > +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> > +	stmia	%0, {r3, r4, ip, lr}		@ 4\n\
> > +	mcr	p15, 0, %0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> > +	add	%0, %0, #16			@ 1\n\
> > +	subs	%2, %2, #1			@ 1\n\
> >  	bne	1b				@ 1\n\
> > -	mcr	p15, 0, r2, c7, c10, 4		@ 1   drain WB\n\
> > -	ldmfd	sp!, {r4, pc}			@ 3"
> > -	:
> > -	: "I" (PAGE_SIZE / 32));
> > +	mcr	p15, 0, %2, c7, c10, 4		@ 1   drain WB"
> > +	: "+&r" (kto), "+&r" (kfrom), "=&r" "tmp)
> 
> There is sneaked in a " before tmp instead of (.

Good catch.

I did compile-test all the existing defconfigs though. Apparently this 
file is not covered?

> > diff --git a/arch/arm/mm/copypage-feroceon.c b/arch/arm/mm/copypage-feroceon.c
> > index 49ee0c1a72..1349430c63 100644
> > --- a/arch/arm/mm/copypage-feroceon.c
> > +++ b/arch/arm/mm/copypage-feroceon.c
> > @@ -13,58 +13,55 @@
> >  #include <linux/init.h>
> >  #include <linux/highmem.h>
> >  
> > -static void __naked
> > -feroceon_copy_user_page(void *kto, const void *kfrom)
> > +static void feroceon_copy_user_page(void *kto, const void *kfrom)
> >  {
> > -	asm("\
> > -	stmfd	sp!, {r4-r9, lr}		\n\
> > -	mov	ip, %2				\n\
> > -1:	mov	lr, r1				\n\
> > -	ldmia	r1!, {r2 - r9}			\n\
> > -	pld	[lr, #32]			\n\
> > -	pld	[lr, #64]			\n\
> > -	pld	[lr, #96]			\n\
> > -	pld	[lr, #128]			\n\
> > -	pld	[lr, #160]			\n\
> > -	pld	[lr, #192]			\n\
> > -	pld	[lr, #224]			\n\
> > -	stmia	r0, {r2 - r9}			\n\
> > -	ldmia	r1!, {r2 - r9}			\n\
> > -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> > -	add	r0, r0, #32			\n\
> > -	stmia	r0, {r2 - r9}			\n\
> > -	ldmia	r1!, {r2 - r9}			\n\
> > -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> > -	add	r0, r0, #32			\n\
> > -	stmia	r0, {r2 - r9}			\n\
> > -	ldmia	r1!, {r2 - r9}			\n\
> > -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> > -	add	r0, r0, #32			\n\
> > -	stmia	r0, {r2 - r9}			\n\
> > -	ldmia	r1!, {r2 - r9}			\n\
> > -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> > -	add	r0, r0, #32			\n\
> > -	stmia	r0, {r2 - r9}			\n\
> > -	ldmia	r1!, {r2 - r9}			\n\
> > -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> > -	add	r0, r0, #32			\n\
> > -	stmia	r0, {r2 - r9}			\n\
> > -	ldmia	r1!, {r2 - r9}			\n\
> > -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> > -	add	r0, r0, #32			\n\
> > -	stmia	r0, {r2 - r9}			\n\
> > -	ldmia	r1!, {r2 - r9}			\n\
> > -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> > -	add	r0, r0, #32			\n\
> > -	stmia	r0, {r2 - r9}			\n\
> > -	subs	ip, ip, #(32 * 8)		\n\
> > -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> > -	add	r0, r0, #32			\n\
> > +	int tmp;
> 
> Newline here?
> 
> > +	asm volatile ("\
> > +1:	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> > +	pld	[%1, #0]			\n\
> > +	pld	[%1, #32]			\n\
> > +	pld	[%1, #64]			\n\
> > +	pld	[%1, #96]			\n\
> > +	pld	[%1, #128]			\n\
> > +	pld	[%1, #160]			\n\
> > +	pld	[%1, #192]			\n\
> 
> I see you shifted this by 32 bytes, but the stmia/ldmia below actually
> move 256 bytes, so we probably should keep pld	[lr, #224] here?

No. If you look at the original code:

1:	mov     lr, r1                          # lr = r1 = start
	ldmia   r1!, {r2 - r9}                  # now r1 == lr + 32
	pld     [lr, #32]                       # [lr, #32] == [r1, #0]
	pld     [lr, #64]                       # [lr, #64] == [r1, #32]
	pld     [lr, #96]                       # [lr, #96] == [r1, #64]
	...
	pld     [lr, #224]                      # [lr, #224] == [r1, #192]

So the new code gets rid of lr.

> > +	stmia	%0, {r2 - r7, ip, lr}		\n\
> > +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> > +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> > +	add	%0, %0, #32			\n\
> > +	stmia	%0, {r2 - r7, ip, lr}		\n\
> > +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> > +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> > +	add	%0, %0, #32			\n\
> > +	stmia	%0, {r2 - r7, ip, lr}		\n\
> > +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> > +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> > +	add	%0, %0, #32			\n\
> > +	stmia	%0, {r2 - r7, ip, lr}		\n\
> > +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> > +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> > +	add	%0, %0, #32			\n\
> > +	stmia	%0, {r2 - r7, ip, lr}		\n\
> > +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> > +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> > +	add	%0, %0, #32			\n\
> > +	stmia	%0, {r2 - r7, ip, lr}		\n\
> > +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> > +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> > +	add	%0, %0, #32			\n\
> > +	stmia	%0, {r2 - r7, ip, lr}		\n\
> > +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> > +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> > +	add	%0, %0, #32			\n\
> > +	stmia	%0, {r2 - r7, ip, lr}		\n\
> > +	subs	%2, %2, #(32 * 8)		\n\
> > +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> > +	add	%0, %0, #32			\n\
> >  	bne	1b				\n\
> > -	mcr	p15, 0, ip, c7, c10, 4		@ drain WB\n\
> > -	ldmfd	sp!, {r4-r9, pc}"
> > -	:
> > -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE));
> > +	mcr	p15, 0, %2, c7, c10, 4		@ drain WB"
> > +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> > +	: =2" (PAGE_SIZE),
> 
> That should be "2" I guess? Also the comma at the end should not be
> there.

Wow.  Something was odd with my compile-testing. That should have been 
caught.

> > +	asm volatile ("\
> > +	pld	[%1, #0]			\n\
> > +	pld	[%1, #32]			\n\
> > +1:	pld	[%1, #64]			\n\
> > +	pld	[%1, #96]			\n\
> >  						\n\
> > -2:	ldrd	r2, [r1], #8			\n\
> > -	mov	ip, r0				\n\
> > -	ldrd	r4, [r1], #8			\n\
> > -	mcr	p15, 0, ip, c7, c6, 1		@ invalidate\n\
> > -	strd	r2, [r0], #8			\n\
> > -	ldrd	r2, [r1], #8			\n\
> > -	strd	r4, [r0], #8			\n\
> > -	ldrd	r4, [r1], #8			\n\
> > -	strd	r2, [r0], #8			\n\
> > -	strd	r4, [r0], #8			\n\
> > -	ldrd	r2, [r1], #8			\n\
> > -	mov	ip, r0				\n\
> > -	ldrd	r4, [r1], #8			\n\
> > -	mcr	p15, 0, ip, c7, c6, 1		@ invalidate\n\
> > -	strd	r2, [r0], #8			\n\
> > -	ldrd	r2, [r1], #8			\n\
> > -	subs	lr, lr, #1			\n\
> > -	strd	r4, [r0], #8			\n\
> > -	ldrd	r4, [r1], #8			\n\
> > -	strd	r2, [r0], #8			\n\
> > -	strd	r4, [r0], #8			\n\
> > +2:	ldrd	r2, [%1], #8			\n\
> > +	ldrd	r4, [%1], #8			\n\
> > +	mcr	p15, 0, %0, c7, c6, 1		@ invalidate\n\
> > +	strd	r2, [%0], #8			\n\
> > +	ldrd	r2, [%1], #8			\n\
> > +	strd	r4, [%0], #8			\n\
> > +	ldrd	r4, [%1], #8			\n\
> > +	strd	r2, [%0], #8			\n\
> > +	strd	r4, [%0], #8			\n\
> > +	ldrd	r2, [%1], #8			\n\
> > +	ldrd	r4, [%1], #8			\n\
> > +	mcr	p15, 0, %0, c7, c6, 1		@ invalidate\n\
> > +	strd	r2, [%0], #8			\n\
> > +	ldrd	r2, [%1], #8			\n\
> > +	subs	%2, %2, #1			\n\
> > +	strd	r4, [%0], #8			\n\
> > +	ldrd	r4, [%1], #8			\n\
> > +	strd	r2, [%0], #8			\n\
> > +	strd	r4, [%0], #8			\n\
> >  	bgt	1b				\n\
> > -	beq	2b				\n\
> > -						\n\
> > -	ldmfd	sp!, {r4, r5, pc}"
> > -	:
> > -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64 - 1));
> > +	beq	2b				"
> > +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> > +	: "2" (PAGE_SIZE / 64 - 1)
> > +	: "r2", "r3", "r4", "r5");
> 
> r3 and r5 are not used above, so no need to have them in the clobber
> list.

They are used. ldrd and strd instructions always use a pair of 
consecutive registers. So "ldrd r2, ..." loads into r2-r3 and "ldrd r4, ..." 
loads into r4-r5.

> > diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
> > index 97972379f4..fa0be66082 100644
> > --- a/arch/arm/mm/copypage-xscale.c
> > +++ b/arch/arm/mm/copypage-xscale.c
> > @@ -36,52 +36,50 @@ static DEFINE_RAW_SPINLOCK(minicache_lock);
> >   * Dcache aliasing issue.  The writes will be forwarded to the write buffer,
> >   * and merged as appropriate.
> >   */
> > -static void __naked
> > -mc_copy_user_page(void *from, void *to)
> > +static void mc_copy_user_page(void *from, void *to)
> >  {
> > +	int tmp;
> >  	/*
> >  	 * Strangely enough, best performance is achieved
> >  	 * when prefetching destination as well.  (NP)
> >  	 */
> > -	asm volatile(
> > -	"stmfd	sp!, {r4, r5, lr}		\n\
> > -	mov	lr, %2				\n\
> > -	pld	[r0, #0]			\n\
> > -	pld	[r0, #32]			\n\
> > -	pld	[r1, #0]			\n\
> > -	pld	[r1, #32]			\n\
> > -1:	pld	[r0, #64]			\n\
> > -	pld	[r0, #96]			\n\
> > -	pld	[r1, #64]			\n\
> > -	pld	[r1, #96]			\n\
> > -2:	ldrd	r2, [r0], #8			\n\
> > -	ldrd	r4, [r0], #8			\n\
> > -	mov	ip, r1				\n\
> > -	strd	r2, [r1], #8			\n\
> > -	ldrd	r2, [r0], #8			\n\
> > -	strd	r4, [r1], #8			\n\
> > -	ldrd	r4, [r0], #8			\n\
> > -	strd	r2, [r1], #8			\n\
> > -	strd	r4, [r1], #8			\n\
> > +	asm volatile ("\
> > +	pld	[%0, #0]			\n\
> > +	pld	[%0, #32]			\n\
> > +	pld	[%1, #0]			\n\
> > +	pld	[%1, #32]			\n\
> > +1:	pld	[%0, #64]			\n\
> > +	pld	[%0, #96]			\n\
> > +	pld	[%1, #64]			\n\
> > +	pld	[%1, #96]			\n\
> > +2:	ldrd	r2, [%0], #8			\n\
> > +	ldrd	r4, [%0], #8			\n\
> > +	mov	ip, %1				\n\
> > +	strd	r2, [%1], #8			\n\
> > +	ldrd	r2, [%0], #8			\n\
> > +	strd	r4, [%1], #8			\n\
> > +	ldrd	r4, [%0], #8			\n\
> > +	strd	r2, [%1], #8			\n\
> > +	strd	r4, [%1], #8			\n\
> >  	mcr	p15, 0, ip, c7, c10, 1		@ clean D line\n\
> 
> How about using %1 here directly and skip the move to ip, as you did in
> copypage-xsc3.c above?

No. The cache line that needs cleaning is the line that we just wrote 
to. %1 is now pointing at the next cache line at this point. That is why 
%1 needs to be preserved into ip before it is incremented.

So here's the revised patch. It now has full compile-test coverage for 
real this time. Would you mind reviewing it again before I resubmit it 
please?

----- >8
Subject: [PATCH] remove unneeded naked function usage

Convert page copy functions not to rely on the naked function attribute.

This attribute is known to confuse some old gcc versions when function
arguments aren't explicitly listed as inline assembly operands despite
the gcc documentation. That resulted in commit 9a40ac86152c ("ARM:
6164/1: Add kto and kfrom to input operands list.").

Yet that commit has problems of its own by having assembly operand
constraints completely wrong. If the generated code has been OK since
then, it is due to luck rather than correctness. So this patch also
provides proper assembly operand constraints, and removes two instances
of redundant register usages in the implementation while at it.

Inspection of the generated code with this patch doesn't show any obvious
quality degradation either, so not relying on __naked at all will make
the code less fragile, and avoid some issues with clang.

The only remaining __naked instances (excluding the kprobes test cases)
are exynos_pm_power_up_setup(), tc2_pm_power_up_setup() and
cci_enable_port_for_self(. But in the first two cases, only the function
address is used by the compiler with no chance of inlining it by 
mistake, and the third case is called from assembly code only.
And the fact that no stack is available when the corresponding code is
executed does warrant the __naked usage in those cases.

Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c
index d130a5ece5..bf24690ec8 100644
--- a/arch/arm/mm/copypage-fa.c
+++ b/arch/arm/mm/copypage-fa.c
@@ -17,26 +17,25 @@
 /*
  * Faraday optimised copy_user_page
  */
-static void __naked
-fa_copy_user_page(void *kto, const void *kfrom)
+static void fa_copy_user_page(void *kto, const void *kfrom)
 {
-	asm("\
-	stmfd	sp!, {r4, lr}			@ 2\n\
-	mov	r2, %0				@ 1\n\
-1:	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
-	add	r0, r0, #16			@ 1\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
-	add	r0, r0, #16			@ 1\n\
-	subs	r2, r2, #1			@ 1\n\
+	int tmp;
+
+	asm volatile ("\
+1:	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+	stmia	%0, {r3, r4, ip, lr}		@ 4\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ 1   clean and invalidate D line\n\
+	add	%0, %0, #16			@ 1\n\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+	stmia	%0, {r3, r4, ip, lr}		@ 4\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ 1   clean and invalidate D line\n\
+	add	%0, %0, #16			@ 1\n\
+	subs	%2, %2, #1			@ 1\n\
 	bne	1b				@ 1\n\
-	mcr	p15, 0, r2, c7, c10, 4		@ 1   drain WB\n\
-	ldmfd	sp!, {r4, pc}			@ 3"
-	:
-	: "I" (PAGE_SIZE / 32));
+	mcr	p15, 0, %2, c7, c10, 4		@ 1   drain WB"
+	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
+	: "2" (PAGE_SIZE / 32)
+	: "r3", "r4", "ip", "lr");
 }
 
 void fa_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-feroceon.c b/arch/arm/mm/copypage-feroceon.c
index 49ee0c1a72..cc819732d9 100644
--- a/arch/arm/mm/copypage-feroceon.c
+++ b/arch/arm/mm/copypage-feroceon.c
@@ -13,58 +13,56 @@
 #include <linux/init.h>
 #include <linux/highmem.h>
 
-static void __naked
-feroceon_copy_user_page(void *kto, const void *kfrom)
+static void feroceon_copy_user_page(void *kto, const void *kfrom)
 {
-	asm("\
-	stmfd	sp!, {r4-r9, lr}		\n\
-	mov	ip, %2				\n\
-1:	mov	lr, r1				\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	pld	[lr, #32]			\n\
-	pld	[lr, #64]			\n\
-	pld	[lr, #96]			\n\
-	pld	[lr, #128]			\n\
-	pld	[lr, #160]			\n\
-	pld	[lr, #192]			\n\
-	pld	[lr, #224]			\n\
-	stmia	r0, {r2 - r9}			\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
-	stmia	r0, {r2 - r9}			\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
-	stmia	r0, {r2 - r9}			\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
-	stmia	r0, {r2 - r9}			\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
-	stmia	r0, {r2 - r9}			\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
-	stmia	r0, {r2 - r9}			\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
-	stmia	r0, {r2 - r9}			\n\
-	ldmia	r1!, {r2 - r9}			\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
-	stmia	r0, {r2 - r9}			\n\
-	subs	ip, ip, #(32 * 8)		\n\
-	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
-	add	r0, r0, #32			\n\
+	int tmp;
+
+	asm volatile ("\
+1:	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	pld	[%1, #0]			\n\
+	pld	[%1, #32]			\n\
+	pld	[%1, #64]			\n\
+	pld	[%1, #96]			\n\
+	pld	[%1, #128]			\n\
+	pld	[%1, #160]			\n\
+	pld	[%1, #192]			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	ldmia	%1!, {r2 - r7, ip, lr}		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
+	stmia	%0, {r2 - r7, ip, lr}		\n\
+	subs	%2, %2, #(32 * 8)		\n\
+	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
+	add	%0, %0, #32			\n\
 	bne	1b				\n\
-	mcr	p15, 0, ip, c7, c10, 4		@ drain WB\n\
-	ldmfd	sp!, {r4-r9, pc}"
-	:
-	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE));
+	mcr	p15, 0, %2, c7, c10, 4		@ drain WB"
+	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
+	: "2" (PAGE_SIZE)
+	: "r2", "r3", "r4", "r5", "r6", "r7", "ip", "lr");
 }
 
 void feroceon_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
index 0224416cba..b03202cddd 100644
--- a/arch/arm/mm/copypage-v4mc.c
+++ b/arch/arm/mm/copypage-v4mc.c
@@ -40,12 +40,11 @@ static DEFINE_RAW_SPINLOCK(minicache_lock);
  * instruction.  If your processor does not supply this, you have to write your
  * own copy_user_highpage that does the right thing.
  */
-static void __naked
-mc_copy_user_page(void *from, void *to)
+static void mc_copy_user_page(void *from, void *to)
 {
-	asm volatile(
-	"stmfd	sp!, {r4, lr}			@ 2\n\
-	mov	r4, %2				@ 1\n\
+	int tmp;
+
+	asm volatile ("\
 	ldmia	%0!, {r2, r3, ip, lr}		@ 4\n\
 1:	mcr	p15, 0, %1, c7, c6, 1		@ 1   invalidate D line\n\
 	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
@@ -55,13 +54,13 @@ mc_copy_user_page(void *from, void *to)
 	mcr	p15, 0, %1, c7, c6, 1		@ 1   invalidate D line\n\
 	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
 	ldmia	%0!, {r2, r3, ip, lr}		@ 4\n\
-	subs	r4, r4, #1			@ 1\n\
+	subs	%2, %2, #1			@ 1\n\
 	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
 	ldmneia	%0!, {r2, r3, ip, lr}		@ 4\n\
-	bne	1b				@ 1\n\
-	ldmfd	sp!, {r4, pc}			@ 3"
-	:
-	: "r" (from), "r" (to), "I" (PAGE_SIZE / 64));
+	bne	1b				@ "
+	: "+&r" (from), "+&r" (to), "=&r" (tmp)
+	: "2" (PAGE_SIZE / 64)
+	: "r2", "r3", "ip", "lr");
 }
 
 void v4_mc_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-v4wb.c b/arch/arm/mm/copypage-v4wb.c
index 067d0fdd63..cd3e165afe 100644
--- a/arch/arm/mm/copypage-v4wb.c
+++ b/arch/arm/mm/copypage-v4wb.c
@@ -22,29 +22,28 @@
  * instruction.  If your processor does not supply this, you have to write your
  * own copy_user_highpage that does the right thing.
  */
-static void __naked
-v4wb_copy_user_page(void *kto, const void *kfrom)
+static void v4wb_copy_user_page(void *kto, const void *kfrom)
 {
-	asm("\
-	stmfd	sp!, {r4, lr}			@ 2\n\
-	mov	r2, %2				@ 1\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-1:	mcr	p15, 0, r0, c7, c6, 1		@ 1   invalidate D line\n\
-	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4+1\n\
-	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-	mcr	p15, 0, r0, c7, c6, 1		@ 1   invalidate D line\n\
-	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-	subs	r2, r2, #1			@ 1\n\
-	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmneia	r1!, {r3, r4, ip, lr}		@ 4\n\
+	int tmp;
+
+	asm volatile ("\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+1:	mcr	p15, 0, %0, c7, c6, 1		@ 1   invalidate D line\n\
+	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4+1\n\
+	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+	mcr	p15, 0, %0, c7, c6, 1		@ 1   invalidate D line\n\
+	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+	subs	%2, %2, #1			@ 1\n\
+	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmneia	%1!, {r3, r4, ip, lr}		@ 4\n\
 	bne	1b				@ 1\n\
-	mcr	p15, 0, r1, c7, c10, 4		@ 1   drain WB\n\
-	ldmfd	 sp!, {r4, pc}			@ 3"
-	:
-	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64));
+	mcr	p15, 0, %1, c7, c10, 4		@ 1   drain WB"
+	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
+	: "2" (PAGE_SIZE / 64)
+	: "r3", "r4", "ip", "lr");
 }
 
 void v4wb_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-v4wt.c b/arch/arm/mm/copypage-v4wt.c
index b85c5da2e5..8614572e12 100644
--- a/arch/arm/mm/copypage-v4wt.c
+++ b/arch/arm/mm/copypage-v4wt.c
@@ -20,27 +20,26 @@
  * dirty data in the cache.  However, we do have to ensure that
  * subsequent reads are up to date.
  */
-static void __naked
-v4wt_copy_user_page(void *kto, const void *kfrom)
+static void v4wt_copy_user_page(void *kto, const void *kfrom)
 {
-	asm("\
-	stmfd	sp!, {r4, lr}			@ 2\n\
-	mov	r2, %2				@ 1\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-1:	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4+1\n\
-	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
-	subs	r2, r2, #1			@ 1\n\
-	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
-	ldmneia	r1!, {r3, r4, ip, lr}		@ 4\n\
+	int tmp;
+
+	asm volatile ("\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+1:	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4+1\n\
+	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
+	subs	%2, %2, #1			@ 1\n\
+	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
+	ldmneia	%1!, {r3, r4, ip, lr}		@ 4\n\
 	bne	1b				@ 1\n\
-	mcr	p15, 0, r2, c7, c7, 0		@ flush ID cache\n\
-	ldmfd	sp!, {r4, pc}			@ 3"
-	:
-	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64));
+	mcr	p15, 0, %2, c7, c7, 0		@ flush ID cache"
+	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
+	: "2" (PAGE_SIZE / 64)
+	: "r3", "r4", "ip", "lr");
 }
 
 void v4wt_copy_user_highpage(struct page *to, struct page *from,
diff --git a/arch/arm/mm/copypage-xsc3.c b/arch/arm/mm/copypage-xsc3.c
index 03a2042ace..55cbc3a89d 100644
--- a/arch/arm/mm/copypage-xsc3.c
+++ b/arch/arm/mm/copypage-xsc3.c
@@ -21,53 +21,46 @@
 
 /*
  * XSC3 optimised copy_user_highpage
- *  r0 = destination
- *  r1 = source
  *
  * The source page may have some clean entries in the cache already, but we
  * can safely ignore them - break_cow() will flush them out of the cache
  * if we eventually end up using our copied page.
  *
  */
-static void __naked
-xsc3_mc_copy_user_page(void *kto, const void *kfrom)
+static void xsc3_mc_copy_user_page(void *kto, const void *kfrom)
 {
-	asm("\
-	stmfd	sp!, {r4, r5, lr}		\n\
-	mov	lr, %2				\n\
-						\n\
-	pld	[r1, #0]			\n\
-	pld	[r1, #32]			\n\
-1:	pld	[r1, #64]			\n\
-	pld	[r1, #96]			\n\
+	int tmp;
+
+	asm volatile ("\
+	pld	[%1, #0]			\n\
+	pld	[%1, #32]			\n\
+1:	pld	[%1, #64]			\n\
+	pld	[%1, #96]			\n\
 						\n\
-2:	ldrd	r2, [r1], #8			\n\
-	mov	ip, r0				\n\
-	ldrd	r4, [r1], #8			\n\
-	mcr	p15, 0, ip, c7, c6, 1		@ invalidate\n\
-	strd	r2, [r0], #8			\n\
-	ldrd	r2, [r1], #8			\n\
-	strd	r4, [r0], #8			\n\
-	ldrd	r4, [r1], #8			\n\
-	strd	r2, [r0], #8			\n\
-	strd	r4, [r0], #8			\n\
-	ldrd	r2, [r1], #8			\n\
-	mov	ip, r0				\n\
-	ldrd	r4, [r1], #8			\n\
-	mcr	p15, 0, ip, c7, c6, 1		@ invalidate\n\
-	strd	r2, [r0], #8			\n\
-	ldrd	r2, [r1], #8			\n\
-	subs	lr, lr, #1			\n\
-	strd	r4, [r0], #8			\n\
-	ldrd	r4, [r1], #8			\n\
-	strd	r2, [r0], #8			\n\
-	strd	r4, [r0], #8			\n\
+2:	ldrd	r2, [%1], #8			\n\
+	ldrd	r4, [%1], #8			\n\
+	mcr	p15, 0, %0, c7, c6, 1		@ invalidate\n\
+	strd	r2, [%0], #8			\n\
+	ldrd	r2, [%1], #8			\n\
+	strd	r4, [%0], #8			\n\
+	ldrd	r4, [%1], #8			\n\
+	strd	r2, [%0], #8			\n\
+	strd	r4, [%0], #8			\n\
+	ldrd	r2, [%1], #8			\n\
+	ldrd	r4, [%1], #8			\n\
+	mcr	p15, 0, %0, c7, c6, 1		@ invalidate\n\
+	strd	r2, [%0], #8			\n\
+	ldrd	r2, [%1], #8			\n\
+	subs	%2, %2, #1			\n\
+	strd	r4, [%0], #8			\n\
+	ldrd	r4, [%1], #8			\n\
+	strd	r2, [%0], #8			\n\
+	strd	r4, [%0], #8			\n\
 	bgt	1b				\n\
-	beq	2b				\n\
-						\n\
-	ldmfd	sp!, {r4, r5, pc}"
-	:
-	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64 - 1));
+	beq	2b				"
+	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
+	: "2" (PAGE_SIZE / 64 - 1)
+	: "r2", "r3", "r4", "r5");
 }
 
 void xsc3_mc_copy_user_highpage(struct page *to, struct page *from,
@@ -85,8 +78,6 @@ void xsc3_mc_copy_user_highpage(struct page *to, struct page *from,
 
 /*
  * XScale optimised clear_user_page
- *  r0 = destination
- *  r1 = virtual user address of ultimate destination page
  */
 void xsc3_mc_clear_user_highpage(struct page *page, unsigned long vaddr)
 {
diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
index 97972379f4..b0ae8c7acb 100644
--- a/arch/arm/mm/copypage-xscale.c
+++ b/arch/arm/mm/copypage-xscale.c
@@ -36,52 +36,51 @@ static DEFINE_RAW_SPINLOCK(minicache_lock);
  * Dcache aliasing issue.  The writes will be forwarded to the write buffer,
  * and merged as appropriate.
  */
-static void __naked
-mc_copy_user_page(void *from, void *to)
+static void mc_copy_user_page(void *from, void *to)
 {
+	int tmp;
+
 	/*
 	 * Strangely enough, best performance is achieved
 	 * when prefetching destination as well.  (NP)
 	 */
-	asm volatile(
-	"stmfd	sp!, {r4, r5, lr}		\n\
-	mov	lr, %2				\n\
-	pld	[r0, #0]			\n\
-	pld	[r0, #32]			\n\
-	pld	[r1, #0]			\n\
-	pld	[r1, #32]			\n\
-1:	pld	[r0, #64]			\n\
-	pld	[r0, #96]			\n\
-	pld	[r1, #64]			\n\
-	pld	[r1, #96]			\n\
-2:	ldrd	r2, [r0], #8			\n\
-	ldrd	r4, [r0], #8			\n\
-	mov	ip, r1				\n\
-	strd	r2, [r1], #8			\n\
-	ldrd	r2, [r0], #8			\n\
-	strd	r4, [r1], #8			\n\
-	ldrd	r4, [r0], #8			\n\
-	strd	r2, [r1], #8			\n\
-	strd	r4, [r1], #8			\n\
+	asm volatile ("\
+	pld	[%0, #0]			\n\
+	pld	[%0, #32]			\n\
+	pld	[%1, #0]			\n\
+	pld	[%1, #32]			\n\
+1:	pld	[%0, #64]			\n\
+	pld	[%0, #96]			\n\
+	pld	[%1, #64]			\n\
+	pld	[%1, #96]			\n\
+2:	ldrd	r2, [%0], #8			\n\
+	ldrd	r4, [%0], #8			\n\
+	mov	ip, %1				\n\
+	strd	r2, [%1], #8			\n\
+	ldrd	r2, [%0], #8			\n\
+	strd	r4, [%1], #8			\n\
+	ldrd	r4, [%0], #8			\n\
+	strd	r2, [%1], #8			\n\
+	strd	r4, [%1], #8			\n\
 	mcr	p15, 0, ip, c7, c10, 1		@ clean D line\n\
-	ldrd	r2, [r0], #8			\n\
+	ldrd	r2, [%0], #8			\n\
 	mcr	p15, 0, ip, c7, c6, 1		@ invalidate D line\n\
-	ldrd	r4, [r0], #8			\n\
-	mov	ip, r1				\n\
-	strd	r2, [r1], #8			\n\
-	ldrd	r2, [r0], #8			\n\
-	strd	r4, [r1], #8			\n\
-	ldrd	r4, [r0], #8			\n\
-	strd	r2, [r1], #8			\n\
-	strd	r4, [r1], #8			\n\
+	ldrd	r4, [%0], #8			\n\
+	mov	ip, %1				\n\
+	strd	r2, [%1], #8			\n\
+	ldrd	r2, [%0], #8			\n\
+	strd	r4, [%1], #8			\n\
+	ldrd	r4, [%0], #8			\n\
+	strd	r2, [%1], #8			\n\
+	strd	r4, [%1], #8			\n\
 	mcr	p15, 0, ip, c7, c10, 1		@ clean D line\n\
-	subs	lr, lr, #1			\n\
+	subs	%2, %2, #1			\n\
 	mcr	p15, 0, ip, c7, c6, 1		@ invalidate D line\n\
 	bgt	1b				\n\
-	beq	2b				\n\
-	ldmfd	sp!, {r4, r5, pc}		"
-	:
-	: "r" (from), "r" (to), "I" (PAGE_SIZE / 64 - 1));
+	beq	2b				"
+	: "+&r" (from), "+&r" (to), "=&r" (tmp)
+	: "2" (PAGE_SIZE / 64 - 1)
+	: "r2", "r3", "r4", "r5", "ip");
 }
 
 void xscale_mc_copy_user_highpage(struct page *to, struct page *from,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-11-06  4:49         ` Nicolas Pitre
@ 2018-11-06 13:16           ` Robin Murphy
  2018-11-06 13:25             ` Nicolas Pitre
  2018-11-07 16:27           ` Stefan Agner
  1 sibling, 1 reply; 21+ messages in thread
From: Robin Murphy @ 2018-11-06 13:16 UTC (permalink / raw)
  To: Nicolas Pitre, Stefan Agner
  Cc: Arnd Bergmann, Roman Yeryomin, Linus Walleij,
	Russell King - ARM Linux, linux-kernel, Joel Stanley,
	Hans Ulli Kroll, Linux ARM

On 06/11/2018 04:49, Nicolas Pitre wrote:
[...]
>> r3 and r5 are not used above, so no need to have them in the clobber
>> list.
> 
> They are used. ldrd and strd instructions always use a pair of
> consecutive registers. So "ldrd r2, ..." loads into r2-r3 and "ldrd r4, ..."
> loads into r4-r5.

FWIW, since we should now be enabling unified syntax everywhere, I guess 
we could probably rewrite all those ldrd/strd to the UAL 3-operand form 
- i.e. "ldrd r2, r3, [...]" - if we really cared for the extra clarity.

Robin.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-11-06 13:16           ` Robin Murphy
@ 2018-11-06 13:25             ` Nicolas Pitre
  0 siblings, 0 replies; 21+ messages in thread
From: Nicolas Pitre @ 2018-11-06 13:25 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Stefan Agner, Arnd Bergmann, Roman Yeryomin, Linus Walleij,
	Russell King - ARM Linux, linux-kernel, Joel Stanley,
	Hans Ulli Kroll, Linux ARM

On Tue, 6 Nov 2018, Robin Murphy wrote:

> On 06/11/2018 04:49, Nicolas Pitre wrote:
> [...]
> >> r3 and r5 are not used above, so no need to have them in the clobber
> >> list.
> > 
> > They are used. ldrd and strd instructions always use a pair of
> > consecutive registers. So "ldrd r2, ..." loads into r2-r3 and "ldrd r4, ..."
> > loads into r4-r5.
> 
> FWIW, since we should now be enabling unified syntax everywhere, I guess we
> could probably rewrite all those ldrd/strd to the UAL 3-operand form - i.e.
> "ldrd r2, r3, [...]" - if we really cared for the extra clarity.

Good idea. Worthy of a separate patch though.


Nicolas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-11-06  4:49         ` Nicolas Pitre
  2018-11-06 13:16           ` Robin Murphy
@ 2018-11-07 16:27           ` Stefan Agner
  2018-11-07 16:58             ` Nicolas Pitre
  1 sibling, 1 reply; 21+ messages in thread
From: Stefan Agner @ 2018-11-07 16:27 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Russell King - ARM Linux, Linus Walleij, Hans Ulli Kroll,
	Joel Stanley, Arnd Bergmann, Linux ARM, linux-kernel,
	Roman Yeryomin

On 06.11.2018 05:49, Nicolas Pitre wrote:
> On Tue, 6 Nov 2018, Stefan Agner wrote:
> 
>> On 16.10.2018 22:43, Nicolas Pitre wrote:
>> > Subject: [PATCH] ARM: remove naked function usage
>> >
>> > Convert page copy functions not to rely on the naked function attribute.
>> >
>> > This attribute is known to confuse some gcc versions when function
>> > arguments aren't explicitly listed as inline assembly operands despite
>> > the gcc documentation. That resulted in commit 9a40ac86152c ("ARM:
>> > 6164/1: Add kto and kfrom to input operands list.").
>> >
>> > Yet that commit has problems of its own by having assembly operand
>> > constraints completely wrong. If the generated code has been OK since
>> > then, it is due to luck rather than correctness. So this patch provides
>> > proper assembly operand usage, and removes two instances of redundant
>> > register duplications in the implementation while at it.
>> >
>> > Inspection of the generated code with this patch doesn't show any obvious
>> > quality degradation either, so not relying on __naked at all will make
>> > the code less fragile, and more likely to be compilable with clang.
>> >
>> > The only remaining __naked instances (excluding the kprobes test cases)
>> > are exynos_pm_power_up_setup() and tc2_pm_power_up_setup(). But in those
>> > cases only the function address is used by the compiler with no chance of
>> > inlining it by mistake.
>> >
>> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
>>
>> As mentioned a couple of weeks ago, I did test this patchset on two
>> architectures (pxa_defconfig -> copypage-xscale.c and
>> versatile_defconfig -> copypage-v4wb.c).
>>
>> I really like this approach, can we move forward with this?
> 
> Yes, the patch was submitted to the patch tracker a few days later.
> 

Oh sorry, didn't realize that!

<snip>
>> > +	asm volatile ("\
>> > +	pld	[%1, #0]			\n\
>> > +	pld	[%1, #32]			\n\
>> > +1:	pld	[%1, #64]			\n\
>> > +	pld	[%1, #96]			\n\
>> >  						\n\
>> > -2:	ldrd	r2, [r1], #8			\n\
>> > -	mov	ip, r0				\n\
>> > -	ldrd	r4, [r1], #8			\n\
>> > -	mcr	p15, 0, ip, c7, c6, 1		@ invalidate\n\
>> > -	strd	r2, [r0], #8			\n\
>> > -	ldrd	r2, [r1], #8			\n\
>> > -	strd	r4, [r0], #8			\n\
>> > -	ldrd	r4, [r1], #8			\n\
>> > -	strd	r2, [r0], #8			\n\
>> > -	strd	r4, [r0], #8			\n\
>> > -	ldrd	r2, [r1], #8			\n\
>> > -	mov	ip, r0				\n\
>> > -	ldrd	r4, [r1], #8			\n\
>> > -	mcr	p15, 0, ip, c7, c6, 1		@ invalidate\n\
>> > -	strd	r2, [r0], #8			\n\
>> > -	ldrd	r2, [r1], #8			\n\
>> > -	subs	lr, lr, #1			\n\
>> > -	strd	r4, [r0], #8			\n\
>> > -	ldrd	r4, [r1], #8			\n\
>> > -	strd	r2, [r0], #8			\n\
>> > -	strd	r4, [r0], #8			\n\
>> > +2:	ldrd	r2, [%1], #8			\n\
>> > +	ldrd	r4, [%1], #8			\n\
>> > +	mcr	p15, 0, %0, c7, c6, 1		@ invalidate\n\
>> > +	strd	r2, [%0], #8			\n\
>> > +	ldrd	r2, [%1], #8			\n\
>> > +	strd	r4, [%0], #8			\n\
>> > +	ldrd	r4, [%1], #8			\n\
>> > +	strd	r2, [%0], #8			\n\
>> > +	strd	r4, [%0], #8			\n\
>> > +	ldrd	r2, [%1], #8			\n\
>> > +	ldrd	r4, [%1], #8			\n\
>> > +	mcr	p15, 0, %0, c7, c6, 1		@ invalidate\n\
>> > +	strd	r2, [%0], #8			\n\
>> > +	ldrd	r2, [%1], #8			\n\
>> > +	subs	%2, %2, #1			\n\
>> > +	strd	r4, [%0], #8			\n\
>> > +	ldrd	r4, [%1], #8			\n\
>> > +	strd	r2, [%0], #8			\n\
>> > +	strd	r4, [%0], #8			\n\
>> >  	bgt	1b				\n\
>> > -	beq	2b				\n\
>> > -						\n\
>> > -	ldmfd	sp!, {r4, r5, pc}"
>> > -	:
>> > -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64 - 1));
>> > +	beq	2b				"
>> > +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
>> > +	: "2" (PAGE_SIZE / 64 - 1)
>> > +	: "r2", "r3", "r4", "r5");
>>
>> r3 and r5 are not used above, so no need to have them in the clobber
>> list.
> 
> They are used. ldrd and strd instructions always use a pair of 
> consecutive registers. So "ldrd r2, ..." loads into r2-r3 and "ldrd r4, ..." 
> loads into r4-r5.

Oh I see. The clobber list is fine then!

> 
>> > diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
>> > index 97972379f4..fa0be66082 100644
>> > --- a/arch/arm/mm/copypage-xscale.c
>> > +++ b/arch/arm/mm/copypage-xscale.c
>> > @@ -36,52 +36,50 @@ static DEFINE_RAW_SPINLOCK(minicache_lock);
>> >   * Dcache aliasing issue.  The writes will be forwarded to the write buffer,
>> >   * and merged as appropriate.
>> >   */
>> > -static void __naked
>> > -mc_copy_user_page(void *from, void *to)
>> > +static void mc_copy_user_page(void *from, void *to)
>> >  {
>> > +	int tmp;
>> >  	/*
>> >  	 * Strangely enough, best performance is achieved
>> >  	 * when prefetching destination as well.  (NP)
>> >  	 */
>> > -	asm volatile(
>> > -	"stmfd	sp!, {r4, r5, lr}		\n\
>> > -	mov	lr, %2				\n\
>> > -	pld	[r0, #0]			\n\
>> > -	pld	[r0, #32]			\n\
>> > -	pld	[r1, #0]			\n\
>> > -	pld	[r1, #32]			\n\
>> > -1:	pld	[r0, #64]			\n\
>> > -	pld	[r0, #96]			\n\
>> > -	pld	[r1, #64]			\n\
>> > -	pld	[r1, #96]			\n\
>> > -2:	ldrd	r2, [r0], #8			\n\
>> > -	ldrd	r4, [r0], #8			\n\
>> > -	mov	ip, r1				\n\
>> > -	strd	r2, [r1], #8			\n\
>> > -	ldrd	r2, [r0], #8			\n\
>> > -	strd	r4, [r1], #8			\n\
>> > -	ldrd	r4, [r0], #8			\n\
>> > -	strd	r2, [r1], #8			\n\
>> > -	strd	r4, [r1], #8			\n\
>> > +	asm volatile ("\
>> > +	pld	[%0, #0]			\n\
>> > +	pld	[%0, #32]			\n\
>> > +	pld	[%1, #0]			\n\
>> > +	pld	[%1, #32]			\n\
>> > +1:	pld	[%0, #64]			\n\
>> > +	pld	[%0, #96]			\n\
>> > +	pld	[%1, #64]			\n\
>> > +	pld	[%1, #96]			\n\
>> > +2:	ldrd	r2, [%0], #8			\n\
>> > +	ldrd	r4, [%0], #8			\n\
>> > +	mov	ip, %1				\n\
>> > +	strd	r2, [%1], #8			\n\
>> > +	ldrd	r2, [%0], #8			\n\
>> > +	strd	r4, [%1], #8			\n\
>> > +	ldrd	r4, [%0], #8			\n\
>> > +	strd	r2, [%1], #8			\n\
>> > +	strd	r4, [%1], #8			\n\
>> >  	mcr	p15, 0, ip, c7, c10, 1		@ clean D line\n\
>>
>> How about using %1 here directly and skip the move to ip, as you did in
>> copypage-xsc3.c above?
> 
> No. The cache line that needs cleaning is the line that we just wrote 
> to. %1 is now pointing at the next cache line at this point. That is why 
> %1 needs to be preserved into ip before it is incremented.

Got it, in copypage-xsc3.c r0 got copied before the first strd.

> 
> So here's the revised patch. It now has full compile-test coverage for 
> real this time. Would you mind reviewing it again before I resubmit it 
> please?

Compile tested all copypage implementations with the revised patch using
Clang too, everything builds fine.

FWIW, I used this defconfigs:
copypage-fa.c: moxart_defconfig
copypage-feroceon.c: mvebu_v5_defconfig
copypage-v4mc.c: h3600_defconfig+CONFIG_AEABI
copypage-v4wb.c/v4wt.c: multi_v4t_defconfig
copypage-xsc3.c/scale.c: pxa_defconfig-CONFIG_FTRACE

The changes look good to me:

Reviewed-by: Stefan Agner <stefan@agner.ch>

> 
> ----- >8
> Subject: [PATCH] remove unneeded naked function usage
> 
> Convert page copy functions not to rely on the naked function attribute.
> 
> This attribute is known to confuse some old gcc versions when function
> arguments aren't explicitly listed as inline assembly operands despite
> the gcc documentation. That resulted in commit 9a40ac86152c ("ARM:
> 6164/1: Add kto and kfrom to input operands list.").
> 
> Yet that commit has problems of its own by having assembly operand
> constraints completely wrong. If the generated code has been OK since
> then, it is due to luck rather than correctness. So this patch also
> provides proper assembly operand constraints, and removes two instances
> of redundant register usages in the implementation while at it.
> 
> Inspection of the generated code with this patch doesn't show any obvious
> quality degradation either, so not relying on __naked at all will make
> the code less fragile, and avoid some issues with clang.
> 
> The only remaining __naked instances (excluding the kprobes test cases)
> are exynos_pm_power_up_setup(), tc2_pm_power_up_setup() and
> cci_enable_port_for_self(. But in the first two cases, only the function
> address is used by the compiler with no chance of inlining it by 
> mistake, and the third case is called from assembly code only.
> And the fact that no stack is available when the corresponding code is
> executed does warrant the __naked usage in those cases.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> 
> diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c
> index d130a5ece5..bf24690ec8 100644
> --- a/arch/arm/mm/copypage-fa.c
> +++ b/arch/arm/mm/copypage-fa.c
> @@ -17,26 +17,25 @@
>  /*
>   * Faraday optimised copy_user_page
>   */
> -static void __naked
> -fa_copy_user_page(void *kto, const void *kfrom)
> +static void fa_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4, lr}			@ 2\n\
> -	mov	r2, %0				@ 1\n\
> -1:	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> -	add	r0, r0, #16			@ 1\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	stmia	r0, {r3, r4, ip, lr}		@ 4\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> -	add	r0, r0, #16			@ 1\n\
> -	subs	r2, r2, #1			@ 1\n\
> +	int tmp;
> +
> +	asm volatile ("\
> +1:	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	stmia	%0, {r3, r4, ip, lr}		@ 4\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> +	add	%0, %0, #16			@ 1\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	stmia	%0, {r3, r4, ip, lr}		@ 4\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ 1   clean and invalidate D line\n\
> +	add	%0, %0, #16			@ 1\n\
> +	subs	%2, %2, #1			@ 1\n\
>  	bne	1b				@ 1\n\
> -	mcr	p15, 0, r2, c7, c10, 4		@ 1   drain WB\n\
> -	ldmfd	sp!, {r4, pc}			@ 3"
> -	:
> -	: "I" (PAGE_SIZE / 32));
> +	mcr	p15, 0, %2, c7, c10, 4		@ 1   drain WB"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 32)
> +	: "r3", "r4", "ip", "lr");
>  }
>  
>  void fa_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-feroceon.c b/arch/arm/mm/copypage-feroceon.c
> index 49ee0c1a72..cc819732d9 100644
> --- a/arch/arm/mm/copypage-feroceon.c
> +++ b/arch/arm/mm/copypage-feroceon.c
> @@ -13,58 +13,56 @@
>  #include <linux/init.h>
>  #include <linux/highmem.h>
>  
> -static void __naked
> -feroceon_copy_user_page(void *kto, const void *kfrom)
> +static void feroceon_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4-r9, lr}		\n\
> -	mov	ip, %2				\n\
> -1:	mov	lr, r1				\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	pld	[lr, #32]			\n\
> -	pld	[lr, #64]			\n\
> -	pld	[lr, #96]			\n\
> -	pld	[lr, #128]			\n\
> -	pld	[lr, #160]			\n\
> -	pld	[lr, #192]			\n\
> -	pld	[lr, #224]			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	ldmia	r1!, {r2 - r9}			\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> -	stmia	r0, {r2 - r9}			\n\
> -	subs	ip, ip, #(32 * 8)		\n\
> -	mcr	p15, 0, r0, c7, c14, 1		@ clean and invalidate D line\n\
> -	add	r0, r0, #32			\n\
> +	int tmp;
> +
> +	asm volatile ("\
> +1:	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	pld	[%1, #0]			\n\
> +	pld	[%1, #32]			\n\
> +	pld	[%1, #64]			\n\
> +	pld	[%1, #96]			\n\
> +	pld	[%1, #128]			\n\
> +	pld	[%1, #160]			\n\
> +	pld	[%1, #192]			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	ldmia	%1!, {r2 - r7, ip, lr}		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
> +	stmia	%0, {r2 - r7, ip, lr}		\n\
> +	subs	%2, %2, #(32 * 8)		\n\
> +	mcr	p15, 0, %0, c7, c14, 1		@ clean and invalidate D line\n\
> +	add	%0, %0, #32			\n\
>  	bne	1b				\n\
> -	mcr	p15, 0, ip, c7, c10, 4		@ drain WB\n\
> -	ldmfd	sp!, {r4-r9, pc}"
> -	:
> -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE));
> +	mcr	p15, 0, %2, c7, c10, 4		@ drain WB"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> +	: "2" (PAGE_SIZE)
> +	: "r2", "r3", "r4", "r5", "r6", "r7", "ip", "lr");
>  }
>  
>  void feroceon_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
> index 0224416cba..b03202cddd 100644
> --- a/arch/arm/mm/copypage-v4mc.c
> +++ b/arch/arm/mm/copypage-v4mc.c
> @@ -40,12 +40,11 @@ static DEFINE_RAW_SPINLOCK(minicache_lock);
>   * instruction.  If your processor does not supply this, you have to write your
>   * own copy_user_highpage that does the right thing.
>   */
> -static void __naked
> -mc_copy_user_page(void *from, void *to)
> +static void mc_copy_user_page(void *from, void *to)
>  {
> -	asm volatile(
> -	"stmfd	sp!, {r4, lr}			@ 2\n\
> -	mov	r4, %2				@ 1\n\
> +	int tmp;
> +
> +	asm volatile ("\
>  	ldmia	%0!, {r2, r3, ip, lr}		@ 4\n\
>  1:	mcr	p15, 0, %1, c7, c6, 1		@ 1   invalidate D line\n\
>  	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
> @@ -55,13 +54,13 @@ mc_copy_user_page(void *from, void *to)
>  	mcr	p15, 0, %1, c7, c6, 1		@ 1   invalidate D line\n\
>  	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
>  	ldmia	%0!, {r2, r3, ip, lr}		@ 4\n\
> -	subs	r4, r4, #1			@ 1\n\
> +	subs	%2, %2, #1			@ 1\n\
>  	stmia	%1!, {r2, r3, ip, lr}		@ 4\n\
>  	ldmneia	%0!, {r2, r3, ip, lr}		@ 4\n\
> -	bne	1b				@ 1\n\
> -	ldmfd	sp!, {r4, pc}			@ 3"
> -	:
> -	: "r" (from), "r" (to), "I" (PAGE_SIZE / 64));
> +	bne	1b				@ "
> +	: "+&r" (from), "+&r" (to), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64)
> +	: "r2", "r3", "ip", "lr");
>  }
>  
>  void v4_mc_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-v4wb.c b/arch/arm/mm/copypage-v4wb.c
> index 067d0fdd63..cd3e165afe 100644
> --- a/arch/arm/mm/copypage-v4wb.c
> +++ b/arch/arm/mm/copypage-v4wb.c
> @@ -22,29 +22,28 @@
>   * instruction.  If your processor does not supply this, you have to write your
>   * own copy_user_highpage that does the right thing.
>   */
> -static void __naked
> -v4wb_copy_user_page(void *kto, const void *kfrom)
> +static void v4wb_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4, lr}			@ 2\n\
> -	mov	r2, %2				@ 1\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -1:	mcr	p15, 0, r0, c7, c6, 1		@ 1   invalidate D line\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4+1\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	mcr	p15, 0, r0, c7, c6, 1		@ 1   invalidate D line\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	subs	r2, r2, #1			@ 1\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmneia	r1!, {r3, r4, ip, lr}		@ 4\n\
> +	int tmp;
> +
> +	asm volatile ("\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +1:	mcr	p15, 0, %0, c7, c6, 1		@ 1   invalidate D line\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4+1\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	mcr	p15, 0, %0, c7, c6, 1		@ 1   invalidate D line\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	subs	%2, %2, #1			@ 1\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmneia	%1!, {r3, r4, ip, lr}		@ 4\n\
>  	bne	1b				@ 1\n\
> -	mcr	p15, 0, r1, c7, c10, 4		@ 1   drain WB\n\
> -	ldmfd	 sp!, {r4, pc}			@ 3"
> -	:
> -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64));
> +	mcr	p15, 0, %1, c7, c10, 4		@ 1   drain WB"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64)
> +	: "r3", "r4", "ip", "lr");
>  }
>  
>  void v4wb_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-v4wt.c b/arch/arm/mm/copypage-v4wt.c
> index b85c5da2e5..8614572e12 100644
> --- a/arch/arm/mm/copypage-v4wt.c
> +++ b/arch/arm/mm/copypage-v4wt.c
> @@ -20,27 +20,26 @@
>   * dirty data in the cache.  However, we do have to ensure that
>   * subsequent reads are up to date.
>   */
> -static void __naked
> -v4wt_copy_user_page(void *kto, const void *kfrom)
> +static void v4wt_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4, lr}			@ 2\n\
> -	mov	r2, %2				@ 1\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -1:	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4+1\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmia	r1!, {r3, r4, ip, lr}		@ 4\n\
> -	subs	r2, r2, #1			@ 1\n\
> -	stmia	r0!, {r3, r4, ip, lr}		@ 4\n\
> -	ldmneia	r1!, {r3, r4, ip, lr}		@ 4\n\
> +	int tmp;
> +
> +	asm volatile ("\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +1:	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4+1\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmia	%1!, {r3, r4, ip, lr}		@ 4\n\
> +	subs	%2, %2, #1			@ 1\n\
> +	stmia	%0!, {r3, r4, ip, lr}		@ 4\n\
> +	ldmneia	%1!, {r3, r4, ip, lr}		@ 4\n\
>  	bne	1b				@ 1\n\
> -	mcr	p15, 0, r2, c7, c7, 0		@ flush ID cache\n\
> -	ldmfd	sp!, {r4, pc}			@ 3"
> -	:
> -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64));
> +	mcr	p15, 0, %2, c7, c7, 0		@ flush ID cache"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64)
> +	: "r3", "r4", "ip", "lr");
>  }
>  
>  void v4wt_copy_user_highpage(struct page *to, struct page *from,
> diff --git a/arch/arm/mm/copypage-xsc3.c b/arch/arm/mm/copypage-xsc3.c
> index 03a2042ace..55cbc3a89d 100644
> --- a/arch/arm/mm/copypage-xsc3.c
> +++ b/arch/arm/mm/copypage-xsc3.c
> @@ -21,53 +21,46 @@
>  
>  /*
>   * XSC3 optimised copy_user_highpage
> - *  r0 = destination
> - *  r1 = source
>   *
>   * The source page may have some clean entries in the cache already, but we
>   * can safely ignore them - break_cow() will flush them out of the cache
>   * if we eventually end up using our copied page.
>   *
>   */
> -static void __naked
> -xsc3_mc_copy_user_page(void *kto, const void *kfrom)
> +static void xsc3_mc_copy_user_page(void *kto, const void *kfrom)
>  {
> -	asm("\
> -	stmfd	sp!, {r4, r5, lr}		\n\
> -	mov	lr, %2				\n\
> -						\n\
> -	pld	[r1, #0]			\n\
> -	pld	[r1, #32]			\n\
> -1:	pld	[r1, #64]			\n\
> -	pld	[r1, #96]			\n\
> +	int tmp;
> +
> +	asm volatile ("\
> +	pld	[%1, #0]			\n\
> +	pld	[%1, #32]			\n\
> +1:	pld	[%1, #64]			\n\
> +	pld	[%1, #96]			\n\
>  						\n\
> -2:	ldrd	r2, [r1], #8			\n\
> -	mov	ip, r0				\n\
> -	ldrd	r4, [r1], #8			\n\
> -	mcr	p15, 0, ip, c7, c6, 1		@ invalidate\n\
> -	strd	r2, [r0], #8			\n\
> -	ldrd	r2, [r1], #8			\n\
> -	strd	r4, [r0], #8			\n\
> -	ldrd	r4, [r1], #8			\n\
> -	strd	r2, [r0], #8			\n\
> -	strd	r4, [r0], #8			\n\
> -	ldrd	r2, [r1], #8			\n\
> -	mov	ip, r0				\n\
> -	ldrd	r4, [r1], #8			\n\
> -	mcr	p15, 0, ip, c7, c6, 1		@ invalidate\n\
> -	strd	r2, [r0], #8			\n\
> -	ldrd	r2, [r1], #8			\n\
> -	subs	lr, lr, #1			\n\
> -	strd	r4, [r0], #8			\n\
> -	ldrd	r4, [r1], #8			\n\
> -	strd	r2, [r0], #8			\n\
> -	strd	r4, [r0], #8			\n\
> +2:	ldrd	r2, [%1], #8			\n\
> +	ldrd	r4, [%1], #8			\n\
> +	mcr	p15, 0, %0, c7, c6, 1		@ invalidate\n\
> +	strd	r2, [%0], #8			\n\
> +	ldrd	r2, [%1], #8			\n\
> +	strd	r4, [%0], #8			\n\
> +	ldrd	r4, [%1], #8			\n\
> +	strd	r2, [%0], #8			\n\
> +	strd	r4, [%0], #8			\n\
> +	ldrd	r2, [%1], #8			\n\
> +	ldrd	r4, [%1], #8			\n\
> +	mcr	p15, 0, %0, c7, c6, 1		@ invalidate\n\
> +	strd	r2, [%0], #8			\n\
> +	ldrd	r2, [%1], #8			\n\
> +	subs	%2, %2, #1			\n\
> +	strd	r4, [%0], #8			\n\
> +	ldrd	r4, [%1], #8			\n\
> +	strd	r2, [%0], #8			\n\
> +	strd	r4, [%0], #8			\n\
>  	bgt	1b				\n\
> -	beq	2b				\n\
> -						\n\
> -	ldmfd	sp!, {r4, r5, pc}"
> -	:
> -	: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64 - 1));
> +	beq	2b				"
> +	: "+&r" (kto), "+&r" (kfrom), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64 - 1)
> +	: "r2", "r3", "r4", "r5");
>  }
>  
>  void xsc3_mc_copy_user_highpage(struct page *to, struct page *from,
> @@ -85,8 +78,6 @@ void xsc3_mc_copy_user_highpage(struct page *to,
> struct page *from,
>  
>  /*
>   * XScale optimised clear_user_page
> - *  r0 = destination
> - *  r1 = virtual user address of ultimate destination page
>   */
>  void xsc3_mc_clear_user_highpage(struct page *page, unsigned long vaddr)
>  {
> diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
> index 97972379f4..b0ae8c7acb 100644
> --- a/arch/arm/mm/copypage-xscale.c
> +++ b/arch/arm/mm/copypage-xscale.c
> @@ -36,52 +36,51 @@ static DEFINE_RAW_SPINLOCK(minicache_lock);
>   * Dcache aliasing issue.  The writes will be forwarded to the write buffer,
>   * and merged as appropriate.
>   */
> -static void __naked
> -mc_copy_user_page(void *from, void *to)
> +static void mc_copy_user_page(void *from, void *to)
>  {
> +	int tmp;
> +
>  	/*
>  	 * Strangely enough, best performance is achieved
>  	 * when prefetching destination as well.  (NP)
>  	 */
> -	asm volatile(
> -	"stmfd	sp!, {r4, r5, lr}		\n\
> -	mov	lr, %2				\n\
> -	pld	[r0, #0]			\n\
> -	pld	[r0, #32]			\n\
> -	pld	[r1, #0]			\n\
> -	pld	[r1, #32]			\n\
> -1:	pld	[r0, #64]			\n\
> -	pld	[r0, #96]			\n\
> -	pld	[r1, #64]			\n\
> -	pld	[r1, #96]			\n\
> -2:	ldrd	r2, [r0], #8			\n\
> -	ldrd	r4, [r0], #8			\n\
> -	mov	ip, r1				\n\
> -	strd	r2, [r1], #8			\n\
> -	ldrd	r2, [r0], #8			\n\
> -	strd	r4, [r1], #8			\n\
> -	ldrd	r4, [r0], #8			\n\
> -	strd	r2, [r1], #8			\n\
> -	strd	r4, [r1], #8			\n\
> +	asm volatile ("\
> +	pld	[%0, #0]			\n\
> +	pld	[%0, #32]			\n\
> +	pld	[%1, #0]			\n\
> +	pld	[%1, #32]			\n\
> +1:	pld	[%0, #64]			\n\
> +	pld	[%0, #96]			\n\
> +	pld	[%1, #64]			\n\
> +	pld	[%1, #96]			\n\
> +2:	ldrd	r2, [%0], #8			\n\
> +	ldrd	r4, [%0], #8			\n\
> +	mov	ip, %1				\n\
> +	strd	r2, [%1], #8			\n\
> +	ldrd	r2, [%0], #8			\n\
> +	strd	r4, [%1], #8			\n\
> +	ldrd	r4, [%0], #8			\n\
> +	strd	r2, [%1], #8			\n\
> +	strd	r4, [%1], #8			\n\
>  	mcr	p15, 0, ip, c7, c10, 1		@ clean D line\n\
> -	ldrd	r2, [r0], #8			\n\
> +	ldrd	r2, [%0], #8			\n\
>  	mcr	p15, 0, ip, c7, c6, 1		@ invalidate D line\n\
> -	ldrd	r4, [r0], #8			\n\
> -	mov	ip, r1				\n\
> -	strd	r2, [r1], #8			\n\
> -	ldrd	r2, [r0], #8			\n\
> -	strd	r4, [r1], #8			\n\
> -	ldrd	r4, [r0], #8			\n\
> -	strd	r2, [r1], #8			\n\
> -	strd	r4, [r1], #8			\n\
> +	ldrd	r4, [%0], #8			\n\
> +	mov	ip, %1				\n\
> +	strd	r2, [%1], #8			\n\
> +	ldrd	r2, [%0], #8			\n\
> +	strd	r4, [%1], #8			\n\
> +	ldrd	r4, [%0], #8			\n\
> +	strd	r2, [%1], #8			\n\
> +	strd	r4, [%1], #8			\n\
>  	mcr	p15, 0, ip, c7, c10, 1		@ clean D line\n\
> -	subs	lr, lr, #1			\n\
> +	subs	%2, %2, #1			\n\
>  	mcr	p15, 0, ip, c7, c6, 1		@ invalidate D line\n\
>  	bgt	1b				\n\
> -	beq	2b				\n\
> -	ldmfd	sp!, {r4, r5, pc}		"
> -	:
> -	: "r" (from), "r" (to), "I" (PAGE_SIZE / 64 - 1));
> +	beq	2b				"
> +	: "+&r" (from), "+&r" (to), "=&r" (tmp)
> +	: "2" (PAGE_SIZE / 64 - 1)
> +	: "r2", "r3", "r4", "r5", "ip");
>  }
>  
>  void xscale_mc_copy_user_highpage(struct page *to, struct page *from,

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
  2018-11-07 16:27           ` Stefan Agner
@ 2018-11-07 16:58             ` Nicolas Pitre
  0 siblings, 0 replies; 21+ messages in thread
From: Nicolas Pitre @ 2018-11-07 16:58 UTC (permalink / raw)
  To: Stefan Agner
  Cc: Russell King - ARM Linux, Linus Walleij, Hans Ulli Kroll,
	Joel Stanley, Arnd Bergmann, Linux ARM, linux-kernel,
	Roman Yeryomin

On Wed, 7 Nov 2018, Stefan Agner wrote:

> On 06.11.2018 05:49, Nicolas Pitre wrote:
> > So here's the revised patch. It now has full compile-test coverage for 
> > real this time. Would you mind reviewing it again before I resubmit it 
> > please?
> 
> Compile tested all copypage implementations with the revised patch using
> Clang too, everything builds fine.
> 
> FWIW, I used this defconfigs:
> copypage-fa.c: moxart_defconfig
> copypage-feroceon.c: mvebu_v5_defconfig
> copypage-v4mc.c: h3600_defconfig+CONFIG_AEABI
> copypage-v4wb.c/v4wt.c: multi_v4t_defconfig
> copypage-xsc3.c/scale.c: pxa_defconfig-CONFIG_FTRACE
> 
> The changes look good to me:
> 
> Reviewed-by: Stefan Agner <stefan@agner.ch>

Thanks.  Submitted here:

http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8805/2


Nicolas

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-11-07 16:58 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-15 22:16 [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list Stefan Agner
2018-10-15 22:23 ` Russell King - ARM Linux
2018-10-15 22:39   ` Stefan Agner
2018-10-15 22:46     ` Russell King - ARM Linux
2018-10-15 22:52       ` Stefan Agner
2018-10-15 23:03         ` Russell King - ARM Linux
2018-10-16  8:00 ` Linus Walleij
2018-10-16  8:44   ` Russell King - ARM Linux
2018-10-16 11:35     ` Linus Walleij
2018-10-16 20:43     ` Nicolas Pitre
2018-10-16 21:59       ` Stefan Agner
2018-10-17  8:58       ` Arnd Bergmann
2018-10-17  9:04         ` [PATCH] [ALTERNATIVE] ARM: fix copypage functions for clang Arnd Bergmann
2018-10-17  9:35           ` Russell King - ARM Linux
2018-10-17 14:23         ` [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list Nicolas Pitre
2018-11-05 23:00       ` Stefan Agner
2018-11-06  4:49         ` Nicolas Pitre
2018-11-06 13:16           ` Robin Murphy
2018-11-06 13:25             ` Nicolas Pitre
2018-11-07 16:27           ` Stefan Agner
2018-11-07 16:58             ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).