All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] crypto: arm/sha256-neon - avoid ADRL pseudo instruction
@ 2020-09-15  9:46 Ard Biesheuvel
  2020-09-15 18:50 ` Nick Desaulniers
  0 siblings, 1 reply; 7+ messages in thread
From: Ard Biesheuvel @ 2020-09-15  9:46 UTC (permalink / raw)
  To: linux-crypto
  Cc: herbert, Ard Biesheuvel, Nick Desaulniers, Stefan Agner, Peter Smith

The ADRL pseudo instruction is not an architectural construct, but a
convenience macro that was supported by the ARM proprietary assembler
and adopted by binutils GAS as well, but only when assembling in 32-bit
ARM mode. Therefore, it can only be used in assembler code that is known
to assemble in ARM mode only, but as it turns out, the Clang assembler
does not implement ADRL at all, and so it is better to get rid of it
entirely.

So replace the ADRL instruction with a ADR instruction that refers to
a nearer symbol, and apply the delta explicitly using an additional
instruction.

Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Peter Smith <Peter.Smith@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
I will leave it to the Clang folks to decide whether this needs to be
backported and how far, but a Cc stable seems reasonable here.

 arch/arm/crypto/sha256-armv4.pl       | 4 ++--
 arch/arm/crypto/sha256-core.S_shipped | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/crypto/sha256-armv4.pl b/arch/arm/crypto/sha256-armv4.pl
index 9f96ff48e4a8..8aeb2e82f915 100644
--- a/arch/arm/crypto/sha256-armv4.pl
+++ b/arch/arm/crypto/sha256-armv4.pl
@@ -175,7 +175,6 @@ $code=<<___;
 #else
 .syntax unified
 # ifdef __thumb2__
-#  define adrl adr
 .thumb
 # else
 .code   32
@@ -471,7 +470,8 @@ sha256_block_data_order_neon:
 	stmdb	sp!,{r4-r12,lr}
 
 	sub	$H,sp,#16*4+16
-	adrl	$Ktbl,K256
+	adr	$Ktbl,.Lsha256_block_data_order
+	add	$Ktbl,$Ktbl,#K256-.Lsha256_block_data_order
 	bic	$H,$H,#15		@ align for 128-bit stores
 	mov	$t2,sp
 	mov	sp,$H			@ alloca
diff --git a/arch/arm/crypto/sha256-core.S_shipped b/arch/arm/crypto/sha256-core.S_shipped
index ea04b2ab0c33..1861c4e8a5ba 100644
--- a/arch/arm/crypto/sha256-core.S_shipped
+++ b/arch/arm/crypto/sha256-core.S_shipped
@@ -56,7 +56,6 @@
 #else
 .syntax unified
 # ifdef __thumb2__
-#  define adrl adr
 .thumb
 # else
 .code   32
@@ -1885,7 +1884,8 @@ sha256_block_data_order_neon:
 	stmdb	sp!,{r4-r12,lr}
 
 	sub	r11,sp,#16*4+16
-	adrl	r14,K256
+	adr	r14,.Lsha256_block_data_order
+	add	r14,r14,#K256-.Lsha256_block_data_order
 	bic	r11,r11,#15		@ align for 128-bit stores
 	mov	r12,sp
 	mov	sp,r11			@ alloca
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] crypto: arm/sha256-neon - avoid ADRL pseudo instruction
  2020-09-15  9:46 [PATCH] crypto: arm/sha256-neon - avoid ADRL pseudo instruction Ard Biesheuvel
@ 2020-09-15 18:50 ` Nick Desaulniers
  2020-09-15 21:31   ` Ard Biesheuvel
  0 siblings, 1 reply; 7+ messages in thread
From: Nick Desaulniers @ 2020-09-15 18:50 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Herbert Xu,
	Stefan Agner, Peter Smith

On Tue, Sep 15, 2020 at 2:46 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> The ADRL pseudo instruction is not an architectural construct, but a
> convenience macro that was supported by the ARM proprietary assembler
> and adopted by binutils GAS as well, but only when assembling in 32-bit
> ARM mode. Therefore, it can only be used in assembler code that is known
> to assemble in ARM mode only, but as it turns out, the Clang assembler
> does not implement ADRL at all, and so it is better to get rid of it
> entirely.
>
> So replace the ADRL instruction with a ADR instruction that refers to
> a nearer symbol, and apply the delta explicitly using an additional
> instruction.
>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Stefan Agner <stefan@agner.ch>
> Cc: Peter Smith <Peter.Smith@arm.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> I will leave it to the Clang folks to decide whether this needs to be
> backported and how far, but a Cc stable seems reasonable here.
>
>  arch/arm/crypto/sha256-armv4.pl       | 4 ++--
>  arch/arm/crypto/sha256-core.S_shipped | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm/crypto/sha256-armv4.pl b/arch/arm/crypto/sha256-armv4.pl
> index 9f96ff48e4a8..8aeb2e82f915 100644
> --- a/arch/arm/crypto/sha256-armv4.pl
> +++ b/arch/arm/crypto/sha256-armv4.pl
> @@ -175,7 +175,6 @@ $code=<<___;
>  #else
>  .syntax unified
>  # ifdef __thumb2__
> -#  define adrl adr
>  .thumb
>  # else
>  .code   32
> @@ -471,7 +470,8 @@ sha256_block_data_order_neon:
>         stmdb   sp!,{r4-r12,lr}
>
>         sub     $H,sp,#16*4+16
> -       adrl    $Ktbl,K256
> +       adr     $Ktbl,.Lsha256_block_data_order
> +       add     $Ktbl,$Ktbl,#K256-.Lsha256_block_data_order
>         bic     $H,$H,#15               @ align for 128-bit stores
>         mov     $t2,sp
>         mov     sp,$H                   @ alloca
> diff --git a/arch/arm/crypto/sha256-core.S_shipped b/arch/arm/crypto/sha256-core.S_shipped
> index ea04b2ab0c33..1861c4e8a5ba 100644
> --- a/arch/arm/crypto/sha256-core.S_shipped
> +++ b/arch/arm/crypto/sha256-core.S_shipped
> @@ -56,7 +56,6 @@
>  #else
>  .syntax unified
>  # ifdef __thumb2__
> -#  define adrl adr
>  .thumb
>  # else
>  .code   32
> @@ -1885,7 +1884,8 @@ sha256_block_data_order_neon:
>         stmdb   sp!,{r4-r12,lr}
>
>         sub     r11,sp,#16*4+16
> -       adrl    r14,K256
> +       adr     r14,.Lsha256_block_data_order
> +       add     r14,r14,#K256-.Lsha256_block_data_order

Hi Ard,
Thanks for the patch.  With this patch applied:

$ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 LLVM_IAS=1
-j71 defconfig
$ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 LLVM_IAS=1 -j71
...
arch/arm/crypto/sha256-core.S:2038:2: error: out of range immediate fixup value
 add r14,r14,#K256-.Lsha256_block_data_order
 ^

:(

Would the adr_l macro you wrote in
https://lore.kernel.org/linux-arm-kernel/nycvar.YSQ.7.78.906.2009141003360.4095746@knanqh.ubzr/T/#t
be helpful here?

>         bic     r11,r11,#15             @ align for 128-bit stores
>         mov     r12,sp
>         mov     sp,r11                  @ alloca
> --
> 2.17.1
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] crypto: arm/sha256-neon - avoid ADRL pseudo instruction
  2020-09-15 18:50 ` Nick Desaulniers
@ 2020-09-15 21:31   ` Ard Biesheuvel
  2020-09-15 23:55     ` Nick Desaulniers
  2020-09-16  7:35     ` Stefan Agner
  0 siblings, 2 replies; 7+ messages in thread
From: Ard Biesheuvel @ 2020-09-15 21:31 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Herbert Xu,
	Stefan Agner, Peter Smith

On Tue, 15 Sep 2020 at 21:50, Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Tue, Sep 15, 2020 at 2:46 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > The ADRL pseudo instruction is not an architectural construct, but a
> > convenience macro that was supported by the ARM proprietary assembler
> > and adopted by binutils GAS as well, but only when assembling in 32-bit
> > ARM mode. Therefore, it can only be used in assembler code that is known
> > to assemble in ARM mode only, but as it turns out, the Clang assembler
> > does not implement ADRL at all, and so it is better to get rid of it
> > entirely.
> >
> > So replace the ADRL instruction with a ADR instruction that refers to
> > a nearer symbol, and apply the delta explicitly using an additional
> > instruction.
> >
> > Cc: Nick Desaulniers <ndesaulniers@google.com>
> > Cc: Stefan Agner <stefan@agner.ch>
> > Cc: Peter Smith <Peter.Smith@arm.com>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > I will leave it to the Clang folks to decide whether this needs to be
> > backported and how far, but a Cc stable seems reasonable here.
> >
> >  arch/arm/crypto/sha256-armv4.pl       | 4 ++--
> >  arch/arm/crypto/sha256-core.S_shipped | 4 ++--
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm/crypto/sha256-armv4.pl b/arch/arm/crypto/sha256-armv4.pl
> > index 9f96ff48e4a8..8aeb2e82f915 100644
> > --- a/arch/arm/crypto/sha256-armv4.pl
> > +++ b/arch/arm/crypto/sha256-armv4.pl
> > @@ -175,7 +175,6 @@ $code=<<___;
> >  #else
> >  .syntax unified
> >  # ifdef __thumb2__
> > -#  define adrl adr
> >  .thumb
> >  # else
> >  .code   32
> > @@ -471,7 +470,8 @@ sha256_block_data_order_neon:
> >         stmdb   sp!,{r4-r12,lr}
> >
> >         sub     $H,sp,#16*4+16
> > -       adrl    $Ktbl,K256
> > +       adr     $Ktbl,.Lsha256_block_data_order
> > +       add     $Ktbl,$Ktbl,#K256-.Lsha256_block_data_order
> >         bic     $H,$H,#15               @ align for 128-bit stores
> >         mov     $t2,sp
> >         mov     sp,$H                   @ alloca
> > diff --git a/arch/arm/crypto/sha256-core.S_shipped b/arch/arm/crypto/sha256-core.S_shipped
> > index ea04b2ab0c33..1861c4e8a5ba 100644
> > --- a/arch/arm/crypto/sha256-core.S_shipped
> > +++ b/arch/arm/crypto/sha256-core.S_shipped
> > @@ -56,7 +56,6 @@
> >  #else
> >  .syntax unified
> >  # ifdef __thumb2__
> > -#  define adrl adr
> >  .thumb
> >  # else
> >  .code   32
> > @@ -1885,7 +1884,8 @@ sha256_block_data_order_neon:
> >         stmdb   sp!,{r4-r12,lr}
> >
> >         sub     r11,sp,#16*4+16
> > -       adrl    r14,K256
> > +       adr     r14,.Lsha256_block_data_order
> > +       add     r14,r14,#K256-.Lsha256_block_data_order
>
> Hi Ard,
> Thanks for the patch.  With this patch applied:
>
> $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 LLVM_IAS=1
> -j71 defconfig
> $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 LLVM_IAS=1 -j71
> ...
> arch/arm/crypto/sha256-core.S:2038:2: error: out of range immediate fixup value
>  add r14,r14,#K256-.Lsha256_block_data_order
>  ^
>
> :(
>

Strange. Could you change it to

sub r14,r14,#.Lsha256_block_data_order-K256

and try again?

If that does work, it means the Clang assembler does not update the
instruction type for negative addends (add to sub in this case), which
would be unfortunate, since it would be another functionality gap.



> Would the adr_l macro you wrote in
> https://lore.kernel.org/linux-arm-kernel/nycvar.YSQ.7.78.906.2009141003360.4095746@knanqh.ubzr/T/#t
> be helpful here?
>
> >         bic     r11,r11,#15             @ align for 128-bit stores
> >         mov     r12,sp
> >         mov     sp,r11                  @ alloca
> > --
> > 2.17.1
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] crypto: arm/sha256-neon - avoid ADRL pseudo instruction
  2020-09-15 21:31   ` Ard Biesheuvel
@ 2020-09-15 23:55     ` Nick Desaulniers
  2020-09-16  5:39       ` Ard Biesheuvel
  2020-09-16  7:35     ` Stefan Agner
  1 sibling, 1 reply; 7+ messages in thread
From: Nick Desaulniers @ 2020-09-15 23:55 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Herbert Xu,
	Stefan Agner, Peter Smith, Jian Cai, clang-built-linux

On Tue, Sep 15, 2020 at 2:32 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Tue, 15 Sep 2020 at 21:50, Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > On Tue, Sep 15, 2020 at 2:46 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > The ADRL pseudo instruction is not an architectural construct, but a
> > > convenience macro that was supported by the ARM proprietary assembler
> > > and adopted by binutils GAS as well, but only when assembling in 32-bit
> > > ARM mode. Therefore, it can only be used in assembler code that is known
> > > to assemble in ARM mode only, but as it turns out, the Clang assembler
> > > does not implement ADRL at all, and so it is better to get rid of it
> > > entirely.
> > >
> > > So replace the ADRL instruction with a ADR instruction that refers to
> > > a nearer symbol, and apply the delta explicitly using an additional
> > > instruction.
> > >
> > > Cc: Nick Desaulniers <ndesaulniers@google.com>
> > > Cc: Stefan Agner <stefan@agner.ch>
> > > Cc: Peter Smith <Peter.Smith@arm.com>
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > > I will leave it to the Clang folks to decide whether this needs to be
> > > backported and how far, but a Cc stable seems reasonable here.
> > >
> > >  arch/arm/crypto/sha256-armv4.pl       | 4 ++--
> > >  arch/arm/crypto/sha256-core.S_shipped | 4 ++--
> > >  2 files changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/arm/crypto/sha256-armv4.pl b/arch/arm/crypto/sha256-armv4.pl
> > > index 9f96ff48e4a8..8aeb2e82f915 100644
> > > --- a/arch/arm/crypto/sha256-armv4.pl
> > > +++ b/arch/arm/crypto/sha256-armv4.pl
> > > @@ -175,7 +175,6 @@ $code=<<___;
> > >  #else
> > >  .syntax unified
> > >  # ifdef __thumb2__
> > > -#  define adrl adr
> > >  .thumb
> > >  # else
> > >  .code   32
> > > @@ -471,7 +470,8 @@ sha256_block_data_order_neon:
> > >         stmdb   sp!,{r4-r12,lr}
> > >
> > >         sub     $H,sp,#16*4+16
> > > -       adrl    $Ktbl,K256
> > > +       adr     $Ktbl,.Lsha256_block_data_order
> > > +       add     $Ktbl,$Ktbl,#K256-.Lsha256_block_data_order
> > >         bic     $H,$H,#15               @ align for 128-bit stores
> > >         mov     $t2,sp
> > >         mov     sp,$H                   @ alloca
> > > diff --git a/arch/arm/crypto/sha256-core.S_shipped b/arch/arm/crypto/sha256-core.S_shipped
> > > index ea04b2ab0c33..1861c4e8a5ba 100644
> > > --- a/arch/arm/crypto/sha256-core.S_shipped
> > > +++ b/arch/arm/crypto/sha256-core.S_shipped
> > > @@ -56,7 +56,6 @@
> > >  #else
> > >  .syntax unified
> > >  # ifdef __thumb2__
> > > -#  define adrl adr
> > >  .thumb
> > >  # else
> > >  .code   32
> > > @@ -1885,7 +1884,8 @@ sha256_block_data_order_neon:
> > >         stmdb   sp!,{r4-r12,lr}
> > >
> > >         sub     r11,sp,#16*4+16
> > > -       adrl    r14,K256
> > > +       adr     r14,.Lsha256_block_data_order
> > > +       add     r14,r14,#K256-.Lsha256_block_data_order
> >
> > Hi Ard,
> > Thanks for the patch.  With this patch applied:
> >
> > $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 LLVM_IAS=1
> > -j71 defconfig
> > $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 LLVM_IAS=1 -j71
> > ...
> > arch/arm/crypto/sha256-core.S:2038:2: error: out of range immediate fixup value
> >  add r14,r14,#K256-.Lsha256_block_data_order
> >  ^
> >
> > :(
> >
>
> Strange. Could you change it to
>
> sub r14,r14,#.Lsha256_block_data_order-K256
>
> and try again?
>
> If that does work, it means the Clang assembler does not update the
> instruction type for negative addends (add to sub in this case), which
> would be unfortunate, since it would be another functionality gap.

Works.  Can you describe the expected functionality a bit more, so we
can come up with a bug report/test case?  (an `add` with a negative
operand should be converted to a `sub` with a positive operand, IIUC?)

Also, there's a similar adrl in arch/arm/crypto/sha512-core.S, err, is
that generated?
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] crypto: arm/sha256-neon - avoid ADRL pseudo instruction
  2020-09-15 23:55     ` Nick Desaulniers
@ 2020-09-16  5:39       ` Ard Biesheuvel
  0 siblings, 0 replies; 7+ messages in thread
From: Ard Biesheuvel @ 2020-09-16  5:39 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Herbert Xu,
	Stefan Agner, Peter Smith, Jian Cai, clang-built-linux

On Wed, 16 Sep 2020 at 02:55, Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Tue, Sep 15, 2020 at 2:32 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Tue, 15 Sep 2020 at 21:50, Nick Desaulniers <ndesaulniers@google.com> wrote:
> > >
> > > On Tue, Sep 15, 2020 at 2:46 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >
> > > > The ADRL pseudo instruction is not an architectural construct, but a
> > > > convenience macro that was supported by the ARM proprietary assembler
> > > > and adopted by binutils GAS as well, but only when assembling in 32-bit
> > > > ARM mode. Therefore, it can only be used in assembler code that is known
> > > > to assemble in ARM mode only, but as it turns out, the Clang assembler
> > > > does not implement ADRL at all, and so it is better to get rid of it
> > > > entirely.
> > > >
> > > > So replace the ADRL instruction with a ADR instruction that refers to
> > > > a nearer symbol, and apply the delta explicitly using an additional
> > > > instruction.
> > > >
> > > > Cc: Nick Desaulniers <ndesaulniers@google.com>
> > > > Cc: Stefan Agner <stefan@agner.ch>
> > > > Cc: Peter Smith <Peter.Smith@arm.com>
> > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > ---
> > > > I will leave it to the Clang folks to decide whether this needs to be
> > > > backported and how far, but a Cc stable seems reasonable here.
> > > >
> > > >  arch/arm/crypto/sha256-armv4.pl       | 4 ++--
> > > >  arch/arm/crypto/sha256-core.S_shipped | 4 ++--
> > > >  2 files changed, 4 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/arch/arm/crypto/sha256-armv4.pl b/arch/arm/crypto/sha256-armv4.pl
> > > > index 9f96ff48e4a8..8aeb2e82f915 100644
> > > > --- a/arch/arm/crypto/sha256-armv4.pl
> > > > +++ b/arch/arm/crypto/sha256-armv4.pl
> > > > @@ -175,7 +175,6 @@ $code=<<___;
> > > >  #else
> > > >  .syntax unified
> > > >  # ifdef __thumb2__
> > > > -#  define adrl adr
> > > >  .thumb
> > > >  # else
> > > >  .code   32
> > > > @@ -471,7 +470,8 @@ sha256_block_data_order_neon:
> > > >         stmdb   sp!,{r4-r12,lr}
> > > >
> > > >         sub     $H,sp,#16*4+16
> > > > -       adrl    $Ktbl,K256
> > > > +       adr     $Ktbl,.Lsha256_block_data_order
> > > > +       add     $Ktbl,$Ktbl,#K256-.Lsha256_block_data_order
> > > >         bic     $H,$H,#15               @ align for 128-bit stores
> > > >         mov     $t2,sp
> > > >         mov     sp,$H                   @ alloca
> > > > diff --git a/arch/arm/crypto/sha256-core.S_shipped b/arch/arm/crypto/sha256-core.S_shipped
> > > > index ea04b2ab0c33..1861c4e8a5ba 100644
> > > > --- a/arch/arm/crypto/sha256-core.S_shipped
> > > > +++ b/arch/arm/crypto/sha256-core.S_shipped
> > > > @@ -56,7 +56,6 @@
> > > >  #else
> > > >  .syntax unified
> > > >  # ifdef __thumb2__
> > > > -#  define adrl adr
> > > >  .thumb
> > > >  # else
> > > >  .code   32
> > > > @@ -1885,7 +1884,8 @@ sha256_block_data_order_neon:
> > > >         stmdb   sp!,{r4-r12,lr}
> > > >
> > > >         sub     r11,sp,#16*4+16
> > > > -       adrl    r14,K256
> > > > +       adr     r14,.Lsha256_block_data_order
> > > > +       add     r14,r14,#K256-.Lsha256_block_data_order
> > >
> > > Hi Ard,
> > > Thanks for the patch.  With this patch applied:
> > >
> > > $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 LLVM_IAS=1
> > > -j71 defconfig
> > > $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 LLVM_IAS=1 -j71
> > > ...
> > > arch/arm/crypto/sha256-core.S:2038:2: error: out of range immediate fixup value
> > >  add r14,r14,#K256-.Lsha256_block_data_order
> > >  ^
> > >
> > > :(
> > >
> >
> > Strange. Could you change it to
> >
> > sub r14,r14,#.Lsha256_block_data_order-K256
> >
> > and try again?
> >
> > If that does work, it means the Clang assembler does not update the
> > instruction type for negative addends (add to sub in this case), which
> > would be unfortunate, since it would be another functionality gap.
>
> Works.  Can you describe the expected functionality a bit more, so we
> can come up with a bug report/test case?  (an `add` with a negative
> operand should be converted to a `sub` with a positive operand, IIUC?)
>

That is it, really. Not sure if this is laid out in a spec anywhere,
although the ELF psABI for ARM covers some similar territory when it
comes to turning add into sub instructions and vice versa, as well as
manipulating the U bit of LDR instructions.

> Also, there's a similar adrl in arch/arm/crypto/sha512-core.S, err, is
> that generated?

Indeed. I missed that one as it has been removed from the upstream
OpenSSL version, but I'll add a fix there as well.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] crypto: arm/sha256-neon - avoid ADRL pseudo instruction
  2020-09-15 21:31   ` Ard Biesheuvel
  2020-09-15 23:55     ` Nick Desaulniers
@ 2020-09-16  7:35     ` Stefan Agner
  2020-09-16  9:58       ` Ard Biesheuvel
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Agner @ 2020-09-16  7:35 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nick Desaulniers,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Herbert Xu,
	Peter Smith

On 2020-09-15 23:31, Ard Biesheuvel wrote:
> On Tue, 15 Sep 2020 at 21:50, Nick Desaulniers <ndesaulniers@google.com> wrote:
>>
>> On Tue, Sep 15, 2020 at 2:46 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>> >
>> > The ADRL pseudo instruction is not an architectural construct, but a
>> > convenience macro that was supported by the ARM proprietary assembler
>> > and adopted by binutils GAS as well, but only when assembling in 32-bit
>> > ARM mode. Therefore, it can only be used in assembler code that is known
>> > to assemble in ARM mode only, but as it turns out, the Clang assembler
>> > does not implement ADRL at all, and so it is better to get rid of it
>> > entirely.
>> >
>> > So replace the ADRL instruction with a ADR instruction that refers to
>> > a nearer symbol, and apply the delta explicitly using an additional
>> > instruction.
>> >
>> > Cc: Nick Desaulniers <ndesaulniers@google.com>
>> > Cc: Stefan Agner <stefan@agner.ch>
>> > Cc: Peter Smith <Peter.Smith@arm.com>
>> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>> > ---
>> > I will leave it to the Clang folks to decide whether this needs to be
>> > backported and how far, but a Cc stable seems reasonable here.
>> >
>> >  arch/arm/crypto/sha256-armv4.pl       | 4 ++--
>> >  arch/arm/crypto/sha256-core.S_shipped | 4 ++--
>> >  2 files changed, 4 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/arch/arm/crypto/sha256-armv4.pl b/arch/arm/crypto/sha256-armv4.pl
>> > index 9f96ff48e4a8..8aeb2e82f915 100644
>> > --- a/arch/arm/crypto/sha256-armv4.pl
>> > +++ b/arch/arm/crypto/sha256-armv4.pl
>> > @@ -175,7 +175,6 @@ $code=<<___;
>> >  #else
>> >  .syntax unified
>> >  # ifdef __thumb2__
>> > -#  define adrl adr
>> >  .thumb
>> >  # else
>> >  .code   32
>> > @@ -471,7 +470,8 @@ sha256_block_data_order_neon:
>> >         stmdb   sp!,{r4-r12,lr}
>> >
>> >         sub     $H,sp,#16*4+16
>> > -       adrl    $Ktbl,K256
>> > +       adr     $Ktbl,.Lsha256_block_data_order
>> > +       add     $Ktbl,$Ktbl,#K256-.Lsha256_block_data_order
>> >         bic     $H,$H,#15               @ align for 128-bit stores
>> >         mov     $t2,sp
>> >         mov     sp,$H                   @ alloca
>> > diff --git a/arch/arm/crypto/sha256-core.S_shipped b/arch/arm/crypto/sha256-core.S_shipped
>> > index ea04b2ab0c33..1861c4e8a5ba 100644
>> > --- a/arch/arm/crypto/sha256-core.S_shipped
>> > +++ b/arch/arm/crypto/sha256-core.S_shipped
>> > @@ -56,7 +56,6 @@
>> >  #else
>> >  .syntax unified
>> >  # ifdef __thumb2__
>> > -#  define adrl adr
>> >  .thumb
>> >  # else
>> >  .code   32
>> > @@ -1885,7 +1884,8 @@ sha256_block_data_order_neon:
>> >         stmdb   sp!,{r4-r12,lr}
>> >
>> >         sub     r11,sp,#16*4+16
>> > -       adrl    r14,K256
>> > +       adr     r14,.Lsha256_block_data_order
>> > +       add     r14,r14,#K256-.Lsha256_block_data_order
>>
>> Hi Ard,
>> Thanks for the patch.  With this patch applied:
>>
>> $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 LLVM_IAS=1
>> -j71 defconfig
>> $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 LLVM_IAS=1 -j71
>> ...
>> arch/arm/crypto/sha256-core.S:2038:2: error: out of range immediate fixup value
>>  add r14,r14,#K256-.Lsha256_block_data_order
>>  ^
>>
>> :(
>>
> 
> Strange. Could you change it to
> 
> sub r14,r14,#.Lsha256_block_data_order-K256
> 
> and try again?
> 
> If that does work, it means the Clang assembler does not update the
> instruction type for negative addends (add to sub in this case), which
> would be unfortunate, since it would be another functionality gap.

Hm interesting, I did not come across another instance where this was a
problem.

In this particular case, is it guaranteed to be a subtraction? I guess
then using sub for now would be fine...?

FWIW, we discussed possible solution also in this issue
(mach-omap2/sleep34xx.S case is handled already):
https://github.com/ClangBuiltLinux/linux/issues/430

--
Stefan

> 
> 
> 
>> Would the adr_l macro you wrote in
>> https://lore.kernel.org/linux-arm-kernel/nycvar.YSQ.7.78.906.2009141003360.4095746@knanqh.ubzr/T/#t
>> be helpful here?
>>
>> >         bic     r11,r11,#15             @ align for 128-bit stores
>> >         mov     r12,sp
>> >         mov     sp,r11                  @ alloca
>> > --
>> > 2.17.1
>> >
>>
>>
>> --
>> Thanks,
>> ~Nick Desaulniers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] crypto: arm/sha256-neon - avoid ADRL pseudo instruction
  2020-09-16  7:35     ` Stefan Agner
@ 2020-09-16  9:58       ` Ard Biesheuvel
  0 siblings, 0 replies; 7+ messages in thread
From: Ard Biesheuvel @ 2020-09-16  9:58 UTC (permalink / raw)
  To: Stefan Agner
  Cc: Nick Desaulniers,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Herbert Xu,
	Peter Smith

On Wed, 16 Sep 2020 at 10:45, Stefan Agner <stefan@agner.ch> wrote:
>
> On 2020-09-15 23:31, Ard Biesheuvel wrote:
> > On Tue, 15 Sep 2020 at 21:50, Nick Desaulniers <ndesaulniers@google.com> wrote:
> >>
> >> On Tue, Sep 15, 2020 at 2:46 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >> >
> >> > The ADRL pseudo instruction is not an architectural construct, but a
> >> > convenience macro that was supported by the ARM proprietary assembler
> >> > and adopted by binutils GAS as well, but only when assembling in 32-bit
> >> > ARM mode. Therefore, it can only be used in assembler code that is known
> >> > to assemble in ARM mode only, but as it turns out, the Clang assembler
> >> > does not implement ADRL at all, and so it is better to get rid of it
> >> > entirely.
> >> >
> >> > So replace the ADRL instruction with a ADR instruction that refers to
> >> > a nearer symbol, and apply the delta explicitly using an additional
> >> > instruction.
> >> >
> >> > Cc: Nick Desaulniers <ndesaulniers@google.com>
> >> > Cc: Stefan Agner <stefan@agner.ch>
> >> > Cc: Peter Smith <Peter.Smith@arm.com>
> >> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> >> > ---
> >> > I will leave it to the Clang folks to decide whether this needs to be
> >> > backported and how far, but a Cc stable seems reasonable here.
> >> >
> >> >  arch/arm/crypto/sha256-armv4.pl       | 4 ++--
> >> >  arch/arm/crypto/sha256-core.S_shipped | 4 ++--
> >> >  2 files changed, 4 insertions(+), 4 deletions(-)
> >> >
> >> > diff --git a/arch/arm/crypto/sha256-armv4.pl b/arch/arm/crypto/sha256-armv4.pl
> >> > index 9f96ff48e4a8..8aeb2e82f915 100644
> >> > --- a/arch/arm/crypto/sha256-armv4.pl
> >> > +++ b/arch/arm/crypto/sha256-armv4.pl
> >> > @@ -175,7 +175,6 @@ $code=<<___;
> >> >  #else
> >> >  .syntax unified
> >> >  # ifdef __thumb2__
> >> > -#  define adrl adr
> >> >  .thumb
> >> >  # else
> >> >  .code   32
> >> > @@ -471,7 +470,8 @@ sha256_block_data_order_neon:
> >> >         stmdb   sp!,{r4-r12,lr}
> >> >
> >> >         sub     $H,sp,#16*4+16
> >> > -       adrl    $Ktbl,K256
> >> > +       adr     $Ktbl,.Lsha256_block_data_order
> >> > +       add     $Ktbl,$Ktbl,#K256-.Lsha256_block_data_order
> >> >         bic     $H,$H,#15               @ align for 128-bit stores
> >> >         mov     $t2,sp
> >> >         mov     sp,$H                   @ alloca
> >> > diff --git a/arch/arm/crypto/sha256-core.S_shipped b/arch/arm/crypto/sha256-core.S_shipped
> >> > index ea04b2ab0c33..1861c4e8a5ba 100644
> >> > --- a/arch/arm/crypto/sha256-core.S_shipped
> >> > +++ b/arch/arm/crypto/sha256-core.S_shipped
> >> > @@ -56,7 +56,6 @@
> >> >  #else
> >> >  .syntax unified
> >> >  # ifdef __thumb2__
> >> > -#  define adrl adr
> >> >  .thumb
> >> >  # else
> >> >  .code   32
> >> > @@ -1885,7 +1884,8 @@ sha256_block_data_order_neon:
> >> >         stmdb   sp!,{r4-r12,lr}
> >> >
> >> >         sub     r11,sp,#16*4+16
> >> > -       adrl    r14,K256
> >> > +       adr     r14,.Lsha256_block_data_order
> >> > +       add     r14,r14,#K256-.Lsha256_block_data_order
> >>
> >> Hi Ard,
> >> Thanks for the patch.  With this patch applied:
> >>
> >> $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 LLVM_IAS=1
> >> -j71 defconfig
> >> $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 LLVM_IAS=1 -j71
> >> ...
> >> arch/arm/crypto/sha256-core.S:2038:2: error: out of range immediate fixup value
> >>  add r14,r14,#K256-.Lsha256_block_data_order
> >>  ^
> >>
> >> :(
> >>
> >
> > Strange. Could you change it to
> >
> > sub r14,r14,#.Lsha256_block_data_order-K256
> >
> > and try again?
> >
> > If that does work, it means the Clang assembler does not update the
> > instruction type for negative addends (add to sub in this case), which
> > would be unfortunate, since it would be another functionality gap.
>
> Hm interesting, I did not come across another instance where this was a
> problem.
>
> In this particular case, is it guaranteed to be a subtraction? I guess
> then using sub for now would be fine...?
>

Yes for this code it is fine.

> FWIW, we discussed possible solution also in this issue
> (mach-omap2/sleep34xx.S case is handled already):
> https://github.com/ClangBuiltLinux/linux/issues/430
>
> --
> Stefan
>
> >
> >
> >
> >> Would the adr_l macro you wrote in
> >> https://lore.kernel.org/linux-arm-kernel/nycvar.YSQ.7.78.906.2009141003360.4095746@knanqh.ubzr/T/#t
> >> be helpful here?
> >>
> >> >         bic     r11,r11,#15             @ align for 128-bit stores
> >> >         mov     r12,sp
> >> >         mov     sp,r11                  @ alloca
> >> > --
> >> > 2.17.1
> >> >
> >>
> >>
> >> --
> >> Thanks,
> >> ~Nick Desaulniers

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-09-16  9:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-15  9:46 [PATCH] crypto: arm/sha256-neon - avoid ADRL pseudo instruction Ard Biesheuvel
2020-09-15 18:50 ` Nick Desaulniers
2020-09-15 21:31   ` Ard Biesheuvel
2020-09-15 23:55     ` Nick Desaulniers
2020-09-16  5:39       ` Ard Biesheuvel
2020-09-16  7:35     ` Stefan Agner
2020-09-16  9:58       ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.