git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] block-sha1: remove use of assembly
@ 2022-03-07 23:25 brian m. carlson
  2022-03-07 23:32 ` Taylor Blau
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: brian m. carlson @ 2022-03-07 23:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Taylor Blau

In the block SHA-1 code, we have special assembly code for i386 and
amd64 to perform rotations with assembly.  This is supposed to help pick
the correct rotation operation depending on which rotation is smaller,
which can help some systems perform slightly better, since any circular
rotation can be specified as either a rotate left or a rotate right.
However, this isn't needed, so we should remove it.

First, SHA-1, like SHA-2, uses fixed constant rotates.  Thus, all
rotation amounts are known at compile time and are in fact baked into
the code.  Fortunately, peephole optimizers recognize rotations
specified in the normal way and automatically emit the correct code,
including a preference for choosing a rotate left versus a rotate right.
This has been the case for well over a decade, and is a standard example
of the utility of a peephole optimizer.

Moreover, all modern CPUs, with the exception of extremely limited
embedded CPUs such as some Cortex-M processors, provide a barrel
shifter, which lets the CPU perform rotates of any bit amount in
constant time.  This is valuable for many cryptographic algorithms to
improve performance, and is required to prevent timing attacks in
algorithms which use data-dependent rotations (which don't include the
hash algorithms we use).  As a result, even though the compiler does the
correct optimization, it isn't even needed here and either a left or a
right rotate is equally acceptable.

In fact, the SHA-256 code already takes this into account and just
writes the simple code using an inline function to let the compiler
optimize it for us.

The downside of using this code, however, is that it uses a GCC
extension, which makes the compiler complain when using -pedantic unless
it's prefixed with __extension__.  We could fix that, but since it's
not needed, let's just remove it.  We haven't noticed this because
almost everyone uses the SHA1DC code instead, but it still shows up for
some people.
---
Normally here, I would say, just use the SHA1DC code.  However, at
GitHub, this is wired up to a special piece of code in our Git
implementation that doesn't process attacker controlled data but needs
to persist its internal hash state (to avoid needing to rehash a
potentially large file, a log of operations performed, to which data is
only appended).

Thus, I'm sending in a fix here to avoid the warning, even though it's
my ultimate intention to remove it in favor of the SHA1DC code in the
future.

 block-sha1/sha1.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
index 1bb6e7c069..5974cd7dd3 100644
--- a/block-sha1/sha1.c
+++ b/block-sha1/sha1.c
@@ -11,27 +11,10 @@
 
 #include "sha1.h"
 
-#if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
-
-/*
- * Force usage of rol or ror by selecting the one with the smaller constant.
- * It _can_ generate slightly smaller code (a constant of 1 is special), but
- * perhaps more importantly it's possibly faster on any uarch that does a
- * rotate with a loop.
- */
-
-#define SHA_ASM(op, x, n) ({ unsigned int __res; __asm__(op " %1,%0":"=r" (__res):"i" (n), "0" (x)); __res; })
-#define SHA_ROL(x,n)	SHA_ASM("rol", x, n)
-#define SHA_ROR(x,n)	SHA_ASM("ror", x, n)
-
-#else
-
 #define SHA_ROT(X,l,r)	(((X) << (l)) | ((X) >> (r)))
 #define SHA_ROL(X,n)	SHA_ROT(X,n,32-(n))
 #define SHA_ROR(X,n)	SHA_ROT(X,32-(n),n)
 
-#endif
-
 /*
  * If you have 32 registers or more, the compiler can (and should)
  * try to change the array[] accesses into registers. However, on

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] block-sha1: remove use of assembly
  2022-03-07 23:25 [PATCH] block-sha1: remove use of assembly brian m. carlson
@ 2022-03-07 23:32 ` Taylor Blau
  2022-03-08  2:20   ` brian m. carlson
  2022-03-08  2:22 ` [PATCH v2] " brian m. carlson
  2022-03-10 17:47 ` [PATCH v3] block-sha1: remove use of obsolete x86 assembly brian m. carlson
  2 siblings, 1 reply; 12+ messages in thread
From: Taylor Blau @ 2022-03-07 23:32 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Junio C Hamano, Taylor Blau

On Mon, Mar 07, 2022 at 11:25:52PM +0000, brian m. carlson wrote:
> In the block SHA-1 code, we have special assembly code for i386 and
> amd64 to perform rotations with assembly.  This is supposed to help pick
> the correct rotation operation depending on which rotation is smaller,
> which can help some systems perform slightly better, since any circular
> rotation can be specified as either a rotate left or a rotate right.
> However, this isn't needed, so we should remove it.

At -O1 or higher (at least on GCC) this optimization is indeed
performed. Here's a Godbolt example that shows this:

    https://godbolt.org/z/9zMP93hr1

so I agree that this code isn't helping us at all. And in the
meantime...

> The downside of using this code, however, is that it uses a GCC
> extension, which makes the compiler complain when using -pedantic unless
> it's prefixed with __extension__.  We could fix that, but since it's
> not needed, let's just remove it.  We haven't noticed this because
> almost everyone uses the SHA1DC code instead, but it still shows up for
> some people.

...it makes it impossible to compile git if you have
`BLK_SHA1=YesPlease` and `DEVELOPER=1` in your environment. So I am
happy to see this go.

On another note: missing Signed-off-by?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] block-sha1: remove use of assembly
  2022-03-07 23:32 ` Taylor Blau
@ 2022-03-08  2:20   ` brian m. carlson
  0 siblings, 0 replies; 12+ messages in thread
From: brian m. carlson @ 2022-03-08  2:20 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1866 bytes --]

On 2022-03-07 at 23:32:42, Taylor Blau wrote:
> On Mon, Mar 07, 2022 at 11:25:52PM +0000, brian m. carlson wrote:
> > In the block SHA-1 code, we have special assembly code for i386 and
> > amd64 to perform rotations with assembly.  This is supposed to help pick
> > the correct rotation operation depending on which rotation is smaller,
> > which can help some systems perform slightly better, since any circular
> > rotation can be specified as either a rotate left or a rotate right.
> > However, this isn't needed, so we should remove it.
> 
> At -O1 or higher (at least on GCC) this optimization is indeed
> performed. Here's a Godbolt example that shows this:
> 
>     https://godbolt.org/z/9zMP93hr1
> 
> so I agree that this code isn't helping us at all. And in the
> meantime...

Thanks for providing a link.  I also did a similar test there with
slightly different code (and unfortunately closed the window before
saving the link) but it demonstrated the same thing: that the compiler
can optimize this case adequately.  My (substantially) past experience
with testing GCC in this case has shown the same thing.

> > The downside of using this code, however, is that it uses a GCC
> > extension, which makes the compiler complain when using -pedantic unless
> > it's prefixed with __extension__.  We could fix that, but since it's
> > not needed, let's just remove it.  We haven't noticed this because
> > almost everyone uses the SHA1DC code instead, but it still shows up for
> > some people.
> 
> ...it makes it impossible to compile git if you have
> `BLK_SHA1=YesPlease` and `DEVELOPER=1` in your environment. So I am
> happy to see this go.
> 
> On another note: missing Signed-off-by?

I'll send an otherwise unchanged v2 with that in a second.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] block-sha1: remove use of assembly
  2022-03-07 23:25 [PATCH] block-sha1: remove use of assembly brian m. carlson
  2022-03-07 23:32 ` Taylor Blau
@ 2022-03-08  2:22 ` brian m. carlson
  2022-03-08 13:38   ` Ævar Arnfjörð Bjarmason
  2022-03-10 17:47 ` [PATCH v3] block-sha1: remove use of obsolete x86 assembly brian m. carlson
  2 siblings, 1 reply; 12+ messages in thread
From: brian m. carlson @ 2022-03-08  2:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Taylor Blau

In the block SHA-1 code, we have special assembly code for i386 and
amd64 to perform rotations with assembly.  This is supposed to help pick
the correct rotation operation depending on which rotation is smaller,
which can help some systems perform slightly better, since any circular
rotation can be specified as either a rotate left or a rotate right.
However, this isn't needed, so we should remove it.

First, SHA-1, like SHA-2, uses fixed constant rotates.  Thus, all
rotation amounts are known at compile time and are in fact baked into
the code.  Fortunately, peephole optimizers recognize rotations
specified in the normal way and automatically emit the correct code,
including a preference for choosing a rotate left versus a rotate right.
This has been the case for well over a decade, and is a standard example
of the utility of a peephole optimizer.

Moreover, all modern CPUs, with the exception of extremely limited
embedded CPUs such as some Cortex-M processors, provide a barrel
shifter, which lets the CPU perform rotates of any bit amount in
constant time.  This is valuable for many cryptographic algorithms to
improve performance, and is required to prevent timing attacks in
algorithms which use data-dependent rotations (which don't include the
hash algorithms we use).  As a result, even though the compiler does the
correct optimization, it isn't even needed here and either a left or a
right rotate is equally acceptable.

In fact, the SHA-256 code already takes this into account and just
writes the simple code using an inline function to let the compiler
optimize it for us.

The downside of using this code, however, is that it uses a GCC
extension, which makes the compiler complain when using -pedantic unless
it's prefixed with __extension__.  We could fix that, but since it's
not needed, let's just remove it.  We haven't noticed this because
almost everyone uses the SHA1DC code instead, but it still shows up for
some people.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Changes from v1:
* Add sign-off.

 block-sha1/sha1.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
index 1bb6e7c069..5974cd7dd3 100644
--- a/block-sha1/sha1.c
+++ b/block-sha1/sha1.c
@@ -11,27 +11,10 @@
 
 #include "sha1.h"
 
-#if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
-
-/*
- * Force usage of rol or ror by selecting the one with the smaller constant.
- * It _can_ generate slightly smaller code (a constant of 1 is special), but
- * perhaps more importantly it's possibly faster on any uarch that does a
- * rotate with a loop.
- */
-
-#define SHA_ASM(op, x, n) ({ unsigned int __res; __asm__(op " %1,%0":"=r" (__res):"i" (n), "0" (x)); __res; })
-#define SHA_ROL(x,n)	SHA_ASM("rol", x, n)
-#define SHA_ROR(x,n)	SHA_ASM("ror", x, n)
-
-#else
-
 #define SHA_ROT(X,l,r)	(((X) << (l)) | ((X) >> (r)))
 #define SHA_ROL(X,n)	SHA_ROT(X,n,32-(n))
 #define SHA_ROR(X,n)	SHA_ROT(X,32-(n),n)
 
-#endif
-
 /*
  * If you have 32 registers or more, the compiler can (and should)
  * try to change the array[] accesses into registers. However, on

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] block-sha1: remove use of assembly
  2022-03-08  2:22 ` [PATCH v2] " brian m. carlson
@ 2022-03-08 13:38   ` Ævar Arnfjörð Bjarmason
  2022-03-09 22:10     ` brian m. carlson
  0 siblings, 1 reply; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-08 13:38 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Junio C Hamano, Taylor Blau


On Tue, Mar 08 2022, brian m. carlson wrote:

I think the $subject of the patch needs updating. It's not removing all
the assemply from the file, after this patch we still have the
ARM-specific assembly.

I don't have a box to test that on, but I wonder if that also triggers
the pedantic mode?

Perhaps:

    block-sha1: remove superfluous i386 and x86-64 assembly

?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] block-sha1: remove use of assembly
  2022-03-08 13:38   ` Ævar Arnfjörð Bjarmason
@ 2022-03-09 22:10     ` brian m. carlson
  2022-03-09 22:32       ` Taylor Blau
  0 siblings, 1 reply; 12+ messages in thread
From: brian m. carlson @ 2022-03-09 22:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 1073 bytes --]

On 2022-03-08 at 13:38:06, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Mar 08 2022, brian m. carlson wrote:
> 
> I think the $subject of the patch needs updating. It's not removing all
> the assemply from the file, after this patch we still have the
> ARM-specific assembly.
> 
> I don't have a box to test that on, but I wonder if that also triggers
> the pedantic mode?
> 
> Perhaps:
> 
>     block-sha1: remove superfluous i386 and x86-64 assembly

I suspect it has the same problem.  My inclination is to just remove it,
because my guess is that the compiler has gotten smarter between 2009
and now.

I honestly intend to just remove this code in a future version because
everyone not using SHA1DC has a security problem and we shouldn't offer
insecure options.

However, I think for now, I'm just going to reroll this with the new
title and then I can remove it in a future version unless somebody with
an ARM system can relatively quickly tell me whether it's necessary.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] block-sha1: remove use of assembly
  2022-03-09 22:10     ` brian m. carlson
@ 2022-03-09 22:32       ` Taylor Blau
  2022-03-09 23:52         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 12+ messages in thread
From: Taylor Blau @ 2022-03-09 22:32 UTC (permalink / raw)
  To: brian m. carlson, Ævar Arnfjörð Bjarmason, git,
	Junio C Hamano

On Wed, Mar 09, 2022 at 10:10:33PM +0000, brian m. carlson wrote:
> On 2022-03-08 at 13:38:06, Ævar Arnfjörð Bjarmason wrote:
> >
> > On Tue, Mar 08 2022, brian m. carlson wrote:
> >
> > I think the $subject of the patch needs updating. It's not removing all
> > the assemply from the file, after this patch we still have the
> > ARM-specific assembly.
> >
> > I don't have a box to test that on, but I wonder if that also triggers
> > the pedantic mode?
> >
> > Perhaps:
> >
> >     block-sha1: remove superfluous i386 and x86-64 assembly
>
> I suspect it has the same problem.  My inclination is to just remove it,
> because my guess is that the compiler has gotten smarter between 2009
> and now.

Almost certainly. I don't have a machine to test it on, either, but I
would be shocked if `make BLK_SHA=YesPlease DEVELOPER=1` worked on
master today on an arm machine.

> I honestly intend to just remove this code in a future version because
> everyone not using SHA1DC has a security problem and we shouldn't offer
> insecure options.
>
> However, I think for now, I'm just going to reroll this with the new
> title and then I can remove it in a future version unless somebody with
> an ARM system can relatively quickly tell me whether it's necessary.

I wonder if a good stop-gap for arm systems might be to do something
like:

--- 8< ---

diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
index 1bb6e7c069..7402d02875 100644
--- a/block-sha1/sha1.c
+++ b/block-sha1/sha1.c
@@ -57,7 +57,7 @@
 #if defined(__i386__) || defined(__x86_64__)
   #define setW(x, val) (*(volatile unsigned int *)&W(x) = (val))
 #elif defined(__GNUC__) && defined(__arm__)
-  #define setW(x, val) do { W(x) = (val); __asm__("":::"memory"); } while (0)
+  #define setW(x, val) do { W(x) = (val); __extension__ __asm__("":::"memory"); } while (0)
 #else
   #define setW(x, val) (W(x) = (val))
 #endif

--- >8 ---

in the meantime in a separate patch. There it seems like the memory
barrier is useful for machines with fewer than 25-ish registers. Though
obviously moot if your ultimate goal is to get rid of the block sha1
code.

But in the meantime, a stop-gap patch may be useful. If you use that
diff, feel free to forge my Signed-off-by.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] block-sha1: remove use of assembly
  2022-03-09 22:32       ` Taylor Blau
@ 2022-03-09 23:52         ` Ævar Arnfjörð Bjarmason
  2022-03-10  1:59           ` Carlo Marcelo Arenas Belón
  2022-03-10  2:34           ` Taylor Blau
  0 siblings, 2 replies; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-09 23:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: brian m. carlson, git, Junio C Hamano


On Wed, Mar 09 2022, Taylor Blau wrote:

> On Wed, Mar 09, 2022 at 10:10:33PM +0000, brian m. carlson wrote:
>> On 2022-03-08 at 13:38:06, Ævar Arnfjörð Bjarmason wrote:
>> >
>> > On Tue, Mar 08 2022, brian m. carlson wrote:
>> >
>> > I think the $subject of the patch needs updating. It's not removing all
>> > the assemply from the file, after this patch we still have the
>> > ARM-specific assembly.
>> >
>> > I don't have a box to test that on, but I wonder if that also triggers
>> > the pedantic mode?
>> >
>> > Perhaps:
>> >
>> >     block-sha1: remove superfluous i386 and x86-64 assembly
>>
>> I suspect it has the same problem.  My inclination is to just remove it,
>> because my guess is that the compiler has gotten smarter between 2009
>> and now.
>
> Almost certainly. I don't have a machine to test it on, either, but I
> would be shocked if `make BLK_SHA=YesPlease DEVELOPER=1` worked on
> master today on an arm machine.

Why is that? The -pedantic error is specifically about
"gnu-statement-expression", i.e. the bracket syntax, not the inline
assembly per-se.

The ARM assembly isn't using that, and we have other code __asm__ code
compiled with -pedantic. E.g. I get the __asm__ in "compat/bswap.h" by
default, and it passes -pedantic (the code starting around line 38).

>> I honestly intend to just remove this code in a future version because
>> everyone not using SHA1DC has a security problem and we shouldn't offer
>> insecure options.
>>
>> However, I think for now, I'm just going to reroll this with the new
>> title and then I can remove it in a future version unless somebody with
>> an ARM system can relatively quickly tell me whether it's necessary.
>
> I wonder if a good stop-gap for arm systems might be to do something
> like:
>
> --- 8< ---
>
> diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
> index 1bb6e7c069..7402d02875 100644
> --- a/block-sha1/sha1.c
> +++ b/block-sha1/sha1.c
> @@ -57,7 +57,7 @@
>  #if defined(__i386__) || defined(__x86_64__)
>    #define setW(x, val) (*(volatile unsigned int *)&W(x) = (val))
>  #elif defined(__GNUC__) && defined(__arm__)
> -  #define setW(x, val) do { W(x) = (val); __asm__("":::"memory"); } while (0)
> +  #define setW(x, val) do { W(x) = (val); __extension__ __asm__("":::"memory"); } while (0)
>  #else
>    #define setW(x, val) (W(x) = (val))
>  #endif
>
> --- >8 ---

Isn't that __extension__ only needed *if* it warns under -pedantic,
which AFAICT doesn't apply to all uses of __asm__ (but your compiler
version etc. may be different...).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] block-sha1: remove use of assembly
  2022-03-09 23:52         ` Ævar Arnfjörð Bjarmason
@ 2022-03-10  1:59           ` Carlo Marcelo Arenas Belón
  2022-03-10  2:34           ` Taylor Blau
  1 sibling, 0 replies; 12+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2022-03-10  1:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, brian m. carlson, git, Junio C Hamano

On Thu, Mar 10, 2022 at 12:52:31AM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Mar 09 2022, Taylor Blau wrote:
> 
> > On Wed, Mar 09, 2022 at 10:10:33PM +0000, brian m. carlson wrote:
> >> On 2022-03-08 at 13:38:06, Ævar Arnfjörð Bjarmason wrote:
> >> >
> >> > On Tue, Mar 08 2022, brian m. carlson wrote:
> >> >
> >> > I think the $subject of the patch needs updating. It's not removing all
> >> > the assemply from the file, after this patch we still have the
> >> > ARM-specific assembly.
> >> >
> >> > I don't have a box to test that on, but I wonder if that also triggers
> >> > the pedantic mode?
> >> >
> >> > Perhaps:
> >> >
> >> >     block-sha1: remove superfluous i386 and x86-64 assembly
> >>
> >> I suspect it has the same problem.  My inclination is to just remove it,
> >> because my guess is that the compiler has gotten smarter between 2009
> >> and now.
> >
> > Almost certainly. I don't have a machine to test it on, either, but I
> > would be shocked if `make BLK_SHA=YesPlease DEVELOPER=1` worked on
> > master today on an arm machine.
> 
> Why is that? The -pedantic error is specifically about
> "gnu-statement-expression", i.e. the bracket syntax, not the inline
> assembly per-se.

not sure how gcc version (as mentioned elsewhere) might affect this, but
had built it successfully in aarch64 with gcc 4.8.4, and arm32v6 with
gcc 10.3.1.

Carlo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] block-sha1: remove use of assembly
  2022-03-09 23:52         ` Ævar Arnfjörð Bjarmason
  2022-03-10  1:59           ` Carlo Marcelo Arenas Belón
@ 2022-03-10  2:34           ` Taylor Blau
  1 sibling, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2022-03-10  2:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, brian m. carlson, git, Junio C Hamano

On Thu, Mar 10, 2022 at 12:52:31AM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> I suspect it has the same problem.  My inclination is to just remove it,
> >> because my guess is that the compiler has gotten smarter between 2009
> >> and now.
> >
> > Almost certainly. I don't have a machine to test it on, either, but I
> > would be shocked if `make BLK_SHA=YesPlease DEVELOPER=1` worked on
> > master today on an arm machine.
>
> Why is that? The -pedantic error is specifically about
> "gnu-statement-expression", i.e. the bracket syntax, not the inline
> assembly per-se.
>
> The ARM assembly isn't using that, and we have other code __asm__ code
> compiled with -pedantic. E.g. I get the __asm__ in "compat/bswap.h" by
> default, and it passes -pedantic (the code starting around line 38).

You're right, I had this completely mixed up in my mind. In GitHub's
fork there is a spot I have been working near for the past couple of
days where there is inline assembly right below a statement expression.

The statement expression is what causes the -pedantic builds to fail,
not the inline __asm__. Indeed, if you just stick a memory barrier
anywhere in Git's codebase, we'll still compile under the DEVELOPER=1
builds.

> Isn't that __extension__ only needed *if* it warns under -pedantic,
> which AFAICT doesn't apply to all uses of __asm__ (but your compiler
> version etc. may be different...).

Yes, disregard that last suggestion :-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3] block-sha1: remove use of obsolete x86 assembly
  2022-03-07 23:25 [PATCH] block-sha1: remove use of assembly brian m. carlson
  2022-03-07 23:32 ` Taylor Blau
  2022-03-08  2:22 ` [PATCH v2] " brian m. carlson
@ 2022-03-10 17:47 ` brian m. carlson
  2022-03-10 19:18   ` Junio C Hamano
  2 siblings, 1 reply; 12+ messages in thread
From: brian m. carlson @ 2022-03-10 17:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Taylor Blau, Ævar Arnfjörð Bjarmason

In the block SHA-1 code, we have special assembly code for i386 and
amd64 to perform rotations with assembly.  This is supposed to help pick
the correct rotation operation depending on which rotation is smaller,
which can help some systems perform slightly better, since any circular
rotation can be specified as either a rotate left or a rotate right.
However, this isn't needed, so we should remove it.

First, SHA-1, like SHA-2, uses fixed constant rotates.  Thus, all
rotation amounts are known at compile time and are in fact baked into
the code.  Fortunately, peephole optimizers recognize rotations
specified in the normal way and automatically emit the correct code,
including a preference for choosing a rotate left versus a rotate right.
This has been the case for well over a decade, and is a standard example
of the utility of a peephole optimizer.

Moreover, all modern CPUs, with the exception of extremely limited
embedded CPUs such as some Cortex-M processors, provide a barrel
shifter, which lets the CPU perform rotates of any bit amount in
constant time.  This is valuable for many cryptographic algorithms to
improve performance, and is required to prevent timing attacks in
algorithms which use data-dependent rotations (which don't include the
hash algorithms we use).  As a result, even though the compiler does the
correct optimization, it isn't even needed here and either a left or a
right rotate is equally acceptable.

In fact, the SHA-256 code already takes this into account and just
writes the simple code using an inline function to let the compiler
optimize it for us.

The downside of using this code, however, is that it uses a GCC
extension, which makes the compiler complain when using -pedantic unless
it's prefixed with __extension__.  We could fix that, but since it's
not needed, let's just remove it.  We haven't noticed this because
almost everyone uses the SHA1DC code instead, but it still shows up for
some people.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Changes from v2:
* Improve subject

 block-sha1/sha1.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
index 1bb6e7c069..5974cd7dd3 100644
--- a/block-sha1/sha1.c
+++ b/block-sha1/sha1.c
@@ -11,27 +11,10 @@
 
 #include "sha1.h"
 
-#if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
-
-/*
- * Force usage of rol or ror by selecting the one with the smaller constant.
- * It _can_ generate slightly smaller code (a constant of 1 is special), but
- * perhaps more importantly it's possibly faster on any uarch that does a
- * rotate with a loop.
- */
-
-#define SHA_ASM(op, x, n) ({ unsigned int __res; __asm__(op " %1,%0":"=r" (__res):"i" (n), "0" (x)); __res; })
-#define SHA_ROL(x,n)	SHA_ASM("rol", x, n)
-#define SHA_ROR(x,n)	SHA_ASM("ror", x, n)
-
-#else
-
 #define SHA_ROT(X,l,r)	(((X) << (l)) | ((X) >> (r)))
 #define SHA_ROL(X,n)	SHA_ROT(X,n,32-(n))
 #define SHA_ROR(X,n)	SHA_ROT(X,32-(n),n)
 
-#endif
-
 /*
  * If you have 32 registers or more, the compiler can (and should)
  * try to change the array[] accesses into registers. However, on

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] block-sha1: remove use of obsolete x86 assembly
  2022-03-10 17:47 ` [PATCH v3] block-sha1: remove use of obsolete x86 assembly brian m. carlson
@ 2022-03-10 19:18   ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2022-03-10 19:18 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Taylor Blau, Ævar Arnfjörð Bjarmason

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Changes from v2:
> * Improve subject

Will queue.  Let's merge it down to 'next'.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-03-10 19:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-07 23:25 [PATCH] block-sha1: remove use of assembly brian m. carlson
2022-03-07 23:32 ` Taylor Blau
2022-03-08  2:20   ` brian m. carlson
2022-03-08  2:22 ` [PATCH v2] " brian m. carlson
2022-03-08 13:38   ` Ævar Arnfjörð Bjarmason
2022-03-09 22:10     ` brian m. carlson
2022-03-09 22:32       ` Taylor Blau
2022-03-09 23:52         ` Ævar Arnfjörð Bjarmason
2022-03-10  1:59           ` Carlo Marcelo Arenas Belón
2022-03-10  2:34           ` Taylor Blau
2022-03-10 17:47 ` [PATCH v3] block-sha1: remove use of obsolete x86 assembly brian m. carlson
2022-03-10 19:18   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).