* SHA1dc on mac @ 2020-02-12 8:56 Mike Hommey 2020-02-12 16:46 ` Eric Sunshine 0 siblings, 1 reply; 17+ messages in thread From: Mike Hommey @ 2020-02-12 8:56 UTC (permalink / raw) To: git Hi, If I'm not mistaken in my reading of the various files involved, it looks like for some reason, building git on mac leads to using Apple Common Crypto for SHA1, rather than SHA1dc, which seems unfortunate. Is that really expected? More generally, at this point, should anything other than SHA1dc be supported as a build option at all? Mike ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SHA1dc on mac 2020-02-12 8:56 SHA1dc on mac Mike Hommey @ 2020-02-12 16:46 ` Eric Sunshine 2020-02-12 22:31 ` Mike Hommey 0 siblings, 1 reply; 17+ messages in thread From: Eric Sunshine @ 2020-02-12 16:46 UTC (permalink / raw) To: Mike Hommey; +Cc: Git List On Wed, Feb 12, 2020 at 3:57 AM Mike Hommey <mh@glandium.org> wrote: > If I'm not mistaken in my reading of the various files involved, it > looks like for some reason, building git on mac leads to using Apple > Common Crypto for SHA1, rather than SHA1dc, which seems unfortunate. > Is that really expected? There was a discussion on this topic a while back[1], and it does seem that the behavior you describe is intentional[2]. > More generally, at this point, should anything > other than SHA1dc be supported as a build option at all? The conclusion [2,3] was that it likely would make sense to drop support for Apple's CommonCrypto altogether, although nobody has yet stepped up to do the work. [1]: https://lore.kernel.org/git/CAMYxyaVQyVRQb-b0nVv412tMZ3rEnOfUPRakg2dEREg5_Ba5Ag@mail.gmail.com/T/ [2]: https://lore.kernel.org/git/20160102234923.GA14424@gmail.com/ [3]: https://lore.kernel.org/git/CAPig+cQ5kKAt2_RQnqT7Rn=uGmHV9VvxpQ+UgDPOj=D=pq6arg@mail.gmail.com/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SHA1dc on mac 2020-02-12 16:46 ` Eric Sunshine @ 2020-02-12 22:31 ` Mike Hommey 2020-02-12 22:40 ` Junio C Hamano 0 siblings, 1 reply; 17+ messages in thread From: Mike Hommey @ 2020-02-12 22:31 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List On Wed, Feb 12, 2020 at 11:46:31AM -0500, Eric Sunshine wrote: > On Wed, Feb 12, 2020 at 3:57 AM Mike Hommey <mh@glandium.org> wrote: > > If I'm not mistaken in my reading of the various files involved, it > > looks like for some reason, building git on mac leads to using Apple > > Common Crypto for SHA1, rather than SHA1dc, which seems unfortunate. > > Is that really expected? > > There was a discussion on this topic a while back[1], and it does seem > that the behavior you describe is intentional[2]. That discussion predates SHA1dc, though. > > More generally, at this point, should anything > > other than SHA1dc be supported as a build option at all? > > The conclusion [2,3] was that it likely would make sense to drop > support for Apple's CommonCrypto altogether, although nobody has yet > stepped up to do the work. I wasn't explicit in my question, but I meant more broadly than Apple Common Crypto. There is still opt-in support for openssl sha1 and PPC sha1. Mike ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: SHA1dc on mac 2020-02-12 22:31 ` Mike Hommey @ 2020-02-12 22:40 ` Junio C Hamano 2020-02-23 22:37 ` [PATCH] Remove non-SHA1dc sha1 implementations Mike Hommey 0 siblings, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2020-02-12 22:40 UTC (permalink / raw) To: Mike Hommey; +Cc: Eric Sunshine, Git List Mike Hommey <mh@glandium.org> writes: > On Wed, Feb 12, 2020 at 11:46:31AM -0500, Eric Sunshine wrote: >> On Wed, Feb 12, 2020 at 3:57 AM Mike Hommey <mh@glandium.org> wrote: >> > If I'm not mistaken in my reading of the various files involved, it >> > looks like for some reason, building git on mac leads to using Apple >> > Common Crypto for SHA1, rather than SHA1dc, which seems unfortunate. >> > Is that really expected? >> >> There was a discussion on this topic a while back[1], and it does seem >> that the behavior you describe is intentional[2]. > > That discussion predates SHA1dc, though. Yes, but the essense is the same. It was phrased as "is there a good reason to prefer CommonCrypto over block-sha1?" but it really was "is there a good reason to prefer CommonCrypto over the best we offer?" And the best we offer, which used to be block-sha1, is now sha1dc. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] Remove non-SHA1dc sha1 implementations 2020-02-12 22:40 ` Junio C Hamano @ 2020-02-23 22:37 ` Mike Hommey 2020-02-24 4:47 ` Jeff King 2022-03-19 1:02 ` [PATCH] ppc: remove custom SHA-1 implementation Ævar Arnfjörð Bjarmason 0 siblings, 2 replies; 17+ messages in thread From: Mike Hommey @ 2020-02-23 22:37 UTC (permalink / raw) To: git; +Cc: gitster, Mike Hommey It is 2020, and with the weakening of SHA1 security-wise, there doesn't seem to be a reason to support anything else than SHA1dc, with collision detection. Signed-off-by: Mike Hommey <mh@glandium.org> --- Note: I only tested building on Linux. INSTALL | 5 - Makefile | 67 ++----------- block-sha1/sha1.c | 251 ---------------------------------------------- block-sha1/sha1.h | 22 ---- config.mak.uname | 1 - configure.ac | 3 - hash.h | 24 ----- ppc/sha1.c | 72 ------------- ppc/sha1.h | 25 ----- ppc/sha1ppc.S | 224 ----------------------------------------- 10 files changed, 6 insertions(+), 688 deletions(-) delete mode 100644 block-sha1/sha1.c delete mode 100644 block-sha1/sha1.h delete mode 100644 ppc/sha1.c delete mode 100644 ppc/sha1.h delete mode 100644 ppc/sha1ppc.S diff --git a/INSTALL b/INSTALL index 22c364f34f..91d649f99e 100644 --- a/INSTALL +++ b/INSTALL @@ -133,11 +133,6 @@ Issues of note: you are using libcurl older than 7.34.0. Otherwise you can use NO_OPENSSL without losing git-imap-send. - By default, git uses OpenSSL for SHA1 but it will use its own - library (inspired by Mozilla's) with either NO_OPENSSL or - BLK_SHA1. Also included is a version optimized for PowerPC - (PPC_SHA1). - - "libcurl" library is used by git-http-fetch, git-fetch, and, if the curl version >= 7.34.0, for git-imap-send. You might also want the "curl" executable for debugging purposes. If you do not diff --git a/Makefile b/Makefile index b7d7374dac..5b4307d332 100644 --- a/Makefile +++ b/Makefile @@ -149,37 +149,15 @@ all:: # specify your own (or DarwinPort's) include directories and # library directories by defining CFLAGS and LDFLAGS appropriately. # -# Define NO_APPLE_COMMON_CRYPTO if you are building on Darwin/Mac OS X -# and do not want to use Apple's CommonCrypto library. This allows you -# to provide your own OpenSSL library, for example from MacPorts. -# -# Define BLK_SHA1 environment variable to make use of the bundled -# optimized C SHA1 routine. -# -# Define PPC_SHA1 environment variable when running make to make use of -# a bundled SHA1 routine optimized for PowerPC. -# -# Define DC_SHA1 to unconditionally enable the collision-detecting sha1 -# algorithm. This is slower, but may detect attempted collision attacks. -# Takes priority over other *_SHA1 knobs. -# -# Define DC_SHA1_EXTERNAL in addition to DC_SHA1 if you want to build / link -# git with the external SHA1 collision-detect library. +# Define DC_SHA1_EXTERNAL if you want to build / link git with the +# external SHA1 collision-detect library. # Without this option, i.e. the default behavior is to build git with its # own built-in code (or submodule). # -# Define DC_SHA1_SUBMODULE in addition to DC_SHA1 to use the -# sha1collisiondetection shipped as a submodule instead of the -# non-submodule copy in sha1dc/. This is an experimental option used -# by the git project to migrate to using sha1collisiondetection as a -# submodule. -# -# Define OPENSSL_SHA1 environment variable when running make to link -# with the SHA1 routine from openssl library. -# -# Define SHA1_MAX_BLOCK_SIZE to limit the amount of data that will be hashed -# in one call to the platform's SHA1_Update(). e.g. APPLE_COMMON_CRYPTO -# wants 'SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L' defined. +# Define DC_SHA1_SUBMODULE to use the sha1collisiondetection shipped +# as a submodule instead of the non-submodule copy in sha1dc/. This is +# an experimental option used by the git project to migrate to using +# sha1collisiondetection as a submodule. # # Define BLK_SHA256 to use the built-in SHA-256 routines. # @@ -1296,11 +1274,6 @@ ifeq ($(uname_S),Darwin) BASIC_LDFLAGS += -L/opt/local/lib endif endif - ifndef NO_APPLE_COMMON_CRYPTO - NO_OPENSSL = YesPlease - APPLE_COMMON_CRYPTO = YesPlease - COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO - endif NO_REGEX = YesPlease PTHREAD_LIBS = endif @@ -1430,9 +1403,6 @@ ifdef NEEDS_SSL_WITH_CRYPTO else LIB_4_CRYPTO = $(OPENSSL_LINK) -lcrypto endif -ifdef APPLE_COMMON_CRYPTO - LIB_4_CRYPTO += -framework Security -framework CoreFoundation -endif endif ifndef NO_ICONV ifdef NEEDS_LIBICONV @@ -1647,27 +1617,6 @@ ifdef NO_POSIX_GOODIES BASIC_CFLAGS += -DNO_POSIX_GOODIES endif -ifdef APPLE_COMMON_CRYPTO - # Apple CommonCrypto requires chunking - SHA1_MAX_BLOCK_SIZE = 1024L*1024L*1024L -endif - -ifdef OPENSSL_SHA1 - EXTLIBS += $(LIB_4_CRYPTO) - BASIC_CFLAGS += -DSHA1_OPENSSL -else -ifdef BLK_SHA1 - LIB_OBJS += block-sha1/sha1.o - BASIC_CFLAGS += -DSHA1_BLK -else -ifdef PPC_SHA1 - LIB_OBJS += ppc/sha1.o ppc/sha1ppc.o - BASIC_CFLAGS += -DSHA1_PPC -else -ifdef APPLE_COMMON_CRYPTO - COMPAT_CFLAGS += -DCOMMON_DIGEST_FOR_OPENSSL - BASIC_CFLAGS += -DSHA1_APPLE -else DC_SHA1 := YesPlease BASIC_CFLAGS += -DSHA1_DC LIB_OBJS += sha1dc_git.o @@ -1694,10 +1643,6 @@ endif -DSHA1DC_CUSTOM_INCLUDE_SHA1_C="\"cache.h\"" \ -DSHA1DC_CUSTOM_INCLUDE_UBC_CHECK_C="\"git-compat-util.h\"" endif -endif -endif -endif -endif ifdef OPENSSL_SHA256 EXTLIBS += $(LIB_4_CRYPTO) diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c deleted file mode 100644 index 22b125cf8c..0000000000 --- a/block-sha1/sha1.c +++ /dev/null @@ -1,251 +0,0 @@ -/* - * SHA1 routine optimized to do word accesses rather than byte accesses, - * and to avoid unnecessary copies into the context array. - * - * This was initially based on the Mozilla SHA1 implementation, although - * none of the original Mozilla code remains. - */ - -/* this is only to get definitions for memcpy(), ntohl() and htonl() */ -#include "../git-compat-util.h" - -#include "sha1.h" - -#if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__)) - -/* - * Force usage of rol or ror by selecting the one with the smaller constant. - * It _can_ generate slightly smaller code (a constant of 1 is special), but - * perhaps more importantly it's possibly faster on any uarch that does a - * rotate with a loop. - */ - -#define SHA_ASM(op, x, n) ({ unsigned int __res; __asm__(op " %1,%0":"=r" (__res):"i" (n), "0" (x)); __res; }) -#define SHA_ROL(x,n) SHA_ASM("rol", x, n) -#define SHA_ROR(x,n) SHA_ASM("ror", x, n) - -#else - -#define SHA_ROT(X,l,r) (((X) << (l)) | ((X) >> (r))) -#define SHA_ROL(X,n) SHA_ROT(X,n,32-(n)) -#define SHA_ROR(X,n) SHA_ROT(X,32-(n),n) - -#endif - -/* - * If you have 32 registers or more, the compiler can (and should) - * try to change the array[] accesses into registers. However, on - * machines with less than ~25 registers, that won't really work, - * and at least gcc will make an unholy mess of it. - * - * So to avoid that mess which just slows things down, we force - * the stores to memory to actually happen (we might be better off - * with a 'W(t)=(val);asm("":"+m" (W(t))' there instead, as - * suggested by Artur Skawina - that will also make gcc unable to - * try to do the silly "optimize away loads" part because it won't - * see what the value will be). - * - * Ben Herrenschmidt reports that on PPC, the C version comes close - * to the optimized asm with this (ie on PPC you don't want that - * 'volatile', since there are lots of registers). - * - * On ARM we get the best code generation by forcing a full memory barrier - * between each SHA_ROUND, otherwise gcc happily get wild with spilling and - * the stack frame size simply explode and performance goes down the drain. - */ - -#if defined(__i386__) || defined(__x86_64__) - #define setW(x, val) (*(volatile unsigned int *)&W(x) = (val)) -#elif defined(__GNUC__) && defined(__arm__) - #define setW(x, val) do { W(x) = (val); __asm__("":::"memory"); } while (0) -#else - #define setW(x, val) (W(x) = (val)) -#endif - -/* This "rolls" over the 512-bit array */ -#define W(x) (array[(x)&15]) - -/* - * Where do we get the source from? The first 16 iterations get it from - * the input data, the next mix it from the 512-bit array. - */ -#define SHA_SRC(t) get_be32((unsigned char *) block + (t)*4) -#define SHA_MIX(t) SHA_ROL(W((t)+13) ^ W((t)+8) ^ W((t)+2) ^ W(t), 1); - -#define SHA_ROUND(t, input, fn, constant, A, B, C, D, E) do { \ - unsigned int TEMP = input(t); setW(t, TEMP); \ - E += TEMP + SHA_ROL(A,5) + (fn) + (constant); \ - B = SHA_ROR(B, 2); } while (0) - -#define T_0_15(t, A, B, C, D, E) SHA_ROUND(t, SHA_SRC, (((C^D)&B)^D) , 0x5a827999, A, B, C, D, E ) -#define T_16_19(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, (((C^D)&B)^D) , 0x5a827999, A, B, C, D, E ) -#define T_20_39(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, (B^C^D) , 0x6ed9eba1, A, B, C, D, E ) -#define T_40_59(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, ((B&C)+(D&(B^C))) , 0x8f1bbcdc, A, B, C, D, E ) -#define T_60_79(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, (B^C^D) , 0xca62c1d6, A, B, C, D, E ) - -static void blk_SHA1_Block(blk_SHA_CTX *ctx, const void *block) -{ - unsigned int A,B,C,D,E; - unsigned int array[16]; - - A = ctx->H[0]; - B = ctx->H[1]; - C = ctx->H[2]; - D = ctx->H[3]; - E = ctx->H[4]; - - /* Round 1 - iterations 0-16 take their input from 'block' */ - T_0_15( 0, A, B, C, D, E); - T_0_15( 1, E, A, B, C, D); - T_0_15( 2, D, E, A, B, C); - T_0_15( 3, C, D, E, A, B); - T_0_15( 4, B, C, D, E, A); - T_0_15( 5, A, B, C, D, E); - T_0_15( 6, E, A, B, C, D); - T_0_15( 7, D, E, A, B, C); - T_0_15( 8, C, D, E, A, B); - T_0_15( 9, B, C, D, E, A); - T_0_15(10, A, B, C, D, E); - T_0_15(11, E, A, B, C, D); - T_0_15(12, D, E, A, B, C); - T_0_15(13, C, D, E, A, B); - T_0_15(14, B, C, D, E, A); - T_0_15(15, A, B, C, D, E); - - /* Round 1 - tail. Input from 512-bit mixing array */ - T_16_19(16, E, A, B, C, D); - T_16_19(17, D, E, A, B, C); - T_16_19(18, C, D, E, A, B); - T_16_19(19, B, C, D, E, A); - - /* Round 2 */ - T_20_39(20, A, B, C, D, E); - T_20_39(21, E, A, B, C, D); - T_20_39(22, D, E, A, B, C); - T_20_39(23, C, D, E, A, B); - T_20_39(24, B, C, D, E, A); - T_20_39(25, A, B, C, D, E); - T_20_39(26, E, A, B, C, D); - T_20_39(27, D, E, A, B, C); - T_20_39(28, C, D, E, A, B); - T_20_39(29, B, C, D, E, A); - T_20_39(30, A, B, C, D, E); - T_20_39(31, E, A, B, C, D); - T_20_39(32, D, E, A, B, C); - T_20_39(33, C, D, E, A, B); - T_20_39(34, B, C, D, E, A); - T_20_39(35, A, B, C, D, E); - T_20_39(36, E, A, B, C, D); - T_20_39(37, D, E, A, B, C); - T_20_39(38, C, D, E, A, B); - T_20_39(39, B, C, D, E, A); - - /* Round 3 */ - T_40_59(40, A, B, C, D, E); - T_40_59(41, E, A, B, C, D); - T_40_59(42, D, E, A, B, C); - T_40_59(43, C, D, E, A, B); - T_40_59(44, B, C, D, E, A); - T_40_59(45, A, B, C, D, E); - T_40_59(46, E, A, B, C, D); - T_40_59(47, D, E, A, B, C); - T_40_59(48, C, D, E, A, B); - T_40_59(49, B, C, D, E, A); - T_40_59(50, A, B, C, D, E); - T_40_59(51, E, A, B, C, D); - T_40_59(52, D, E, A, B, C); - T_40_59(53, C, D, E, A, B); - T_40_59(54, B, C, D, E, A); - T_40_59(55, A, B, C, D, E); - T_40_59(56, E, A, B, C, D); - T_40_59(57, D, E, A, B, C); - T_40_59(58, C, D, E, A, B); - T_40_59(59, B, C, D, E, A); - - /* Round 4 */ - T_60_79(60, A, B, C, D, E); - T_60_79(61, E, A, B, C, D); - T_60_79(62, D, E, A, B, C); - T_60_79(63, C, D, E, A, B); - T_60_79(64, B, C, D, E, A); - T_60_79(65, A, B, C, D, E); - T_60_79(66, E, A, B, C, D); - T_60_79(67, D, E, A, B, C); - T_60_79(68, C, D, E, A, B); - T_60_79(69, B, C, D, E, A); - T_60_79(70, A, B, C, D, E); - T_60_79(71, E, A, B, C, D); - T_60_79(72, D, E, A, B, C); - T_60_79(73, C, D, E, A, B); - T_60_79(74, B, C, D, E, A); - T_60_79(75, A, B, C, D, E); - T_60_79(76, E, A, B, C, D); - T_60_79(77, D, E, A, B, C); - T_60_79(78, C, D, E, A, B); - T_60_79(79, B, C, D, E, A); - - ctx->H[0] += A; - ctx->H[1] += B; - ctx->H[2] += C; - ctx->H[3] += D; - ctx->H[4] += E; -} - -void blk_SHA1_Init(blk_SHA_CTX *ctx) -{ - ctx->size = 0; - - /* Initialize H with the magic constants (see FIPS180 for constants) */ - ctx->H[0] = 0x67452301; - ctx->H[1] = 0xefcdab89; - ctx->H[2] = 0x98badcfe; - ctx->H[3] = 0x10325476; - ctx->H[4] = 0xc3d2e1f0; -} - -void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len) -{ - unsigned int lenW = ctx->size & 63; - - ctx->size += len; - - /* Read the data into W and process blocks as they get full */ - if (lenW) { - unsigned int left = 64 - lenW; - if (len < left) - left = len; - memcpy(lenW + (char *)ctx->W, data, left); - lenW = (lenW + left) & 63; - len -= left; - data = ((const char *)data + left); - if (lenW) - return; - blk_SHA1_Block(ctx, ctx->W); - } - while (len >= 64) { - blk_SHA1_Block(ctx, data); - data = ((const char *)data + 64); - len -= 64; - } - if (len) - memcpy(ctx->W, data, len); -} - -void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx) -{ - static const unsigned char pad[64] = { 0x80 }; - unsigned int padlen[2]; - int i; - - /* Pad with a binary 1 (ie 0x80), then zeroes, then length */ - padlen[0] = htonl((uint32_t)(ctx->size >> 29)); - padlen[1] = htonl((uint32_t)(ctx->size << 3)); - - i = ctx->size & 63; - blk_SHA1_Update(ctx, pad, 1 + (63 & (55 - i))); - blk_SHA1_Update(ctx, padlen, 8); - - /* Output hash */ - for (i = 0; i < 5; i++) - put_be32(hashout + i * 4, ctx->H[i]); -} diff --git a/block-sha1/sha1.h b/block-sha1/sha1.h deleted file mode 100644 index 4df6747752..0000000000 --- a/block-sha1/sha1.h +++ /dev/null @@ -1,22 +0,0 @@ -/* - * SHA1 routine optimized to do word accesses rather than byte accesses, - * and to avoid unnecessary copies into the context array. - * - * This was initially based on the Mozilla SHA1 implementation, although - * none of the original Mozilla code remains. - */ - -typedef struct { - unsigned long long size; - unsigned int H[5]; - unsigned int W[16]; -} blk_SHA_CTX; - -void blk_SHA1_Init(blk_SHA_CTX *ctx); -void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *dataIn, unsigned long len); -void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx); - -#define platform_SHA_CTX blk_SHA_CTX -#define platform_SHA1_Init blk_SHA1_Init -#define platform_SHA1_Update blk_SHA1_Update -#define platform_SHA1_Final blk_SHA1_Final diff --git a/config.mak.uname b/config.mak.uname index cc8efd95b1..785b9265c3 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -116,7 +116,6 @@ ifeq ($(uname_S),Darwin) # i.e. "begins with [15678] and a dot" means "10.4.* or older". ifeq ($(shell expr "$(uname_R)" : '[15678]\.'),2) OLD_ICONV = UnfortunatelyYes - NO_APPLE_COMMON_CRYPTO = YesPlease endif ifeq ($(shell expr "$(uname_R)" : '[15]\.'),2) NO_STRLCPY = YesPlease diff --git a/configure.ac b/configure.ac index 66aedb9288..dd39b7ecdb 100644 --- a/configure.ac +++ b/configure.ac @@ -237,9 +237,6 @@ AC_MSG_NOTICE([CHECKS for site configuration]) # tests. These tests take up a significant amount of the total test time # but are not needed unless you plan to talk to SVN repos. # -# Define PPC_SHA1 environment variable when running make to make use of -# a bundled SHA1 routine optimized for PowerPC. -# # Define NO_OPENSSL environment variable if you do not have OpenSSL. # # Define OPENSSLDIR=/foo/bar if your openssl header and library files are in diff --git a/hash.h b/hash.h index 52a4f1a3f4..e1a3d00b13 100644 --- a/hash.h +++ b/hash.h @@ -3,17 +3,7 @@ #include "git-compat-util.h" -#if defined(SHA1_PPC) -#include "ppc/sha1.h" -#elif defined(SHA1_APPLE) -#include <CommonCrypto/CommonDigest.h> -#elif defined(SHA1_OPENSSL) -#include <openssl/sha.h> -#elif defined(SHA1_DC) #include "sha1dc_git.h" -#else /* SHA1_BLK */ -#include "block-sha1/sha1.h" -#endif #if defined(SHA256_GCRYPT) #include "sha256/gcrypt.h" @@ -23,20 +13,6 @@ #include "sha256/block/sha256.h" #endif -#ifndef platform_SHA_CTX -/* - * platform's underlying implementation of SHA-1; could be OpenSSL, - * blk_SHA, Apple CommonCrypto, etc... Note that the relevant - * SHA-1 header may have already defined platform_SHA_CTX for our - * own implementations like block-sha1 and ppc-sha1, so we list - * the default for OpenSSL compatible SHA-1 implementations here. - */ -#define platform_SHA_CTX SHA_CTX -#define platform_SHA1_Init SHA1_Init -#define platform_SHA1_Update SHA1_Update -#define platform_SHA1_Final SHA1_Final -#endif - #define git_SHA_CTX platform_SHA_CTX #define git_SHA1_Init platform_SHA1_Init #define git_SHA1_Update platform_SHA1_Update diff --git a/ppc/sha1.c b/ppc/sha1.c deleted file mode 100644 index 1b705cee1f..0000000000 --- a/ppc/sha1.c +++ /dev/null @@ -1,72 +0,0 @@ -/* - * SHA-1 implementation. - * - * Copyright (C) 2005 Paul Mackerras <paulus@samba.org> - * - * This version assumes we are running on a big-endian machine. - * It calls an external sha1_core() to process blocks of 64 bytes. - */ -#include <stdio.h> -#include <string.h> -#include "sha1.h" - -void ppc_sha1_core(uint32_t *hash, const unsigned char *p, - unsigned int nblocks); - -int ppc_SHA1_Init(ppc_SHA_CTX *c) -{ - c->hash[0] = 0x67452301; - c->hash[1] = 0xEFCDAB89; - c->hash[2] = 0x98BADCFE; - c->hash[3] = 0x10325476; - c->hash[4] = 0xC3D2E1F0; - c->len = 0; - c->cnt = 0; - return 0; -} - -int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *ptr, unsigned long n) -{ - unsigned long nb; - const unsigned char *p = ptr; - - c->len += (uint64_t) n << 3; - while (n != 0) { - if (c->cnt || n < 64) { - nb = 64 - c->cnt; - if (nb > n) - nb = n; - memcpy(&c->buf.b[c->cnt], p, nb); - if ((c->cnt += nb) == 64) { - ppc_sha1_core(c->hash, c->buf.b, 1); - c->cnt = 0; - } - } else { - nb = n >> 6; - ppc_sha1_core(c->hash, p, nb); - nb <<= 6; - } - n -= nb; - p += nb; - } - return 0; -} - -int ppc_SHA1_Final(unsigned char *hash, ppc_SHA_CTX *c) -{ - unsigned int cnt = c->cnt; - - c->buf.b[cnt++] = 0x80; - if (cnt > 56) { - if (cnt < 64) - memset(&c->buf.b[cnt], 0, 64 - cnt); - ppc_sha1_core(c->hash, c->buf.b, 1); - cnt = 0; - } - if (cnt < 56) - memset(&c->buf.b[cnt], 0, 56 - cnt); - c->buf.l[7] = c->len; - ppc_sha1_core(c->hash, c->buf.b, 1); - memcpy(hash, c->hash, 20); - return 0; -} diff --git a/ppc/sha1.h b/ppc/sha1.h deleted file mode 100644 index 9b24b32615..0000000000 --- a/ppc/sha1.h +++ /dev/null @@ -1,25 +0,0 @@ -/* - * SHA-1 implementation. - * - * Copyright (C) 2005 Paul Mackerras <paulus@samba.org> - */ -#include <stdint.h> - -typedef struct { - uint32_t hash[5]; - uint32_t cnt; - uint64_t len; - union { - unsigned char b[64]; - uint64_t l[8]; - } buf; -} ppc_SHA_CTX; - -int ppc_SHA1_Init(ppc_SHA_CTX *c); -int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *p, unsigned long n); -int ppc_SHA1_Final(unsigned char *hash, ppc_SHA_CTX *c); - -#define platform_SHA_CTX ppc_SHA_CTX -#define platform_SHA1_Init ppc_SHA1_Init -#define platform_SHA1_Update ppc_SHA1_Update -#define platform_SHA1_Final ppc_SHA1_Final diff --git a/ppc/sha1ppc.S b/ppc/sha1ppc.S deleted file mode 100644 index 1711eef6e7..0000000000 --- a/ppc/sha1ppc.S +++ /dev/null @@ -1,224 +0,0 @@ -/* - * SHA-1 implementation for PowerPC. - * - * Copyright (C) 2005 Paul Mackerras <paulus@samba.org> - */ - -/* - * PowerPC calling convention: - * %r0 - volatile temp - * %r1 - stack pointer. - * %r2 - reserved - * %r3-%r12 - Incoming arguments & return values; volatile. - * %r13-%r31 - Callee-save registers - * %lr - Return address, volatile - * %ctr - volatile - * - * Register usage in this routine: - * %r0 - temp - * %r3 - argument (pointer to 5 words of SHA state) - * %r4 - argument (pointer to data to hash) - * %r5 - Constant K in SHA round (initially number of blocks to hash) - * %r6-%r10 - Working copies of SHA variables A..E (actually E..A order) - * %r11-%r26 - Data being hashed W[]. - * %r27-%r31 - Previous copies of A..E, for final add back. - * %ctr - loop count - */ - - -/* - * We roll the registers for A, B, C, D, E around on each - * iteration; E on iteration t is D on iteration t+1, and so on. - * We use registers 6 - 10 for this. (Registers 27 - 31 hold - * the previous values.) - */ -#define RA(t) (((t)+4)%5+6) -#define RB(t) (((t)+3)%5+6) -#define RC(t) (((t)+2)%5+6) -#define RD(t) (((t)+1)%5+6) -#define RE(t) (((t)+0)%5+6) - -/* We use registers 11 - 26 for the W values */ -#define W(t) ((t)%16+11) - -/* Register 5 is used for the constant k */ - -/* - * The basic SHA-1 round function is: - * E += ROTL(A,5) + F(B,C,D) + W[i] + K; B = ROTL(B,30) - * Then the variables are renamed: (A,B,C,D,E) = (E,A,B,C,D). - * - * Every 20 rounds, the function F() and the constant K changes: - * - 20 rounds of f0(b,c,d) = "bit wise b ? c : d" = (^b & d) + (b & c) - * - 20 rounds of f1(b,c,d) = b^c^d = (b^d)^c - * - 20 rounds of f2(b,c,d) = majority(b,c,d) = (b&d) + ((b^d)&c) - * - 20 more rounds of f1(b,c,d) - * - * These are all scheduled for near-optimal performance on a G4. - * The G4 is a 3-issue out-of-order machine with 3 ALUs, but it can only - * *consider* starting the oldest 3 instructions per cycle. So to get - * maximum performance out of it, you have to treat it as an in-order - * machine. Which means interleaving the computation round t with the - * computation of W[t+4]. - * - * The first 16 rounds use W values loaded directly from memory, while the - * remaining 64 use values computed from those first 16. We preload - * 4 values before starting, so there are three kinds of rounds: - * - The first 12 (all f0) also load the W values from memory. - * - The next 64 compute W(i+4) in parallel. 8*f0, 20*f1, 20*f2, 16*f1. - * - The last 4 (all f1) do not do anything with W. - * - * Therefore, we have 6 different round functions: - * STEPD0_LOAD(t,s) - Perform round t and load W(s). s < 16 - * STEPD0_UPDATE(t,s) - Perform round t and compute W(s). s >= 16. - * STEPD1_UPDATE(t,s) - * STEPD2_UPDATE(t,s) - * STEPD1(t) - Perform round t with no load or update. - * - * The G5 is more fully out-of-order, and can find the parallelism - * by itself. The big limit is that it has a 2-cycle ALU latency, so - * even though it's 2-way, the code has to be scheduled as if it's - * 4-way, which can be a limit. To help it, we try to schedule the - * read of RA(t) as late as possible so it doesn't stall waiting for - * the previous round's RE(t-1), and we try to rotate RB(t) as early - * as possible while reading RC(t) (= RB(t-1)) as late as possible. - */ - -/* the initial loads. */ -#define LOADW(s) \ - lwz W(s),(s)*4(%r4) - -/* - * Perform a step with F0, and load W(s). Uses W(s) as a temporary - * before loading it. - * This is actually 10 instructions, which is an awkward fit. - * It can execute grouped as listed, or delayed one instruction. - * (If delayed two instructions, there is a stall before the start of the - * second line.) Thus, two iterations take 7 cycles, 3.5 cycles per round. - */ -#define STEPD0_LOAD(t,s) \ -add RE(t),RE(t),W(t); andc %r0,RD(t),RB(t); and W(s),RC(t),RB(t); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; rotlwi RB(t),RB(t),30; \ -add RE(t),RE(t),W(s); add %r0,%r0,%r5; lwz W(s),(s)*4(%r4); \ -add RE(t),RE(t),%r0 - -/* - * This is likewise awkward, 13 instructions. However, it can also - * execute starting with 2 out of 3 possible moduli, so it does 2 rounds - * in 9 cycles, 4.5 cycles/round. - */ -#define STEPD0_UPDATE(t,s,loadk...) \ -add RE(t),RE(t),W(t); andc %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \ -add RE(t),RE(t),%r0; and %r0,RC(t),RB(t); xor W(s),W(s),W((s)-8); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; xor W(s),W(s),W((s)-14); \ -add RE(t),RE(t),%r5; loadk; rotlwi RB(t),RB(t),30; rotlwi W(s),W(s),1; \ -add RE(t),RE(t),%r0 - -/* Nicely optimal. Conveniently, also the most common. */ -#define STEPD1_UPDATE(t,s,loadk...) \ -add RE(t),RE(t),W(t); xor %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \ -add RE(t),RE(t),%r5; loadk; xor %r0,%r0,RC(t); xor W(s),W(s),W((s)-8); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; xor W(s),W(s),W((s)-14); \ -add RE(t),RE(t),%r0; rotlwi RB(t),RB(t),30; rotlwi W(s),W(s),1 - -/* - * The naked version, no UPDATE, for the last 4 rounds. 3 cycles per. - * We could use W(s) as a temp register, but we don't need it. - */ -#define STEPD1(t) \ - add RE(t),RE(t),W(t); xor %r0,RD(t),RB(t); \ -rotlwi RB(t),RB(t),30; add RE(t),RE(t),%r5; xor %r0,%r0,RC(t); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; /* spare slot */ \ -add RE(t),RE(t),%r0 - -/* - * 14 instructions, 5 cycles per. The majority function is a bit - * awkward to compute. This can execute with a 1-instruction delay, - * but it causes a 2-instruction delay, which triggers a stall. - */ -#define STEPD2_UPDATE(t,s,loadk...) \ -add RE(t),RE(t),W(t); and %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \ -add RE(t),RE(t),%r0; xor %r0,RD(t),RB(t); xor W(s),W(s),W((s)-8); \ -add RE(t),RE(t),%r5; loadk; and %r0,%r0,RC(t); xor W(s),W(s),W((s)-14); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; rotlwi W(s),W(s),1; \ -add RE(t),RE(t),%r0; rotlwi RB(t),RB(t),30 - -#define STEP0_LOAD4(t,s) \ - STEPD0_LOAD(t,s); \ - STEPD0_LOAD((t+1),(s)+1); \ - STEPD0_LOAD((t)+2,(s)+2); \ - STEPD0_LOAD((t)+3,(s)+3) - -#define STEPUP4(fn, t, s, loadk...) \ - STEP##fn##_UPDATE(t,s,); \ - STEP##fn##_UPDATE((t)+1,(s)+1,); \ - STEP##fn##_UPDATE((t)+2,(s)+2,); \ - STEP##fn##_UPDATE((t)+3,(s)+3,loadk) - -#define STEPUP20(fn, t, s, loadk...) \ - STEPUP4(fn, t, s,); \ - STEPUP4(fn, (t)+4, (s)+4,); \ - STEPUP4(fn, (t)+8, (s)+8,); \ - STEPUP4(fn, (t)+12, (s)+12,); \ - STEPUP4(fn, (t)+16, (s)+16, loadk) - - .globl ppc_sha1_core -ppc_sha1_core: - stwu %r1,-80(%r1) - stmw %r13,4(%r1) - - /* Load up A - E */ - lmw %r27,0(%r3) - - mtctr %r5 - -1: - LOADW(0) - lis %r5,0x5a82 - mr RE(0),%r31 - LOADW(1) - mr RD(0),%r30 - mr RC(0),%r29 - LOADW(2) - ori %r5,%r5,0x7999 /* K0-19 */ - mr RB(0),%r28 - LOADW(3) - mr RA(0),%r27 - - STEP0_LOAD4(0, 4) - STEP0_LOAD4(4, 8) - STEP0_LOAD4(8, 12) - STEPUP4(D0, 12, 16,) - STEPUP4(D0, 16, 20, lis %r5,0x6ed9) - - ori %r5,%r5,0xeba1 /* K20-39 */ - STEPUP20(D1, 20, 24, lis %r5,0x8f1b) - - ori %r5,%r5,0xbcdc /* K40-59 */ - STEPUP20(D2, 40, 44, lis %r5,0xca62) - - ori %r5,%r5,0xc1d6 /* K60-79 */ - STEPUP4(D1, 60, 64,) - STEPUP4(D1, 64, 68,) - STEPUP4(D1, 68, 72,) - STEPUP4(D1, 72, 76,) - addi %r4,%r4,64 - STEPD1(76) - STEPD1(77) - STEPD1(78) - STEPD1(79) - - /* Add results to original values */ - add %r31,%r31,RE(0) - add %r30,%r30,RD(0) - add %r29,%r29,RC(0) - add %r28,%r28,RB(0) - add %r27,%r27,RA(0) - - bdnz 1b - - /* Save final hash, restore registers, and return */ - stmw %r27,0(%r3) - lmw %r13,4(%r1) - addi %r1,%r1,80 - blr -- 2.24.0.424.g559c6fc317.dirty ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] Remove non-SHA1dc sha1 implementations 2020-02-23 22:37 ` [PATCH] Remove non-SHA1dc sha1 implementations Mike Hommey @ 2020-02-24 4:47 ` Jeff King 2020-02-24 4:52 ` Jeff King 2022-03-19 1:02 ` [PATCH] ppc: remove custom SHA-1 implementation Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 17+ messages in thread From: Jeff King @ 2020-02-24 4:47 UTC (permalink / raw) To: Mike Hommey; +Cc: git, gitster On Mon, Feb 24, 2020 at 07:37:58AM +0900, Mike Hommey wrote: > It is 2020, and with the weakening of SHA1 security-wise, there doesn't > seem to be a reason to support anything else than SHA1dc, with collision > detection. One possible reason is that they're way faster than sha1dc (block-sha1 maybe only a little, but openssl's sha1 is over twice as fast). To be clear, I think the slowdown is worth the extra safety, but: - do we still want to care about people who prefer to make the tradeoff differently? - when we first switched the default to sha1dc, the idea was raised of continuing to use a faster implementation for non-security checksums (e.g., the checksums at the end of packfiles, index files, etc). I don't think anybody ever implemented that, but it's not a terrible idea. OTOH, if nobody noticed the bottleneck enough to care, maybe it's not worth worrying about. I'm not convinced the answer to those questions is "yes", but I think it's worth at least raising them (and arguing against them in the commit message). One thing that compels me is the recent report that we still build with common crypto by default on macOS, which was definitely _not_ intended. That's a bug that can be fixed, but it wouldn't have happened in the first place if we only supported sha1dc. -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Remove non-SHA1dc sha1 implementations 2020-02-24 4:47 ` Jeff King @ 2020-02-24 4:52 ` Jeff King 0 siblings, 0 replies; 17+ messages in thread From: Jeff King @ 2020-02-24 4:52 UTC (permalink / raw) To: Mike Hommey; +Cc: git, gitster On Sun, Feb 23, 2020 at 11:47:32PM -0500, Jeff King wrote: > One thing that compels me is the recent report that we still build with > common crypto by default on macOS, which was definitely _not_ intended. > That's a bug that can be fixed, but it wouldn't have happened in the > first place if we only supported sha1dc. I just noticed you were the original reporter there, too. So I guess it compelled you, too. ;) If we do want to keep the other implementations around, another thing that might be worth doing is to teach t0013 to complain when the collision-detecting sha1 is not in use (i.e., rather than auto-skipping when built without DC_SHA1, require the user to set a special NO_REALLY_I_CHOOSE_NOT_TO_USE_DC_SHA1_AND_AM_AWARE_OF_THE_IMPLICATIONS variable). That would provide a cross-check on the build flags. -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] ppc: remove custom SHA-1 implementation 2020-02-23 22:37 ` [PATCH] Remove non-SHA1dc sha1 implementations Mike Hommey 2020-02-24 4:47 ` Jeff King @ 2022-03-19 1:02 ` Ævar Arnfjörð Bjarmason 2022-03-21 16:39 ` Junio C Hamano 2022-03-21 17:06 ` [PATCH v2] " Ævar Arnfjörð Bjarmason 1 sibling, 2 replies; 17+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-03-19 1:02 UTC (permalink / raw) To: git Cc: Junio C Hamano, Mike Hommey, brian m . carlson, Carlo Marcelo Arenas Belón, Taylor Blau, Ævar Arnfjörð Bjarmason Remove the PPC_SHA1 implementation added in a6ef3518f9a ([PATCH] PPC assembly implementation of SHA1, 2005-04-22). When this was added Apple consumer hardware used the PPC architecture, and the implementation was intended to improve SHA-1 speed there. Since it was added we've moved to DC_SHA1 by default, and anyone wanting hard-rolled non-DC SHA-1 implementation can use OpenSSL's via the OPENSSL_SHA1 knob. I'm unsure if this was ever supposed to work on 64-bit PPC. It clearly originally targeted 32 bit PPC, but there's some mailing list references to this being tried on G5 (PPC 970). I can't get it to do anything but segfault on the BE POWER8 machine in the GCC compile farm. Anyone caring about speed on PPC these days is likely to be using IBM's POWER, not PPC 970. There have been proposals to entirely remove non-DC_SHA1 implementations from the tree[1]. I think per [2] that would be a bit overzealous. I.e. there are various set-ups git's speed is going to be more important than the relatively implausible SHA-1 collision attack, or where such attacks are entirely mitigated by other means (e.g. by incoming objects being checked with DC_SHA1). The main reason for doing so at this point is to simplify follow-up Makefile change. Since PPC_SHA1 included the only in-tree *.S assembly file we needed to keep around special support for building objects from it. By getting rid of it we know we'll always build *.o from *.c files, which makes the build process simpler. As an aside the code being removed here was also throwing warnings with the "-pedantic" flag, but let's remove it instead of fixing it, as 544d93bc3b4 (block-sha1: remove use of obsolete x86 assembly, 2022-03-10) did for block-sha1/*. 1. https://lore.kernel.org/git/20200223223758.120941-1-mh@glandium.org/ 2. https://lore.kernel.org/git/20200224044732.GK1018190@coredump.intra.peff.net/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- INSTALL | 3 +- Makefile | 35 ++++---- configure.ac | 3 - hash.h | 6 +- ppc/sha1.c | 72 ---------------- ppc/sha1.h | 25 ------ ppc/sha1ppc.S | 224 -------------------------------------------------- 7 files changed, 17 insertions(+), 351 deletions(-) delete mode 100644 ppc/sha1.c delete mode 100644 ppc/sha1.h delete mode 100644 ppc/sha1ppc.S diff --git a/INSTALL b/INSTALL index 4140a3f5c8b..89b15d71df5 100644 --- a/INSTALL +++ b/INSTALL @@ -135,8 +135,7 @@ Issues of note: By default, git uses OpenSSL for SHA1 but it will use its own library (inspired by Mozilla's) with either NO_OPENSSL or - BLK_SHA1. Also included is a version optimized for PowerPC - (PPC_SHA1). + BLK_SHA1. - "libcurl" library is used for fetching and pushing repositories over http:// or https://, as well as by diff --git a/Makefile b/Makefile index 70f0a004e75..965e51f773e 100644 --- a/Makefile +++ b/Makefile @@ -155,9 +155,6 @@ include shared.mak # Define BLK_SHA1 environment variable to make use of the bundled # optimized C SHA1 routine. # -# Define PPC_SHA1 environment variable when running make to make use of -# a bundled SHA1 routine optimized for PowerPC. -# # Define DC_SHA1 to unconditionally enable the collision-detecting sha1 # algorithm. This is slower, but may detect attempted collision attacks. # Takes priority over other *_SHA1 knobs. @@ -1770,14 +1767,14 @@ ifdef OPENSSL_SHA1 EXTLIBS += $(LIB_4_CRYPTO) BASIC_CFLAGS += -DSHA1_OPENSSL else +ifdef PPC_SHA1 +$(error PPC_SHA1 has been removed! You should almost definitely remove that \ +knob and use the DC_SHA1 default! See INSTALL for more information) +endif ifdef BLK_SHA1 LIB_OBJS += block-sha1/sha1.o BASIC_CFLAGS += -DSHA1_BLK else -ifdef PPC_SHA1 - LIB_OBJS += ppc/sha1.o ppc/sha1ppc.o - BASIC_CFLAGS += -DSHA1_PPC -else ifdef APPLE_COMMON_CRYPTO COMPAT_CFLAGS += -DCOMMON_DIGEST_FOR_OPENSSL BASIC_CFLAGS += -DSHA1_APPLE @@ -1811,7 +1808,6 @@ endif endif endif endif -endif ifdef OPENSSL_SHA256 EXTLIBS += $(LIB_4_CRYPTO) @@ -2509,6 +2505,11 @@ OBJECTS += $(SCALAR_OBJECTS) .PHONY: objects objects: $(OBJECTS) +# Derived from $(OBJECTS) +OBJECTS_C = $(OBJECTS:%.o=%.c) +OBJECTS_S = $(OBJECTS:%.o=%.s) +OBJECTS_SP = $(OBJECTS:%.o=%.sp) + dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d) dep_dirs := $(addsuffix .depend,$(sort $(dir $(OBJECTS)))) @@ -2540,13 +2541,7 @@ missing_compdb_dir = compdb_args = endif -ASM_SRC := $(wildcard $(OBJECTS:o=S)) -ASM_OBJ := $(ASM_SRC:S=o) -C_OBJ := $(filter-out $(ASM_OBJ),$(OBJECTS)) - -$(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) - $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< -$(ASM_OBJ): %.o: %.S GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) +$(OBJECTS): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< %.s: %.c GIT-CFLAGS FORCE @@ -2692,7 +2687,7 @@ XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \ --keyword=gettextln --keyword=eval_gettextln XGETTEXT_FLAGS_PERL = $(XGETTEXT_FLAGS) --language=Perl \ --keyword=__ --keyword=N__ --keyword="__n:1,2" -LOCALIZED_C = $(C_OBJ:o=c) $(LIB_H) $(GENERATED_H) +LOCALIZED_C = $(OBJECTS_C) $(LIB_H) $(GENERATED_H) LOCALIZED_SH = $(SCRIPT_SH) LOCALIZED_SH += git-sh-setup.sh LOCALIZED_PERL = $(SCRIPT_PERL) @@ -2939,16 +2934,14 @@ t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS) $(REFTABLE_TEST_LIB) check-sha1:: t/helper/test-tool$X t/helper/test-sha1.sh -SP_OBJ = $(patsubst %.o,%.sp,$(C_OBJ)) - -$(SP_OBJ): %.sp: %.c %.o +$(OBJECTS_SP): %.sp: %.c %.o $(QUIET_SP)cgcc -no-compile $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) \ -Wsparse-error \ $(SPARSE_FLAGS) $(SP_EXTRA_FLAGS) $< && \ >$@ .PHONY: sparse -sparse: $(SP_OBJ) +sparse: $(OBJECTS_SP) EXCEPT_HDRS := $(GENERATED_H) unicode-width.h compat/% xdiff/% ifndef GCRYPT_SHA256 @@ -3272,7 +3265,7 @@ clean: profile-clean coverage-clean cocciclean $(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X $(RM) $(TEST_PROGRAMS) $(RM) $(FUZZ_PROGRAMS) - $(RM) $(SP_OBJ) + $(RM) $(OBJECTS_SP) $(RM) $(HCC) $(RM) -r bin-wrappers $(dep_dirs) $(compdb_dir) compile_commands.json $(RM) -r po/build/ diff --git a/configure.ac b/configure.ac index 5ee25ec95c8..9c75b00d3eb 100644 --- a/configure.ac +++ b/configure.ac @@ -237,9 +237,6 @@ AC_MSG_NOTICE([CHECKS for site configuration]) # tests. These tests take up a significant amount of the total test time # but are not needed unless you plan to talk to SVN repos. # -# Define PPC_SHA1 environment variable when running make to make use of -# a bundled SHA1 routine optimized for PowerPC. -# # Define NO_OPENSSL environment variable if you do not have OpenSSL. # # Define OPENSSLDIR=/foo/bar if your openssl header and library files are in diff --git a/hash.h b/hash.h index 5d40368f18a..efc14c5f56d 100644 --- a/hash.h +++ b/hash.h @@ -4,9 +4,7 @@ #include "git-compat-util.h" #include "repository.h" -#if defined(SHA1_PPC) -#include "ppc/sha1.h" -#elif defined(SHA1_APPLE) +#if defined(SHA1_APPLE) #include <CommonCrypto/CommonDigest.h> #elif defined(SHA1_OPENSSL) #include <openssl/sha.h> @@ -30,7 +28,7 @@ * platform's underlying implementation of SHA-1; could be OpenSSL, * blk_SHA, Apple CommonCrypto, etc... Note that the relevant * SHA-1 header may have already defined platform_SHA_CTX for our - * own implementations like block-sha1 and ppc-sha1, so we list + * own implementations like block-sha1, so we list * the default for OpenSSL compatible SHA-1 implementations here. */ #define platform_SHA_CTX SHA_CTX diff --git a/ppc/sha1.c b/ppc/sha1.c deleted file mode 100644 index 1b705cee1fe..00000000000 --- a/ppc/sha1.c +++ /dev/null @@ -1,72 +0,0 @@ -/* - * SHA-1 implementation. - * - * Copyright (C) 2005 Paul Mackerras <paulus@samba.org> - * - * This version assumes we are running on a big-endian machine. - * It calls an external sha1_core() to process blocks of 64 bytes. - */ -#include <stdio.h> -#include <string.h> -#include "sha1.h" - -void ppc_sha1_core(uint32_t *hash, const unsigned char *p, - unsigned int nblocks); - -int ppc_SHA1_Init(ppc_SHA_CTX *c) -{ - c->hash[0] = 0x67452301; - c->hash[1] = 0xEFCDAB89; - c->hash[2] = 0x98BADCFE; - c->hash[3] = 0x10325476; - c->hash[4] = 0xC3D2E1F0; - c->len = 0; - c->cnt = 0; - return 0; -} - -int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *ptr, unsigned long n) -{ - unsigned long nb; - const unsigned char *p = ptr; - - c->len += (uint64_t) n << 3; - while (n != 0) { - if (c->cnt || n < 64) { - nb = 64 - c->cnt; - if (nb > n) - nb = n; - memcpy(&c->buf.b[c->cnt], p, nb); - if ((c->cnt += nb) == 64) { - ppc_sha1_core(c->hash, c->buf.b, 1); - c->cnt = 0; - } - } else { - nb = n >> 6; - ppc_sha1_core(c->hash, p, nb); - nb <<= 6; - } - n -= nb; - p += nb; - } - return 0; -} - -int ppc_SHA1_Final(unsigned char *hash, ppc_SHA_CTX *c) -{ - unsigned int cnt = c->cnt; - - c->buf.b[cnt++] = 0x80; - if (cnt > 56) { - if (cnt < 64) - memset(&c->buf.b[cnt], 0, 64 - cnt); - ppc_sha1_core(c->hash, c->buf.b, 1); - cnt = 0; - } - if (cnt < 56) - memset(&c->buf.b[cnt], 0, 56 - cnt); - c->buf.l[7] = c->len; - ppc_sha1_core(c->hash, c->buf.b, 1); - memcpy(hash, c->hash, 20); - return 0; -} diff --git a/ppc/sha1.h b/ppc/sha1.h deleted file mode 100644 index 9b24b326159..00000000000 --- a/ppc/sha1.h +++ /dev/null @@ -1,25 +0,0 @@ -/* - * SHA-1 implementation. - * - * Copyright (C) 2005 Paul Mackerras <paulus@samba.org> - */ -#include <stdint.h> - -typedef struct { - uint32_t hash[5]; - uint32_t cnt; - uint64_t len; - union { - unsigned char b[64]; - uint64_t l[8]; - } buf; -} ppc_SHA_CTX; - -int ppc_SHA1_Init(ppc_SHA_CTX *c); -int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *p, unsigned long n); -int ppc_SHA1_Final(unsigned char *hash, ppc_SHA_CTX *c); - -#define platform_SHA_CTX ppc_SHA_CTX -#define platform_SHA1_Init ppc_SHA1_Init -#define platform_SHA1_Update ppc_SHA1_Update -#define platform_SHA1_Final ppc_SHA1_Final diff --git a/ppc/sha1ppc.S b/ppc/sha1ppc.S deleted file mode 100644 index 1711eef6e71..00000000000 --- a/ppc/sha1ppc.S +++ /dev/null @@ -1,224 +0,0 @@ -/* - * SHA-1 implementation for PowerPC. - * - * Copyright (C) 2005 Paul Mackerras <paulus@samba.org> - */ - -/* - * PowerPC calling convention: - * %r0 - volatile temp - * %r1 - stack pointer. - * %r2 - reserved - * %r3-%r12 - Incoming arguments & return values; volatile. - * %r13-%r31 - Callee-save registers - * %lr - Return address, volatile - * %ctr - volatile - * - * Register usage in this routine: - * %r0 - temp - * %r3 - argument (pointer to 5 words of SHA state) - * %r4 - argument (pointer to data to hash) - * %r5 - Constant K in SHA round (initially number of blocks to hash) - * %r6-%r10 - Working copies of SHA variables A..E (actually E..A order) - * %r11-%r26 - Data being hashed W[]. - * %r27-%r31 - Previous copies of A..E, for final add back. - * %ctr - loop count - */ - - -/* - * We roll the registers for A, B, C, D, E around on each - * iteration; E on iteration t is D on iteration t+1, and so on. - * We use registers 6 - 10 for this. (Registers 27 - 31 hold - * the previous values.) - */ -#define RA(t) (((t)+4)%5+6) -#define RB(t) (((t)+3)%5+6) -#define RC(t) (((t)+2)%5+6) -#define RD(t) (((t)+1)%5+6) -#define RE(t) (((t)+0)%5+6) - -/* We use registers 11 - 26 for the W values */ -#define W(t) ((t)%16+11) - -/* Register 5 is used for the constant k */ - -/* - * The basic SHA-1 round function is: - * E += ROTL(A,5) + F(B,C,D) + W[i] + K; B = ROTL(B,30) - * Then the variables are renamed: (A,B,C,D,E) = (E,A,B,C,D). - * - * Every 20 rounds, the function F() and the constant K changes: - * - 20 rounds of f0(b,c,d) = "bit wise b ? c : d" = (^b & d) + (b & c) - * - 20 rounds of f1(b,c,d) = b^c^d = (b^d)^c - * - 20 rounds of f2(b,c,d) = majority(b,c,d) = (b&d) + ((b^d)&c) - * - 20 more rounds of f1(b,c,d) - * - * These are all scheduled for near-optimal performance on a G4. - * The G4 is a 3-issue out-of-order machine with 3 ALUs, but it can only - * *consider* starting the oldest 3 instructions per cycle. So to get - * maximum performance out of it, you have to treat it as an in-order - * machine. Which means interleaving the computation round t with the - * computation of W[t+4]. - * - * The first 16 rounds use W values loaded directly from memory, while the - * remaining 64 use values computed from those first 16. We preload - * 4 values before starting, so there are three kinds of rounds: - * - The first 12 (all f0) also load the W values from memory. - * - The next 64 compute W(i+4) in parallel. 8*f0, 20*f1, 20*f2, 16*f1. - * - The last 4 (all f1) do not do anything with W. - * - * Therefore, we have 6 different round functions: - * STEPD0_LOAD(t,s) - Perform round t and load W(s). s < 16 - * STEPD0_UPDATE(t,s) - Perform round t and compute W(s). s >= 16. - * STEPD1_UPDATE(t,s) - * STEPD2_UPDATE(t,s) - * STEPD1(t) - Perform round t with no load or update. - * - * The G5 is more fully out-of-order, and can find the parallelism - * by itself. The big limit is that it has a 2-cycle ALU latency, so - * even though it's 2-way, the code has to be scheduled as if it's - * 4-way, which can be a limit. To help it, we try to schedule the - * read of RA(t) as late as possible so it doesn't stall waiting for - * the previous round's RE(t-1), and we try to rotate RB(t) as early - * as possible while reading RC(t) (= RB(t-1)) as late as possible. - */ - -/* the initial loads. */ -#define LOADW(s) \ - lwz W(s),(s)*4(%r4) - -/* - * Perform a step with F0, and load W(s). Uses W(s) as a temporary - * before loading it. - * This is actually 10 instructions, which is an awkward fit. - * It can execute grouped as listed, or delayed one instruction. - * (If delayed two instructions, there is a stall before the start of the - * second line.) Thus, two iterations take 7 cycles, 3.5 cycles per round. - */ -#define STEPD0_LOAD(t,s) \ -add RE(t),RE(t),W(t); andc %r0,RD(t),RB(t); and W(s),RC(t),RB(t); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; rotlwi RB(t),RB(t),30; \ -add RE(t),RE(t),W(s); add %r0,%r0,%r5; lwz W(s),(s)*4(%r4); \ -add RE(t),RE(t),%r0 - -/* - * This is likewise awkward, 13 instructions. However, it can also - * execute starting with 2 out of 3 possible moduli, so it does 2 rounds - * in 9 cycles, 4.5 cycles/round. - */ -#define STEPD0_UPDATE(t,s,loadk...) \ -add RE(t),RE(t),W(t); andc %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \ -add RE(t),RE(t),%r0; and %r0,RC(t),RB(t); xor W(s),W(s),W((s)-8); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; xor W(s),W(s),W((s)-14); \ -add RE(t),RE(t),%r5; loadk; rotlwi RB(t),RB(t),30; rotlwi W(s),W(s),1; \ -add RE(t),RE(t),%r0 - -/* Nicely optimal. Conveniently, also the most common. */ -#define STEPD1_UPDATE(t,s,loadk...) \ -add RE(t),RE(t),W(t); xor %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \ -add RE(t),RE(t),%r5; loadk; xor %r0,%r0,RC(t); xor W(s),W(s),W((s)-8); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; xor W(s),W(s),W((s)-14); \ -add RE(t),RE(t),%r0; rotlwi RB(t),RB(t),30; rotlwi W(s),W(s),1 - -/* - * The naked version, no UPDATE, for the last 4 rounds. 3 cycles per. - * We could use W(s) as a temp register, but we don't need it. - */ -#define STEPD1(t) \ - add RE(t),RE(t),W(t); xor %r0,RD(t),RB(t); \ -rotlwi RB(t),RB(t),30; add RE(t),RE(t),%r5; xor %r0,%r0,RC(t); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; /* spare slot */ \ -add RE(t),RE(t),%r0 - -/* - * 14 instructions, 5 cycles per. The majority function is a bit - * awkward to compute. This can execute with a 1-instruction delay, - * but it causes a 2-instruction delay, which triggers a stall. - */ -#define STEPD2_UPDATE(t,s,loadk...) \ -add RE(t),RE(t),W(t); and %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \ -add RE(t),RE(t),%r0; xor %r0,RD(t),RB(t); xor W(s),W(s),W((s)-8); \ -add RE(t),RE(t),%r5; loadk; and %r0,%r0,RC(t); xor W(s),W(s),W((s)-14); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; rotlwi W(s),W(s),1; \ -add RE(t),RE(t),%r0; rotlwi RB(t),RB(t),30 - -#define STEP0_LOAD4(t,s) \ - STEPD0_LOAD(t,s); \ - STEPD0_LOAD((t+1),(s)+1); \ - STEPD0_LOAD((t)+2,(s)+2); \ - STEPD0_LOAD((t)+3,(s)+3) - -#define STEPUP4(fn, t, s, loadk...) \ - STEP##fn##_UPDATE(t,s,); \ - STEP##fn##_UPDATE((t)+1,(s)+1,); \ - STEP##fn##_UPDATE((t)+2,(s)+2,); \ - STEP##fn##_UPDATE((t)+3,(s)+3,loadk) - -#define STEPUP20(fn, t, s, loadk...) \ - STEPUP4(fn, t, s,); \ - STEPUP4(fn, (t)+4, (s)+4,); \ - STEPUP4(fn, (t)+8, (s)+8,); \ - STEPUP4(fn, (t)+12, (s)+12,); \ - STEPUP4(fn, (t)+16, (s)+16, loadk) - - .globl ppc_sha1_core -ppc_sha1_core: - stwu %r1,-80(%r1) - stmw %r13,4(%r1) - - /* Load up A - E */ - lmw %r27,0(%r3) - - mtctr %r5 - -1: - LOADW(0) - lis %r5,0x5a82 - mr RE(0),%r31 - LOADW(1) - mr RD(0),%r30 - mr RC(0),%r29 - LOADW(2) - ori %r5,%r5,0x7999 /* K0-19 */ - mr RB(0),%r28 - LOADW(3) - mr RA(0),%r27 - - STEP0_LOAD4(0, 4) - STEP0_LOAD4(4, 8) - STEP0_LOAD4(8, 12) - STEPUP4(D0, 12, 16,) - STEPUP4(D0, 16, 20, lis %r5,0x6ed9) - - ori %r5,%r5,0xeba1 /* K20-39 */ - STEPUP20(D1, 20, 24, lis %r5,0x8f1b) - - ori %r5,%r5,0xbcdc /* K40-59 */ - STEPUP20(D2, 40, 44, lis %r5,0xca62) - - ori %r5,%r5,0xc1d6 /* K60-79 */ - STEPUP4(D1, 60, 64,) - STEPUP4(D1, 64, 68,) - STEPUP4(D1, 68, 72,) - STEPUP4(D1, 72, 76,) - addi %r4,%r4,64 - STEPD1(76) - STEPD1(77) - STEPD1(78) - STEPD1(79) - - /* Add results to original values */ - add %r31,%r31,RE(0) - add %r30,%r30,RD(0) - add %r29,%r29,RC(0) - add %r28,%r28,RB(0) - add %r27,%r27,RA(0) - - bdnz 1b - - /* Save final hash, restore registers, and return */ - stmw %r27,0(%r3) - lmw %r13,4(%r1) - addi %r1,%r1,80 - blr -- 2.35.1.1438.g8874c8eeb35 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] ppc: remove custom SHA-1 implementation 2022-03-19 1:02 ` [PATCH] ppc: remove custom SHA-1 implementation Ævar Arnfjörð Bjarmason @ 2022-03-21 16:39 ` Junio C Hamano 2022-03-21 17:06 ` [PATCH v2] " Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 17+ messages in thread From: Junio C Hamano @ 2022-03-21 16:39 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Mike Hommey, brian m . carlson, Carlo Marcelo Arenas Belón, Taylor Blau Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > The main reason for doing so at this point is to simplify follow-up > Makefile change. Since PPC_SHA1 included the only in-tree *.S assembly > file we needed to keep around special support for building objects > from it. By getting rid of it we know we'll always build *.o from *.c > files, which makes the build process simpler. Yuck. > diff --git a/Makefile b/Makefile > index 70f0a004e75..965e51f773e 100644 > --- a/Makefile > +++ b/Makefile > @@ -155,9 +155,6 @@ include shared.mak > # Define BLK_SHA1 environment variable to make use of the bundled > # optimized C SHA1 routine. > # > -# Define PPC_SHA1 environment variable when running make to make use of > -# a bundled SHA1 routine optimized for PowerPC. > -# > # Define DC_SHA1 to unconditionally enable the collision-detecting sha1 > # algorithm. This is slower, but may detect attempted collision attacks. > # Takes priority over other *_SHA1 knobs. > @@ -1770,14 +1767,14 @@ ifdef OPENSSL_SHA1 > EXTLIBS += $(LIB_4_CRYPTO) > BASIC_CFLAGS += -DSHA1_OPENSSL > else > +ifdef PPC_SHA1 > +$(error PPC_SHA1 has been removed! You should almost definitely remove that \ > +knob and use the DC_SHA1 default! See INSTALL for more information) > +endif "use the DC_SHA1 default"? PPC_SHA1 support is being removed. Use DC_SHA1 instead, which is the default. I am wondering if we can make only these four lines the first step of remova, without doing anything else. It would give us a good feel on how many users we may be inconveniencing (not necessarily hurting, as switching to DC_SHA1 would be a good move) by the removal. > @@ -2509,6 +2505,11 @@ OBJECTS += $(SCALAR_OBJECTS) > .PHONY: objects > objects: $(OBJECTS) > > +# Derived from $(OBJECTS) > +OBJECTS_C = $(OBJECTS:%.o=%.c) > +OBJECTS_S = $(OBJECTS:%.o=%.s) > +OBJECTS_SP = $(OBJECTS:%.o=%.sp) Usually we build objects from sources, so "derived from" is a puzzling way to call them. I understand we are deriving a list from another list, but it still feels confusing (see below for ASM_OBJ). This seems to have nothing to do with "we no longer have *.S files" and looks more like "now we have excuse to touch Makefile, make random changes that look subjectively good to me, burying them in the noise so that we can sneak them in without justifying them much in the proposed log message". > dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d) > dep_dirs := $(addsuffix .depend,$(sort $(dir $(OBJECTS)))) > > @@ -2540,13 +2541,7 @@ missing_compdb_dir = > compdb_args = > endif > > -ASM_SRC := $(wildcard $(OBJECTS:o=S)) > -ASM_OBJ := $(ASM_SRC:S=o) > -C_OBJ := $(filter-out $(ASM_OBJ),$(OBJECTS)) I tend to agree with this patch that these three lines are ugly in multiple ways. It's a confusing construct. The list of OBJECTS is used as the single source of truth and others are derived by filtering the list and futzing the suffix of the resulting subset of elements; it makes me wonder if it should be the other way around (i.e. we have a list of source files in various languages, all get turned into objects, rather than we have a list of object files, and we see if corresponding source file in possible languages, if any, exists on disk). It is pleasing that we can see them go. Unfortunately the new "Derived from $(OBJECTS) lists are in the same spirit as used by ASM_SRC we are removing here, aren't they? > -$(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) > - $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< > -$(ASM_OBJ): %.o: %.S GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) > +$(OBJECTS): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) > $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< Now we deal only with *.c sources and all objects come from them, and the action is the same as the old C_OBJ rule, naturally. > %.s: %.c GIT-CFLAGS FORCE This is the only remaining rule regarding assembly and it is the target to generate for debugging, not even used as a source to create object files. Do we need OBJECTS_S defined above? I somehow doubt it. FWIW, I do not see the need for OBJECTS_SP or OBJECTS_C, either. E.g. > @@ -2692,7 +2687,7 @@ XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \ > --keyword=gettextln --keyword=eval_gettextln > XGETTEXT_FLAGS_PERL = $(XGETTEXT_FLAGS) --language=Perl \ > --keyword=__ --keyword=N__ --keyword="__n:1,2" > -LOCALIZED_C = $(C_OBJ:o=c) $(LIB_H) $(GENERATED_H) > +LOCALIZED_C = $(OBJECTS_C) $(LIB_H) $(GENERATED_H) Shouldn't it be sufficient to use %(OBJECTS:o=c) here, i.e. equating the old C_OBJ with OBJECTS, now we know all objects come from C? The same question for SP_OBJ below, but I won't repeat. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2] ppc: remove custom SHA-1 implementation 2022-03-19 1:02 ` [PATCH] ppc: remove custom SHA-1 implementation Ævar Arnfjörð Bjarmason 2022-03-21 16:39 ` Junio C Hamano @ 2022-03-21 17:06 ` Ævar Arnfjörð Bjarmason 2022-03-21 21:19 ` brian m. carlson 2022-08-31 9:18 ` [PATCH v3 0/2] Makefile + hash.h: remove PPC_SHA1 implementation Ævar Arnfjörð Bjarmason 1 sibling, 2 replies; 17+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-03-21 17:06 UTC (permalink / raw) To: git Cc: Junio C Hamano, Mike Hommey, brian m . carlson, Carlo Marcelo Arenas Belón, Taylor Blau, Ævar Arnfjörð Bjarmason Remove the PPC_SHA1 implementation added in a6ef3518f9a ([PATCH] PPC assembly implementation of SHA1, 2005-04-22). When this was added Apple consumer hardware used the PPC architecture, and the implementation was intended to improve SHA-1 speed there. Since it was added we've moved to DC_SHA1 by default, and anyone wanting hard-rolled non-DC SHA-1 implementation can use OpenSSL's via the OPENSSL_SHA1 knob. I'm unsure if this was ever supposed to work on 64-bit PPC. It clearly originally targeted 32 bit PPC, but there's some mailing list references to this being tried on G5 (PPC 970). I can't get it to do anything but segfault on the BE POWER8 machine in the GCC compile farm. Anyone caring about speed on PPC these days is likely to be using IBM's POWER, not PPC 970. There have been proposals to entirely remove non-DC_SHA1 implementations from the tree[1]. I think per [2] that would be a bit overzealous. I.e. there are various set-ups git's speed is going to be more important than the relatively implausible SHA-1 collision attack, or where such attacks are entirely mitigated by other means (e.g. by incoming objects being checked with DC_SHA1). The main reason for doing so at this point is to simplify follow-up Makefile change. Since PPC_SHA1 included the only in-tree *.S assembly file we needed to keep around special support for building objects from it. By getting rid of it we know we'll always build *.o from *.c files, which makes the build process simpler. As an aside the code being removed here was also throwing warnings with the "-pedantic" flag, but let's remove it instead of fixing it, as 544d93bc3b4 (block-sha1: remove use of obsolete x86 assembly, 2022-03-10) did for block-sha1/*. 1. https://lore.kernel.org/git/20200223223758.120941-1-mh@glandium.org/ 2. https://lore.kernel.org/git/20200224044732.GK1018190@coredump.intra.peff.net/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- A more minimal change to the Makefile than that in v1. In v1 I thought it was a bit odd to leave in-place the C_OBJ variable that was only declared for now-removed code. But sure, we can keep it and keep defining SP_OBJ etc. as being derived from it. Range-diff against v1: 1: 05dcdca3877 ! 1: e77fd23a824 ppc: remove custom SHA-1 implementation @@ Makefile: ifdef OPENSSL_SHA1 BASIC_CFLAGS += -DSHA1_OPENSSL else +ifdef PPC_SHA1 -+$(error PPC_SHA1 has been removed! You should almost definitely remove that \ -+knob and use the DC_SHA1 default! See INSTALL for more information) ++$(error PPC_SHA1 has been removed! Use DC_SHA1 instead, which is the default) +endif ifdef BLK_SHA1 LIB_OBJS += block-sha1/sha1.o @@ Makefile: endif ifdef OPENSSL_SHA256 EXTLIBS += $(LIB_4_CRYPTO) -@@ Makefile: OBJECTS += $(SCALAR_OBJECTS) - .PHONY: objects - objects: $(OBJECTS) - -+# Derived from $(OBJECTS) -+OBJECTS_C = $(OBJECTS:%.o=%.c) -+OBJECTS_S = $(OBJECTS:%.o=%.s) -+OBJECTS_SP = $(OBJECTS:%.o=%.sp) -+ - dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d) - dep_dirs := $(addsuffix .depend,$(sort $(dir $(OBJECTS)))) - @@ Makefile: missing_compdb_dir = compdb_args = endif @@ Makefile: missing_compdb_dir = -ASM_SRC := $(wildcard $(OBJECTS:o=S)) -ASM_OBJ := $(ASM_SRC:S=o) -C_OBJ := $(filter-out $(ASM_OBJ),$(OBJECTS)) -- --$(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) -- $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< --$(ASM_OBJ): %.o: %.S GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) -+$(OBJECTS): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) ++C_OBJ = $(OBJECTS) + + $(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< +-$(ASM_OBJ): %.o: %.S GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) +- $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< %.s: %.c GIT-CFLAGS FORCE -@@ Makefile: XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \ - --keyword=gettextln --keyword=eval_gettextln - XGETTEXT_FLAGS_PERL = $(XGETTEXT_FLAGS) --language=Perl \ - --keyword=__ --keyword=N__ --keyword="__n:1,2" --LOCALIZED_C = $(C_OBJ:o=c) $(LIB_H) $(GENERATED_H) -+LOCALIZED_C = $(OBJECTS_C) $(LIB_H) $(GENERATED_H) - LOCALIZED_SH = $(SCRIPT_SH) - LOCALIZED_SH += git-sh-setup.sh - LOCALIZED_PERL = $(SCRIPT_PERL) -@@ Makefile: t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS) $(REFTABLE_TEST_LIB) - check-sha1:: t/helper/test-tool$X - t/helper/test-sha1.sh - --SP_OBJ = $(patsubst %.o,%.sp,$(C_OBJ)) -- --$(SP_OBJ): %.sp: %.c %.o -+$(OBJECTS_SP): %.sp: %.c %.o - $(QUIET_SP)cgcc -no-compile $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) \ - -Wsparse-error \ - $(SPARSE_FLAGS) $(SP_EXTRA_FLAGS) $< && \ - >$@ - - .PHONY: sparse --sparse: $(SP_OBJ) -+sparse: $(OBJECTS_SP) - - EXCEPT_HDRS := $(GENERATED_H) unicode-width.h compat/% xdiff/% - ifndef GCRYPT_SHA256 -@@ Makefile: clean: profile-clean coverage-clean cocciclean - $(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X - $(RM) $(TEST_PROGRAMS) - $(RM) $(FUZZ_PROGRAMS) -- $(RM) $(SP_OBJ) -+ $(RM) $(OBJECTS_SP) - $(RM) $(HCC) - $(RM) -r bin-wrappers $(dep_dirs) $(compdb_dir) compile_commands.json - $(RM) -r po/build/ + $(QUIET_CC)$(CC) -o $@ -S $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< ## configure.ac ## @@ configure.ac: AC_MSG_NOTICE([CHECKS for site configuration]) INSTALL | 3 +- Makefile | 17 +--- configure.ac | 3 - hash.h | 6 +- ppc/sha1.c | 72 ---------------- ppc/sha1.h | 25 ------ ppc/sha1ppc.S | 224 -------------------------------------------------- 7 files changed, 7 insertions(+), 343 deletions(-) delete mode 100644 ppc/sha1.c delete mode 100644 ppc/sha1.h delete mode 100644 ppc/sha1ppc.S diff --git a/INSTALL b/INSTALL index 4140a3f5c8b..89b15d71df5 100644 --- a/INSTALL +++ b/INSTALL @@ -135,8 +135,7 @@ Issues of note: By default, git uses OpenSSL for SHA1 but it will use its own library (inspired by Mozilla's) with either NO_OPENSSL or - BLK_SHA1. Also included is a version optimized for PowerPC - (PPC_SHA1). + BLK_SHA1. - "libcurl" library is used for fetching and pushing repositories over http:// or https://, as well as by diff --git a/Makefile b/Makefile index 70f0a004e75..33c6db5e6c9 100644 --- a/Makefile +++ b/Makefile @@ -155,9 +155,6 @@ include shared.mak # Define BLK_SHA1 environment variable to make use of the bundled # optimized C SHA1 routine. # -# Define PPC_SHA1 environment variable when running make to make use of -# a bundled SHA1 routine optimized for PowerPC. -# # Define DC_SHA1 to unconditionally enable the collision-detecting sha1 # algorithm. This is slower, but may detect attempted collision attacks. # Takes priority over other *_SHA1 knobs. @@ -1770,14 +1767,13 @@ ifdef OPENSSL_SHA1 EXTLIBS += $(LIB_4_CRYPTO) BASIC_CFLAGS += -DSHA1_OPENSSL else +ifdef PPC_SHA1 +$(error PPC_SHA1 has been removed! Use DC_SHA1 instead, which is the default) +endif ifdef BLK_SHA1 LIB_OBJS += block-sha1/sha1.o BASIC_CFLAGS += -DSHA1_BLK else -ifdef PPC_SHA1 - LIB_OBJS += ppc/sha1.o ppc/sha1ppc.o - BASIC_CFLAGS += -DSHA1_PPC -else ifdef APPLE_COMMON_CRYPTO COMPAT_CFLAGS += -DCOMMON_DIGEST_FOR_OPENSSL BASIC_CFLAGS += -DSHA1_APPLE @@ -1811,7 +1807,6 @@ endif endif endif endif -endif ifdef OPENSSL_SHA256 EXTLIBS += $(LIB_4_CRYPTO) @@ -2540,14 +2535,10 @@ missing_compdb_dir = compdb_args = endif -ASM_SRC := $(wildcard $(OBJECTS:o=S)) -ASM_OBJ := $(ASM_SRC:S=o) -C_OBJ := $(filter-out $(ASM_OBJ),$(OBJECTS)) +C_OBJ = $(OBJECTS) $(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< -$(ASM_OBJ): %.o: %.S GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) - $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< %.s: %.c GIT-CFLAGS FORCE $(QUIET_CC)$(CC) -o $@ -S $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< diff --git a/configure.ac b/configure.ac index 5ee25ec95c8..9c75b00d3eb 100644 --- a/configure.ac +++ b/configure.ac @@ -237,9 +237,6 @@ AC_MSG_NOTICE([CHECKS for site configuration]) # tests. These tests take up a significant amount of the total test time # but are not needed unless you plan to talk to SVN repos. # -# Define PPC_SHA1 environment variable when running make to make use of -# a bundled SHA1 routine optimized for PowerPC. -# # Define NO_OPENSSL environment variable if you do not have OpenSSL. # # Define OPENSSLDIR=/foo/bar if your openssl header and library files are in diff --git a/hash.h b/hash.h index 5d40368f18a..efc14c5f56d 100644 --- a/hash.h +++ b/hash.h @@ -4,9 +4,7 @@ #include "git-compat-util.h" #include "repository.h" -#if defined(SHA1_PPC) -#include "ppc/sha1.h" -#elif defined(SHA1_APPLE) +#if defined(SHA1_APPLE) #include <CommonCrypto/CommonDigest.h> #elif defined(SHA1_OPENSSL) #include <openssl/sha.h> @@ -30,7 +28,7 @@ * platform's underlying implementation of SHA-1; could be OpenSSL, * blk_SHA, Apple CommonCrypto, etc... Note that the relevant * SHA-1 header may have already defined platform_SHA_CTX for our - * own implementations like block-sha1 and ppc-sha1, so we list + * own implementations like block-sha1, so we list * the default for OpenSSL compatible SHA-1 implementations here. */ #define platform_SHA_CTX SHA_CTX diff --git a/ppc/sha1.c b/ppc/sha1.c deleted file mode 100644 index 1b705cee1fe..00000000000 --- a/ppc/sha1.c +++ /dev/null @@ -1,72 +0,0 @@ -/* - * SHA-1 implementation. - * - * Copyright (C) 2005 Paul Mackerras <paulus@samba.org> - * - * This version assumes we are running on a big-endian machine. - * It calls an external sha1_core() to process blocks of 64 bytes. - */ -#include <stdio.h> -#include <string.h> -#include "sha1.h" - -void ppc_sha1_core(uint32_t *hash, const unsigned char *p, - unsigned int nblocks); - -int ppc_SHA1_Init(ppc_SHA_CTX *c) -{ - c->hash[0] = 0x67452301; - c->hash[1] = 0xEFCDAB89; - c->hash[2] = 0x98BADCFE; - c->hash[3] = 0x10325476; - c->hash[4] = 0xC3D2E1F0; - c->len = 0; - c->cnt = 0; - return 0; -} - -int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *ptr, unsigned long n) -{ - unsigned long nb; - const unsigned char *p = ptr; - - c->len += (uint64_t) n << 3; - while (n != 0) { - if (c->cnt || n < 64) { - nb = 64 - c->cnt; - if (nb > n) - nb = n; - memcpy(&c->buf.b[c->cnt], p, nb); - if ((c->cnt += nb) == 64) { - ppc_sha1_core(c->hash, c->buf.b, 1); - c->cnt = 0; - } - } else { - nb = n >> 6; - ppc_sha1_core(c->hash, p, nb); - nb <<= 6; - } - n -= nb; - p += nb; - } - return 0; -} - -int ppc_SHA1_Final(unsigned char *hash, ppc_SHA_CTX *c) -{ - unsigned int cnt = c->cnt; - - c->buf.b[cnt++] = 0x80; - if (cnt > 56) { - if (cnt < 64) - memset(&c->buf.b[cnt], 0, 64 - cnt); - ppc_sha1_core(c->hash, c->buf.b, 1); - cnt = 0; - } - if (cnt < 56) - memset(&c->buf.b[cnt], 0, 56 - cnt); - c->buf.l[7] = c->len; - ppc_sha1_core(c->hash, c->buf.b, 1); - memcpy(hash, c->hash, 20); - return 0; -} diff --git a/ppc/sha1.h b/ppc/sha1.h deleted file mode 100644 index 9b24b326159..00000000000 --- a/ppc/sha1.h +++ /dev/null @@ -1,25 +0,0 @@ -/* - * SHA-1 implementation. - * - * Copyright (C) 2005 Paul Mackerras <paulus@samba.org> - */ -#include <stdint.h> - -typedef struct { - uint32_t hash[5]; - uint32_t cnt; - uint64_t len; - union { - unsigned char b[64]; - uint64_t l[8]; - } buf; -} ppc_SHA_CTX; - -int ppc_SHA1_Init(ppc_SHA_CTX *c); -int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *p, unsigned long n); -int ppc_SHA1_Final(unsigned char *hash, ppc_SHA_CTX *c); - -#define platform_SHA_CTX ppc_SHA_CTX -#define platform_SHA1_Init ppc_SHA1_Init -#define platform_SHA1_Update ppc_SHA1_Update -#define platform_SHA1_Final ppc_SHA1_Final diff --git a/ppc/sha1ppc.S b/ppc/sha1ppc.S deleted file mode 100644 index 1711eef6e71..00000000000 --- a/ppc/sha1ppc.S +++ /dev/null @@ -1,224 +0,0 @@ -/* - * SHA-1 implementation for PowerPC. - * - * Copyright (C) 2005 Paul Mackerras <paulus@samba.org> - */ - -/* - * PowerPC calling convention: - * %r0 - volatile temp - * %r1 - stack pointer. - * %r2 - reserved - * %r3-%r12 - Incoming arguments & return values; volatile. - * %r13-%r31 - Callee-save registers - * %lr - Return address, volatile - * %ctr - volatile - * - * Register usage in this routine: - * %r0 - temp - * %r3 - argument (pointer to 5 words of SHA state) - * %r4 - argument (pointer to data to hash) - * %r5 - Constant K in SHA round (initially number of blocks to hash) - * %r6-%r10 - Working copies of SHA variables A..E (actually E..A order) - * %r11-%r26 - Data being hashed W[]. - * %r27-%r31 - Previous copies of A..E, for final add back. - * %ctr - loop count - */ - - -/* - * We roll the registers for A, B, C, D, E around on each - * iteration; E on iteration t is D on iteration t+1, and so on. - * We use registers 6 - 10 for this. (Registers 27 - 31 hold - * the previous values.) - */ -#define RA(t) (((t)+4)%5+6) -#define RB(t) (((t)+3)%5+6) -#define RC(t) (((t)+2)%5+6) -#define RD(t) (((t)+1)%5+6) -#define RE(t) (((t)+0)%5+6) - -/* We use registers 11 - 26 for the W values */ -#define W(t) ((t)%16+11) - -/* Register 5 is used for the constant k */ - -/* - * The basic SHA-1 round function is: - * E += ROTL(A,5) + F(B,C,D) + W[i] + K; B = ROTL(B,30) - * Then the variables are renamed: (A,B,C,D,E) = (E,A,B,C,D). - * - * Every 20 rounds, the function F() and the constant K changes: - * - 20 rounds of f0(b,c,d) = "bit wise b ? c : d" = (^b & d) + (b & c) - * - 20 rounds of f1(b,c,d) = b^c^d = (b^d)^c - * - 20 rounds of f2(b,c,d) = majority(b,c,d) = (b&d) + ((b^d)&c) - * - 20 more rounds of f1(b,c,d) - * - * These are all scheduled for near-optimal performance on a G4. - * The G4 is a 3-issue out-of-order machine with 3 ALUs, but it can only - * *consider* starting the oldest 3 instructions per cycle. So to get - * maximum performance out of it, you have to treat it as an in-order - * machine. Which means interleaving the computation round t with the - * computation of W[t+4]. - * - * The first 16 rounds use W values loaded directly from memory, while the - * remaining 64 use values computed from those first 16. We preload - * 4 values before starting, so there are three kinds of rounds: - * - The first 12 (all f0) also load the W values from memory. - * - The next 64 compute W(i+4) in parallel. 8*f0, 20*f1, 20*f2, 16*f1. - * - The last 4 (all f1) do not do anything with W. - * - * Therefore, we have 6 different round functions: - * STEPD0_LOAD(t,s) - Perform round t and load W(s). s < 16 - * STEPD0_UPDATE(t,s) - Perform round t and compute W(s). s >= 16. - * STEPD1_UPDATE(t,s) - * STEPD2_UPDATE(t,s) - * STEPD1(t) - Perform round t with no load or update. - * - * The G5 is more fully out-of-order, and can find the parallelism - * by itself. The big limit is that it has a 2-cycle ALU latency, so - * even though it's 2-way, the code has to be scheduled as if it's - * 4-way, which can be a limit. To help it, we try to schedule the - * read of RA(t) as late as possible so it doesn't stall waiting for - * the previous round's RE(t-1), and we try to rotate RB(t) as early - * as possible while reading RC(t) (= RB(t-1)) as late as possible. - */ - -/* the initial loads. */ -#define LOADW(s) \ - lwz W(s),(s)*4(%r4) - -/* - * Perform a step with F0, and load W(s). Uses W(s) as a temporary - * before loading it. - * This is actually 10 instructions, which is an awkward fit. - * It can execute grouped as listed, or delayed one instruction. - * (If delayed two instructions, there is a stall before the start of the - * second line.) Thus, two iterations take 7 cycles, 3.5 cycles per round. - */ -#define STEPD0_LOAD(t,s) \ -add RE(t),RE(t),W(t); andc %r0,RD(t),RB(t); and W(s),RC(t),RB(t); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; rotlwi RB(t),RB(t),30; \ -add RE(t),RE(t),W(s); add %r0,%r0,%r5; lwz W(s),(s)*4(%r4); \ -add RE(t),RE(t),%r0 - -/* - * This is likewise awkward, 13 instructions. However, it can also - * execute starting with 2 out of 3 possible moduli, so it does 2 rounds - * in 9 cycles, 4.5 cycles/round. - */ -#define STEPD0_UPDATE(t,s,loadk...) \ -add RE(t),RE(t),W(t); andc %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \ -add RE(t),RE(t),%r0; and %r0,RC(t),RB(t); xor W(s),W(s),W((s)-8); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; xor W(s),W(s),W((s)-14); \ -add RE(t),RE(t),%r5; loadk; rotlwi RB(t),RB(t),30; rotlwi W(s),W(s),1; \ -add RE(t),RE(t),%r0 - -/* Nicely optimal. Conveniently, also the most common. */ -#define STEPD1_UPDATE(t,s,loadk...) \ -add RE(t),RE(t),W(t); xor %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \ -add RE(t),RE(t),%r5; loadk; xor %r0,%r0,RC(t); xor W(s),W(s),W((s)-8); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; xor W(s),W(s),W((s)-14); \ -add RE(t),RE(t),%r0; rotlwi RB(t),RB(t),30; rotlwi W(s),W(s),1 - -/* - * The naked version, no UPDATE, for the last 4 rounds. 3 cycles per. - * We could use W(s) as a temp register, but we don't need it. - */ -#define STEPD1(t) \ - add RE(t),RE(t),W(t); xor %r0,RD(t),RB(t); \ -rotlwi RB(t),RB(t),30; add RE(t),RE(t),%r5; xor %r0,%r0,RC(t); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; /* spare slot */ \ -add RE(t),RE(t),%r0 - -/* - * 14 instructions, 5 cycles per. The majority function is a bit - * awkward to compute. This can execute with a 1-instruction delay, - * but it causes a 2-instruction delay, which triggers a stall. - */ -#define STEPD2_UPDATE(t,s,loadk...) \ -add RE(t),RE(t),W(t); and %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \ -add RE(t),RE(t),%r0; xor %r0,RD(t),RB(t); xor W(s),W(s),W((s)-8); \ -add RE(t),RE(t),%r5; loadk; and %r0,%r0,RC(t); xor W(s),W(s),W((s)-14); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; rotlwi W(s),W(s),1; \ -add RE(t),RE(t),%r0; rotlwi RB(t),RB(t),30 - -#define STEP0_LOAD4(t,s) \ - STEPD0_LOAD(t,s); \ - STEPD0_LOAD((t+1),(s)+1); \ - STEPD0_LOAD((t)+2,(s)+2); \ - STEPD0_LOAD((t)+3,(s)+3) - -#define STEPUP4(fn, t, s, loadk...) \ - STEP##fn##_UPDATE(t,s,); \ - STEP##fn##_UPDATE((t)+1,(s)+1,); \ - STEP##fn##_UPDATE((t)+2,(s)+2,); \ - STEP##fn##_UPDATE((t)+3,(s)+3,loadk) - -#define STEPUP20(fn, t, s, loadk...) \ - STEPUP4(fn, t, s,); \ - STEPUP4(fn, (t)+4, (s)+4,); \ - STEPUP4(fn, (t)+8, (s)+8,); \ - STEPUP4(fn, (t)+12, (s)+12,); \ - STEPUP4(fn, (t)+16, (s)+16, loadk) - - .globl ppc_sha1_core -ppc_sha1_core: - stwu %r1,-80(%r1) - stmw %r13,4(%r1) - - /* Load up A - E */ - lmw %r27,0(%r3) - - mtctr %r5 - -1: - LOADW(0) - lis %r5,0x5a82 - mr RE(0),%r31 - LOADW(1) - mr RD(0),%r30 - mr RC(0),%r29 - LOADW(2) - ori %r5,%r5,0x7999 /* K0-19 */ - mr RB(0),%r28 - LOADW(3) - mr RA(0),%r27 - - STEP0_LOAD4(0, 4) - STEP0_LOAD4(4, 8) - STEP0_LOAD4(8, 12) - STEPUP4(D0, 12, 16,) - STEPUP4(D0, 16, 20, lis %r5,0x6ed9) - - ori %r5,%r5,0xeba1 /* K20-39 */ - STEPUP20(D1, 20, 24, lis %r5,0x8f1b) - - ori %r5,%r5,0xbcdc /* K40-59 */ - STEPUP20(D2, 40, 44, lis %r5,0xca62) - - ori %r5,%r5,0xc1d6 /* K60-79 */ - STEPUP4(D1, 60, 64,) - STEPUP4(D1, 64, 68,) - STEPUP4(D1, 68, 72,) - STEPUP4(D1, 72, 76,) - addi %r4,%r4,64 - STEPD1(76) - STEPD1(77) - STEPD1(78) - STEPD1(79) - - /* Add results to original values */ - add %r31,%r31,RE(0) - add %r30,%r30,RD(0) - add %r29,%r29,RC(0) - add %r28,%r28,RB(0) - add %r27,%r27,RA(0) - - bdnz 1b - - /* Save final hash, restore registers, and return */ - stmw %r27,0(%r3) - lmw %r13,4(%r1) - addi %r1,%r1,80 - blr -- 2.35.1.1441.g83331fcb493 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2] ppc: remove custom SHA-1 implementation 2022-03-21 17:06 ` [PATCH v2] " Ævar Arnfjörð Bjarmason @ 2022-03-21 21:19 ` brian m. carlson 2022-08-31 9:18 ` [PATCH v3 0/2] Makefile + hash.h: remove PPC_SHA1 implementation Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 17+ messages in thread From: brian m. carlson @ 2022-03-21 21:19 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Mike Hommey, Carlo Marcelo Arenas Belón, Taylor Blau [-- Attachment #1: Type: text/plain, Size: 2270 bytes --] On 2022-03-21 at 17:06:12, Ævar Arnfjörð Bjarmason wrote: > Remove the PPC_SHA1 implementation added in a6ef3518f9a ([PATCH] PPC > assembly implementation of SHA1, 2005-04-22). When this was added > Apple consumer hardware used the PPC architecture, and the > implementation was intended to improve SHA-1 speed there. > > Since it was added we've moved to DC_SHA1 by default, and anyone > wanting hard-rolled non-DC SHA-1 implementation can use OpenSSL's via > the OPENSSL_SHA1 knob. > > I'm unsure if this was ever supposed to work on 64-bit PPC. It clearly > originally targeted 32 bit PPC, but there's some mailing list > references to this being tried on G5 (PPC 970). I can't get it to do > anything but segfault on the BE POWER8 machine in the GCC compile > farm. Anyone caring about speed on PPC these days is likely to be > using IBM's POWER, not PPC 970. > > There have been proposals to entirely remove non-DC_SHA1 > implementations from the tree[1]. I think per [2] that would be a bit > overzealous. I.e. there are various set-ups git's speed is going to be > more important than the relatively implausible SHA-1 collision attack, > or where such attacks are entirely mitigated by other means (e.g. by > incoming objects being checked with DC_SHA1). > > The main reason for doing so at this point is to simplify follow-up > Makefile change. Since PPC_SHA1 included the only in-tree *.S assembly > file we needed to keep around special support for building objects > from it. By getting rid of it we know we'll always build *.o from *.c > files, which makes the build process simpler. > > As an aside the code being removed here was also throwing warnings > with the "-pedantic" flag, but let's remove it instead of fixing it, > as 544d93bc3b4 (block-sha1: remove use of obsolete x86 assembly, > 2022-03-10) did for block-sha1/*. While I don't agree that we shouldn't remove the other non-DC SHA-1 implementations, I do agree that we should remove this one. Given the testing you've done and the fact that almost everyone desiring speed is using a 64-bit machine these days, I think it's unlikely that anyone is using this in the real world. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 0/2] Makefile + hash.h: remove PPC_SHA1 implementation 2022-03-21 17:06 ` [PATCH v2] " Ævar Arnfjörð Bjarmason 2022-03-21 21:19 ` brian m. carlson @ 2022-08-31 9:18 ` Ævar Arnfjörð Bjarmason 2022-08-31 9:18 ` [PATCH v3 1/2] " Ævar Arnfjörð Bjarmason 2022-08-31 9:18 ` [PATCH v3 2/2] Makefile: use $(OBJECTS) instead of $(C_OBJ) Ævar Arnfjörð Bjarmason 1 sibling, 2 replies; 17+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-08-31 9:18 UTC (permalink / raw) To: git Cc: Junio C Hamano, Mike Hommey, brian m . carlson, Carlo Marcelo Arenas Belón, Taylor Blau, Ævar Arnfjörð Bjarmason This is a re-roll of [1] sent back in March. It removes the PPC_SHA1 implementation, which as noted in in 1/2 is likely completely unused at this point, if anyone's using it we have other better tested (and just as fast) SHA-1 implementations. I then included this change in a larger series sent in late April[2], which stalled for lack of feedback from the OSX crowd[3]. But this part should be uncontroversial, and is an obvious win in terms of the diffstat, and in reducing our complexity surface in this area. Changes since v2: * Much rewritten commit message (see range-diff) * Rebased on upstream changes * More PPC_SHA1 removal of comments/docs that I missed the first time around. * The s/C_OBJS/OBJECTS/ Makefile change is now split into a new 2/2. * Added brian's Acked-by, per [4] (not explicitly an "Acked-by", but I thought adding amounted to a fair paraphrasing of the feedback there) 1. https://lore.kernel.org/git/patch-v2-1.1-e77fd23a824-20220321T170412Z-avarab@gmail.com/ 2. https://lore.kernel.org/git/cover-0.5-00000000000-20220422T094624Z-avarab@gmail.com/ 3. https://lore.kernel.org/git/xmqqv8v17xrl.fsf@gitster.g/ 4. https://lore.kernel.org/git/Yjjr9fkybVmB53M7@camp.crustytoothpaste.net/ Ævar Arnfjörð Bjarmason (2): Makefile + hash.h: remove PPC_SHA1 implementation Makefile: use $(OBJECTS) instead of $(C_OBJ) INSTALL | 3 +- Makefile | 22 ++--- block-sha1/sha1.c | 4 - configure.ac | 3 - hash.h | 6 +- ppc/sha1.c | 72 --------------- ppc/sha1.h | 25 ------ ppc/sha1ppc.S | 224 ---------------------------------------------- 8 files changed, 9 insertions(+), 350 deletions(-) delete mode 100644 ppc/sha1.c delete mode 100644 ppc/sha1.h delete mode 100644 ppc/sha1ppc.S Range-diff against v2: 1: e77fd23a824 ! 1: 87a204b8937 ppc: remove custom SHA-1 implementation @@ Metadata Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com> ## Commit message ## - ppc: remove custom SHA-1 implementation + Makefile + hash.h: remove PPC_SHA1 implementation Remove the PPC_SHA1 implementation added in a6ef3518f9a ([PATCH] PPC assembly implementation of SHA1, 2005-04-22). When this was added Apple consumer hardware used the PPC architecture, and the implementation was intended to improve SHA-1 speed there. - Since it was added we've moved to DC_SHA1 by default, and anyone - wanting hard-rolled non-DC SHA-1 implementation can use OpenSSL's via - the OPENSSL_SHA1 knob. + Since it was added we've moved to using sha1collisiondetection by + default, and anyone wanting hard-rolled non-DC SHA-1 implementation + can use OpenSSL's via the OPENSSL_SHA1 knob. - I'm unsure if this was ever supposed to work on 64-bit PPC. It clearly - originally targeted 32 bit PPC, but there's some mailing list - references to this being tried on G5 (PPC 970). I can't get it to do - anything but segfault on the BE POWER8 machine in the GCC compile - farm. Anyone caring about speed on PPC these days is likely to be - using IBM's POWER, not PPC 970. + The PPC_SHA1 originally originally targeted 32 bit PPC, and later the + 64 bit PPC 970 (a.k.a. Apple PowerPC G5). See 926172c5e48 (block-sha1: + improve code on large-register-set machines, 2009-08-10) for a + reference about the performance on G5 (a comment in block-sha1/sha1.c + being removed here). - There have been proposals to entirely remove non-DC_SHA1 - implementations from the tree[1]. I think per [2] that would be a bit + I can't get it to do anything but segfault on both the BE and LE POWER + machines in the GCC compile farm[1]. Anyone who's concerned about + performance on PPC these days is likely to be using the IBM POWER + processors. + + There have been proposals to entirely remove non-sha1collisiondetection + implementations from the tree[2]. I think per [3] that would be a bit overzealous. I.e. there are various set-ups git's speed is going to be more important than the relatively implausible SHA-1 collision attack, or where such attacks are entirely mitigated by other means (e.g. by incoming objects being checked with DC_SHA1). - The main reason for doing so at this point is to simplify follow-up - Makefile change. Since PPC_SHA1 included the only in-tree *.S assembly - file we needed to keep around special support for building objects - from it. By getting rid of it we know we'll always build *.o from *.c - files, which makes the build process simpler. + But that really doesn't apply to PPC_SHA1 in particular, which seems + to have outlived its usefulness. + + As this gets rid of the only in-tree *.S assembly file we can remove + the small bits of logic from the Makefile needed to build objects + from *.S (as opposed to *.c) - As an aside the code being removed here was also throwing warnings - with the "-pedantic" flag, but let's remove it instead of fixing it, - as 544d93bc3b4 (block-sha1: remove use of obsolete x86 assembly, - 2022-03-10) did for block-sha1/*. + The code being removed here was also throwing warnings with the + "-pedantic" flag, it could have been fixed as 544d93bc3b4 (block-sha1: + remove use of obsolete x86 assembly, 2022-03-10) did for block-sha1/*, + but as noted above let's remove it instead. - 1. https://lore.kernel.org/git/20200223223758.120941-1-mh@glandium.org/ - 2. https://lore.kernel.org/git/20200224044732.GK1018190@coredump.intra.peff.net/ + 1. https://cfarm.tetaneutral.net/machines/list/ + Tested on gcc{110,112,135,203}, a mixture of POWER [789] ppc64 and + ppc64le. All segfault in anything needing object + hashing (e.g. t/t1007-hash-object.sh) when compiled with + PPC_SHA1=Y. + 2. https://lore.kernel.org/git/20200223223758.120941-1-mh@glandium.org/ + 3. https://lore.kernel.org/git/20200224044732.GK1018190@coredump.intra.peff.net/ + Acked-by: brian m. carlson" <sandals@crustytoothpaste.net> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> ## INSTALL ## @@ Makefile: include shared.mak # Define DC_SHA1 to unconditionally enable the collision-detecting sha1 # algorithm. This is slower, but may detect attempted collision attacks. # Takes priority over other *_SHA1 knobs. -@@ Makefile: ifdef OPENSSL_SHA1 - EXTLIBS += $(LIB_4_CRYPTO) - BASIC_CFLAGS += -DSHA1_OPENSSL - else +@@ Makefile: ifdef APPLE_COMMON_CRYPTO + SHA1_MAX_BLOCK_SIZE = 1024L*1024L*1024L + endif + +ifdef PPC_SHA1 -+$(error PPC_SHA1 has been removed! Use DC_SHA1 instead, which is the default) ++$(error the PPC_SHA1 flag has been removed along with the PowerPC-specific SHA-1 implementation.) +endif - ifdef BLK_SHA1 ++ + ifdef OPENSSL_SHA1 + EXTLIBS += $(LIB_4_CRYPTO) + BASIC_CFLAGS += -DSHA1_OPENSSL +@@ Makefile: ifdef BLK_SHA1 LIB_OBJS += block-sha1/sha1.o BASIC_CFLAGS += -DSHA1_BLK else @@ Makefile: missing_compdb_dir = -ASM_SRC := $(wildcard $(OBJECTS:o=S)) -ASM_OBJ := $(ASM_SRC:S=o) -C_OBJ := $(filter-out $(ASM_OBJ),$(OBJECTS)) -+C_OBJ = $(OBJECTS) ++C_OBJ := $(OBJECTS) $(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< @@ Makefile: missing_compdb_dir = %.s: %.c GIT-CFLAGS FORCE $(QUIET_CC)$(CC) -o $@ -S $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< + ## block-sha1/sha1.c ## +@@ + * try to do the silly "optimize away loads" part because it won't + * see what the value will be). + * +- * Ben Herrenschmidt reports that on PPC, the C version comes close +- * to the optimized asm with this (ie on PPC you don't want that +- * 'volatile', since there are lots of registers). +- * + * On ARM we get the best code generation by forcing a full memory barrier + * between each SHA_ROUND, otherwise gcc happily get wild with spilling and + * the stack frame size simply explode and performance goes down the drain. + ## configure.ac ## @@ configure.ac: AC_MSG_NOTICE([CHECKS for site configuration]) # tests. These tests take up a significant amount of the total test time -: ----------- > 2: cb3bc8b5029 Makefile: use $(OBJECTS) instead of $(C_OBJ) -- 2.37.3.1406.g184357183a6 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 1/2] Makefile + hash.h: remove PPC_SHA1 implementation 2022-08-31 9:18 ` [PATCH v3 0/2] Makefile + hash.h: remove PPC_SHA1 implementation Ævar Arnfjörð Bjarmason @ 2022-08-31 9:18 ` Ævar Arnfjörð Bjarmason 2022-08-31 9:18 ` [PATCH v3 2/2] Makefile: use $(OBJECTS) instead of $(C_OBJ) Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 17+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-08-31 9:18 UTC (permalink / raw) To: git Cc: Junio C Hamano, Mike Hommey, brian m . carlson, Carlo Marcelo Arenas Belón, Taylor Blau, Ævar Arnfjörð Bjarmason Remove the PPC_SHA1 implementation added in a6ef3518f9a ([PATCH] PPC assembly implementation of SHA1, 2005-04-22). When this was added Apple consumer hardware used the PPC architecture, and the implementation was intended to improve SHA-1 speed there. Since it was added we've moved to using sha1collisiondetection by default, and anyone wanting hard-rolled non-DC SHA-1 implementation can use OpenSSL's via the OPENSSL_SHA1 knob. The PPC_SHA1 originally originally targeted 32 bit PPC, and later the 64 bit PPC 970 (a.k.a. Apple PowerPC G5). See 926172c5e48 (block-sha1: improve code on large-register-set machines, 2009-08-10) for a reference about the performance on G5 (a comment in block-sha1/sha1.c being removed here). I can't get it to do anything but segfault on both the BE and LE POWER machines in the GCC compile farm[1]. Anyone who's concerned about performance on PPC these days is likely to be using the IBM POWER processors. There have been proposals to entirely remove non-sha1collisiondetection implementations from the tree[2]. I think per [3] that would be a bit overzealous. I.e. there are various set-ups git's speed is going to be more important than the relatively implausible SHA-1 collision attack, or where such attacks are entirely mitigated by other means (e.g. by incoming objects being checked with DC_SHA1). But that really doesn't apply to PPC_SHA1 in particular, which seems to have outlived its usefulness. As this gets rid of the only in-tree *.S assembly file we can remove the small bits of logic from the Makefile needed to build objects from *.S (as opposed to *.c) The code being removed here was also throwing warnings with the "-pedantic" flag, it could have been fixed as 544d93bc3b4 (block-sha1: remove use of obsolete x86 assembly, 2022-03-10) did for block-sha1/*, but as noted above let's remove it instead. 1. https://cfarm.tetaneutral.net/machines/list/ Tested on gcc{110,112,135,203}, a mixture of POWER [789] ppc64 and ppc64le. All segfault in anything needing object hashing (e.g. t/t1007-hash-object.sh) when compiled with PPC_SHA1=Y. 2. https://lore.kernel.org/git/20200223223758.120941-1-mh@glandium.org/ 3. https://lore.kernel.org/git/20200224044732.GK1018190@coredump.intra.peff.net/ Acked-by: brian m. carlson" <sandals@crustytoothpaste.net> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- INSTALL | 3 +- Makefile | 18 ++-- block-sha1/sha1.c | 4 - configure.ac | 3 - hash.h | 6 +- ppc/sha1.c | 72 --------------- ppc/sha1.h | 25 ------ ppc/sha1ppc.S | 224 ---------------------------------------------- 8 files changed, 8 insertions(+), 347 deletions(-) delete mode 100644 ppc/sha1.c delete mode 100644 ppc/sha1.h delete mode 100644 ppc/sha1ppc.S diff --git a/INSTALL b/INSTALL index 4140a3f5c8b..89b15d71df5 100644 --- a/INSTALL +++ b/INSTALL @@ -135,8 +135,7 @@ Issues of note: By default, git uses OpenSSL for SHA1 but it will use its own library (inspired by Mozilla's) with either NO_OPENSSL or - BLK_SHA1. Also included is a version optimized for PowerPC - (PPC_SHA1). + BLK_SHA1. - "libcurl" library is used for fetching and pushing repositories over http:// or https://, as well as by diff --git a/Makefile b/Makefile index eac30126e29..7feda7e79be 100644 --- a/Makefile +++ b/Makefile @@ -155,9 +155,6 @@ include shared.mak # Define BLK_SHA1 environment variable to make use of the bundled # optimized C SHA1 routine. # -# Define PPC_SHA1 environment variable when running make to make use of -# a bundled SHA1 routine optimized for PowerPC. -# # Define DC_SHA1 to unconditionally enable the collision-detecting sha1 # algorithm. This is slower, but may detect attempted collision attacks. # Takes priority over other *_SHA1 knobs. @@ -1802,6 +1799,10 @@ ifdef APPLE_COMMON_CRYPTO SHA1_MAX_BLOCK_SIZE = 1024L*1024L*1024L endif +ifdef PPC_SHA1 +$(error the PPC_SHA1 flag has been removed along with the PowerPC-specific SHA-1 implementation.) +endif + ifdef OPENSSL_SHA1 EXTLIBS += $(LIB_4_CRYPTO) BASIC_CFLAGS += -DSHA1_OPENSSL @@ -1810,10 +1811,6 @@ ifdef BLK_SHA1 LIB_OBJS += block-sha1/sha1.o BASIC_CFLAGS += -DSHA1_BLK else -ifdef PPC_SHA1 - LIB_OBJS += ppc/sha1.o ppc/sha1ppc.o - BASIC_CFLAGS += -DSHA1_PPC -else ifdef APPLE_COMMON_CRYPTO COMPAT_CFLAGS += -DCOMMON_DIGEST_FOR_OPENSSL BASIC_CFLAGS += -DSHA1_APPLE @@ -1847,7 +1844,6 @@ endif endif endif endif -endif ifdef OPENSSL_SHA256 EXTLIBS += $(LIB_4_CRYPTO) @@ -2594,14 +2590,10 @@ missing_compdb_dir = compdb_args = endif -ASM_SRC := $(wildcard $(OBJECTS:o=S)) -ASM_OBJ := $(ASM_SRC:S=o) -C_OBJ := $(filter-out $(ASM_OBJ),$(OBJECTS)) +C_OBJ := $(OBJECTS) $(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< -$(ASM_OBJ): %.o: %.S GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) - $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< %.s: %.c GIT-CFLAGS FORCE $(QUIET_CC)$(CC) -o $@ -S $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c index 5974cd7dd3c..80cebd27564 100644 --- a/block-sha1/sha1.c +++ b/block-sha1/sha1.c @@ -28,10 +28,6 @@ * try to do the silly "optimize away loads" part because it won't * see what the value will be). * - * Ben Herrenschmidt reports that on PPC, the C version comes close - * to the optimized asm with this (ie on PPC you don't want that - * 'volatile', since there are lots of registers). - * * On ARM we get the best code generation by forcing a full memory barrier * between each SHA_ROUND, otherwise gcc happily get wild with spilling and * the stack frame size simply explode and performance goes down the drain. diff --git a/configure.ac b/configure.ac index 7dcd0482042..38ff86678a0 100644 --- a/configure.ac +++ b/configure.ac @@ -237,9 +237,6 @@ AC_MSG_NOTICE([CHECKS for site configuration]) # tests. These tests take up a significant amount of the total test time # but are not needed unless you plan to talk to SVN repos. # -# Define PPC_SHA1 environment variable when running make to make use of -# a bundled SHA1 routine optimized for PowerPC. -# # Define NO_OPENSSL environment variable if you do not have OpenSSL. # # Define OPENSSLDIR=/foo/bar if your openssl header and library files are in diff --git a/hash.h b/hash.h index ea87ae9d92f..36b64165fc9 100644 --- a/hash.h +++ b/hash.h @@ -4,9 +4,7 @@ #include "git-compat-util.h" #include "repository.h" -#if defined(SHA1_PPC) -#include "ppc/sha1.h" -#elif defined(SHA1_APPLE) +#if defined(SHA1_APPLE) #include <CommonCrypto/CommonDigest.h> #elif defined(SHA1_OPENSSL) #include <openssl/sha.h> @@ -32,7 +30,7 @@ * platform's underlying implementation of SHA-1; could be OpenSSL, * blk_SHA, Apple CommonCrypto, etc... Note that the relevant * SHA-1 header may have already defined platform_SHA_CTX for our - * own implementations like block-sha1 and ppc-sha1, so we list + * own implementations like block-sha1, so we list * the default for OpenSSL compatible SHA-1 implementations here. */ #define platform_SHA_CTX SHA_CTX diff --git a/ppc/sha1.c b/ppc/sha1.c deleted file mode 100644 index 1b705cee1fe..00000000000 --- a/ppc/sha1.c +++ /dev/null @@ -1,72 +0,0 @@ -/* - * SHA-1 implementation. - * - * Copyright (C) 2005 Paul Mackerras <paulus@samba.org> - * - * This version assumes we are running on a big-endian machine. - * It calls an external sha1_core() to process blocks of 64 bytes. - */ -#include <stdio.h> -#include <string.h> -#include "sha1.h" - -void ppc_sha1_core(uint32_t *hash, const unsigned char *p, - unsigned int nblocks); - -int ppc_SHA1_Init(ppc_SHA_CTX *c) -{ - c->hash[0] = 0x67452301; - c->hash[1] = 0xEFCDAB89; - c->hash[2] = 0x98BADCFE; - c->hash[3] = 0x10325476; - c->hash[4] = 0xC3D2E1F0; - c->len = 0; - c->cnt = 0; - return 0; -} - -int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *ptr, unsigned long n) -{ - unsigned long nb; - const unsigned char *p = ptr; - - c->len += (uint64_t) n << 3; - while (n != 0) { - if (c->cnt || n < 64) { - nb = 64 - c->cnt; - if (nb > n) - nb = n; - memcpy(&c->buf.b[c->cnt], p, nb); - if ((c->cnt += nb) == 64) { - ppc_sha1_core(c->hash, c->buf.b, 1); - c->cnt = 0; - } - } else { - nb = n >> 6; - ppc_sha1_core(c->hash, p, nb); - nb <<= 6; - } - n -= nb; - p += nb; - } - return 0; -} - -int ppc_SHA1_Final(unsigned char *hash, ppc_SHA_CTX *c) -{ - unsigned int cnt = c->cnt; - - c->buf.b[cnt++] = 0x80; - if (cnt > 56) { - if (cnt < 64) - memset(&c->buf.b[cnt], 0, 64 - cnt); - ppc_sha1_core(c->hash, c->buf.b, 1); - cnt = 0; - } - if (cnt < 56) - memset(&c->buf.b[cnt], 0, 56 - cnt); - c->buf.l[7] = c->len; - ppc_sha1_core(c->hash, c->buf.b, 1); - memcpy(hash, c->hash, 20); - return 0; -} diff --git a/ppc/sha1.h b/ppc/sha1.h deleted file mode 100644 index 9b24b326159..00000000000 --- a/ppc/sha1.h +++ /dev/null @@ -1,25 +0,0 @@ -/* - * SHA-1 implementation. - * - * Copyright (C) 2005 Paul Mackerras <paulus@samba.org> - */ -#include <stdint.h> - -typedef struct { - uint32_t hash[5]; - uint32_t cnt; - uint64_t len; - union { - unsigned char b[64]; - uint64_t l[8]; - } buf; -} ppc_SHA_CTX; - -int ppc_SHA1_Init(ppc_SHA_CTX *c); -int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *p, unsigned long n); -int ppc_SHA1_Final(unsigned char *hash, ppc_SHA_CTX *c); - -#define platform_SHA_CTX ppc_SHA_CTX -#define platform_SHA1_Init ppc_SHA1_Init -#define platform_SHA1_Update ppc_SHA1_Update -#define platform_SHA1_Final ppc_SHA1_Final diff --git a/ppc/sha1ppc.S b/ppc/sha1ppc.S deleted file mode 100644 index 1711eef6e71..00000000000 --- a/ppc/sha1ppc.S +++ /dev/null @@ -1,224 +0,0 @@ -/* - * SHA-1 implementation for PowerPC. - * - * Copyright (C) 2005 Paul Mackerras <paulus@samba.org> - */ - -/* - * PowerPC calling convention: - * %r0 - volatile temp - * %r1 - stack pointer. - * %r2 - reserved - * %r3-%r12 - Incoming arguments & return values; volatile. - * %r13-%r31 - Callee-save registers - * %lr - Return address, volatile - * %ctr - volatile - * - * Register usage in this routine: - * %r0 - temp - * %r3 - argument (pointer to 5 words of SHA state) - * %r4 - argument (pointer to data to hash) - * %r5 - Constant K in SHA round (initially number of blocks to hash) - * %r6-%r10 - Working copies of SHA variables A..E (actually E..A order) - * %r11-%r26 - Data being hashed W[]. - * %r27-%r31 - Previous copies of A..E, for final add back. - * %ctr - loop count - */ - - -/* - * We roll the registers for A, B, C, D, E around on each - * iteration; E on iteration t is D on iteration t+1, and so on. - * We use registers 6 - 10 for this. (Registers 27 - 31 hold - * the previous values.) - */ -#define RA(t) (((t)+4)%5+6) -#define RB(t) (((t)+3)%5+6) -#define RC(t) (((t)+2)%5+6) -#define RD(t) (((t)+1)%5+6) -#define RE(t) (((t)+0)%5+6) - -/* We use registers 11 - 26 for the W values */ -#define W(t) ((t)%16+11) - -/* Register 5 is used for the constant k */ - -/* - * The basic SHA-1 round function is: - * E += ROTL(A,5) + F(B,C,D) + W[i] + K; B = ROTL(B,30) - * Then the variables are renamed: (A,B,C,D,E) = (E,A,B,C,D). - * - * Every 20 rounds, the function F() and the constant K changes: - * - 20 rounds of f0(b,c,d) = "bit wise b ? c : d" = (^b & d) + (b & c) - * - 20 rounds of f1(b,c,d) = b^c^d = (b^d)^c - * - 20 rounds of f2(b,c,d) = majority(b,c,d) = (b&d) + ((b^d)&c) - * - 20 more rounds of f1(b,c,d) - * - * These are all scheduled for near-optimal performance on a G4. - * The G4 is a 3-issue out-of-order machine with 3 ALUs, but it can only - * *consider* starting the oldest 3 instructions per cycle. So to get - * maximum performance out of it, you have to treat it as an in-order - * machine. Which means interleaving the computation round t with the - * computation of W[t+4]. - * - * The first 16 rounds use W values loaded directly from memory, while the - * remaining 64 use values computed from those first 16. We preload - * 4 values before starting, so there are three kinds of rounds: - * - The first 12 (all f0) also load the W values from memory. - * - The next 64 compute W(i+4) in parallel. 8*f0, 20*f1, 20*f2, 16*f1. - * - The last 4 (all f1) do not do anything with W. - * - * Therefore, we have 6 different round functions: - * STEPD0_LOAD(t,s) - Perform round t and load W(s). s < 16 - * STEPD0_UPDATE(t,s) - Perform round t and compute W(s). s >= 16. - * STEPD1_UPDATE(t,s) - * STEPD2_UPDATE(t,s) - * STEPD1(t) - Perform round t with no load or update. - * - * The G5 is more fully out-of-order, and can find the parallelism - * by itself. The big limit is that it has a 2-cycle ALU latency, so - * even though it's 2-way, the code has to be scheduled as if it's - * 4-way, which can be a limit. To help it, we try to schedule the - * read of RA(t) as late as possible so it doesn't stall waiting for - * the previous round's RE(t-1), and we try to rotate RB(t) as early - * as possible while reading RC(t) (= RB(t-1)) as late as possible. - */ - -/* the initial loads. */ -#define LOADW(s) \ - lwz W(s),(s)*4(%r4) - -/* - * Perform a step with F0, and load W(s). Uses W(s) as a temporary - * before loading it. - * This is actually 10 instructions, which is an awkward fit. - * It can execute grouped as listed, or delayed one instruction. - * (If delayed two instructions, there is a stall before the start of the - * second line.) Thus, two iterations take 7 cycles, 3.5 cycles per round. - */ -#define STEPD0_LOAD(t,s) \ -add RE(t),RE(t),W(t); andc %r0,RD(t),RB(t); and W(s),RC(t),RB(t); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; rotlwi RB(t),RB(t),30; \ -add RE(t),RE(t),W(s); add %r0,%r0,%r5; lwz W(s),(s)*4(%r4); \ -add RE(t),RE(t),%r0 - -/* - * This is likewise awkward, 13 instructions. However, it can also - * execute starting with 2 out of 3 possible moduli, so it does 2 rounds - * in 9 cycles, 4.5 cycles/round. - */ -#define STEPD0_UPDATE(t,s,loadk...) \ -add RE(t),RE(t),W(t); andc %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \ -add RE(t),RE(t),%r0; and %r0,RC(t),RB(t); xor W(s),W(s),W((s)-8); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; xor W(s),W(s),W((s)-14); \ -add RE(t),RE(t),%r5; loadk; rotlwi RB(t),RB(t),30; rotlwi W(s),W(s),1; \ -add RE(t),RE(t),%r0 - -/* Nicely optimal. Conveniently, also the most common. */ -#define STEPD1_UPDATE(t,s,loadk...) \ -add RE(t),RE(t),W(t); xor %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \ -add RE(t),RE(t),%r5; loadk; xor %r0,%r0,RC(t); xor W(s),W(s),W((s)-8); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; xor W(s),W(s),W((s)-14); \ -add RE(t),RE(t),%r0; rotlwi RB(t),RB(t),30; rotlwi W(s),W(s),1 - -/* - * The naked version, no UPDATE, for the last 4 rounds. 3 cycles per. - * We could use W(s) as a temp register, but we don't need it. - */ -#define STEPD1(t) \ - add RE(t),RE(t),W(t); xor %r0,RD(t),RB(t); \ -rotlwi RB(t),RB(t),30; add RE(t),RE(t),%r5; xor %r0,%r0,RC(t); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; /* spare slot */ \ -add RE(t),RE(t),%r0 - -/* - * 14 instructions, 5 cycles per. The majority function is a bit - * awkward to compute. This can execute with a 1-instruction delay, - * but it causes a 2-instruction delay, which triggers a stall. - */ -#define STEPD2_UPDATE(t,s,loadk...) \ -add RE(t),RE(t),W(t); and %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \ -add RE(t),RE(t),%r0; xor %r0,RD(t),RB(t); xor W(s),W(s),W((s)-8); \ -add RE(t),RE(t),%r5; loadk; and %r0,%r0,RC(t); xor W(s),W(s),W((s)-14); \ -add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; rotlwi W(s),W(s),1; \ -add RE(t),RE(t),%r0; rotlwi RB(t),RB(t),30 - -#define STEP0_LOAD4(t,s) \ - STEPD0_LOAD(t,s); \ - STEPD0_LOAD((t+1),(s)+1); \ - STEPD0_LOAD((t)+2,(s)+2); \ - STEPD0_LOAD((t)+3,(s)+3) - -#define STEPUP4(fn, t, s, loadk...) \ - STEP##fn##_UPDATE(t,s,); \ - STEP##fn##_UPDATE((t)+1,(s)+1,); \ - STEP##fn##_UPDATE((t)+2,(s)+2,); \ - STEP##fn##_UPDATE((t)+3,(s)+3,loadk) - -#define STEPUP20(fn, t, s, loadk...) \ - STEPUP4(fn, t, s,); \ - STEPUP4(fn, (t)+4, (s)+4,); \ - STEPUP4(fn, (t)+8, (s)+8,); \ - STEPUP4(fn, (t)+12, (s)+12,); \ - STEPUP4(fn, (t)+16, (s)+16, loadk) - - .globl ppc_sha1_core -ppc_sha1_core: - stwu %r1,-80(%r1) - stmw %r13,4(%r1) - - /* Load up A - E */ - lmw %r27,0(%r3) - - mtctr %r5 - -1: - LOADW(0) - lis %r5,0x5a82 - mr RE(0),%r31 - LOADW(1) - mr RD(0),%r30 - mr RC(0),%r29 - LOADW(2) - ori %r5,%r5,0x7999 /* K0-19 */ - mr RB(0),%r28 - LOADW(3) - mr RA(0),%r27 - - STEP0_LOAD4(0, 4) - STEP0_LOAD4(4, 8) - STEP0_LOAD4(8, 12) - STEPUP4(D0, 12, 16,) - STEPUP4(D0, 16, 20, lis %r5,0x6ed9) - - ori %r5,%r5,0xeba1 /* K20-39 */ - STEPUP20(D1, 20, 24, lis %r5,0x8f1b) - - ori %r5,%r5,0xbcdc /* K40-59 */ - STEPUP20(D2, 40, 44, lis %r5,0xca62) - - ori %r5,%r5,0xc1d6 /* K60-79 */ - STEPUP4(D1, 60, 64,) - STEPUP4(D1, 64, 68,) - STEPUP4(D1, 68, 72,) - STEPUP4(D1, 72, 76,) - addi %r4,%r4,64 - STEPD1(76) - STEPD1(77) - STEPD1(78) - STEPD1(79) - - /* Add results to original values */ - add %r31,%r31,RE(0) - add %r30,%r30,RD(0) - add %r29,%r29,RC(0) - add %r28,%r28,RB(0) - add %r27,%r27,RA(0) - - bdnz 1b - - /* Save final hash, restore registers, and return */ - stmw %r27,0(%r3) - lmw %r13,4(%r1) - addi %r1,%r1,80 - blr -- 2.37.3.1406.g184357183a6 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v3 2/2] Makefile: use $(OBJECTS) instead of $(C_OBJ) 2022-08-31 9:18 ` [PATCH v3 0/2] Makefile + hash.h: remove PPC_SHA1 implementation Ævar Arnfjörð Bjarmason 2022-08-31 9:18 ` [PATCH v3 1/2] " Ævar Arnfjörð Bjarmason @ 2022-08-31 9:18 ` Ævar Arnfjörð Bjarmason 2022-08-31 21:44 ` Junio C Hamano 1 sibling, 1 reply; 17+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-08-31 9:18 UTC (permalink / raw) To: git Cc: Junio C Hamano, Mike Hommey, brian m . carlson, Carlo Marcelo Arenas Belón, Taylor Blau, Ævar Arnfjörð Bjarmason In the preceding commit $(C_OBJ) added in c373991375a (Makefile: list generated object files in OBJECTS, 2010-01-26) became synonymous with $(OBJECTS). Let's avoid the indirection and use the $(OBJECTS) variable directly instead. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- Makefile | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/Makefile b/Makefile index 7feda7e79be..8956cace8eb 100644 --- a/Makefile +++ b/Makefile @@ -2590,9 +2590,7 @@ missing_compdb_dir = compdb_args = endif -C_OBJ := $(OBJECTS) - -$(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) +$(OBJECTS): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< %.s: %.c GIT-CFLAGS FORCE @@ -3084,7 +3082,7 @@ t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS) $(REFTABLE_TEST_LIB) check-sha1:: t/helper/test-tool$X t/helper/test-sha1.sh -SP_OBJ = $(patsubst %.o,%.sp,$(C_OBJ)) +SP_OBJ = $(patsubst %.o,%.sp,$(OBJECTS)) $(SP_OBJ): %.sp: %.c %.o $(QUIET_SP)cgcc -no-compile $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) \ -- 2.37.3.1406.g184357183a6 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/2] Makefile: use $(OBJECTS) instead of $(C_OBJ) 2022-08-31 9:18 ` [PATCH v3 2/2] Makefile: use $(OBJECTS) instead of $(C_OBJ) Ævar Arnfjörð Bjarmason @ 2022-08-31 21:44 ` Junio C Hamano 2022-09-01 14:52 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2022-08-31 21:44 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Mike Hommey, brian m . carlson, Carlo Marcelo Arenas Belón, Taylor Blau Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > In the preceding commit $(C_OBJ) added in c373991375a (Makefile: list > generated object files in OBJECTS, 2010-01-26) became synonymous with > $(OBJECTS). Let's avoid the indirection and use the $(OBJECTS) > variable directly instead. > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > Makefile | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) This is a declaration that we would never ever build .o files out of sources other than .c files. While it does make sense to have it outside the scope of [PATCH 1/2], I am not sure if it even belongs to the same series. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/2] Makefile: use $(OBJECTS) instead of $(C_OBJ) 2022-08-31 21:44 ` Junio C Hamano @ 2022-09-01 14:52 ` Ævar Arnfjörð Bjarmason 2022-09-01 15:48 ` Junio C Hamano 0 siblings, 1 reply; 17+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-09-01 14:52 UTC (permalink / raw) To: Junio C Hamano Cc: git, Mike Hommey, brian m . carlson, Carlo Marcelo Arenas Belón, Taylor Blau On Wed, Aug 31 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> In the preceding commit $(C_OBJ) added in c373991375a (Makefile: list >> generated object files in OBJECTS, 2010-01-26) became synonymous with >> $(OBJECTS). Let's avoid the indirection and use the $(OBJECTS) >> variable directly instead. >> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> >> --- >> Makefile | 6 ++---- >> 1 file changed, 2 insertions(+), 4 deletions(-) > > This is a declaration that we would never ever build .o files out of > sources other than .c files. While it does make sense to have it > outside the scope of [PATCH 1/2], I am not sure if it even belongs > to the same series. I think it does. Before this the C_OBJ would be: C_OBJ := $(filter-out $(ASM_OBJ),$(OBJECTS)) but after 1/2 it's the same as $(OBJECTS). An earlier iteration of this did this cleanup "while we're at it" (which I do think makes sense as an atomic change), but I got the feedback that the cleanup wasn't strictly necessary. But as 1/2 has removed the ability to build those $(ASM_OBJ), as we had only one of those, I don't think keeping this particular bit of indirection makes sense. Of course it doesn't really matter at all, the real change is the removal of $(ASM_OBJ). If we do start building *.o files out of *.S files (or other non-*.c) again we'll need new rules anyway. I think we should just add any such variables back then, and not keep this small bit of dead husk around. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/2] Makefile: use $(OBJECTS) instead of $(C_OBJ) 2022-09-01 14:52 ` Ævar Arnfjörð Bjarmason @ 2022-09-01 15:48 ` Junio C Hamano 0 siblings, 0 replies; 17+ messages in thread From: Junio C Hamano @ 2022-09-01 15:48 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Mike Hommey, brian m . carlson, Carlo Marcelo Arenas Belón, Taylor Blau Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> This is a declaration that we would never ever build .o files out of >> sources other than .c files. While it does make sense to have it >> outside the scope of [PATCH 1/2], I am not sure if it even belongs >> to the same series. > > I think it does. Before this the C_OBJ would be: > > C_OBJ := $(filter-out $(ASM_OBJ),$(OBJECTS)) > > but after 1/2 it's the same as $(OBJECTS). An earlier iteration of this > did this cleanup "while we're at it" (which I do think makes sense as an > atomic change), but I got the feedback that the cleanup wasn't strictly > necessary. > > But as 1/2 has removed the ability to build those $(ASM_OBJ), as we had > only one of those, I don't think keeping this particular bit of > indirection makes sense. You are not thinking for longer term to help project maintenance. This change removes distinction between C_OBJ and OBJECTS, only because the sources to the objects we HAPPEN TO have are only C files. It is premature and short sighted to declare that it has to stay that way forever. And such a declaration is not something we would casually make "while at it" in a topic like this. When we add a source written in another language, say xyzzy, to be compiled into an object file, we'd add $(XYZZY_OBJ), and they will become part of $(OBJECTS), but the current rule to create $(C_OBJ) will not apply to $(XYZZY_OBJ). But you do this: -$(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) +$(OBJECTS): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir) $(QUIET_CC)$(CC) -o $*.o -c ... $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< Right now, we know where this patch affected the build procedure, because the patch highlights what is being changed. But when future developers need to produce some files that belong to $(OBJECTS) out of source files that are not .c, they first need to locate the above hunk and revert it. I do not see the benefit of being hostile to future developers with this patch. Not before we know that it is not likely that we would add any non-C sources in the future, by running with 1/2 alone for a year or two. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2022-09-01 15:48 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-02-12 8:56 SHA1dc on mac Mike Hommey 2020-02-12 16:46 ` Eric Sunshine 2020-02-12 22:31 ` Mike Hommey 2020-02-12 22:40 ` Junio C Hamano 2020-02-23 22:37 ` [PATCH] Remove non-SHA1dc sha1 implementations Mike Hommey 2020-02-24 4:47 ` Jeff King 2020-02-24 4:52 ` Jeff King 2022-03-19 1:02 ` [PATCH] ppc: remove custom SHA-1 implementation Ævar Arnfjörð Bjarmason 2022-03-21 16:39 ` Junio C Hamano 2022-03-21 17:06 ` [PATCH v2] " Ævar Arnfjörð Bjarmason 2022-03-21 21:19 ` brian m. carlson 2022-08-31 9:18 ` [PATCH v3 0/2] Makefile + hash.h: remove PPC_SHA1 implementation Ævar Arnfjörð Bjarmason 2022-08-31 9:18 ` [PATCH v3 1/2] " Ævar Arnfjörð Bjarmason 2022-08-31 9:18 ` [PATCH v3 2/2] Makefile: use $(OBJECTS) instead of $(C_OBJ) Ævar Arnfjörð Bjarmason 2022-08-31 21:44 ` Junio C Hamano 2022-09-01 14:52 ` Ævar Arnfjörð Bjarmason 2022-09-01 15:48 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).