From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: git@vger.kernel.org
Cc: "Derrick Stolee" <stolee@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Duy Nguyen" <pclouds@gmail.com>,
"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: [PATCH v3 00/12] Base SHA-256 implementation
Date: Mon, 22 Oct 2018 02:43:30 +0000 [thread overview]
Message-ID: <20181022024342.489564-1-sandals@crustytoothpaste.net> (raw)
This series provides a functional SHA-256 implementation and wires it
up, along with some housekeeping patches to make it suitable for
testing.
Changes from v2:
* Improve commit messages to include timing and performance information.
* Improve commit messages to be less ambiguous and more friendly to a
wider variety of English speakers.
* Prefer functions taking struct git_hash_algo in hex.c.
* Port pieces of the block-sha1 implementation over to the block-sha256
implementation for better compatibility.
* Drop patch 13 in favor of further discussion about the best way
forward for versioning commit graph.
* Rename the test so as to have a different number from other tests.
* Rebase on master.
Changes from v1:
* Add a hash_to_hex function mirroring sha1_to_hex, but for
the_hash_algo.
* Strip commit message explanation about why we chose SHA-256.
* Rebase on master
* Strip leading whitespace from commit message.
* Improve commit-graph patch to cover new code added since v1.
* Be more honest about the scope of work involved in porting the SHA-256
implementation out of libtomcrypt.
* Revert change to limit hashcmp to 20 bytes.
brian m. carlson (12):
sha1-file: rename algorithm to "sha1"
sha1-file: provide functions to look up hash algorithms
hex: introduce functions to print arbitrary hashes
cache: make hashcmp and hasheq work with larger hashes
t: add basic tests for our SHA-1 implementation
t: make the sha1 test-tool helper generic
sha1-file: add a constant for hash block size
t/helper: add a test helper to compute hash speed
commit-graph: convert to using the_hash_algo
Add a base implementation of SHA-256 support
sha256: add an SHA-256 implementation using libgcrypt
hash: add an SHA-256 implementation using OpenSSL
Makefile | 22 +++
cache.h | 51 ++++---
commit-graph.c | 33 ++---
hash.h | 41 +++++-
hex.c | 32 +++--
sha1-file.c | 70 +++++++++-
sha256/block/sha256.c | 186 ++++++++++++++++++++++++++
sha256/block/sha256.h | 26 ++++
sha256/gcrypt.h | 30 +++++
t/helper/test-hash-speed.c | 61 +++++++++
t/helper/{test-sha1.c => test-hash.c} | 19 +--
t/helper/test-sha1.c | 52 +------
t/helper/test-sha256.c | 7 +
t/helper/test-tool.c | 2 +
t/helper/test-tool.h | 4 +
t/t0015-hash.sh | 54 ++++++++
16 files changed, 586 insertions(+), 104 deletions(-)
create mode 100644 sha256/block/sha256.c
create mode 100644 sha256/block/sha256.h
create mode 100644 sha256/gcrypt.h
create mode 100644 t/helper/test-hash-speed.c
copy t/helper/{test-sha1.c => test-hash.c} (65%)
create mode 100644 t/helper/test-sha256.c
create mode 100755 t/t0015-hash.sh
Range-diff against v2:
1: b5845548ac < -: ---------- :hash-impl
2: 804ec2fd27 ! 1: cf9f7f5620 sha1-file: rename algorithm to "sha1"
@@ -5,14 +5,14 @@
The transition plan anticipates us using a syntax such as "^{sha1}" for
disambiguation. Since this is a syntax some people will be typing a
lot, it makes sense to provide a short, easy-to-type syntax. Omitting
- the dash doesn't create any ambiguity, but it does make it shorter and
- easier to type, especially for touch typists. In addition, the
- transition plan already uses "sha1" in this context.
+ the dash doesn't create any ambiguity; however, it does make the syntax
+ shorter and easier to type, especially for touch typists. In addition,
+ the transition plan already uses "sha1" in this context.
Rename the name of SHA-1 implementation to "sha1".
Note that this change creates no backwards compatibility concerns, since
- we haven't yet used this field in any serialized data formats.
+ we haven't yet used this field in any configuration settings.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
3: 5196e00b26 ! 2: 0144deaebe sha1-file: provide functions to look up hash algorithms
@@ -27,7 +27,7 @@
+/* Identical, except based on the format ID. */
+int hash_algo_by_id(uint32_t format_id);
+/* Identical, except for a pointer to struct git_hash_algo. */
-+inline int hash_algo_by_ptr(const struct git_hash_algo *p)
++static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
+{
+ return p - hash_algos;
+}
4: 5873510d0a ! 3: b74858fb03 hex: introduce functions to print arbitrary hashes
@@ -45,13 +45,13 @@
-extern char *oid_to_hex_r(char *out, const struct object_id *oid);
-extern char *sha1_to_hex(const unsigned char *sha1); /* static buffer result! */
-extern char *oid_to_hex(const struct object_id *oid); /* same static buffer as sha1_to_hex */
-+char *hash_to_hex_algo_r(char *buffer, const unsigned char *hash, int algo);
++char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const struct git_hash_algo *);
+char *sha1_to_hex_r(char *out, const unsigned char *sha1);
+char *oid_to_hex_r(char *out, const struct object_id *oid);
-+char *hash_to_hex_algo(const unsigned char *hash, int algo); /* static buffer result! */
-+char *sha1_to_hex(const unsigned char *sha1); /* same static buffer */
-+char *hash_to_hex(const unsigned char *hash); /* same static buffer */
-+char *oid_to_hex(const struct object_id *oid); /* same static buffer */
++char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *); /* static buffer result! */
++char *sha1_to_hex(const unsigned char *sha1); /* same static buffer */
++char *hash_to_hex(const unsigned char *hash); /* same static buffer */
++char *oid_to_hex(const struct object_id *oid); /* same static buffer */
/*
* Parse a 40-character hexadecimal object ID starting from hex, updating the
@@ -64,7 +64,7 @@
}
-char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
-+static inline char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
++inline char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
+ const struct git_hash_algo *algop)
{
static const char hex[] = "0123456789abcdef";
@@ -83,44 +83,39 @@
}
-char *oid_to_hex_r(char *buffer, const struct object_id *oid)
-+char *hash_to_hex_algo_r(char *buffer, const unsigned char *hash, int algo)
++char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
{
- return sha1_to_hex_r(buffer, oid->hash);
-+ return hash_to_hex_algop_r(buffer, hash, &hash_algos[algo]);
++ return hash_to_hex_algop_r(buffer, sha1, &hash_algos[GIT_HASH_SHA1]);
}
-char *sha1_to_hex(const unsigned char *sha1)
-+char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
-+{
-+ return hash_to_hex_algo_r(buffer, sha1, GIT_HASH_SHA1);
-+}
-+
+char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+{
+ return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
+}
+
-+char *hash_to_hex_algo(const unsigned char *hash, int algo)
++char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop)
{
static int bufno;
static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
- return sha1_to_hex_r(hexbuffer[bufno], sha1);
-+ return hash_to_hex_algo_r(hexbuffer[bufno], hash, algo);
++ return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
+}
+
+char *sha1_to_hex(const unsigned char *sha1)
+{
-+ return hash_to_hex_algo(sha1, GIT_HASH_SHA1);
++ return hash_to_hex_algop(sha1, &hash_algos[GIT_HASH_SHA1]);
+}
+
+char *hash_to_hex(const unsigned char *hash)
+{
-+ return hash_to_hex_algo(hash, hash_algo_by_ptr(the_hash_algo));
++ return hash_to_hex_algop(hash, the_hash_algo);
}
char *oid_to_hex(const struct object_id *oid)
{
- return sha1_to_hex(oid->hash);
-+ return hash_to_hex_algo(oid->hash, hash_algo_by_ptr(the_hash_algo));
++ return hash_to_hex_algop(oid->hash, the_hash_algo);
}
5: e22bf99c93 = 4: e9703017a4 cache: make hashcmp and hasheq work with larger hashes
6: 02cbcae270 ! 5: ab85a834fd t: add basic tests for our SHA-1 implementation
@@ -11,10 +11,10 @@
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
- diff --git a/t/t0014-hash.sh b/t/t0014-hash.sh
+ diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh
new file mode 100755
--- /dev/null
- +++ b/t/t0014-hash.sh
+ +++ b/t/t0015-hash.sh
@@
+#!/bin/sh
+
7: e9ac5f265a ! 6: 962f6d8903 t: make the sha1 test-tool helper generic
@@ -23,7 +23,7 @@
TEST_BUILTINS_OBJS += test-json-writer.o
diff --git a/t/helper/test-sha1.c b/t/helper/test-hash.c
- similarity index 66%
+ similarity index 65%
copy from t/helper/test-sha1.c
copy to t/helper/test-hash.c
--- a/t/helper/test-sha1.c
@@ -78,7 +78,7 @@
+ fwrite(hash, 1, algop->rawsz, stdout);
else
- puts(sha1_to_hex(sha1));
-+ puts(hash_to_hex_algo(hash, algo));
++ puts(hash_to_hex_algop(hash, algop));
exit(0);
}
8: 44c9f27f9d = 7: 53addf4d58 sha1-file: add a constant for hash block size
9: 59be7f1f37 = 8: 9ace10faa2 t/helper: add a test helper to compute hash speed
10: 09f1286769 ! 9: 9adc56d01e commit-graph: convert to using the_hash_algo
@@ -4,7 +4,9 @@
Instead of using hard-coded constants for object sizes, use
the_hash_algo to look them up. In addition, use a function call to look
- up the object ID version and produce the correct value.
+ up the object ID version and produce the correct value. For now, we use
+ version 1, which means to use the default algorithm used in the rest of
+ the repository.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
@@ -78,13 +80,13 @@
(uint32_t)chunk_offset);
goto cleanup_fail;
@@
- int num_chunks;
int num_extra_edges;
struct commit_list *parent;
+ struct progress *progress = NULL;
+ const unsigned hashsz = the_hash_algo->rawsz;
- oids.nr = 0;
- oids.alloc = approximate_object_count() / 4;
+ if (!commit_graph_compatible(the_repository))
+ return;
@@
hashwrite_be32(f, GRAPH_SIGNATURE);
@@ -115,4 +117,4 @@
+ write_graph_chunk_data(f, hashsz, commits.list, commits.nr);
write_graph_chunk_large_edges(f, commits.list, commits.nr);
- close_commit_graph();
+ close_commit_graph(the_repository);
11: cdec8c1b49 ! 10: 8e82cb0dfb Add a base implementation of SHA-256 support
@@ -8,9 +8,10 @@
Add a basic implementation of SHA-256 based off libtomcrypt, which is in
the public domain. Optimize it and restructure it to meet our coding
- standards. Place it in a directory called "sha256" where it and any
- future implementations can live so as to avoid a proliferation of
- implementation directories.
+ standards. Pull in the update and final functions from the SHA-1 block
+ implementation, as we know these function correctly with all compilers.
+ This implementation is slower than SHA-1, but more performant
+ implementations will be introduced in future commits.
Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.
@@ -211,7 +212,7 @@
+void blk_SHA256_Init(blk_SHA256_CTX *ctx)
+{
+ ctx->offset = 0;
-+ ctx->length = 0;
++ ctx->size = 0;
+ ctx->state[0] = 0x6A09E667UL;
+ ctx->state[1] = 0xBB67AE85UL;
+ ctx->state[2] = 0x3C6EF372UL;
@@ -329,55 +330,61 @@
+ RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],63,0xc67178f2);
+
+#undef RND
++#undef Ch
++#undef Maj
++#undef S
++#undef R
++#undef Sigma0
++#undef Sigma1
++#undef Gamma0
++#undef Gamma1
+
+ for (i = 0; i < 8; i++) {
+ ctx->state[i] = ctx->state[i] + S[i];
+ }
+}
+
-+#define MIN(x, y) ((x) < (y) ? (x) : (y))
+void blk_SHA256_Update(blk_SHA256_CTX *ctx, const void *data, size_t len)
+{
-+ const unsigned char *in = data;
-+ size_t n;
-+ ctx->length += len;
-+ while (len > 0) {
-+ if (!ctx->offset && len >= BLKSIZE) {
-+ blk_SHA256_Transform(ctx, in);
-+ in += BLKSIZE;
-+ len -= BLKSIZE;
-+ } else {
-+ n = MIN(len, (BLKSIZE - ctx->offset));
-+ memcpy(ctx->buf + ctx->offset, in, n);
-+ ctx->offset += n;
-+ in += n;
-+ len -= n;
-+ if (ctx->offset == BLKSIZE) {
-+ blk_SHA256_Transform(ctx, ctx->buf);
-+ ctx->offset = 0;
-+ }
-+ }
++ unsigned int len_buf = ctx->size & 63;
++
++ ctx->size += len;
++
++ /* Read the data into buf and process blocks as they get full */
++ if (len_buf) {
++ unsigned int left = 64 - len_buf;
++ if (len < left)
++ left = len;
++ memcpy(len_buf + ctx->buf, data, left);
++ len_buf = (len_buf + left) & 63;
++ len -= left;
++ data = ((const char *)data + left);
++ if (len_buf)
++ return;
++ blk_SHA256_Transform(ctx, ctx->buf);
+ }
++ while (len >= 64) {
++ blk_SHA256_Transform(ctx, data);
++ data = ((const char *)data + 64);
++ len -= 64;
++ }
++ if (len)
++ memcpy(ctx->buf, data, len);
+}
+
+void blk_SHA256_Final(unsigned char *digest, blk_SHA256_CTX *ctx)
+{
-+ const unsigned trip = BLKSIZE - sizeof(ctx->length);
++ static const unsigned char pad[64] = { 0x80 };
++ unsigned int padlen[2];
+ int i;
+
-+ ctx->length <<= 3;
-+ ctx->buf[ctx->offset++] = 0x80;
++ /* Pad with a binary 1 (ie 0x80), then zeroes, then length */
++ padlen[0] = htonl((uint32_t)(ctx->size >> 29));
++ padlen[1] = htonl((uint32_t)(ctx->size << 3));
+
-+ if (ctx->offset > trip) {
-+ memset(ctx->buf + ctx->offset, 0, BLKSIZE - ctx->offset);
-+ blk_SHA256_Transform(ctx, ctx->buf);
-+ ctx->offset = 0;
-+ }
-+
-+ memset(ctx->buf + ctx->offset, 0, BLKSIZE - ctx->offset - sizeof(ctx->length));
-+
-+ put_be64(ctx->buf + trip, ctx->length);
-+ blk_SHA256_Transform(ctx, ctx->buf);
++ i = ctx->size & 63;
++ blk_SHA256_Update(ctx, pad, 1 + (63 & (55 - i)));
++ blk_SHA256_Update(ctx, padlen, 8);
+
+ /* copy output */
+ for (i = 0; i < 8; i++, digest += sizeof(uint32_t))
@@ -398,7 +405,7 @@
+
+struct blk_SHA256_CTX {
+ uint32_t state[8];
-+ uint64_t length;
++ uint64_t size;
+ uint32_t offset;
+ uint8_t buf[blk_SHA256_BLKSIZE];
+};
@@ -453,9 +460,9 @@
int cmd__strcmp_offset(int argc, const char **argv);
int cmd__string_list(int argc, const char **argv);
- diff --git a/t/t0014-hash.sh b/t/t0014-hash.sh
- --- a/t/t0014-hash.sh
- +++ b/t/t0014-hash.sh
+ diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh
+ --- a/t/t0015-hash.sh
+ +++ b/t/t0015-hash.sh
@@
grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
'
12: d085802a89 ! 11: 9e0061bd74 sha256: add an SHA-256 implementation using libgcrypt
@@ -9,10 +9,15 @@
Most systems with GnuPG will also have libgcrypt, since it is a
dependency of GnuPG. libgcrypt is also faster than the SHA1DC
- implementation for messages of a few KiB and larger. It is licensed
- under the LGPL 2.1, which is compatible with the GPL.
+ implementation for messages of a few KiB and larger.
- Add an implementation of SHA-256 that uses libgcrypt.
+ For comparison, on a Core i7-6600U, this implementation processes 16 KiB
+ chunks at 355 MiB/s while SHA1DC processes equivalent chunks at 337
+ MiB/s.
+
+ In addition, libgcrypt is licensed under the LGPL 2.1, which is
+ compatible with the GPL. Add an implementation of SHA-256 that uses
+ libgcrypt.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
13: 4e4daf4000 ! 12: 128d6b8150 hash: add an SHA-256 implementation using OpenSSL
@@ -5,6 +5,12 @@
We already have OpenSSL routines available for SHA-1, so add routines
for SHA-256 as well.
+ On a Core i7-6600U, this SHA-256 implementation compares favorably to
+ the SHA1DC SHA-1 implementation:
+
+ SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks)
+ SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)
+
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
diff --git a/Makefile b/Makefile
14: f206f45426 < -: ---------- commit-graph: specify OID version for SHA-256
next reply other threads:[~2018-10-22 2:44 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-22 2:43 brian m. carlson [this message]
2018-10-22 2:43 ` [PATCH v3 01/12] sha1-file: rename algorithm to "sha1" brian m. carlson
2018-10-22 2:43 ` [PATCH v3 02/12] sha1-file: provide functions to look up hash algorithms brian m. carlson
2018-10-22 2:43 ` [PATCH v3 03/12] hex: introduce functions to print arbitrary hashes brian m. carlson
2018-10-22 2:43 ` [PATCH v3 04/12] cache: make hashcmp and hasheq work with larger hashes brian m. carlson
2018-10-22 2:43 ` [PATCH v3 05/12] t: add basic tests for our SHA-1 implementation brian m. carlson
2018-10-22 2:43 ` [PATCH v3 06/12] t: make the sha1 test-tool helper generic brian m. carlson
2018-10-22 2:43 ` [PATCH v3 07/12] sha1-file: add a constant for hash block size brian m. carlson
2018-10-22 2:43 ` [PATCH v3 08/12] t/helper: add a test helper to compute hash speed brian m. carlson
2018-10-22 2:43 ` [PATCH v3 09/12] commit-graph: convert to using the_hash_algo brian m. carlson
2018-10-24 13:22 ` Derrick Stolee
2018-10-22 2:43 ` [PATCH v3 10/12] Add a base implementation of SHA-256 support brian m. carlson
2018-10-22 9:44 ` SZEDER Gábor
2018-10-23 0:43 ` brian m. carlson
2018-10-27 14:34 ` Jakub Narebski
2018-10-22 2:43 ` [PATCH v3 11/12] sha256: add an SHA-256 implementation using libgcrypt brian m. carlson
2018-10-22 2:43 ` [PATCH v3 12/12] hash: add an SHA-256 implementation using OpenSSL brian m. carlson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181022024342.489564-1-sandals@crustytoothpaste.net \
--to=sandals@crustytoothpaste.net \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
--cc=stolee@gmail.com \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).