git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/13] Base SHA-256 implementation
@ 2018-10-15  2:18 brian m. carlson
  2018-10-15  2:18 ` [PATCH v2 01/13] sha1-file: rename algorithm to "sha1" brian m. carlson
                   ` (15 more replies)
  0 siblings, 16 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-15  2:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

This series provides an actual SHA-256 implementation and wires it up,
along with some housekeeping patches to make it suitable for testing.

New in this version is a patch which reverts the change to limit hashcmp
to 20 bytes.  I've taken care to permit the compiler to inline as much
as possible for efficiency, but it would be helpful to see what the
performance impact of these changes is on real-life workflows, or with
MSVC and other non-GCC and non-clang compilers.  The resulting change is
uglier and more duplicative than I wanted, but oh, well.

I didn't attempt to pull in the full complement of code from libtomcrypt
to try to show the changes made, since that would have involved
importing a significant quantity of code in order to make things work.

I realize the increase to GIT_MAX_HEXSZ will result in an increase in
memory usage, but we're going to need to do it at some point, and the
sooner the code is in the codebase, the sooner people can play around
with it and test it.

This piece should be the final piece of preparatory work required for a
fully functioning SHA-256-only Git.  Additional work should be able to
come into the testsuite and codebase without needing to build on work in
any series after this one.

v1, which was RFC, can be seen at
https://public-inbox.org/git/20180829005857.980820-1-sandals@crustytoothpaste.net/.

Changes from v1:
* Add a hash_to_hex function mirroring sha1_to_hex, but for
  the_hash_algo.
* Strip commit message explanation about why we chose SHA-256.
* Rebase on master
* Strip leading whitespace from commit message.
* Improve commit-graph patch to cover new code added since v1.
* Be more honest about the scope of work involved in porting the SHA-256
  implementation out of libtomcrypt.
* Revert change to limit hashcmp to 20 bytes.

brian m. carlson (13):
  sha1-file: rename algorithm to "sha1"
  sha1-file: provide functions to look up hash algorithms
  hex: introduce functions to print arbitrary hashes
  cache: make hashcmp and hasheq work with larger hashes
  t: add basic tests for our SHA-1 implementation
  t: make the sha1 test-tool helper generic
  sha1-file: add a constant for hash block size
  t/helper: add a test helper to compute hash speed
  commit-graph: convert to using the_hash_algo
  Add a base implementation of SHA-256 support
  sha256: add an SHA-256 implementation using libgcrypt
  hash: add an SHA-256 implementation using OpenSSL
  commit-graph: specify OID version for SHA-256

 Makefile                              |  22 ++++
 cache.h                               |  51 +++++---
 commit-graph.c                        |  40 +++---
 hash.h                                |  41 +++++-
 hex.c                                 |  37 ++++--
 sha1-file.c                           |  70 +++++++++-
 sha256/block/sha256.c                 | 180 ++++++++++++++++++++++++++
 sha256/block/sha256.h                 |  26 ++++
 sha256/gcrypt.h                       |  30 +++++
 t/helper/test-hash-speed.c            |  61 +++++++++
 t/helper/{test-sha1.c => test-hash.c} |  19 +--
 t/helper/test-sha1.c                  |  52 +-------
 t/helper/test-sha256.c                |   7 +
 t/helper/test-tool.c                  |   2 +
 t/helper/test-tool.h                  |   4 +
 t/t0014-hash.sh                       |  54 ++++++++
 16 files changed, 592 insertions(+), 104 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 sha256/gcrypt.h
 create mode 100644 t/helper/test-hash-speed.c
 copy t/helper/{test-sha1.c => test-hash.c} (66%)
 create mode 100644 t/helper/test-sha256.c
 create mode 100755 t/t0014-hash.sh

Range-diff against v1:
 1:  0a96c59452 =  1:  804ec2fd27 sha1-file: rename algorithm to "sha1"
 2:  65f9feba41 =  2:  5196e00b26 sha1-file: provide functions to look up hash algorithms
 3:  b6d960121d !  3:  5873510d0a hex: introduce functions to print arbitrary hashes
    @@ -13,9 +13,12 @@
         it.
     
         We use the variant taking the algorithm structure pointer as the
    -    internal variant, since in the future we'll want to replace sha1_to_hex
    -    with a hash_to_hex that handles the_hash_algo, and taking an algorithm
    -    pointer is the easiest way to handle all of the variants in use.
    +    internal variant, since taking an algorithm pointer is the easiest way
    +    to handle all of the variants in use.
    +
    +    Note that we maintain these functions because there are hashes which
    +    must change based on the hash algorithm in use but are not object IDs
    +    (such as pack checksums).
     
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     
    @@ -47,6 +50,7 @@
     +char *oid_to_hex_r(char *out, const struct object_id *oid);
     +char *hash_to_hex_algo(const unsigned char *hash, int algo);	/* static buffer result! */
     +char *sha1_to_hex(const unsigned char *sha1);			/* same static buffer */
    ++char *hash_to_hex(const unsigned char *hash);			/* same static buffer */
     +char *oid_to_hex(const struct object_id *oid);			/* same static buffer */
      
      /*
    @@ -108,6 +112,11 @@
     +char *sha1_to_hex(const unsigned char *sha1)
     +{
     +	return hash_to_hex_algo(sha1, GIT_HASH_SHA1);
    ++}
    ++
    ++char *hash_to_hex(const unsigned char *hash)
    ++{
    ++	return hash_to_hex_algo(hash, hash_algo_by_ptr(the_hash_algo));
      }
      
      char *oid_to_hex(const struct object_id *oid)
 -:  ---------- >  4:  e22bf99c93 cache: make hashcmp and hasheq work with larger hashes
 4:  984698c42a =  5:  02cbcae270 t: add basic tests for our SHA-1 implementation
 5:  61632644bd !  6:  e9ac5f265a t: make the sha1 test-tool helper generic
    @@ -14,7 +14,7 @@
      --- a/Makefile
      +++ b/Makefile
     @@
    - TEST_BUILTINS_OBJS += test-dump-split-index.o
    + TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
      TEST_BUILTINS_OBJS += test-example-decorate.o
      TEST_BUILTINS_OBJS += test-genrandom.o
     +TEST_BUILTINS_OBJS += test-hash.o
    @@ -147,7 +147,7 @@
      --- a/t/helper/test-tool.h
      +++ b/t/helper/test-tool.h
     @@
    - int cmd__wildmatch(int argc, const char **argv);
    + #endif
      int cmd__write_cache(int argc, const char **argv);
      
     +int cmd_hash_impl(int ac, const char **av, int algo);
 6:  79d5246af6 =  7:  44c9f27f9d sha1-file: add a constant for hash block size
 7:  a83ceb35a8 =  8:  59be7f1f37 t/helper: add a test helper to compute hash speed
 8:  ef3c5f3370 !  9:  09f1286769  commit-graph: convert to using the_hash_algo
    @@ -1,6 +1,6 @@
     Author: brian m. carlson <sandals@crustytoothpaste.net>
     
    -     commit-graph: convert to using the_hash_algo
    +    commit-graph: convert to using the_hash_algo
     
         Instead of using hard-coded constants for object sizes, use
         the_hash_algo to look them up.  In addition, use a function call to look
    @@ -12,6 +12,12 @@
      --- a/commit-graph.c
      +++ b/commit-graph.c
     @@
    + #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
    + #define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
    + 
    +-#define GRAPH_DATA_WIDTH 36
    ++#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
    + 
      #define GRAPH_VERSION_1 0x1
      #define GRAPH_VERSION GRAPH_VERSION_1
      
 9:  298b15b577 ! 10:  cdec8c1b49 Add a base implementation of SHA-256 support
    @@ -3,39 +3,14 @@
         Add a base implementation of SHA-256 support
     
         SHA-1 is weak and we need to transition to a new hash function.  For
    -    some time, we have referred to this new function as NewHash.
    -
    -    The selection criteria for NewHash specify that it should (a) be 256
    -    bits in length, (b) have high quality implementations available, (c)
    -    should match Git's needs in terms of security, and (d) ideally, be fast
    -    to compute.
    -
    -    SHA-256 has a variety of high quality implementations across various
    -    libraries.  It is implemented by every cryptographic library we support
    -    and is available on every platform and in almost every programming
    -    language.  It is often highly optimized, since it is commonly used in
    -    TLS and elsewhere.  Additionally, there are various command line
    -    utilities that implement it, which is useful for educational and testing
    -    purposes.
    -
    -    SHA-256 is presently considered secure and has received a reasonable
    -    amount of cryptanalysis in the literature.  It is, admittedly, not
    -    resistant to length extension attacks, but Git object storage is immune
    -    to those due to the length field at the beginning.
    -
    -    SHA-256 is somewhat slower to compute than SHA-1 in software.  However,
    -    since our default SHA-1 implementation is collision-detecting, a
    -    reasonable cryptographic library implementation of SHA-256 will actually
    -    be faster than SHA-256.  In addition, modern ARM and AMD processors (and
    -    some Intel processors) contain instructions for implementing SHA-256 in
    -    hardware, making it the fastest possible option.
    -
    -    There are other reasons to select SHA-256.  With signed commits and
    -    tags, it's possible to use SHA-256 for signatures and therefore have to
    -    rely on only one hash algorithm for security.
    +    some time, we have referred to this new function as NewHash.  Recently,
    +    we decided to pick SHA-256 as NewHash.
     
         Add a basic implementation of SHA-256 based off libtomcrypt, which is in
    -    the public domain.  Optimize it and tidy it somewhat.
    +    the public domain.  Optimize it and restructure it to meet our coding
    +    standards.  Place it in a directory called "sha256" where it and any
    +    future implementations can live so as to avoid a proliferation of
    +    implementation directories.
     
         Wire up SHA-256 in the list of hash algorithms, and add a test that the
         algorithm works correctly.
    @@ -51,8 +26,8 @@
      +++ b/Makefile
     @@
      TEST_BUILTINS_OBJS += test-scrap-cache-tree.o
    - TEST_BUILTINS_OBJS += test-sha1-array.o
      TEST_BUILTINS_OBJS += test-sha1.o
    + TEST_BUILTINS_OBJS += test-sha1-array.o
     +TEST_BUILTINS_OBJS += test-sha256.o
      TEST_BUILTINS_OBJS += test-sigchain.o
      TEST_BUILTINS_OBJS += test-strcmp-offset.o
    @@ -459,8 +434,8 @@
      +++ b/t/helper/test-tool.c
     @@
      	{ "scrap-cache-tree", cmd__scrap_cache_tree },
    - 	{ "sha1-array", cmd__sha1_array },
      	{ "sha1", cmd__sha1 },
    + 	{ "sha1-array", cmd__sha1_array },
     +	{ "sha256", cmd__sha256 },
      	{ "sigchain", cmd__sigchain },
      	{ "strcmp-offset", cmd__strcmp_offset },
    @@ -471,8 +446,8 @@
      +++ b/t/helper/test-tool.h
     @@
      int cmd__scrap_cache_tree(int argc, const char **argv);
    - int cmd__sha1_array(int argc, const char **argv);
      int cmd__sha1(int argc, const char **argv);
    + int cmd__sha1_array(int argc, const char **argv);
     +int cmd__sha256(int argc, const char **argv);
      int cmd__sigchain(int argc, const char **argv);
      int cmd__strcmp_offset(int argc, const char **argv);
10:  c52b4be13c = 11:  d085802a89 sha256: add an SHA-256 implementation using libgcrypt
11:  225ec075d1 = 12:  4e4daf4000 hash: add an SHA-256 implementation using OpenSSL
12:  04a3ac6f29 = 13:  f206f45426 commit-graph: specify OID version for SHA-256

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 01/13] sha1-file: rename algorithm to "sha1"
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
@ 2018-10-15  2:18 ` brian m. carlson
  2018-10-16 15:17   ` Duy Nguyen
  2018-10-15  2:18 ` [PATCH v2 02/13] sha1-file: provide functions to look up hash algorithms brian m. carlson
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 44+ messages in thread
From: brian m. carlson @ 2018-10-15  2:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

The transition plan anticipates us using a syntax such as "^{sha1}" for
disambiguation.  Since this is a syntax some people will be typing a
lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
the dash doesn't create any ambiguity, but it does make it shorter and
easier to type, especially for touch typists.  In addition, the
transition plan already uses "sha1" in this context.

Rename the name of SHA-1 implementation to "sha1".

Note that this change creates no backwards compatibility concerns, since
we haven't yet used this field in any serialized data formats.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 sha1-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index a4367b8f04..e29825f259 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -97,7 +97,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		NULL,
 	},
 	{
-		"sha-1",
+		"sha1",
 		/* "sha1", big-endian */
 		0x73686131,
 		GIT_SHA1_RAWSZ,

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 02/13] sha1-file: provide functions to look up hash algorithms
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
  2018-10-15  2:18 ` [PATCH v2 01/13] sha1-file: rename algorithm to "sha1" brian m. carlson
@ 2018-10-15  2:18 ` brian m. carlson
  2018-10-17 13:32   ` SZEDER Gábor
  2018-10-15  2:18 ` [PATCH v2 03/13] hex: introduce functions to print arbitrary hashes brian m. carlson
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 44+ messages in thread
From: brian m. carlson @ 2018-10-15  2:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

There are several ways we might refer to a hash algorithm: by name, such
as in the config file; by format ID, such as in a pack; or internally,
by a pointer to the hash_algos array.  Provide functions to look up hash
algorithms based on these various forms and return the internal constant
used for them.  If conversion to another form is necessary, this
internal constant can be used to look up the proper data in the
hash_algos array.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h      | 13 +++++++++++++
 sha1-file.c | 21 +++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/hash.h b/hash.h
index 7c8238bc2e..90f4344619 100644
--- a/hash.h
+++ b/hash.h
@@ -98,4 +98,17 @@ struct git_hash_algo {
 };
 extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
 
+/*
+ * Return a GIT_HASH_* constant based on the name.  Returns GIT_HASH_UNKNOWN if
+ * the name doesn't match a known algorithm.
+ */
+int hash_algo_by_name(const char *name);
+/* Identical, except based on the format ID. */
+int hash_algo_by_id(uint32_t format_id);
+/* Identical, except for a pointer to struct git_hash_algo. */
+inline int hash_algo_by_ptr(const struct git_hash_algo *p)
+{
+	return p - hash_algos;
+}
+
 #endif
diff --git a/sha1-file.c b/sha1-file.c
index e29825f259..3a75d515eb 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -122,6 +122,27 @@ const char *empty_blob_oid_hex(void)
 	return oid_to_hex_r(buf, the_hash_algo->empty_blob);
 }
 
+int hash_algo_by_name(const char *name)
+{
+	int i;
+	if (!name)
+		return GIT_HASH_UNKNOWN;
+	for (i = 1; i < GIT_HASH_NALGOS; i++)
+		if (!strcmp(name, hash_algos[i].name))
+			return i;
+	return GIT_HASH_UNKNOWN;
+}
+
+int hash_algo_by_id(uint32_t format_id)
+{
+	int i;
+	for (i = 1; i < GIT_HASH_NALGOS; i++)
+		if (format_id == hash_algos[i].format_id)
+			return i;
+	return GIT_HASH_UNKNOWN;
+}
+
+
 /*
  * This is meant to hold a *small* number of objects that you would
  * want read_sha1_file() to be able to return, but yet you do not want

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 03/13] hex: introduce functions to print arbitrary hashes
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
  2018-10-15  2:18 ` [PATCH v2 01/13] sha1-file: rename algorithm to "sha1" brian m. carlson
  2018-10-15  2:18 ` [PATCH v2 02/13] sha1-file: provide functions to look up hash algorithms brian m. carlson
@ 2018-10-15  2:18 ` brian m. carlson
  2018-10-16  1:54   ` Junio C Hamano
  2018-10-15  2:18 ` [PATCH v2 04/13] cache: make hashcmp and hasheq work with larger hashes brian m. carlson
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 44+ messages in thread
From: brian m. carlson @ 2018-10-15  2:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

Currently, we have functions that turn an arbitrary SHA-1 value or an
object ID into hex format, either using a static buffer or with a
user-provided buffer.  Add variants of these functions that can handle
an arbitrary hash algorithm, specified by constant.  Update the
documentation as well.

While we're at it, remove the "extern" declaration from this family of
functions, since it's not needed and our style now recommends against
it.

We use the variant taking the algorithm structure pointer as the
internal variant, since taking an algorithm pointer is the easiest way
to handle all of the variants in use.

Note that we maintain these functions because there are hashes which
must change based on the hash algorithm in use but are not object IDs
(such as pack checksums).

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h | 15 +++++++++------
 hex.c   | 37 +++++++++++++++++++++++++++++--------
 2 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/cache.h b/cache.h
index d508f3d4f8..a13d14ce0a 100644
--- a/cache.h
+++ b/cache.h
@@ -1361,9 +1361,9 @@ extern int get_oid_hex(const char *hex, struct object_id *sha1);
 extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
 
 /*
- * Convert a binary sha1 to its hex equivalent. The `_r` variant is reentrant,
+ * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
  * and writes the NUL-terminated output to the buffer `out`, which must be at
- * least `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for
+ * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
  * convenience.
  *
  * The non-`_r` variant returns a static buffer, but uses a ring of 4
@@ -1371,10 +1371,13 @@ extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
  *
  *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
  */
-extern char *sha1_to_hex_r(char *out, const unsigned char *sha1);
-extern char *oid_to_hex_r(char *out, const struct object_id *oid);
-extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
-extern char *oid_to_hex(const struct object_id *oid);	/* same static buffer as sha1_to_hex */
+char *hash_to_hex_algo_r(char *buffer, const unsigned char *hash, int algo);
+char *sha1_to_hex_r(char *out, const unsigned char *sha1);
+char *oid_to_hex_r(char *out, const struct object_id *oid);
+char *hash_to_hex_algo(const unsigned char *hash, int algo);	/* static buffer result! */
+char *sha1_to_hex(const unsigned char *sha1);			/* same static buffer */
+char *hash_to_hex(const unsigned char *hash);			/* same static buffer */
+char *oid_to_hex(const struct object_id *oid);			/* same static buffer */
 
 /*
  * Parse a 40-character hexadecimal object ID starting from hex, updating the
diff --git a/hex.c b/hex.c
index 10af1a29e8..080597ad3f 100644
--- a/hex.c
+++ b/hex.c
@@ -73,14 +73,15 @@ int parse_oid_hex(const char *hex, struct object_id *oid, const char **end)
 	return ret;
 }
 
-char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
+static inline char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
+					const struct git_hash_algo *algop)
 {
 	static const char hex[] = "0123456789abcdef";
 	char *buf = buffer;
 	int i;
 
-	for (i = 0; i < the_hash_algo->rawsz; i++) {
-		unsigned int val = *sha1++;
+	for (i = 0; i < algop->rawsz; i++) {
+		unsigned int val = *hash++;
 		*buf++ = hex[val >> 4];
 		*buf++ = hex[val & 0xf];
 	}
@@ -89,20 +90,40 @@ char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
 	return buffer;
 }
 
-char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+char *hash_to_hex_algo_r(char *buffer, const unsigned char *hash, int algo)
 {
-	return sha1_to_hex_r(buffer, oid->hash);
+	return hash_to_hex_algop_r(buffer, hash, &hash_algos[algo]);
 }
 
-char *sha1_to_hex(const unsigned char *sha1)
+char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
+{
+	return hash_to_hex_algo_r(buffer, sha1, GIT_HASH_SHA1);
+}
+
+char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+{
+	return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
+}
+
+char *hash_to_hex_algo(const unsigned char *hash, int algo)
 {
 	static int bufno;
 	static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
 	bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
-	return sha1_to_hex_r(hexbuffer[bufno], sha1);
+	return hash_to_hex_algo_r(hexbuffer[bufno], hash, algo);
+}
+
+char *sha1_to_hex(const unsigned char *sha1)
+{
+	return hash_to_hex_algo(sha1, GIT_HASH_SHA1);
+}
+
+char *hash_to_hex(const unsigned char *hash)
+{
+	return hash_to_hex_algo(hash, hash_algo_by_ptr(the_hash_algo));
 }
 
 char *oid_to_hex(const struct object_id *oid)
 {
-	return sha1_to_hex(oid->hash);
+	return hash_to_hex_algo(oid->hash, hash_algo_by_ptr(the_hash_algo));
 }

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 04/13] cache: make hashcmp and hasheq work with larger hashes
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
                   ` (2 preceding siblings ...)
  2018-10-15  2:18 ` [PATCH v2 03/13] hex: introduce functions to print arbitrary hashes brian m. carlson
@ 2018-10-15  2:18 ` brian m. carlson
  2018-10-16 15:44   ` Duy Nguyen
  2018-10-15  2:18 ` [PATCH v2 05/13] t: add basic tests for our SHA-1 implementation brian m. carlson
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 44+ messages in thread
From: brian m. carlson @ 2018-10-15  2:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

In 183a638b7d ("hashcmp: assert constant hash size", 2018-08-23), we
modified hashcmp to assert that the hash size was always 20 to help it
optimize and inline calls to memcmp.  In a future series, we replaced
many calls to hashcmp and oidcmp with calls to hasheq and oideq to
improve inlining further.

However, we want to support hash algorithms other than SHA-1, namely
SHA-256.  When doing so, we must handle the case where these values are
32 bytes long as well as 20.  Adjust hashcmp to handle two cases:
20-byte matches, and maximum-size matches.  Therefore, when we include
SHA-256, we'll automatically handle it properly, while at the same time
teaching the compiler that there are only two possible options to
consider.  This will allow the compiler to write the most efficient
possible code.

Copy similar code into hasheq and perform an identical transformation.
At least with GCC 8.2.0, making hasheq defer to hashcmp when there are
two branches prevents the compiler from inlining the comparison, while
the code in this patch is inlined properly.  Add a comment to avoid an
accidental performance regression from well-intentioned refactoring.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index a13d14ce0a..0b88c3a344 100644
--- a/cache.h
+++ b/cache.h
@@ -1024,16 +1024,12 @@ extern const struct object_id null_oid;
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
 	/*
-	 * This is a temporary optimization hack. By asserting the size here,
-	 * we let the compiler know that it's always going to be 20, which lets
-	 * it turn this fixed-size memcmp into a few inline instructions.
-	 *
-	 * This will need to be extended or ripped out when we learn about
-	 * hashes of different sizes.
+	 * Teach the compiler that there are only two possibilities of hash size
+	 * here, so that it can optimize for this case as much as possible.
 	 */
-	if (the_hash_algo->rawsz != 20)
-		BUG("hash size not yet supported by hashcmp");
-	return memcmp(sha1, sha2, the_hash_algo->rawsz);
+	if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+		return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+	return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
@@ -1043,7 +1039,13 @@ static inline int oidcmp(const struct object_id *oid1, const struct object_id *o
 
 static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
 {
-	return !hashcmp(sha1, sha2);
+	/*
+	 * We write this here instead of deferring to hashcmp so that the
+	 * compiler can properly inline it and avoid calling memcmp.
+	 */
+	if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+		return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+	return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oideq(const struct object_id *oid1, const struct object_id *oid2)

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 05/13] t: add basic tests for our SHA-1 implementation
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
                   ` (3 preceding siblings ...)
  2018-10-15  2:18 ` [PATCH v2 04/13] cache: make hashcmp and hasheq work with larger hashes brian m. carlson
@ 2018-10-15  2:18 ` brian m. carlson
  2018-10-15  2:18 ` [PATCH v2 06/13] t: make the sha1 test-tool helper generic brian m. carlson
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-15  2:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

We have in the past had some unfortunate endianness issues with some
SHA-1 implementations we ship, especially on big-endian machines.  Add
an explicit test using the test helper to catch these issues and point
them out prominently.  This test can also be used as a staging ground
for people testing additional algorithms to verify that their
implementations are working as expected.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t0014-hash.sh | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100755 t/t0014-hash.sh

diff --git a/t/t0014-hash.sh b/t/t0014-hash.sh
new file mode 100755
index 0000000000..8e763c2c3d
--- /dev/null
+++ b/t/t0014-hash.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+test_description='test basic hash implementation'
+. ./test-lib.sh
+
+
+test_expect_success 'test basic SHA-1 hash values' '
+	test-tool sha1 </dev/null >actual &&
+	grep da39a3ee5e6b4b0d3255bfef95601890afd80709 actual &&
+	printf "a" | test-tool sha1 >actual &&
+	grep 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 actual &&
+	printf "abc" | test-tool sha1 >actual &&
+	grep a9993e364706816aba3e25717850c26c9cd0d89d actual &&
+	printf "message digest" | test-tool sha1 >actual &&
+	grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual &&
+	printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual &&
+	grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual &&
+	perl -E "for (1..100000) { print q{aaaaaaaaaa}; }" | \
+		test-tool sha1 >actual &&
+	grep 34aa973cd4c4daa4f61eeb2bdbad27316534016f actual &&
+	printf "blob 0\0" | test-tool sha1 >actual &&
+	grep e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 actual &&
+	printf "blob 3\0abc" | test-tool sha1 >actual &&
+	grep f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f actual &&
+	printf "tree 0\0" | test-tool sha1 >actual &&
+	grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
+'
+
+test_done

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 06/13] t: make the sha1 test-tool helper generic
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
                   ` (4 preceding siblings ...)
  2018-10-15  2:18 ` [PATCH v2 05/13] t: add basic tests for our SHA-1 implementation brian m. carlson
@ 2018-10-15  2:18 ` brian m. carlson
  2018-10-15  2:18 ` [PATCH v2 07/13] sha1-file: add a constant for hash block size brian m. carlson
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-15  2:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

Since we're going to have multiple hash algorithms to test, it makes
sense to share as much of the test code as possible.  Convert the sha1
helper for the test-tool to be generic and move it out into its own
module.  This will allow us to share most of this code with our NewHash
implementation.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile                              |  1 +
 t/helper/{test-sha1.c => test-hash.c} | 19 +++++-----
 t/helper/test-sha1.c                  | 52 +--------------------------
 t/helper/test-tool.h                  |  2 ++
 4 files changed, 14 insertions(+), 60 deletions(-)
 copy t/helper/{test-sha1.c => test-hash.c} (66%)

diff --git a/Makefile b/Makefile
index 5c8307b7c4..324967410d 100644
--- a/Makefile
+++ b/Makefile
@@ -714,6 +714,7 @@ TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
+TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
diff --git a/t/helper/test-sha1.c b/t/helper/test-hash.c
similarity index 66%
copy from t/helper/test-sha1.c
copy to t/helper/test-hash.c
index 1ba0675c75..9992de2212 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-hash.c
@@ -1,13 +1,14 @@
 #include "test-tool.h"
 #include "cache.h"
 
-int cmd__sha1(int ac, const char **av)
+int cmd_hash_impl(int ac, const char **av, int algo)
 {
-	git_SHA_CTX ctx;
-	unsigned char sha1[20];
+	git_hash_ctx ctx;
+	unsigned char hash[GIT_MAX_HEXSZ];
 	unsigned bufsz = 8192;
 	int binary = 0;
 	char *buffer;
+	const struct git_hash_algo *algop = &hash_algos[algo];
 
 	if (ac == 2) {
 		if (!strcmp(av[1], "-b"))
@@ -26,7 +27,7 @@ int cmd__sha1(int ac, const char **av)
 			die("OOPS");
 	}
 
-	git_SHA1_Init(&ctx);
+	algop->init_fn(&ctx);
 
 	while (1) {
 		ssize_t sz, this_sz;
@@ -38,20 +39,20 @@ int cmd__sha1(int ac, const char **av)
 			if (sz == 0)
 				break;
 			if (sz < 0)
-				die_errno("test-sha1");
+				die_errno("test-hash");
 			this_sz += sz;
 			cp += sz;
 			room -= sz;
 		}
 		if (this_sz == 0)
 			break;
-		git_SHA1_Update(&ctx, buffer, this_sz);
+		algop->update_fn(&ctx, buffer, this_sz);
 	}
-	git_SHA1_Final(sha1, &ctx);
+	algop->final_fn(hash, &ctx);
 
 	if (binary)
-		fwrite(sha1, 1, 20, stdout);
+		fwrite(hash, 1, algop->rawsz, stdout);
 	else
-		puts(sha1_to_hex(sha1));
+		puts(hash_to_hex_algo(hash, algo));
 	exit(0);
 }
diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c
index 1ba0675c75..d860c387c3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-sha1.c
@@ -3,55 +3,5 @@
 
 int cmd__sha1(int ac, const char **av)
 {
-	git_SHA_CTX ctx;
-	unsigned char sha1[20];
-	unsigned bufsz = 8192;
-	int binary = 0;
-	char *buffer;
-
-	if (ac == 2) {
-		if (!strcmp(av[1], "-b"))
-			binary = 1;
-		else
-			bufsz = strtoul(av[1], NULL, 10) * 1024 * 1024;
-	}
-
-	if (!bufsz)
-		bufsz = 8192;
-
-	while ((buffer = malloc(bufsz)) == NULL) {
-		fprintf(stderr, "bufsz %u is too big, halving...\n", bufsz);
-		bufsz /= 2;
-		if (bufsz < 1024)
-			die("OOPS");
-	}
-
-	git_SHA1_Init(&ctx);
-
-	while (1) {
-		ssize_t sz, this_sz;
-		char *cp = buffer;
-		unsigned room = bufsz;
-		this_sz = 0;
-		while (room) {
-			sz = xread(0, cp, room);
-			if (sz == 0)
-				break;
-			if (sz < 0)
-				die_errno("test-sha1");
-			this_sz += sz;
-			cp += sz;
-			room -= sz;
-		}
-		if (this_sz == 0)
-			break;
-		git_SHA1_Update(&ctx, buffer, this_sz);
-	}
-	git_SHA1_Final(sha1, &ctx);
-
-	if (binary)
-		fwrite(sha1, 1, 20, stdout);
-	else
-		puts(sha1_to_hex(sha1));
-	exit(0);
+	return cmd_hash_impl(ac, av, GIT_HASH_SHA1);
 }
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e4890566da..29ac7b0b0d 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -50,4 +50,6 @@ int cmd__windows_named_pipe(int argc, const char **argv);
 #endif
 int cmd__write_cache(int argc, const char **argv);
 
+int cmd_hash_impl(int ac, const char **av, int algo);
+
 #endif

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 07/13] sha1-file: add a constant for hash block size
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
                   ` (5 preceding siblings ...)
  2018-10-15  2:18 ` [PATCH v2 06/13] t: make the sha1 test-tool helper generic brian m. carlson
@ 2018-10-15  2:18 ` brian m. carlson
  2018-10-15  2:18 ` [PATCH v2 08/13] t/helper: add a test helper to compute hash speed brian m. carlson
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-15  2:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

There is one place we need the hash algorithm block size: the HMAC code
for push certs.  Expose this constant in struct git_hash_algo and expose
values for SHA-1 and for the largest value of any hash.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h     | 4 ++++
 hash.h      | 3 +++
 sha1-file.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/cache.h b/cache.h
index 0b88c3a344..48e736b0d5 100644
--- a/cache.h
+++ b/cache.h
@@ -45,10 +45,14 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
 /* The length in bytes and in hex digits of an object name (SHA-1 value). */
 #define GIT_SHA1_RAWSZ 20
 #define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
+/* The block size of SHA-1. */
+#define GIT_SHA1_BLKSZ 64
 
 /* The length in byte and in hex digits of the largest possible hash value. */
 #define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
 #define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+/* The largest possible block size for any supported hash. */
+#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
 
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 90f4344619..46dff69eb3 100644
--- a/hash.h
+++ b/hash.h
@@ -81,6 +81,9 @@ struct git_hash_algo {
 	/* The length of the hash in hex characters. */
 	size_t hexsz;
 
+	/* The block size of the hash. */
+	size_t blksz;
+
 	/* The hash initialization function. */
 	git_hash_init_fn init_fn;
 
diff --git a/sha1-file.c b/sha1-file.c
index 3a75d515eb..9aadaf0c8d 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -90,6 +90,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		0x00000000,
 		0,
 		0,
+		0,
 		git_hash_unknown_init,
 		git_hash_unknown_update,
 		git_hash_unknown_final,
@@ -102,6 +103,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		0x73686131,
 		GIT_SHA1_RAWSZ,
 		GIT_SHA1_HEXSZ,
+		GIT_SHA1_BLKSZ,
 		git_hash_sha1_init,
 		git_hash_sha1_update,
 		git_hash_sha1_final,

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 08/13] t/helper: add a test helper to compute hash speed
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
                   ` (6 preceding siblings ...)
  2018-10-15  2:18 ` [PATCH v2 07/13] sha1-file: add a constant for hash block size brian m. carlson
@ 2018-10-15  2:18 ` brian m. carlson
  2018-10-15  2:18 ` [PATCH v2 09/13] commit-graph: convert to using the_hash_algo brian m. carlson
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-15  2:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

Add a utility (which is less for the testsuite and more for developers)
that can compute hash speeds for whatever hash algorithms are
implemented.  This allows developers to test their personal systems to
determine the performance characteristics of various algorithms.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile                   |  1 +
 t/helper/test-hash-speed.c | 61 ++++++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c       |  1 +
 t/helper/test-tool.h       |  1 +
 4 files changed, 64 insertions(+)
 create mode 100644 t/helper/test-hash-speed.c

diff --git a/Makefile b/Makefile
index 324967410d..1c43bf9aa8 100644
--- a/Makefile
+++ b/Makefile
@@ -716,6 +716,7 @@ TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
 TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
+TEST_BUILTINS_OBJS += test-hash-speed.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
 TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
diff --git a/t/helper/test-hash-speed.c b/t/helper/test-hash-speed.c
new file mode 100644
index 0000000000..432233c7f0
--- /dev/null
+++ b/t/helper/test-hash-speed.c
@@ -0,0 +1,61 @@
+#include "test-tool.h"
+#include "cache.h"
+
+#define NUM_SECONDS 3
+
+static inline void compute_hash(const struct git_hash_algo *algo, git_hash_ctx *ctx, uint8_t *final, const void *p, size_t len)
+{
+	algo->init_fn(ctx);
+	algo->update_fn(ctx, p, len);
+	algo->final_fn(final, ctx);
+}
+
+int cmd__hash_speed(int ac, const char **av)
+{
+	git_hash_ctx ctx;
+	unsigned char hash[GIT_MAX_RAWSZ];
+	clock_t initial, start, end;
+	unsigned bufsizes[] = { 64, 256, 1024, 8192, 16384 };
+	int i;
+	void *p;
+	const struct git_hash_algo *algo = NULL;
+
+	if (ac == 2) {
+		for (i = 1; i < GIT_HASH_NALGOS; i++) {
+			if (!strcmp(av[1], hash_algos[i].name)) {
+				algo = &hash_algos[i];
+				break;
+			}
+		}
+	}
+	if (!algo)
+		die("usage: test-tool hash-speed algo_name");
+
+	/* Use this as an offset to make overflow less likely. */
+	initial = clock();
+
+	printf("algo: %s\n", algo->name);
+
+	for (i = 0; i < ARRAY_SIZE(bufsizes); i++) {
+		unsigned long j, kb;
+		double kb_per_sec;
+		p = xcalloc(1, bufsizes[i]);
+		start = end = clock() - initial;
+		for (j = 0; ((end - start) / CLOCKS_PER_SEC) < NUM_SECONDS; j++) {
+			compute_hash(algo, &ctx, hash, p, bufsizes[i]);
+
+			/*
+			 * Only check elapsed time every 128 iterations to avoid
+			 * dominating the runtime with system calls.
+			 */
+			if (!(j & 127))
+				end = clock() - initial;
+		}
+		kb = j * bufsizes[i];
+		kb_per_sec = kb / (1024 * ((double)end - start) / CLOCKS_PER_SEC);
+		printf("size %u: %lu iters; %lu KiB; %0.2f KiB/s\n", bufsizes[i], j, kb, kb_per_sec);
+		free(p);
+	}
+
+	exit(0);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 6b5836dc1b..e009c8186d 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -20,6 +20,7 @@ static struct test_cmd cmds[] = {
 	{ "example-decorate", cmd__example_decorate },
 	{ "genrandom", cmd__genrandom },
 	{ "hashmap", cmd__hashmap },
+	{ "hash-speed", cmd__hash_speed },
 	{ "index-version", cmd__index_version },
 	{ "json-writer", cmd__json_writer },
 	{ "lazy-init-name-hash", cmd__lazy_init_name_hash },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 29ac7b0b0d..19a7e8332a 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -16,6 +16,7 @@ int cmd__dump_untracked_cache(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
+int cmd__hash_speed(int argc, const char **argv);
 int cmd__index_version(int argc, const char **argv);
 int cmd__json_writer(int argc, const char **argv);
 int cmd__lazy_init_name_hash(int argc, const char **argv);

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 09/13] commit-graph: convert to using the_hash_algo
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
                   ` (7 preceding siblings ...)
  2018-10-15  2:18 ` [PATCH v2 08/13] t/helper: add a test helper to compute hash speed brian m. carlson
@ 2018-10-15  2:18 ` brian m. carlson
  2018-10-15 15:10   ` Derrick Stolee
  2018-10-15  2:18 ` [PATCH v2 10/13] Add a base implementation of SHA-256 support brian m. carlson
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 44+ messages in thread
From: brian m. carlson @ 2018-10-15  2:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

Instead of using hard-coded constants for object sizes, use
the_hash_algo to look them up.  In addition, use a function call to look
up the object ID version and produce the correct value.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 commit-graph.c | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7f4519ec3b..7a28fbb03f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -20,16 +20,11 @@
 #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
 #define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
 
-#define GRAPH_DATA_WIDTH 36
+#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
 #define GRAPH_VERSION_1 0x1
 #define GRAPH_VERSION GRAPH_VERSION_1
 
-#define GRAPH_OID_VERSION_SHA1 1
-#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
-#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
-#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
-
 #define GRAPH_OCTOPUS_EDGES_NEEDED 0x80000000
 #define GRAPH_PARENT_MISSING 0x7fffffff
 #define GRAPH_EDGE_LAST_MASK 0x7fffffff
@@ -41,13 +36,18 @@
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
-			+ GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
+			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
 	return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static uint8_t oid_version(void)
+{
+	return 1;
+}
+
 static struct commit_graph *alloc_commit_graph(void)
 {
 	struct commit_graph *g = xcalloc(1, sizeof(*g));
@@ -100,15 +100,15 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	}
 
 	hash_version = *(unsigned char*)(data + 5);
-	if (hash_version != GRAPH_OID_VERSION) {
+	if (hash_version != oid_version()) {
 		error(_("hash version %X does not match version %X"),
-		      hash_version, GRAPH_OID_VERSION);
+		      hash_version, oid_version());
 		goto cleanup_fail;
 	}
 
 	graph = alloc_commit_graph();
 
-	graph->hash_len = GRAPH_OID_LEN;
+	graph->hash_len = the_hash_algo->rawsz;
 	graph->num_chunks = *(unsigned char*)(data + 6);
 	graph->graph_fd = fd;
 	graph->data = graph_map;
@@ -124,7 +124,7 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 
 		chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
-		if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+		if (chunk_offset > graph_size - the_hash_algo->rawsz) {
 			error(_("improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32),
 			      (uint32_t)chunk_offset);
 			goto cleanup_fail;
@@ -711,6 +711,7 @@ void write_commit_graph(const char *obj_dir,
 	int num_chunks;
 	int num_extra_edges;
 	struct commit_list *parent;
+	const unsigned hashsz = the_hash_algo->rawsz;
 
 	oids.nr = 0;
 	oids.alloc = approximate_object_count() / 4;
@@ -831,7 +832,7 @@ void write_commit_graph(const char *obj_dir,
 	hashwrite_be32(f, GRAPH_SIGNATURE);
 
 	hashwrite_u8(f, GRAPH_VERSION);
-	hashwrite_u8(f, GRAPH_OID_VERSION);
+	hashwrite_u8(f, oid_version());
 	hashwrite_u8(f, num_chunks);
 	hashwrite_u8(f, 0); /* unused padding byte */
 
@@ -846,8 +847,8 @@ void write_commit_graph(const char *obj_dir,
 
 	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
 	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-	chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
-	chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+	chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
+	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
 	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
 
 	for (i = 0; i <= num_chunks; i++) {
@@ -860,8 +861,8 @@ void write_commit_graph(const char *obj_dir,
 	}
 
 	write_graph_chunk_fanout(f, commits.list, commits.nr);
-	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
-	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+	write_graph_chunk_oids(f, hashsz, commits.list, commits.nr);
+	write_graph_chunk_data(f, hashsz, commits.list, commits.nr);
 	write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
 	close_commit_graph();

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 10/13] Add a base implementation of SHA-256 support
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
                   ` (8 preceding siblings ...)
  2018-10-15  2:18 ` [PATCH v2 09/13] commit-graph: convert to using the_hash_algo brian m. carlson
@ 2018-10-15  2:18 ` brian m. carlson
  2018-10-15 14:59   ` Duy Nguyen
  2018-10-17 16:12   ` SZEDER Gábor
  2018-10-15  2:18 ` [PATCH v2 11/13] sha256: add an SHA-256 implementation using libgcrypt brian m. carlson
                   ` (5 subsequent siblings)
  15 siblings, 2 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-15  2:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

SHA-1 is weak and we need to transition to a new hash function.  For
some time, we have referred to this new function as NewHash.  Recently,
we decided to pick SHA-256 as NewHash.

Add a basic implementation of SHA-256 based off libtomcrypt, which is in
the public domain.  Optimize it and restructure it to meet our coding
standards.  Place it in a directory called "sha256" where it and any
future implementations can live so as to avoid a proliferation of
implementation directories.

Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.

Note that with this patch, it is still not possible to switch to using
SHA-256 in Git.  Additional patches are needed to prepare the code to
handle a larger hash algorithm and further test fixes are needed.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile               |   4 +
 cache.h                |  12 ++-
 hash.h                 |  19 ++++-
 sha1-file.c            |  45 +++++++++++
 sha256/block/sha256.c  | 180 +++++++++++++++++++++++++++++++++++++++++
 sha256/block/sha256.h  |  26 ++++++
 t/helper/test-sha256.c |   7 ++
 t/helper/test-tool.c   |   1 +
 t/helper/test-tool.h   |   1 +
 t/t0014-hash.sh        |  25 ++++++
 10 files changed, 316 insertions(+), 4 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 t/helper/test-sha256.c

diff --git a/Makefile b/Makefile
index 1c43bf9aa8..76d378c7ba 100644
--- a/Makefile
+++ b/Makefile
@@ -739,6 +739,7 @@ TEST_BUILTINS_OBJS += test-run-command.o
 TEST_BUILTINS_OBJS += test-scrap-cache-tree.o
 TEST_BUILTINS_OBJS += test-sha1.o
 TEST_BUILTINS_OBJS += test-sha1-array.o
+TEST_BUILTINS_OBJS += test-sha256.o
 TEST_BUILTINS_OBJS += test-sigchain.o
 TEST_BUILTINS_OBJS += test-strcmp-offset.o
 TEST_BUILTINS_OBJS += test-string-list.o
@@ -1633,6 +1634,9 @@ endif
 endif
 endif
 
+LIB_OBJS += sha256/block/sha256.o
+BASIC_CFLAGS += -DSHA256_BLK
+
 ifdef SHA1_MAX_BLOCK_SIZE
 	LIB_OBJS += compat/sha1-chunked.o
 	BASIC_CFLAGS += -DSHA1_MAX_BLOCK_SIZE="$(SHA1_MAX_BLOCK_SIZE)"
diff --git a/cache.h b/cache.h
index 48e736b0d5..81ece0360e 100644
--- a/cache.h
+++ b/cache.h
@@ -48,11 +48,17 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
 /* The block size of SHA-1. */
 #define GIT_SHA1_BLKSZ 64
 
+/* The length in bytes and in hex digits of an object name (SHA-256 value). */
+#define GIT_SHA256_RAWSZ 32
+#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ)
+/* The block size of SHA-256. */
+#define GIT_SHA256_BLKSZ 64
+
 /* The length in byte and in hex digits of the largest possible hash value. */
-#define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
-#define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
+#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
 /* The largest possible block size for any supported hash. */
-#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
+#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ
 
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 46dff69eb3..88d18896d7 100644
--- a/hash.h
+++ b/hash.h
@@ -15,6 +15,8 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#include "sha256/block/sha256.h"
+
 #ifndef platform_SHA_CTX
 /*
  * platform's underlying implementation of SHA-1; could be OpenSSL,
@@ -34,6 +36,18 @@
 #define git_SHA1_Update		platform_SHA1_Update
 #define git_SHA1_Final		platform_SHA1_Final
 
+#ifndef platform_SHA256_CTX
+#define platform_SHA256_CTX	SHA256_CTX
+#define platform_SHA256_Init	SHA256_Init
+#define platform_SHA256_Update	SHA256_Update
+#define platform_SHA256_Final	SHA256_Final
+#endif
+
+#define git_SHA256_CTX		platform_SHA256_CTX
+#define git_SHA256_Init		platform_SHA256_Init
+#define git_SHA256_Update	platform_SHA256_Update
+#define git_SHA256_Final	platform_SHA256_Final
+
 #ifdef SHA1_MAX_BLOCK_SIZE
 #include "compat/sha1-chunked.h"
 #undef git_SHA1_Update
@@ -52,12 +66,15 @@
 #define GIT_HASH_UNKNOWN 0
 /* SHA-1 */
 #define GIT_HASH_SHA1 1
+/* SHA-256  */
+#define GIT_HASH_SHA256 2
 /* Number of algorithms supported (including unknown). */
-#define GIT_HASH_NALGOS (GIT_HASH_SHA1 + 1)
+#define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
 
 /* A suitably aligned type for stack allocations of hash contexts. */
 union git_hash_ctx {
 	git_SHA_CTX sha1;
+	git_SHA256_CTX sha256;
 };
 typedef union git_hash_ctx git_hash_ctx;
 
diff --git a/sha1-file.c b/sha1-file.c
index 9aadaf0c8d..66ba3dadb9 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -40,10 +40,20 @@
 #define EMPTY_TREE_SHA1_BIN_LITERAL \
 	 "\x4b\x82\x5d\xc6\x42\xcb\x6e\xb9\xa0\x60" \
 	 "\xe5\x4b\xf8\xd6\x92\x88\xfb\xee\x49\x04"
+#define EMPTY_TREE_SHA256_BIN_LITERAL \
+	"\x6e\xf1\x9b\x41\x22\x5c\x53\x69\xf1\xc1" \
+	"\x04\xd4\x5d\x8d\x85\xef\xa9\xb0\x57\xb5" \
+	"\x3b\x14\xb4\xb9\xb9\x39\xdd\x74\xde\xcc" \
+	"\x53\x21"
 
 #define EMPTY_BLOB_SHA1_BIN_LITERAL \
 	"\xe6\x9d\xe2\x9b\xb2\xd1\xd6\x43\x4b\x8b" \
 	"\x29\xae\x77\x5a\xd8\xc2\xe4\x8c\x53\x91"
+#define EMPTY_BLOB_SHA256_BIN_LITERAL \
+	"\x47\x3a\x0f\x4c\x3b\xe8\xa9\x36\x81\xa2" \
+	"\x67\xe3\xb1\xe9\xa7\xdc\xda\x11\x85\x43" \
+	"\x6f\xe1\x41\xf7\x74\x91\x20\xa3\x03\x72" \
+	"\x18\x13"
 
 const unsigned char null_sha1[GIT_MAX_RAWSZ];
 const struct object_id null_oid;
@@ -53,6 +63,12 @@ static const struct object_id empty_tree_oid = {
 static const struct object_id empty_blob_oid = {
 	EMPTY_BLOB_SHA1_BIN_LITERAL
 };
+static const struct object_id empty_tree_oid_sha256 = {
+	EMPTY_TREE_SHA256_BIN_LITERAL
+};
+static const struct object_id empty_blob_oid_sha256 = {
+	EMPTY_BLOB_SHA256_BIN_LITERAL
+};
 
 static void git_hash_sha1_init(git_hash_ctx *ctx)
 {
@@ -69,6 +85,22 @@ static void git_hash_sha1_final(unsigned char *hash, git_hash_ctx *ctx)
 	git_SHA1_Final(hash, &ctx->sha1);
 }
 
+
+static void git_hash_sha256_init(git_hash_ctx *ctx)
+{
+	git_SHA256_Init(&ctx->sha256);
+}
+
+static void git_hash_sha256_update(git_hash_ctx *ctx, const void *data, size_t len)
+{
+	git_SHA256_Update(&ctx->sha256, data, len);
+}
+
+static void git_hash_sha256_final(unsigned char *hash, git_hash_ctx *ctx)
+{
+	git_SHA256_Final(hash, &ctx->sha256);
+}
+
 static void git_hash_unknown_init(git_hash_ctx *ctx)
 {
 	BUG("trying to init unknown hash");
@@ -110,6 +142,19 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		&empty_tree_oid,
 		&empty_blob_oid,
 	},
+	{
+		"sha256",
+		/* "s256", big-endian */
+		0x73323536,
+		GIT_SHA256_RAWSZ,
+		GIT_SHA256_HEXSZ,
+		GIT_SHA256_BLKSZ,
+		git_hash_sha256_init,
+		git_hash_sha256_update,
+		git_hash_sha256_final,
+		&empty_tree_oid_sha256,
+		&empty_blob_oid_sha256,
+	}
 };
 
 const char *empty_tree_oid_hex(void)
diff --git a/sha256/block/sha256.c b/sha256/block/sha256.c
new file mode 100644
index 0000000000..18350c161a
--- /dev/null
+++ b/sha256/block/sha256.c
@@ -0,0 +1,180 @@
+#include "git-compat-util.h"
+#include "./sha256.h"
+
+#define BLKSIZE blk_SHA256_BLKSIZE
+
+void blk_SHA256_Init(blk_SHA256_CTX *ctx)
+{
+	ctx->offset = 0;
+	ctx->length = 0;
+	ctx->state[0] = 0x6A09E667UL;
+	ctx->state[1] = 0xBB67AE85UL;
+	ctx->state[2] = 0x3C6EF372UL;
+	ctx->state[3] = 0xA54FF53AUL;
+	ctx->state[4] = 0x510E527FUL;
+	ctx->state[5] = 0x9B05688CUL;
+	ctx->state[6] = 0x1F83D9ABUL;
+	ctx->state[7] = 0x5BE0CD19UL;
+}
+
+static inline uint32_t ror(uint32_t x, unsigned n)
+{
+	return (x >> n) | (x << (32 - n));
+}
+
+#define Ch(x,y,z)       (z ^ (x & (y ^ z)))
+#define Maj(x,y,z)      (((x | y) & z) | (x & y))
+#define S(x, n)         ror((x),(n))
+#define R(x, n)         ((x)>>(n))
+#define Sigma0(x)       (S(x, 2) ^ S(x, 13) ^ S(x, 22))
+#define Sigma1(x)       (S(x, 6) ^ S(x, 11) ^ S(x, 25))
+#define Gamma0(x)       (S(x, 7) ^ S(x, 18) ^ R(x, 3))
+#define Gamma1(x)       (S(x, 17) ^ S(x, 19) ^ R(x, 10))
+
+static void blk_SHA256_Transform(blk_SHA256_CTX *ctx, const unsigned char *buf)
+{
+
+	uint32_t S[8], W[64], t0, t1;
+	int i;
+
+	/* copy state into S */
+	for (i = 0; i < 8; i++) {
+		S[i] = ctx->state[i];
+	}
+
+	/* copy the state into 512-bits into W[0..15] */
+	for (i = 0; i < 16; i++, buf += sizeof(uint32_t)) {
+		W[i] = get_be32(buf);
+	}
+
+	/* fill W[16..63] */
+	for (i = 16; i < 64; i++) {
+		W[i] = Gamma1(W[i - 2]) + W[i - 7] + Gamma0(W[i - 15]) + W[i - 16];
+	}
+
+#define RND(a,b,c,d,e,f,g,h,i,ki)                    \
+	t0 = h + Sigma1(e) + Ch(e, f, g) + ki + W[i];   \
+	t1 = Sigma0(a) + Maj(a, b, c);                  \
+	d += t0;                                        \
+	h  = t0 + t1;
+
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],0,0x428a2f98);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],1,0x71374491);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],2,0xb5c0fbcf);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],3,0xe9b5dba5);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],4,0x3956c25b);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],5,0x59f111f1);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],6,0x923f82a4);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],7,0xab1c5ed5);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],8,0xd807aa98);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],9,0x12835b01);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],10,0x243185be);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],11,0x550c7dc3);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],12,0x72be5d74);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],13,0x80deb1fe);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],14,0x9bdc06a7);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],15,0xc19bf174);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],16,0xe49b69c1);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],17,0xefbe4786);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],18,0x0fc19dc6);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],19,0x240ca1cc);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],20,0x2de92c6f);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],21,0x4a7484aa);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],22,0x5cb0a9dc);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],23,0x76f988da);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],24,0x983e5152);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],25,0xa831c66d);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],26,0xb00327c8);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],27,0xbf597fc7);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],28,0xc6e00bf3);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],29,0xd5a79147);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],30,0x06ca6351);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],31,0x14292967);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],32,0x27b70a85);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],33,0x2e1b2138);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],34,0x4d2c6dfc);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],35,0x53380d13);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],36,0x650a7354);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],37,0x766a0abb);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],38,0x81c2c92e);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],39,0x92722c85);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],40,0xa2bfe8a1);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],41,0xa81a664b);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],42,0xc24b8b70);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],43,0xc76c51a3);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],44,0xd192e819);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],45,0xd6990624);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],46,0xf40e3585);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],47,0x106aa070);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],48,0x19a4c116);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],49,0x1e376c08);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],50,0x2748774c);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],51,0x34b0bcb5);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],52,0x391c0cb3);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],53,0x4ed8aa4a);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],54,0x5b9cca4f);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],55,0x682e6ff3);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],56,0x748f82ee);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],57,0x78a5636f);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],58,0x84c87814);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],59,0x8cc70208);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],60,0x90befffa);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],61,0xa4506ceb);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],62,0xbef9a3f7);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],63,0xc67178f2);
+
+#undef RND
+
+	for (i = 0; i < 8; i++) {
+		ctx->state[i] = ctx->state[i] + S[i];
+	}
+}
+
+#define MIN(x, y) ((x) < (y) ? (x) : (y))
+void blk_SHA256_Update(blk_SHA256_CTX *ctx, const void *data, size_t len)
+{
+	const unsigned char *in = data;
+	size_t n;
+	ctx->length += len;
+	while (len > 0) {
+		if (!ctx->offset && len >= BLKSIZE) {
+			blk_SHA256_Transform(ctx, in);
+			in += BLKSIZE;
+			len -= BLKSIZE;
+		} else {
+			n = MIN(len, (BLKSIZE - ctx->offset));
+			memcpy(ctx->buf + ctx->offset, in, n);
+			ctx->offset += n;
+			in += n;
+			len -= n;
+			if (ctx->offset == BLKSIZE) {
+				blk_SHA256_Transform(ctx, ctx->buf);
+				ctx->offset = 0;
+			}
+		}
+	}
+}
+
+void blk_SHA256_Final(unsigned char *digest, blk_SHA256_CTX *ctx)
+{
+	const unsigned trip = BLKSIZE - sizeof(ctx->length);
+	int i;
+
+	ctx->length <<= 3;
+	ctx->buf[ctx->offset++] = 0x80;
+
+	if (ctx->offset > trip) {
+		memset(ctx->buf + ctx->offset, 0, BLKSIZE - ctx->offset);
+		blk_SHA256_Transform(ctx, ctx->buf);
+		ctx->offset = 0;
+	}
+
+	memset(ctx->buf + ctx->offset, 0, BLKSIZE - ctx->offset - sizeof(ctx->length));
+
+	put_be64(ctx->buf + trip, ctx->length);
+	blk_SHA256_Transform(ctx, ctx->buf);
+
+	/* copy output */
+	for (i = 0; i < 8; i++, digest += sizeof(uint32_t))
+		put_be32(digest, ctx->state[i]);
+}
diff --git a/sha256/block/sha256.h b/sha256/block/sha256.h
new file mode 100644
index 0000000000..ad8b178ad3
--- /dev/null
+++ b/sha256/block/sha256.h
@@ -0,0 +1,26 @@
+#ifndef SHA256_BLOCK_SHA256_H
+#define SHA256_BLOCK_SHA256_H
+
+#include "git-compat-util.h"
+
+#define blk_SHA256_BLKSIZE 64
+
+struct blk_SHA256_CTX {
+	uint32_t state[8];
+	uint64_t length;
+	uint32_t offset;
+	uint8_t buf[blk_SHA256_BLKSIZE];
+};
+
+typedef struct blk_SHA256_CTX blk_SHA256_CTX;
+
+void blk_SHA256_Init(blk_SHA256_CTX *ctx);
+void blk_SHA256_Update(blk_SHA256_CTX *ctx, const void *data, size_t len);
+void blk_SHA256_Final(unsigned char *digest, blk_SHA256_CTX *ctx);
+
+#define platform_SHA256_CTX blk_SHA256_CTX
+#define platform_SHA256_Init blk_SHA256_Init
+#define platform_SHA256_Update blk_SHA256_Update
+#define platform_SHA256_Final blk_SHA256_Final
+
+#endif
diff --git a/t/helper/test-sha256.c b/t/helper/test-sha256.c
new file mode 100644
index 0000000000..0ac6a99d5f
--- /dev/null
+++ b/t/helper/test-sha256.c
@@ -0,0 +1,7 @@
+#include "test-tool.h"
+#include "cache.h"
+
+int cmd__sha256(int ac, const char **av)
+{
+	return cmd_hash_impl(ac, av, GIT_HASH_SHA256);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index e009c8186d..2a65193514 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -43,6 +43,7 @@ static struct test_cmd cmds[] = {
 	{ "scrap-cache-tree", cmd__scrap_cache_tree },
 	{ "sha1", cmd__sha1 },
 	{ "sha1-array", cmd__sha1_array },
+	{ "sha256", cmd__sha256 },
 	{ "sigchain", cmd__sigchain },
 	{ "strcmp-offset", cmd__strcmp_offset },
 	{ "string-list", cmd__string_list },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 19a7e8332a..2e66a8e47b 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -39,6 +39,7 @@ int cmd__run_command(int argc, const char **argv);
 int cmd__scrap_cache_tree(int argc, const char **argv);
 int cmd__sha1(int argc, const char **argv);
 int cmd__sha1_array(int argc, const char **argv);
+int cmd__sha256(int argc, const char **argv);
 int cmd__sigchain(int argc, const char **argv);
 int cmd__strcmp_offset(int argc, const char **argv);
 int cmd__string_list(int argc, const char **argv);
diff --git a/t/t0014-hash.sh b/t/t0014-hash.sh
index 8e763c2c3d..f8e639743f 100755
--- a/t/t0014-hash.sh
+++ b/t/t0014-hash.sh
@@ -26,4 +26,29 @@ test_expect_success 'test basic SHA-1 hash values' '
 	grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
 '
 
+test_expect_success 'test basic SHA-256 hash values' '
+	test-tool sha256 </dev/null >actual &&
+	grep e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 actual &&
+	printf "a" | test-tool sha256 >actual &&
+	grep ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb actual &&
+	printf "abc" | test-tool sha256 >actual &&
+	grep ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad actual &&
+	printf "message digest" | test-tool sha256 >actual &&
+	grep f7846f55cf23e14eebeab5b4e1550cad5b509e3348fbc4efa3a1413d393cb650 actual &&
+	printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha256 >actual &&
+	grep 71c480df93d6ae2f1efad1447c66c9525e316218cf51fc8d9ed832f2daf18b73 actual &&
+	perl -E "for (1..100000) { print q{aaaaaaaaaa}; }" | \
+		test-tool sha256 >actual &&
+	grep cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0 actual &&
+	perl -E "for (1..100000) { print q{abcdefghijklmnopqrstuvwxyz}; }" | \
+		test-tool sha256 >actual &&
+	grep e406ba321ca712ad35a698bf0af8d61fc4dc40eca6bdcea4697962724ccbde35 actual &&
+	printf "blob 0\0" | test-tool sha256 >actual &&
+	grep 473a0f4c3be8a93681a267e3b1e9a7dcda1185436fe141f7749120a303721813 actual &&
+	printf "blob 3\0abc" | test-tool sha256 >actual &&
+	grep c1cf6e465077930e88dc5136641d402f72a229ddd996f627d60e9639eaba35a6 actual &&
+	printf "tree 0\0" | test-tool sha256 >actual &&
+	grep 6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321 actual
+'
+
 test_done

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 11/13] sha256: add an SHA-256 implementation using libgcrypt
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
                   ` (9 preceding siblings ...)
  2018-10-15  2:18 ` [PATCH v2 10/13] Add a base implementation of SHA-256 support brian m. carlson
@ 2018-10-15  2:18 ` brian m. carlson
  2018-10-15  2:18 ` [PATCH v2 12/13] hash: add an SHA-256 implementation using OpenSSL brian m. carlson
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-15  2:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

Generally, one gets better performance out of cryptographic routines
written in assembly than C, and this is also true for SHA-256.  In
addition, most Linux distributions cannot distribute Git linked against
OpenSSL for licensing reasons.

Most systems with GnuPG will also have libgcrypt, since it is a
dependency of GnuPG.  libgcrypt is also faster than the SHA1DC
implementation for messages of a few KiB and larger. It is licensed
under the LGPL 2.1, which is compatible with the GPL.

Add an implementation of SHA-256 that uses libgcrypt.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile        | 13 +++++++++++--
 hash.h          |  4 ++++
 sha256/gcrypt.h | 30 ++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+), 2 deletions(-)
 create mode 100644 sha256/gcrypt.h

diff --git a/Makefile b/Makefile
index 76d378c7ba..3d91555a81 100644
--- a/Makefile
+++ b/Makefile
@@ -179,6 +179,10 @@ all::
 # in one call to the platform's SHA1_Update(). e.g. APPLE_COMMON_CRYPTO
 # wants 'SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L' defined.
 #
+# Define BLK_SHA256 to use the built-in SHA-256 routines.
+#
+# Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1634,8 +1638,13 @@ endif
 endif
 endif
 
-LIB_OBJS += sha256/block/sha256.o
-BASIC_CFLAGS += -DSHA256_BLK
+ifdef GCRYPT_SHA256
+	BASIC_CFLAGS += -DSHA256_GCRYPT
+	EXTLIBS += -lgcrypt
+else
+	LIB_OBJS += sha256/block/sha256.o
+	BASIC_CFLAGS += -DSHA256_BLK
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
 	LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index 88d18896d7..9df562f2f6 100644
--- a/hash.h
+++ b/hash.h
@@ -15,7 +15,11 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#if defined(SHA256_GCRYPT)
+#include "sha256/gcrypt.h"
+#else
 #include "sha256/block/sha256.h"
+#endif
 
 #ifndef platform_SHA_CTX
 /*
diff --git a/sha256/gcrypt.h b/sha256/gcrypt.h
new file mode 100644
index 0000000000..09bd8bb200
--- /dev/null
+++ b/sha256/gcrypt.h
@@ -0,0 +1,30 @@
+#ifndef SHA256_GCRYPT_H
+#define SHA256_GCRYPT_H
+
+#include <gcrypt.h>
+
+#define SHA256_DIGEST_SIZE 32
+
+typedef gcry_md_hd_t gcrypt_SHA256_CTX;
+
+inline void gcrypt_SHA256_Init(gcrypt_SHA256_CTX *ctx)
+{
+	gcry_md_open(ctx, GCRY_MD_SHA256, 0);
+}
+
+inline void gcrypt_SHA256_Update(gcrypt_SHA256_CTX *ctx, const void *data, size_t len)
+{
+	gcry_md_write(*ctx, data, len);
+}
+
+inline void gcrypt_SHA256_Final(unsigned char *digest, gcrypt_SHA256_CTX *ctx)
+{
+	memcpy(digest, gcry_md_read(*ctx, GCRY_MD_SHA256), SHA256_DIGEST_SIZE);
+}
+
+#define platform_SHA256_CTX gcrypt_SHA256_CTX
+#define platform_SHA256_Init gcrypt_SHA256_Init
+#define platform_SHA256_Update gcrypt_SHA256_Update
+#define platform_SHA256_Final gcrypt_SHA256_Final
+
+#endif

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 12/13] hash: add an SHA-256 implementation using OpenSSL
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
                   ` (10 preceding siblings ...)
  2018-10-15  2:18 ` [PATCH v2 11/13] sha256: add an SHA-256 implementation using libgcrypt brian m. carlson
@ 2018-10-15  2:18 ` brian m. carlson
  2018-10-16 15:36   ` Duy Nguyen
  2018-10-15  2:19 ` [PATCH v2 13/13] commit-graph: specify OID version for SHA-256 brian m. carlson
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 44+ messages in thread
From: brian m. carlson @ 2018-10-15  2:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

We already have OpenSSL routines available for SHA-1, so add routines
for SHA-256 as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile | 7 +++++++
 hash.h   | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/Makefile b/Makefile
index 3d91555a81..3164e2aeee 100644
--- a/Makefile
+++ b/Makefile
@@ -183,6 +183,8 @@ all::
 #
 # Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
 #
+# Define OPENSSL_SHA256 to use the SHA-256 routines in OpenSSL.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1638,6 +1640,10 @@ endif
 endif
 endif
 
+ifdef OPENSSL_SHA256
+	EXTLIBS += $(LIB_4_CRYPTO)
+	BASIC_CFLAGS += -DSHA256_OPENSSL
+else
 ifdef GCRYPT_SHA256
 	BASIC_CFLAGS += -DSHA256_GCRYPT
 	EXTLIBS += -lgcrypt
@@ -1645,6 +1651,7 @@ else
 	LIB_OBJS += sha256/block/sha256.o
 	BASIC_CFLAGS += -DSHA256_BLK
 endif
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
 	LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index 9df562f2f6..9df06d56b4 100644
--- a/hash.h
+++ b/hash.h
@@ -17,6 +17,8 @@
 
 #if defined(SHA256_GCRYPT)
 #include "sha256/gcrypt.h"
+#elif defined(SHA256_OPENSSL)
+#include <openssl/sha.h>
 #else
 #include "sha256/block/sha256.h"
 #endif

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 13/13] commit-graph: specify OID version for SHA-256
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
                   ` (11 preceding siblings ...)
  2018-10-15  2:18 ` [PATCH v2 12/13] hash: add an SHA-256 implementation using OpenSSL brian m. carlson
@ 2018-10-15  2:19 ` brian m. carlson
  2018-10-15 15:11   ` Derrick Stolee
                     ` (3 more replies)
  2018-10-16  2:00 ` [PATCH v2 00/13] Base SHA-256 implementation Junio C Hamano
                   ` (2 subsequent siblings)
  15 siblings, 4 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-15  2:19 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

Since the commit-graph code wants to serialize the hash algorithm into
the data store, specify a version number for each supported algorithm.
Note that we don't use the values of the constants themselves, as they
are internal and could change in the future.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 commit-graph.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7a28fbb03f..e587c21bb6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
 
 static uint8_t oid_version(void)
 {
-	return 1;
+	switch (hash_algo_by_ptr(the_hash_algo)) {
+		case GIT_HASH_SHA1:
+			return 1;
+		case GIT_HASH_SHA256:
+			return 2;
+		default:
+			BUG("unknown hash algorithm");
+	}
 }
 
 static struct commit_graph *alloc_commit_graph(void)

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 10/13] Add a base implementation of SHA-256 support
  2018-10-15  2:18 ` [PATCH v2 10/13] Add a base implementation of SHA-256 support brian m. carlson
@ 2018-10-15 14:59   ` Duy Nguyen
  2018-10-15 23:30     ` brian m. carlson
  2018-10-17 16:12   ` SZEDER Gábor
  1 sibling, 1 reply; 44+ messages in thread
From: Duy Nguyen @ 2018-10-15 14:59 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Git Mailing List, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

 On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> SHA-1 is weak and we need to transition to a new hash function.  For
> some time, we have referred to this new function as NewHash.  Recently,
> we decided to pick SHA-256 as NewHash.
>
> Add a basic implementation of SHA-256 based off libtomcrypt, which is in
> the public domain.  Optimize it and restructure it to meet our coding
> standards.  Place it in a directory called "sha256" where it and any
> future implementations can live so as to avoid a proliferation of
> implementation directories.
>
> Wire up SHA-256 in the list of hash algorithms, and add a test that the
> algorithm works correctly.
>
> Note that with this patch, it is still not possible to switch to using
> SHA-256 in Git.  Additional patches are needed to prepare the code to
> handle a larger hash algorithm and further test fixes are needed.

At some point I assume SHA-256 will become functional and be part of a
git release without all file formats updated to support multiple
hashes. Should we somehow discourage the user from using it because it
will break when all file formats are finally updated?

The simplest way is to just not register "sha256" in hash_algos unless
some developer flag is set. Or rename sha256 to sha256-experimental or
something to make it more obvious (but then we may need to fix up the
test suite after renaming it back to sha256, not great)
-- 
Duy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 09/13] commit-graph: convert to using the_hash_algo
  2018-10-15  2:18 ` [PATCH v2 09/13] commit-graph: convert to using the_hash_algo brian m. carlson
@ 2018-10-15 15:10   ` Derrick Stolee
  0 siblings, 0 replies; 44+ messages in thread
From: Derrick Stolee @ 2018-10-15 15:10 UTC (permalink / raw)
  To: brian m. carlson, git; +Cc: Jeff King, Ævar Arnfjörð Bjarmason

On 10/14/2018 10:18 PM, brian m. carlson wrote:
> Instead of using hard-coded constants for object sizes, use
> the_hash_algo to look them up.  In addition, use a function call to look
> up the object ID version and produce the correct value.
This looks good and I can see already how this new organization will 
help make the hash size more flexible.

Thanks!
-Stolee


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256
  2018-10-15  2:19 ` [PATCH v2 13/13] commit-graph: specify OID version for SHA-256 brian m. carlson
@ 2018-10-15 15:11   ` Derrick Stolee
  2018-10-16  2:00   ` Junio C Hamano
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 44+ messages in thread
From: Derrick Stolee @ 2018-10-15 15:11 UTC (permalink / raw)
  To: brian m. carlson, git; +Cc: Jeff King, Ævar Arnfjörð Bjarmason

On 10/14/2018 10:19 PM, brian m. carlson wrote:
> Since the commit-graph code wants to serialize the hash algorithm into
> the data store, specify a version number for each supported algorithm.
> Note that we don't use the values of the constants themselves, as they
> are internal and could change in the future.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>   commit-graph.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 7a28fbb03f..e587c21bb6 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
>   
>   static uint8_t oid_version(void)
>   {
> -	return 1;
> +	switch (hash_algo_by_ptr(the_hash_algo)) {
> +		case GIT_HASH_SHA1:
> +			return 1;
> +		case GIT_HASH_SHA256:
> +			return 2;
> +		default:
> +			BUG("unknown hash algorithm");
> +	}
>   }
>   
>   static struct commit_graph *alloc_commit_graph(void)
Simple and easy!

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 10/13] Add a base implementation of SHA-256 support
  2018-10-15 14:59   ` Duy Nguyen
@ 2018-10-15 23:30     ` brian m. carlson
  2018-10-16 14:59       ` Duy Nguyen
  0 siblings, 1 reply; 44+ messages in thread
From: brian m. carlson @ 2018-10-15 23:30 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 5426 bytes --]

On Mon, Oct 15, 2018 at 04:59:12PM +0200, Duy Nguyen wrote:
>  On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> >
> > SHA-1 is weak and we need to transition to a new hash function.  For
> > some time, we have referred to this new function as NewHash.  Recently,
> > we decided to pick SHA-256 as NewHash.
> >
> > Add a basic implementation of SHA-256 based off libtomcrypt, which is in
> > the public domain.  Optimize it and restructure it to meet our coding
> > standards.  Place it in a directory called "sha256" where it and any
> > future implementations can live so as to avoid a proliferation of
> > implementation directories.
> >
> > Wire up SHA-256 in the list of hash algorithms, and add a test that the
> > algorithm works correctly.
> >
> > Note that with this patch, it is still not possible to switch to using
> > SHA-256 in Git.  Additional patches are needed to prepare the code to
> > handle a larger hash algorithm and further test fixes are needed.
> 
> At some point I assume SHA-256 will become functional and be part of a
> git release without all file formats updated to support multiple
> hashes. Should we somehow discourage the user from using it because it
> will break when all file formats are finally updated?

In order to activate SHA-256 in the codebase, currently you need a patch
to force it on.  Otherwise, the code is simply inert and does nothing
(other than in the test-tool).  I've included the patch below so you can
see what it does (or if you want to play around with it).

Without this patch, Git remains fully SHA-1 and can't access any of the
SHA-256 code.  I have some very preliminary patches that do wire up
extensions.objectFormat (branch object-id-part15 [sic]) but I haven't
picked them up in a while.  (I need to finish test fixes first.)

Making the codebase run in SHA-256 mode with a Git that supports both
algorithms but defaults to SHA-1 is currently seriously broken, even
with the object-id-part15 branch.  When those patches go in, they will
be behind a developer flag to prevent wholesale chaos and segfaults.

So in summary, no, I don't think a developer flag is necessary at this
point.

-- >% --
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: "brian m. carlson" <sandals@crustytoothpaste.net>
Date: Wed, 25 Jul 2018 23:48:30 +0000
Subject: [PATCH] Switch default hash algorithm to SHA-256 for testing

Set the default hash algorithm to SHA-256 for testing purposes.

This is a test commit and should not be used in production.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 repository.c            |  2 +-
 setup.c                 |  2 +-
 t/test-lib-functions.sh |  4 +---
 t/test-lib.sh           | 13 ++++++++-----
 4 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/repository.c b/repository.c
index 5dd1486718..4ce50ea670 100644
--- a/repository.c
+++ b/repository.c
@@ -17,7 +17,7 @@ void initialize_the_repository(void)
 	the_repo.objects = raw_object_store_new();
 	the_repo.parsed_objects = parsed_object_pool_new();
 
-	repo_set_hash_algo(&the_repo, GIT_HASH_SHA1);
+	repo_set_hash_algo(&the_repo, GIT_HASH_SHA256);
 }
 
 static void expand_base_dir(char **out, const char *in,
diff --git a/setup.c b/setup.c
index b24c811c1c..2b4cc4e5a4 100644
--- a/setup.c
+++ b/setup.c
@@ -490,7 +490,7 @@ int read_repository_format(struct repository_format *format, const char *path)
 	memset(format, 0, sizeof(*format));
 	format->version = -1;
 	format->is_bare = -1;
-	format->hash_algo = GIT_HASH_SHA1;
+	format->hash_algo = GIT_HASH_SHA256;
 	string_list_init(&format->unknown_extensions, 1);
 	git_config_from_file(check_repo_format, path, format);
 	return format->version;
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index dfa1bebb46..91cf2b9d40 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1165,9 +1165,7 @@ test_set_hash () {
 
 # Detect the hash algorithm in use.
 test_detect_hash () {
-	# Currently we only support SHA-1, but in the future this function will
-	# actually detect the algorithm in use.
-	test_hash_algo='sha1'
+	test_hash_algo='sha256'
 }
 
 # Load common hash metadata and common placeholder object IDs for use with
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 3f95bfda60..bb507f2d4f 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -176,18 +176,21 @@ esac
 
 # Convenience
 #
-# A regexp to match 5, 35 and 40 hexdigits
+# A regexp to match 4, 5, 35, 40, and 64 hexdigits
+_x04='[0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
 _x05='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
 _x35="$_x05$_x05$_x05$_x05$_x05$_x05$_x05"
 _x40="$_x35$_x05"
+_x64="$_x40$_x05$_x05$_x05$_x05$_x04"
 
 # Zero SHA-1
 _z40=0000000000000000000000000000000000000000
+_z64=0000000000000000000000000000000000000000000000000000000000000000
 
-OID_REGEX="$_x40"
-ZERO_OID=$_z40
-EMPTY_TREE=4b825dc642cb6eb9a060e54bf8d69288fbee4904
-EMPTY_BLOB=e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
+OID_REGEX="$_x64"
+ZERO_OID=$_z64
+EMPTY_TREE=6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321
+EMPTY_BLOB=473a0f4c3be8a93681a267e3b1e9a7dcda1185436fe141f7749120a303721813
 
 # Line feed
 LF='
-- >% --
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] hex: introduce functions to print arbitrary hashes
  2018-10-15  2:18 ` [PATCH v2 03/13] hex: introduce functions to print arbitrary hashes brian m. carlson
@ 2018-10-16  1:54   ` Junio C Hamano
  2018-10-17 23:49     ` brian m. carlson
  0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2018-10-16  1:54 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> diff --git a/cache.h b/cache.h
> index d508f3d4f8..a13d14ce0a 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1361,9 +1361,9 @@ extern int get_oid_hex(const char *hex, struct object_id *sha1);
>  extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
>  
>  /*
> - * Convert a binary sha1 to its hex equivalent. The `_r` variant is reentrant,
> + * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
>   * and writes the NUL-terminated output to the buffer `out`, which must be at
> - * least `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for
> + * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
>   * convenience.
>   *
>   * The non-`_r` variant returns a static buffer, but uses a ring of 4
> @@ -1371,10 +1371,13 @@ extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
>   *
>   *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
>   */
> -extern char *sha1_to_hex_r(char *out, const unsigned char *sha1);
> -extern char *oid_to_hex_r(char *out, const struct object_id *oid);
> -extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
> -extern char *oid_to_hex(const struct object_id *oid);	/* same static buffer as sha1_to_hex */
> +char *hash_to_hex_algo_r(char *buffer, const unsigned char *hash, int algo);
> +char *sha1_to_hex_r(char *out, const unsigned char *sha1);
> +char *oid_to_hex_r(char *out, const struct object_id *oid);
> +char *hash_to_hex_algo(const unsigned char *hash, int algo);	/* static buffer result! */
> +char *sha1_to_hex(const unsigned char *sha1);			/* same static buffer */
> +char *hash_to_hex(const unsigned char *hash);			/* same static buffer */
> +char *oid_to_hex(const struct object_id *oid);			/* same static buffer */

Even though in hex.c I see mixture of *_algo and *_algop helper
functions, I see only "algo" variants above.  Is it our official
stance to use primarily the integer index into the algo array when
specifying the hash, and when a caller into 'multi-hash' API happens
to have other things, it should use functions in 2/13 to convert it
to the canonical "int algo" beforehand?

I am not saying it is bad or good to choose the index into the algo
array as the primary way to specify the algorithm.  I think it is a
good idea to pick one and stick to it, and I wanted to make sure
that the choice we made here is clearly communicated to any future
developer who read this code.

> +char *sha1_to_hex(const unsigned char *sha1)
> +{
> +	return hash_to_hex_algo(sha1, GIT_HASH_SHA1);
> +}
> +
> +char *hash_to_hex(const unsigned char *hash)
> +{
> +	return hash_to_hex_algo(hash, hash_algo_by_ptr(the_hash_algo));
>  }
>  
>  char *oid_to_hex(const struct object_id *oid)
>  {
> -	return sha1_to_hex(oid->hash);
> +	return hash_to_hex_algo(oid->hash, hash_algo_by_ptr(the_hash_algo));
>  }

Having said the above, seeing the use of hash_algo_by_ptr() here
makes me suspect if it makes more sense to use the algop as the
primary way to specify which algorithm the caller wants to use.
IOW, making the set of helpers in 02/13 to allow quering by name,
format-id, or the integer index and have them all return a pointer
to "const struct git_hash_algo".  Two immediate downsides I can see
is that it exposes the actual structure to the callers (but is it
really a problem?  Outside callers learn hash sizes etc. by accessing
its fields anyway without treating the algo struct as opaque.), and
passing an 8-byte pointer may be more costly than passing a small
integer index that ranges between 0 and 1 at most (assuming that
we'd only use SHA-1 and "the current NewHash" in the code).



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256
  2018-10-15  2:19 ` [PATCH v2 13/13] commit-graph: specify OID version for SHA-256 brian m. carlson
  2018-10-15 15:11   ` Derrick Stolee
@ 2018-10-16  2:00   ` Junio C Hamano
  2018-10-16 22:39     ` brian m. carlson
  2018-10-16 15:35   ` Duy Nguyen
  2018-10-17 12:21   ` Derrick Stolee
  3 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2018-10-16  2:00 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Since the commit-graph code wants to serialize the hash algorithm into
> the data store, specify a version number for each supported algorithm.
> Note that we don't use the values of the constants themselves, as they
> are internal and could change in the future.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  commit-graph.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 7a28fbb03f..e587c21bb6 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
>  
>  static uint8_t oid_version(void)
>  {
> -	return 1;
> +	switch (hash_algo_by_ptr(the_hash_algo)) {
> +		case GIT_HASH_SHA1:
> +			return 1;
> +		case GIT_HASH_SHA256:
> +			return 2;
> +		default:
> +			BUG("unknown hash algorithm");
> +	}

Style: align switch/case.


Shouldn't this be just using GIT_HASH_* constants instead?  Are we
expecting that it would be benefitial to allow more than one "oid
version" even within a same GIT_HASH_*?

>  }
>  
>  static struct commit_graph *alloc_commit_graph(void)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 00/13] Base SHA-256 implementation
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
                   ` (12 preceding siblings ...)
  2018-10-15  2:19 ` [PATCH v2 13/13] commit-graph: specify OID version for SHA-256 brian m. carlson
@ 2018-10-16  2:00 ` Junio C Hamano
  2018-10-16  4:01 ` Junio C Hamano
  2018-10-16 15:39 ` Duy Nguyen
  15 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2018-10-16  2:00 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> I realize the increase to GIT_MAX_HEXSZ will result in an increase in
> memory usage, but we're going to need to do it at some point, and the
> sooner the code is in the codebase, the sooner people can play around
> with it and test it.

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 00/13] Base SHA-256 implementation
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
                   ` (13 preceding siblings ...)
  2018-10-16  2:00 ` [PATCH v2 00/13] Base SHA-256 implementation Junio C Hamano
@ 2018-10-16  4:01 ` Junio C Hamano
  2018-10-16 22:45   ` brian m. carlson
  2018-10-16 15:39 ` Duy Nguyen
  15 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2018-10-16  4:01 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>  t/t0014-hash.sh                       |  54 ++++++++
>  create mode 100755 t/t0014-hash.sh

If I am not mistaken, 0014 is already used by another topic in
flight, and will cause test-lint failure on 'pu'.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 10/13] Add a base implementation of SHA-256 support
  2018-10-15 23:30     ` brian m. carlson
@ 2018-10-16 14:59       ` Duy Nguyen
  0 siblings, 0 replies; 44+ messages in thread
From: Duy Nguyen @ 2018-10-16 14:59 UTC (permalink / raw)
  To: brian m. carlson, Git Mailing List, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Tue, Oct 16, 2018 at 1:31 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On Mon, Oct 15, 2018 at 04:59:12PM +0200, Duy Nguyen wrote:
> >  On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson
> > <sandals@crustytoothpaste.net> wrote:
> > >
> > > SHA-1 is weak and we need to transition to a new hash function.  For
> > > some time, we have referred to this new function as NewHash.  Recently,
> > > we decided to pick SHA-256 as NewHash.
> > >
> > > Add a basic implementation of SHA-256 based off libtomcrypt, which is in
> > > the public domain.  Optimize it and restructure it to meet our coding
> > > standards.  Place it in a directory called "sha256" where it and any
> > > future implementations can live so as to avoid a proliferation of
> > > implementation directories.
> > >
> > > Wire up SHA-256 in the list of hash algorithms, and add a test that the
> > > algorithm works correctly.
> > >
> > > Note that with this patch, it is still not possible to switch to using
> > > SHA-256 in Git.  Additional patches are needed to prepare the code to
> > > handle a larger hash algorithm and further test fixes are needed.
> >
> > At some point I assume SHA-256 will become functional and be part of a
> > git release without all file formats updated to support multiple
> > hashes. Should we somehow discourage the user from using it because it
> > will break when all file formats are finally updated?
>
> In order to activate SHA-256 in the codebase, currently you need a patch
> to force it on.  Otherwise, the code is simply inert and does nothing
> (other than in the test-tool).  I've included the patch below so you can
> see what it does (or if you want to play around with it).
>
> Without this patch, Git remains fully SHA-1 and can't access any of the
> SHA-256 code.  I have some very preliminary patches that do wire up
> extensions.objectFormat (branch object-id-part15 [sic]) but I haven't
> picked them up in a while.  (I need to finish test fixes first.)

Ah, I thought that extensions.objectFormat and setup changes already
landed (I think I saw that series on this list). Sorry for the noise.

-- 
Duy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 01/13] sha1-file: rename algorithm to "sha1"
  2018-10-15  2:18 ` [PATCH v2 01/13] sha1-file: rename algorithm to "sha1" brian m. carlson
@ 2018-10-16 15:17   ` Duy Nguyen
  2018-10-17 22:53     ` brian m. carlson
  0 siblings, 1 reply; 44+ messages in thread
From: Duy Nguyen @ 2018-10-16 15:17 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Git Mailing List, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Mon, Oct 15, 2018 at 4:21 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> The transition plan anticipates us using a syntax such as "^{sha1}" for
> disambiguation.  Since this is a syntax some people will be typing a
> lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
> the dash doesn't create any ambiguity, but it does make it shorter and

"but" or "and"? I think both clauses are on the same side ... or did
you mean omitting the dash does create ambiguity?

> easier to type, especially for touch typists.  In addition, the
> transition plan already uses "sha1" in this context.
>
> Rename the name of SHA-1 implementation to "sha1".
>
> Note that this change creates no backwards compatibility concerns, since
> we haven't yet used this field in any serialized data formats.

But we're not going to use this _string_ in any data format either
because we'll stick to format_id field anyway, right?
-- 
Duy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256
  2018-10-15  2:19 ` [PATCH v2 13/13] commit-graph: specify OID version for SHA-256 brian m. carlson
  2018-10-15 15:11   ` Derrick Stolee
  2018-10-16  2:00   ` Junio C Hamano
@ 2018-10-16 15:35   ` Duy Nguyen
  2018-10-16 16:01     ` Derrick Stolee
  2018-10-17 12:21   ` Derrick Stolee
  3 siblings, 1 reply; 44+ messages in thread
From: Duy Nguyen @ 2018-10-16 15:35 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Git Mailing List, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> Since the commit-graph code wants to serialize the hash algorithm into
> the data store, specify a version number for each supported algorithm.
> Note that we don't use the values of the constants themselves, as they
> are internal and could change in the future.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  commit-graph.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 7a28fbb03f..e587c21bb6 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
>
>  static uint8_t oid_version(void)
>  {
> -       return 1;
> +       switch (hash_algo_by_ptr(the_hash_algo)) {
> +               case GIT_HASH_SHA1:
> +                       return 1;
> +               case GIT_HASH_SHA256:
> +                       return 2;

Should we just increase this field to uint32_t and store format_id
instead? That will keep oid version unique in all data formats.

> +               default:
> +                       BUG("unknown hash algorithm");
> +       }
>  }
>
>  static struct commit_graph *alloc_commit_graph(void)
-- 
Duy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 12/13] hash: add an SHA-256 implementation using OpenSSL
  2018-10-15  2:18 ` [PATCH v2 12/13] hash: add an SHA-256 implementation using OpenSSL brian m. carlson
@ 2018-10-16 15:36   ` Duy Nguyen
  0 siblings, 0 replies; 44+ messages in thread
From: Duy Nguyen @ 2018-10-16 15:36 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Git Mailing List, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Mon, Oct 15, 2018 at 4:22 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> We already have OpenSSL routines available for SHA-1, so add routines
> for SHA-256 as well.

Since we have "hash-speed" tool now, it would be great to keep some
numbers of these hash implementations in the commit message (and maybe
sha1 as well just for comparison).

>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  Makefile | 7 +++++++
>  hash.h   | 2 ++
>  2 files changed, 9 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index 3d91555a81..3164e2aeee 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -183,6 +183,8 @@ all::
>  #
>  # Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
>  #
> +# Define OPENSSL_SHA256 to use the SHA-256 routines in OpenSSL.
> +#
>  # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
>  #
>  # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
> @@ -1638,6 +1640,10 @@ endif
>  endif
>  endif
>
> +ifdef OPENSSL_SHA256
> +       EXTLIBS += $(LIB_4_CRYPTO)
> +       BASIC_CFLAGS += -DSHA256_OPENSSL
> +else
>  ifdef GCRYPT_SHA256
>         BASIC_CFLAGS += -DSHA256_GCRYPT
>         EXTLIBS += -lgcrypt
> @@ -1645,6 +1651,7 @@ else
>         LIB_OBJS += sha256/block/sha256.o
>         BASIC_CFLAGS += -DSHA256_BLK
>  endif
> +endif
>
>  ifdef SHA1_MAX_BLOCK_SIZE
>         LIB_OBJS += compat/sha1-chunked.o
> diff --git a/hash.h b/hash.h
> index 9df562f2f6..9df06d56b4 100644
> --- a/hash.h
> +++ b/hash.h
> @@ -17,6 +17,8 @@
>
>  #if defined(SHA256_GCRYPT)
>  #include "sha256/gcrypt.h"
> +#elif defined(SHA256_OPENSSL)
> +#include <openssl/sha.h>
>  #else
>  #include "sha256/block/sha256.h"
>  #endif



-- 
Duy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 00/13] Base SHA-256 implementation
  2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
                   ` (14 preceding siblings ...)
  2018-10-16  4:01 ` Junio C Hamano
@ 2018-10-16 15:39 ` Duy Nguyen
  15 siblings, 0 replies; 44+ messages in thread
From: Duy Nguyen @ 2018-10-16 15:39 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Git Mailing List, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Mon, Oct 15, 2018 at 4:21 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> This series provides an actual SHA-256 implementation and wires it up,
> along with some housekeeping patches to make it suitable for testing.
>
> New in this version is a patch which reverts the change to limit hashcmp
> to 20 bytes.  I've taken care to permit the compiler to inline as much
> as possible for efficiency, but it would be helpful to see what the
> performance impact of these changes is on real-life workflows, or with
> MSVC and other non-GCC and non-clang compilers.  The resulting change is
> uglier and more duplicative than I wanted, but oh, well.
>
> I didn't attempt to pull in the full complement of code from libtomcrypt
> to try to show the changes made, since that would have involved
> importing a significant quantity of code in order to make things work.
>
> I realize the increase to GIT_MAX_HEXSZ will result in an increase in
> memory usage, but we're going to need to do it at some point, and the
> sooner the code is in the codebase, the sooner people can play around
> with it and test it.
>
> This piece should be the final piece of preparatory work required for a
> fully functioning SHA-256-only Git.  Additional work should be able to
> come into the testsuite and codebase without needing to build on work in
> any series after this one.

Thanks again for keeping working on this. I'm still sick and can't
really do a deep review. With that in mind, the patches look good.
-- 
Duy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 04/13] cache: make hashcmp and hasheq work with larger hashes
  2018-10-15  2:18 ` [PATCH v2 04/13] cache: make hashcmp and hasheq work with larger hashes brian m. carlson
@ 2018-10-16 15:44   ` Duy Nguyen
  0 siblings, 0 replies; 44+ messages in thread
From: Duy Nguyen @ 2018-10-16 15:44 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Git Mailing List, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Mon, Oct 15, 2018 at 4:21 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> diff --git a/cache.h b/cache.h
> index a13d14ce0a..0b88c3a344 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1024,16 +1024,12 @@ extern const struct object_id null_oid;
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>  {
>         /*
> -        * This is a temporary optimization hack. By asserting the size here,
> -        * we let the compiler know that it's always going to be 20, which lets
> -        * it turn this fixed-size memcmp into a few inline instructions.
> -        *
> -        * This will need to be extended or ripped out when we learn about
> -        * hashes of different sizes.
> +        * Teach the compiler that there are only two possibilities of hash size
> +        * here, so that it can optimize for this case as much as possible.
>          */
> -       if (the_hash_algo->rawsz != 20)
> -               BUG("hash size not yet supported by hashcmp");
> -       return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +       if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)

It's tangent. But performance is probably another good reason to
detach the_hash_algo from the_repository so we have one less
dereference to do. (the other good reason is these hash operations
should work in "no-repo" commands as well, where the_repository does
not really make sense).

> +               return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
> +       return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
>  }
>
>  static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
> @@ -1043,7 +1039,13 @@ static inline int oidcmp(const struct object_id *oid1, const struct object_id *o
>
>  static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
>  {
> -       return !hashcmp(sha1, sha2);
> +       /*
> +        * We write this here instead of deferring to hashcmp so that the
> +        * compiler can properly inline it and avoid calling memcmp.
> +        */
> +       if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
> +               return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
> +       return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
>  }
>
>  static inline int oideq(const struct object_id *oid1, const struct object_id *oid2)



-- 
Duy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256
  2018-10-16 15:35   ` Duy Nguyen
@ 2018-10-16 16:01     ` Derrick Stolee
  2018-10-16 16:09       ` Duy Nguyen
  0 siblings, 1 reply; 44+ messages in thread
From: Derrick Stolee @ 2018-10-16 16:01 UTC (permalink / raw)
  To: Duy Nguyen, brian m. carlson
  Cc: Git Mailing List, Jeff King, Ævar Arnfjörð Bjarmason

On 10/16/2018 11:35 AM, Duy Nguyen wrote:
> On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
>> Since the commit-graph code wants to serialize the hash algorithm into
>> the data store, specify a version number for each supported algorithm.
>> Note that we don't use the values of the constants themselves, as they
>> are internal and could change in the future.
>>
>> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
>> ---
>>   commit-graph.c | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 7a28fbb03f..e587c21bb6 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
>>
>>   static uint8_t oid_version(void)
>>   {
>> -       return 1;
>> +       switch (hash_algo_by_ptr(the_hash_algo)) {
>> +               case GIT_HASH_SHA1:
>> +                       return 1;
>> +               case GIT_HASH_SHA256:
>> +                       return 2;
> Should we just increase this field to uint32_t and store format_id
> instead? That will keep oid version unique in all data formats.
Both the commit-graph and multi-pack-index store a single byte for the 
hash version, so that ship has sailed (without incrementing the full 
file version number in each format).

It may be good to make this method accessible to both formats. I'm not 
sure if Brian's branch is built on top of the multi-pack-index code. 
Probably best to see if ds/multi-pack-verify is in the history.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256
  2018-10-16 16:01     ` Derrick Stolee
@ 2018-10-16 16:09       ` Duy Nguyen
  2018-10-16 22:44         ` brian m. carlson
  0 siblings, 1 reply; 44+ messages in thread
From: Duy Nguyen @ 2018-10-16 16:09 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: brian m. carlson, Git Mailing List, Jeff King,
	Ævar Arnfjörð Bjarmason

On Tue, Oct 16, 2018 at 6:01 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 10/16/2018 11:35 AM, Duy Nguyen wrote:
> > On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson
> > <sandals@crustytoothpaste.net> wrote:
> >> Since the commit-graph code wants to serialize the hash algorithm into
> >> the data store, specify a version number for each supported algorithm.
> >> Note that we don't use the values of the constants themselves, as they
> >> are internal and could change in the future.
> >>
> >> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> >> ---
> >>   commit-graph.c | 9 ++++++++-
> >>   1 file changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/commit-graph.c b/commit-graph.c
> >> index 7a28fbb03f..e587c21bb6 100644
> >> --- a/commit-graph.c
> >> +++ b/commit-graph.c
> >> @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
> >>
> >>   static uint8_t oid_version(void)
> >>   {
> >> -       return 1;
> >> +       switch (hash_algo_by_ptr(the_hash_algo)) {
> >> +               case GIT_HASH_SHA1:
> >> +                       return 1;
> >> +               case GIT_HASH_SHA256:
> >> +                       return 2;
> > Should we just increase this field to uint32_t and store format_id
> > instead? That will keep oid version unique in all data formats.
> Both the commit-graph and multi-pack-index store a single byte for the
> hash version, so that ship has sailed (without incrementing the full
> file version number in each format).

And it's probably premature to add the oid version field when multiple
hash support has not been fully realized. Now we have different ways
of storing hash id and need separate mappings.

I would go for incrementing file version. Otherwise maybe we just
update format_id to be one byte instead, and the way of storing hash
version in commit-graph will be used everywhere.

> It may be good to make this method accessible to both formats. I'm not
> sure if Brian's branch is built on top of the multi-pack-index code.
> Probably best to see if ds/multi-pack-verify is in the history.
>
> Thanks,
> -Stolee



-- 
Duy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256
  2018-10-16  2:00   ` Junio C Hamano
@ 2018-10-16 22:39     ` brian m. carlson
  0 siblings, 0 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-16 22:39 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 1744 bytes --]

On Tue, Oct 16, 2018 at 11:00:19AM +0900, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > Since the commit-graph code wants to serialize the hash algorithm into
> > the data store, specify a version number for each supported algorithm.
> > Note that we don't use the values of the constants themselves, as they
> > are internal and could change in the future.
> >
> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> > ---
> >  commit-graph.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/commit-graph.c b/commit-graph.c
> > index 7a28fbb03f..e587c21bb6 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
> >  
> >  static uint8_t oid_version(void)
> >  {
> > -	return 1;
> > +	switch (hash_algo_by_ptr(the_hash_algo)) {
> > +		case GIT_HASH_SHA1:
> > +			return 1;
> > +		case GIT_HASH_SHA256:
> > +			return 2;
> > +		default:
> > +			BUG("unknown hash algorithm");
> > +	}
> 
> Style: align switch/case.

Will fix.

> Shouldn't this be just using GIT_HASH_* constants instead?  Are we
> expecting that it would be benefitial to allow more than one "oid
> version" even within a same GIT_HASH_*?

I really would like to have us rely not rely explicitly on those values.
We don't serialize them anywhere else, and they're explicitly documented
as not being suitable for serialization.  If we were writing this fresh
today, we'd probably use the format_version field, which is designed for
this purpose, or simply omit the field altogether.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256
  2018-10-16 16:09       ` Duy Nguyen
@ 2018-10-16 22:44         ` brian m. carlson
  2018-10-17 14:31           ` Duy Nguyen
  0 siblings, 1 reply; 44+ messages in thread
From: brian m. carlson @ 2018-10-16 22:44 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Derrick Stolee, Git Mailing List, Jeff King,
	Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 2615 bytes --]

On Tue, Oct 16, 2018 at 06:09:41PM +0200, Duy Nguyen wrote:
> On Tue, Oct 16, 2018 at 6:01 PM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 10/16/2018 11:35 AM, Duy Nguyen wrote:
> > > On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson
> > > <sandals@crustytoothpaste.net> wrote:
> > >> Since the commit-graph code wants to serialize the hash algorithm into
> > >> the data store, specify a version number for each supported algorithm.
> > >> Note that we don't use the values of the constants themselves, as they
> > >> are internal and could change in the future.
> > >>
> > >> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> > >> ---
> > >>   commit-graph.c | 9 ++++++++-
> > >>   1 file changed, 8 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/commit-graph.c b/commit-graph.c
> > >> index 7a28fbb03f..e587c21bb6 100644
> > >> --- a/commit-graph.c
> > >> +++ b/commit-graph.c
> > >> @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
> > >>
> > >>   static uint8_t oid_version(void)
> > >>   {
> > >> -       return 1;
> > >> +       switch (hash_algo_by_ptr(the_hash_algo)) {
> > >> +               case GIT_HASH_SHA1:
> > >> +                       return 1;
> > >> +               case GIT_HASH_SHA256:
> > >> +                       return 2;
> > > Should we just increase this field to uint32_t and store format_id
> > > instead? That will keep oid version unique in all data formats.
> > Both the commit-graph and multi-pack-index store a single byte for the
> > hash version, so that ship has sailed (without incrementing the full
> > file version number in each format).
> 
> And it's probably premature to add the oid version field when multiple
> hash support has not been fully realized. Now we have different ways
> of storing hash id and need separate mappings.

Honestly, anything in the .git directory that is not the v3 pack indexes
or the loose object file should be in exactly one hash algorithm.  We
could simply just leave this value at 1 all the time and ignore the
field, since we already know what algorithm it will use.

> I would go for incrementing file version. Otherwise maybe we just
> update format_id to be one byte instead, and the way of storing hash
> version in commit-graph will be used everywhere.

It needs to be four bytes for pack files so that it's four-byte aligned.
Otherwise accessing pack index fields will cause alignment issues if we
don't massively rewrite the pack handling code.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 00/13] Base SHA-256 implementation
  2018-10-16  4:01 ` Junio C Hamano
@ 2018-10-16 22:45   ` brian m. carlson
  0 siblings, 0 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-16 22:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 539 bytes --]

On Tue, Oct 16, 2018 at 01:01:07PM +0900, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> >  t/t0014-hash.sh                       |  54 ++++++++
> >  create mode 100755 t/t0014-hash.sh
> 
> If I am not mistaken, 0014 is already used by another topic in
> flight, and will cause test-lint failure on 'pu'.

This series was written a while back.  I'll rename it.  Feel free to
wait for v3 before picking it up.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256
  2018-10-15  2:19 ` [PATCH v2 13/13] commit-graph: specify OID version for SHA-256 brian m. carlson
                     ` (2 preceding siblings ...)
  2018-10-16 15:35   ` Duy Nguyen
@ 2018-10-17 12:21   ` Derrick Stolee
  2018-10-17 22:38     ` brian m. carlson
  3 siblings, 1 reply; 44+ messages in thread
From: Derrick Stolee @ 2018-10-17 12:21 UTC (permalink / raw)
  To: brian m. carlson, git; +Cc: Jeff King, Ævar Arnfjörð Bjarmason

On 10/14/2018 10:19 PM, brian m. carlson wrote:
> Since the commit-graph code wants to serialize the hash algorithm into
> the data store, specify a version number for each supported algorithm.
> Note that we don't use the values of the constants themselves, as they
> are internal and could change in the future.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>   commit-graph.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 7a28fbb03f..e587c21bb6 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
>   
>   static uint8_t oid_version(void)
>   {
> -	return 1;
> +	switch (hash_algo_by_ptr(the_hash_algo)) {
hash_algo_by_ptr is specified as an inline function in hash.h, so this 
leads to a linking error in jch and pu right now.

I think the fix is simply to add '#include "hash.h"' to the list of 
includes.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/13] sha1-file: provide functions to look up hash algorithms
  2018-10-15  2:18 ` [PATCH v2 02/13] sha1-file: provide functions to look up hash algorithms brian m. carlson
@ 2018-10-17 13:32   ` SZEDER Gábor
  0 siblings, 0 replies; 44+ messages in thread
From: SZEDER Gábor @ 2018-10-17 13:32 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

On Mon, Oct 15, 2018 at 02:18:49AM +0000, brian m. carlson wrote:
> There are several ways we might refer to a hash algorithm: by name, such
> as in the config file; by format ID, such as in a pack; or internally,
> by a pointer to the hash_algos array.  Provide functions to look up hash
> algorithms based on these various forms and return the internal constant
> used for them.  If conversion to another form is necessary, this
> internal constant can be used to look up the proper data in the
> hash_algos array.
> 
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  hash.h      | 13 +++++++++++++
>  sha1-file.c | 21 +++++++++++++++++++++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/hash.h b/hash.h
> index 7c8238bc2e..90f4344619 100644
> --- a/hash.h
> +++ b/hash.h
> @@ -98,4 +98,17 @@ struct git_hash_algo {
>  };
>  extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
>  
> +/*
> + * Return a GIT_HASH_* constant based on the name.  Returns GIT_HASH_UNKNOWN if
> + * the name doesn't match a known algorithm.
> + */
> +int hash_algo_by_name(const char *name);
> +/* Identical, except based on the format ID. */
> +int hash_algo_by_id(uint32_t format_id);
> +/* Identical, except for a pointer to struct git_hash_algo. */
> +inline int hash_algo_by_ptr(const struct git_hash_algo *p)

This has to be declared as static, otherwise the linker will error out
when building without optimization:

    LINK git
libgit.a(commit-graph.o): In function `oid_version':
/home/szeder/src/git/commit-graph.c:48: undefined reference to `hash_algo_by_ptr'
libgit.a(hex.o): In function `hash_to_hex':
/home/szeder/src/git/hex.c:123: undefined reference to `hash_algo_by_ptr'
libgit.a(hex.o): In function `oid_to_hex':
/home/szeder/src/git/hex.c:128: undefined reference to `hash_algo_by_ptr'
collect2: error: ld returned 1 exit status
Makefile:2055: recipe for target 'git' failed
make: *** [git] Error 1


> +{
> +	return p - hash_algos;
> +}
> +
>  #endif
> diff --git a/sha1-file.c b/sha1-file.c
> index e29825f259..3a75d515eb 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -122,6 +122,27 @@ const char *empty_blob_oid_hex(void)
>  	return oid_to_hex_r(buf, the_hash_algo->empty_blob);
>  }
>  
> +int hash_algo_by_name(const char *name)
> +{
> +	int i;
> +	if (!name)
> +		return GIT_HASH_UNKNOWN;
> +	for (i = 1; i < GIT_HASH_NALGOS; i++)
> +		if (!strcmp(name, hash_algos[i].name))
> +			return i;
> +	return GIT_HASH_UNKNOWN;
> +}
> +
> +int hash_algo_by_id(uint32_t format_id)
> +{
> +	int i;
> +	for (i = 1; i < GIT_HASH_NALGOS; i++)
> +		if (format_id == hash_algos[i].format_id)
> +			return i;
> +	return GIT_HASH_UNKNOWN;
> +}
> +
> +
>  /*
>   * This is meant to hold a *small* number of objects that you would
>   * want read_sha1_file() to be able to return, but yet you do not want

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256
  2018-10-16 22:44         ` brian m. carlson
@ 2018-10-17 14:31           ` Duy Nguyen
  2018-10-18  0:06             ` brian m. carlson
  0 siblings, 1 reply; 44+ messages in thread
From: Duy Nguyen @ 2018-10-17 14:31 UTC (permalink / raw)
  To: brian m. carlson, Derrick Stolee, Git Mailing List, Jeff King,
	Ævar Arnfjörð Bjarmason

On Wed, Oct 17, 2018 at 12:44 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> > > >>   static uint8_t oid_version(void)
> > > >>   {
> > > >> -       return 1;
> > > >> +       switch (hash_algo_by_ptr(the_hash_algo)) {
> > > >> +               case GIT_HASH_SHA1:
> > > >> +                       return 1;
> > > >> +               case GIT_HASH_SHA256:
> > > >> +                       return 2;
> > > > Should we just increase this field to uint32_t and store format_id
> > > > instead? That will keep oid version unique in all data formats.
> > > Both the commit-graph and multi-pack-index store a single byte for the
> > > hash version, so that ship has sailed (without incrementing the full
> > > file version number in each format).
> >
> > And it's probably premature to add the oid version field when multiple
> > hash support has not been fully realized. Now we have different ways
> > of storing hash id and need separate mappings.
>
> Honestly, anything in the .git directory that is not the v3 pack indexes
> or the loose object file should be in exactly one hash algorithm.  We
> could simply just leave this value at 1 all the time and ignore the
> field, since we already know what algorithm it will use.

In this particular case, I agree, but not as a general principle. It's
nice to have independence for fsck-like tools. I don't know if we have
a tool that simply validates commit-graph file format (and not trying
to access any real object). But for such a tool, I guess we can just
pass the hash algorithm from command line. The user would have to
guess a bit.
-- 
Duy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 10/13] Add a base implementation of SHA-256 support
  2018-10-15  2:18 ` [PATCH v2 10/13] Add a base implementation of SHA-256 support brian m. carlson
  2018-10-15 14:59   ` Duy Nguyen
@ 2018-10-17 16:12   ` SZEDER Gábor
  2018-10-17 23:04     ` brian m. carlson
  1 sibling, 1 reply; 44+ messages in thread
From: SZEDER Gábor @ 2018-10-17 16:12 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

On Mon, Oct 15, 2018 at 02:18:57AM +0000, brian m. carlson wrote:
> diff --git a/sha256/block/sha256.c b/sha256/block/sha256.c
> new file mode 100644
> index 0000000000..18350c161a
> --- /dev/null
> +++ b/sha256/block/sha256.c
> @@ -0,0 +1,180 @@
> +#include "git-compat-util.h"
> +#include "./sha256.h"
> +
> +#define BLKSIZE blk_SHA256_BLKSIZE
> +
> +void blk_SHA256_Init(blk_SHA256_CTX *ctx)
> +{
> +	ctx->offset = 0;
> +	ctx->length = 0;
> +	ctx->state[0] = 0x6A09E667UL;
> +	ctx->state[1] = 0xBB67AE85UL;
> +	ctx->state[2] = 0x3C6EF372UL;
> +	ctx->state[3] = 0xA54FF53AUL;
> +	ctx->state[4] = 0x510E527FUL;
> +	ctx->state[5] = 0x9B05688CUL;
> +	ctx->state[6] = 0x1F83D9ABUL;
> +	ctx->state[7] = 0x5BE0CD19UL;
> +}
> +
> +static inline uint32_t ror(uint32_t x, unsigned n)
> +{
> +	return (x >> n) | (x << (32 - n));
> +}
> +
> +#define Ch(x,y,z)       (z ^ (x & (y ^ z)))
> +#define Maj(x,y,z)      (((x | y) & z) | (x & y))
> +#define S(x, n)         ror((x),(n))
> +#define R(x, n)         ((x)>>(n))
> +#define Sigma0(x)       (S(x, 2) ^ S(x, 13) ^ S(x, 22))
> +#define Sigma1(x)       (S(x, 6) ^ S(x, 11) ^ S(x, 25))
> +#define Gamma0(x)       (S(x, 7) ^ S(x, 18) ^ R(x, 3))
> +#define Gamma1(x)       (S(x, 17) ^ S(x, 19) ^ R(x, 10))

[...]

> +#define RND(a,b,c,d,e,f,g,h,i,ki)                    \
> +	t0 = h + Sigma1(e) + Ch(e, f, g) + ki + W[i];   \
> +	t1 = Sigma0(a) + Maj(a, b, c);                  \
> +	d += t0;                                        \
> +	h  = t0 + t1;
> +
> +	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],0,0x428a2f98);

[...]

> +#undef RND
> +
> +	for (i = 0; i < 8; i++) {
> +		ctx->state[i] = ctx->state[i] + S[i];
> +	}
> +}
> +
> +#define MIN(x, y) ((x) < (y) ? (x) : (y))

On macOS there is a MIN macro already defined in the system headers,
resulting in the following error:

      CC sha256/block/sha256.o
  sha256/block/sha256.c:133:9: error: 'MIN' macro redefined [-Werror,-Wmacro-redefined]
  #define MIN(x, y) ((x) < (y) ? (x) : (y))
          ^
  /usr/include/sys/param.h:215:9: note: previous definition is here
  #define MIN(a,b) (((a)<(b))?(a):(b))
          ^
  1 error generated.
  make: *** [sha256/block/sha256.o] Error 1

A simple "#undef MIN" solves this issue.  However, I wonder whether we
should #undef the other #define directives as well, just to be sure
(and perhaps overly cautious).

> +void blk_SHA256_Update(blk_SHA256_CTX *ctx, const void *data, size_t len)
> +{
> +	const unsigned char *in = data;
> +	size_t n;
> +	ctx->length += len;
> +	while (len > 0) {
> +		if (!ctx->offset && len >= BLKSIZE) {
> +			blk_SHA256_Transform(ctx, in);
> +			in += BLKSIZE;
> +			len -= BLKSIZE;
> +		} else {
> +			n = MIN(len, (BLKSIZE - ctx->offset));
> +			memcpy(ctx->buf + ctx->offset, in, n);
> +			ctx->offset += n;
> +			in += n;
> +			len -= n;
> +			if (ctx->offset == BLKSIZE) {
> +				blk_SHA256_Transform(ctx, ctx->buf);
> +				ctx->offset = 0;
> +			}
> +		}
> +	}
> +}
> +
> +void blk_SHA256_Final(unsigned char *digest, blk_SHA256_CTX *ctx)
> +{
> +	const unsigned trip = BLKSIZE - sizeof(ctx->length);
> +	int i;
> +
> +	ctx->length <<= 3;
> +	ctx->buf[ctx->offset++] = 0x80;
> +
> +	if (ctx->offset > trip) {
> +		memset(ctx->buf + ctx->offset, 0, BLKSIZE - ctx->offset);
> +		blk_SHA256_Transform(ctx, ctx->buf);
> +		ctx->offset = 0;
> +	}
> +
> +	memset(ctx->buf + ctx->offset, 0, BLKSIZE - ctx->offset - sizeof(ctx->length));
> +
> +	put_be64(ctx->buf + trip, ctx->length);

Some GCC versions (e.g. gcc-4.8 with -O2 -Wall -Werror) complain about
the above line:

      CC sha256/block/sha256.o
  sha256/block/sha256.c: In function ‘blk_SHA256_Final’:
  sha256/block/sha256.c:174:2: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
    put_be64(ctx->buf + trip, ctx->length);
    ^
  cc1: all warnings being treated as errors
  make: *** [sha256/block/sha256.o] Error 1

Something like this makes it compile:

  void *ptr = ctx->buf + trip;
  put_be64(ptr, ctx->length);

However, it's not immediately obvious to me why the compiler
complains, or why that intermediate void* variable makes any
difference, but now it's not the time to put on my language lawyer
hat.

Perhaps an old compiler bug?  Clang in general, newer GCC versions, or
gcc-4.8 with -Wall -Werror but without -O2 don't seem to be affected.


> +	blk_SHA256_Transform(ctx, ctx->buf);
> +
> +	/* copy output */
> +	for (i = 0; i < 8; i++, digest += sizeof(uint32_t))
> +		put_be32(digest, ctx->state[i]);
> +}

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256
  2018-10-17 12:21   ` Derrick Stolee
@ 2018-10-17 22:38     ` brian m. carlson
  0 siblings, 0 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-17 22:38 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Jeff King, Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 1378 bytes --]

On Wed, Oct 17, 2018 at 08:21:42AM -0400, Derrick Stolee wrote:
> On 10/14/2018 10:19 PM, brian m. carlson wrote:
> > Since the commit-graph code wants to serialize the hash algorithm into
> > the data store, specify a version number for each supported algorithm.
> > Note that we don't use the values of the constants themselves, as they
> > are internal and could change in the future.
> > 
> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> > ---
> >   commit-graph.c | 9 ++++++++-
> >   1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/commit-graph.c b/commit-graph.c
> > index 7a28fbb03f..e587c21bb6 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
> >   static uint8_t oid_version(void)
> >   {
> > -	return 1;
> > +	switch (hash_algo_by_ptr(the_hash_algo)) {
> hash_algo_by_ptr is specified as an inline function in hash.h, so this leads
> to a linking error in jch and pu right now.
> 
> I think the fix is simply to add '#include "hash.h"' to the list of
> includes.

hash.h is already included by cache.h, so it should be present.  We
probably need to make it static.  I'll make that change; can you see if
it fixes the problem for you?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 01/13] sha1-file: rename algorithm to "sha1"
  2018-10-16 15:17   ` Duy Nguyen
@ 2018-10-17 22:53     ` brian m. carlson
  0 siblings, 0 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-17 22:53 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 1783 bytes --]

On Tue, Oct 16, 2018 at 05:17:53PM +0200, Duy Nguyen wrote:
> On Mon, Oct 15, 2018 at 4:21 AM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> >
> > The transition plan anticipates us using a syntax such as "^{sha1}" for
> > disambiguation.  Since this is a syntax some people will be typing a
> > lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
> > the dash doesn't create any ambiguity, but it does make it shorter and
> 
> "but" or "and"? I think both clauses are on the same side ... or did
> you mean omitting the dash does create ambiguity?

I think "but" is correct here.  This is a standard "This doesn't make it
worse, but it does make it better" phrase.  The "but" creates a contrast
between what it doesn't do and what it does.

I'm trying to come up with a different way to say this that may be
easier to understand, but I'm failing to do so in a natural-sounding
way.  Does the following sound better?

  Omitting the dash doesn't introduce any ambiguity; however, it does
  make the text shorter and easier to type, especially for touch
  typists.

> > easier to type, especially for touch typists.  In addition, the
> > transition plan already uses "sha1" in this context.
> >
> > Rename the name of SHA-1 implementation to "sha1".
> >
> > Note that this change creates no backwards compatibility concerns, since
> > we haven't yet used this field in any serialized data formats.
> 
> But we're not going to use this _string_ in any data format either
> because we'll stick to format_id field anyway, right?

We'll use it in extensions.objectFormat and other config files.  But in
anything binary, we'll use format_id.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 10/13] Add a base implementation of SHA-256 support
  2018-10-17 16:12   ` SZEDER Gábor
@ 2018-10-17 23:04     ` brian m. carlson
  0 siblings, 0 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-17 23:04 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 2085 bytes --]

On Wed, Oct 17, 2018 at 06:12:41PM +0200, SZEDER Gábor wrote:
> On macOS there is a MIN macro already defined in the system headers,
> resulting in the following error:
> 
>       CC sha256/block/sha256.o
>   sha256/block/sha256.c:133:9: error: 'MIN' macro redefined [-Werror,-Wmacro-redefined]
>   #define MIN(x, y) ((x) < (y) ? (x) : (y))
>           ^
>   /usr/include/sys/param.h:215:9: note: previous definition is here
>   #define MIN(a,b) (((a)<(b))?(a):(b))
>           ^
>   1 error generated.
>   make: *** [sha256/block/sha256.o] Error 1
> 
> A simple "#undef MIN" solves this issue.  However, I wonder whether we
> should #undef the other #define directives as well, just to be sure
> (and perhaps overly cautious).

I'll undefine the macros at the top.

For MIN, I'm going to go with the simple approach and pull pieces from
the block-sha1 implementation, which doesn't require this code path...

> Some GCC versions (e.g. gcc-4.8 with -O2 -Wall -Werror) complain about
> the above line:
> 
>       CC sha256/block/sha256.o
>   sha256/block/sha256.c: In function ‘blk_SHA256_Final’:
>   sha256/block/sha256.c:174:2: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
>     put_be64(ctx->buf + trip, ctx->length);
>     ^
>   cc1: all warnings being treated as errors
>   make: *** [sha256/block/sha256.o] Error 1
> 
> Something like this makes it compile:
> 
>   void *ptr = ctx->buf + trip;
>   put_be64(ptr, ctx->length);
> 
> However, it's not immediately obvious to me why the compiler
> complains, or why that intermediate void* variable makes any
> difference, but now it's not the time to put on my language lawyer
> hat.
> 
> Perhaps an old compiler bug?  Clang in general, newer GCC versions, or
> gcc-4.8 with -Wall -Werror but without -O2 don't seem to be affected.

...or this code path.  Presumably GCC 4.8 and macOS are happy with the
code that we already have in block-sha1.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] hex: introduce functions to print arbitrary hashes
  2018-10-16  1:54   ` Junio C Hamano
@ 2018-10-17 23:49     ` brian m. carlson
  0 siblings, 0 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-17 23:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Derrick Stolee, Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 2207 bytes --]

On Tue, Oct 16, 2018 at 10:54:23AM +0900, Junio C Hamano wrote:
> Even though in hex.c I see mixture of *_algo and *_algop helper
> functions, I see only "algo" variants above.  Is it our official
> stance to use primarily the integer index into the algo array when
> specifying the hash, and when a caller into 'multi-hash' API happens
> to have other things, it should use functions in 2/13 to convert it
> to the canonical "int algo" beforehand?

That was my intention, yes.

> I am not saying it is bad or good to choose the index into the algo
> array as the primary way to specify the algorithm.  I think it is a
> good idea to pick one and stick to it, and I wanted to make sure
> that the choice we made here is clearly communicated to any future
> developer who read this code.

Yeah, that was my feeling as well.  I wanted to pick something fixed and
stick to it.

> Having said the above, seeing the use of hash_algo_by_ptr() here
> makes me suspect if it makes more sense to use the algop as the
> primary way to specify which algorithm the caller wants to use.
> IOW, making the set of helpers in 02/13 to allow quering by name,
> format-id, or the integer index and have them all return a pointer
> to "const struct git_hash_algo".  Two immediate downsides I can see
> is that it exposes the actual structure to the callers (but is it
> really a problem?  Outside callers learn hash sizes etc. by accessing
> its fields anyway without treating the algo struct as opaque.), and
> passing an 8-byte pointer may be more costly than passing a small
> integer index that ranges between 0 and 1 at most (assuming that
> we'd only use SHA-1 and "the current NewHash" in the code).

I thought about this.  The one downside to this is that we can't use
those values anywhere we need an integer constant expression, like a
switch.  I suppose that just means we need hash_algo_by_ptr in those
cases, and not everywhere else, which would make the code cleaner.

Let me reroll with that change, and we'll see if we like it better.  If
we don't, I can pull the old version out of history.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256
  2018-10-17 14:31           ` Duy Nguyen
@ 2018-10-18  0:06             ` brian m. carlson
  2018-10-18 13:03               ` Derrick Stolee
  0 siblings, 1 reply; 44+ messages in thread
From: brian m. carlson @ 2018-10-18  0:06 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Derrick Stolee, Git Mailing List, Jeff King,
	Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 1232 bytes --]

On Wed, Oct 17, 2018 at 04:31:19PM +0200, Duy Nguyen wrote:
> On Wed, Oct 17, 2018 at 12:44 AM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> > Honestly, anything in the .git directory that is not the v3 pack indexes
> > or the loose object file should be in exactly one hash algorithm.  We
> > could simply just leave this value at 1 all the time and ignore the
> > field, since we already know what algorithm it will use.
> 
> In this particular case, I agree, but not as a general principle. It's
> nice to have independence for fsck-like tools. I don't know if we have
> a tool that simply validates commit-graph file format (and not trying
> to access any real object). But for such a tool, I guess we can just
> pass the hash algorithm from command line. The user would have to
> guess a bit.

I'm going to drop this patch for now.  I'll send a follow-up series
later which bumps the format version for this and the multi-pack index
and serializes them with the four-byte value.  I probably should have
caught this earlier, but unfortunately I don't always have the time to
look at every series that hits the list.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256
  2018-10-18  0:06             ` brian m. carlson
@ 2018-10-18 13:03               ` Derrick Stolee
  2018-10-19 22:21                 ` brian m. carlson
  0 siblings, 1 reply; 44+ messages in thread
From: Derrick Stolee @ 2018-10-18 13:03 UTC (permalink / raw)
  To: brian m. carlson, Duy Nguyen, Git Mailing List, Jeff King,
	Ævar Arnfjörð Bjarmason

On 10/17/2018 8:06 PM, brian m. carlson wrote:
> On Wed, Oct 17, 2018 at 04:31:19PM +0200, Duy Nguyen wrote:
>> On Wed, Oct 17, 2018 at 12:44 AM brian m. carlson
>> <sandals@crustytoothpaste.net> wrote:
>>> Honestly, anything in the .git directory that is not the v3 pack indexes
>>> or the loose object file should be in exactly one hash algorithm.  We
>>> could simply just leave this value at 1 all the time and ignore the
>>> field, since we already know what algorithm it will use.
>> In this particular case, I agree, but not as a general principle. It's
>> nice to have independence for fsck-like tools. I don't know if we have
>> a tool that simply validates commit-graph file format (and not trying
>> to access any real object). But for such a tool, I guess we can just
>> pass the hash algorithm from command line. The user would have to
>> guess a bit.
> I'm going to drop this patch for now.  I'll send a follow-up series
> later which bumps the format version for this and the multi-pack index
> and serializes them with the four-byte value.  I probably should have
> caught this earlier, but unfortunately I don't always have the time to
> look at every series that hits the list.
We should coordinate before incrementing the version number. I was 
working on making the file formats incremental (think split-index) but 
couldn't come up with a way to do it without incrementing the file 
format. It would be best to combine these features into v2 of each format.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256
  2018-10-18 13:03               ` Derrick Stolee
@ 2018-10-19 22:21                 ` brian m. carlson
  0 siblings, 0 replies; 44+ messages in thread
From: brian m. carlson @ 2018-10-19 22:21 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Duy Nguyen, Git Mailing List, Jeff King,
	Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 767 bytes --]

On Thu, Oct 18, 2018 at 09:03:22AM -0400, Derrick Stolee wrote:
> We should coordinate before incrementing the version number. I was working
> on making the file formats incremental (think split-index) but couldn't come
> up with a way to do it without incrementing the file format. It would be
> best to combine these features into v2 of each format.

For the moment, I'm happy to leave things as they are, and I'll
interpret version 1 of the hash version field as whatever hash is in use
elsewhere in .git.  If you're going to bump the version in the future,
feel free to CC me on the patches, and we'll make sure that we serialize
the format_version field in the file then.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2018-10-19 22:21 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-15  2:18 [PATCH v2 00/13] Base SHA-256 implementation brian m. carlson
2018-10-15  2:18 ` [PATCH v2 01/13] sha1-file: rename algorithm to "sha1" brian m. carlson
2018-10-16 15:17   ` Duy Nguyen
2018-10-17 22:53     ` brian m. carlson
2018-10-15  2:18 ` [PATCH v2 02/13] sha1-file: provide functions to look up hash algorithms brian m. carlson
2018-10-17 13:32   ` SZEDER Gábor
2018-10-15  2:18 ` [PATCH v2 03/13] hex: introduce functions to print arbitrary hashes brian m. carlson
2018-10-16  1:54   ` Junio C Hamano
2018-10-17 23:49     ` brian m. carlson
2018-10-15  2:18 ` [PATCH v2 04/13] cache: make hashcmp and hasheq work with larger hashes brian m. carlson
2018-10-16 15:44   ` Duy Nguyen
2018-10-15  2:18 ` [PATCH v2 05/13] t: add basic tests for our SHA-1 implementation brian m. carlson
2018-10-15  2:18 ` [PATCH v2 06/13] t: make the sha1 test-tool helper generic brian m. carlson
2018-10-15  2:18 ` [PATCH v2 07/13] sha1-file: add a constant for hash block size brian m. carlson
2018-10-15  2:18 ` [PATCH v2 08/13] t/helper: add a test helper to compute hash speed brian m. carlson
2018-10-15  2:18 ` [PATCH v2 09/13] commit-graph: convert to using the_hash_algo brian m. carlson
2018-10-15 15:10   ` Derrick Stolee
2018-10-15  2:18 ` [PATCH v2 10/13] Add a base implementation of SHA-256 support brian m. carlson
2018-10-15 14:59   ` Duy Nguyen
2018-10-15 23:30     ` brian m. carlson
2018-10-16 14:59       ` Duy Nguyen
2018-10-17 16:12   ` SZEDER Gábor
2018-10-17 23:04     ` brian m. carlson
2018-10-15  2:18 ` [PATCH v2 11/13] sha256: add an SHA-256 implementation using libgcrypt brian m. carlson
2018-10-15  2:18 ` [PATCH v2 12/13] hash: add an SHA-256 implementation using OpenSSL brian m. carlson
2018-10-16 15:36   ` Duy Nguyen
2018-10-15  2:19 ` [PATCH v2 13/13] commit-graph: specify OID version for SHA-256 brian m. carlson
2018-10-15 15:11   ` Derrick Stolee
2018-10-16  2:00   ` Junio C Hamano
2018-10-16 22:39     ` brian m. carlson
2018-10-16 15:35   ` Duy Nguyen
2018-10-16 16:01     ` Derrick Stolee
2018-10-16 16:09       ` Duy Nguyen
2018-10-16 22:44         ` brian m. carlson
2018-10-17 14:31           ` Duy Nguyen
2018-10-18  0:06             ` brian m. carlson
2018-10-18 13:03               ` Derrick Stolee
2018-10-19 22:21                 ` brian m. carlson
2018-10-17 12:21   ` Derrick Stolee
2018-10-17 22:38     ` brian m. carlson
2018-10-16  2:00 ` [PATCH v2 00/13] Base SHA-256 implementation Junio C Hamano
2018-10-16  4:01 ` Junio C Hamano
2018-10-16 22:45   ` brian m. carlson
2018-10-16 15:39 ` Duy Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).