All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/10] Use a structure for object IDs.
@ 2015-03-07 23:23 brian m. carlson
  2015-03-07 23:23 ` [PATCH v2 01/10] Define " brian m. carlson
                   ` (12 more replies)
  0 siblings, 13 replies; 37+ messages in thread
From: brian m. carlson @ 2015-03-07 23:23 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Andreas Schwab

This is a patch series to convert some of the relevant uses of unsigned
char [20] to struct object_id.

The goal of this series to improve type-checking in the codebase and to
make it easier to move to a different hash function if the project
decides to do that.  This series does not convert all of the codebase,
but only parts.  I've dropped some of the patches from earlier (which no
longer apply) and added others.

Certain parts of the code have to be converted before others to keep the
patch sizes small, maintainable, and bisectable, so functions and
structures that are used across the codebase (e.g. struct object) will
be converted later.  Conversion has been done in a somewhat haphazard
manner by converting modules with leaf functions and less commonly used
structs first.

Since part of the goal is to ease a move to a different hash function if
the project decides to do that, I've adopted Michael Haggerty's
suggestion of using variables named "oid", naming the structure member
"sha1", and using GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ as the constants.

I've been holding on to this series for a long time in hopes of
converting more of the code before submitting, but realized that this
deprived others of the ability to use the new structure and required me
to rebase extremely frequently.

brian m. carlson (10):
  Define a structure for object IDs.
  Define utility functions for object IDs.
  bisect.c: convert leaf functions to use struct object_id
  archive.c: convert to use struct object_id
  zip: use GIT_SHA1_HEXSZ for trailers
  bulk-checkin.c: convert to use struct object_id
  diff: convert struct combine_diff_path to object_id
  commit: convert parts to struct object_id
  patch-id: convert to use struct object_id
  apply: convert threeway_stage to object_id

 archive-zip.c      |  4 ++--
 archive.c          | 18 +++++++++---------
 bisect.c           | 40 +++++++++++++++++++-------------------
 builtin/apply.c    | 14 +++++++-------
 builtin/patch-id.c | 34 ++++++++++++++++-----------------
 bulk-checkin.c     | 12 ++++++------
 cache.h            | 37 ++++++++++++++++++++++++++++++++----
 combine-diff.c     | 56 +++++++++++++++++++++++++++---------------------------
 commit.c           | 32 +++++++++++++++----------------
 commit.h           |  4 ++--
 diff-lib.c         | 10 +++++-----
 diff.h             |  5 +++--
 hex.c              | 16 +++++++++++++---
 log-tree.c         |  2 +-
 object.h           |  8 ++++++++
 send-pack.c        |  2 +-
 shallow.c          |  8 ++++----
 tree-diff.c        | 10 +++++-----
 upload-pack.c      |  2 +-
 19 files changed, 181 insertions(+), 133 deletions(-)

-- 
2.2.1.209.g41e5f3a

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
@ 2015-03-07 23:23 ` brian m. carlson
       [not found]   ` <CEA07500-9F47-4B24-AD5D-1423A601A4DD@gmail.com>
  2015-03-07 23:23 ` [PATCH v2 02/10] Define utility functions " brian m. carlson
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: brian m. carlson @ 2015-03-07 23:23 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Andreas Schwab

Many places throughout the code use "unsigned char [20]" to store object IDs
(SHA-1 values).  This leads to lots of hardcoded numbers throughout the
codebase.  It also leads to confusion about the purposes of a buffer.

Introduce a structure for object IDs.  This allows us to obtain the benefits
of compile-time checking for misuse.  The structure is expected to remain
the same size and have the same alignment requirements on all known
platforms, compared to the array of unsigned char, although this is not
required for correctness.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 object.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/object.h b/object.h
index 6416247..f99b502 100644
--- a/object.h
+++ b/object.h
@@ -1,6 +1,14 @@
 #ifndef OBJECT_H
 #define OBJECT_H
 
+/* The length in bytes and in hex digits of an object name (SHA-1 value). */
+#define GIT_SHA1_RAWSZ 20
+#define GIT_SHA1_HEXSZ 40
+
+struct object_id {
+	unsigned char sha1[GIT_SHA1_RAWSZ];
+};
+
 struct object_list {
 	struct object *item;
 	struct object_list *next;
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 02/10] Define utility functions for object IDs.
  2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
  2015-03-07 23:23 ` [PATCH v2 01/10] Define " brian m. carlson
@ 2015-03-07 23:23 ` brian m. carlson
  2015-03-08  9:57   ` Duy Nguyen
  2015-03-11 12:44   ` Michael Haggerty
  2015-03-07 23:23 ` [PATCH v2 03/10] bisect.c: convert leaf functions to use struct object_id brian m. carlson
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 37+ messages in thread
From: brian m. carlson @ 2015-03-07 23:23 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Andreas Schwab

There are several utility functions (hashcmp and friends) that are used
for comparing object IDs (SHA-1 values).  Using these functions, which
take pointers to unsigned char, with struct object_id requires tiresome
access to the sha1 member, which bloats code and violates the desired
encapsulation.  Provide wrappers around these functions for struct
object_id for neater, more maintainable code.  Use the new constants to
avoid the hard-coded 20s and 40s throughout the original functions.

These functions simply call the underlying pointer-to-unsigned-char
versions to ensure that any performance improvements will be passed
through to the new functions.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
I'm not very excited about having to put the #include in the middle of
cache.h.  The alternative, of course, is to move enum object_type up,
which I can do if necessary.  Another alternative is to simply move the
struct object_id definitions to cache.h instead of object.h, which might
be cleaner.

I'm interested in hearing opinions about the best way to go forward.

 cache.h | 37 +++++++++++++++++++++++++++++++++----
 hex.c   | 16 +++++++++++++---
 2 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/cache.h b/cache.h
index 761c570..f9addcc 100644
--- a/cache.h
+++ b/cache.h
@@ -368,6 +368,11 @@ static inline enum object_type object_type(unsigned int mode)
 		OBJ_BLOB;
 }
 
+/* This must go before the oid* functions, but after the declaration of enum
+ * object_type.
+ */
+#include "object.h"
+
 /* Double-check local_repo_env below if you add to this list. */
 #define GIT_DIR_ENVIRONMENT "GIT_DIR"
 #define GIT_NAMESPACE_ENVIRONMENT "GIT_NAMESPACE"
@@ -710,13 +715,13 @@ extern char *sha1_pack_name(const unsigned char *sha1);
 extern char *sha1_pack_index_name(const unsigned char *sha1);
 
 extern const char *find_unique_abbrev(const unsigned char *sha1, int);
-extern const unsigned char null_sha1[20];
+extern const unsigned char null_sha1[GIT_SHA1_RAWSZ];
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
 	int i;
 
-	for (i = 0; i < 20; i++, sha1++, sha2++) {
+	for (i = 0; i < GIT_SHA1_RAWSZ; i++, sha1++, sha2++) {
 		if (*sha1 != *sha2)
 			return *sha1 - *sha2;
 	}
@@ -724,20 +729,42 @@ static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 	return 0;
 }
 
+static inline int oidcmp(const struct object_id *o1, const struct object_id *o2)
+{
+	return hashcmp(o1->sha1, o2->sha1);
+}
+
 static inline int is_null_sha1(const unsigned char *sha1)
 {
 	return !hashcmp(sha1, null_sha1);
 }
 
+static inline int is_null_oid(const struct object_id *oid)
+{
+	return !hashcmp(oid->sha1, null_sha1);
+}
+
 static inline void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src)
 {
-	memcpy(sha_dst, sha_src, 20);
+	memcpy(sha_dst, sha_src, GIT_SHA1_RAWSZ);
 }
+
+static inline void oidcpy(struct object_id *dst, const struct object_id *src)
+{
+	hashcpy(dst->sha1, src->sha1);
+}
+
 static inline void hashclr(unsigned char *hash)
 {
-	memset(hash, 0, 20);
+	memset(hash, 0, GIT_SHA1_RAWSZ);
 }
 
+static inline void oidclr(struct object_id *oid)
+{
+	hashclr(oid->sha1);
+}
+
+
 #define EMPTY_TREE_SHA1_HEX \
 	"4b825dc642cb6eb9a060e54bf8d69288fbee4904"
 #define EMPTY_TREE_SHA1_BIN_LITERAL \
@@ -944,8 +971,10 @@ extern int for_each_abbrev(const char *prefix, each_abbrev_fn, void *);
  * null-terminated string.
  */
 extern int get_sha1_hex(const char *hex, unsigned char *sha1);
+extern int get_oid_hex(const char *hex, struct object_id *sha1);
 
 extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
+extern char *oid_to_hex(const struct object_id *oid);	/* same static buffer as sha1_to_hex */
 extern int read_ref_full(const char *refname, int resolve_flags,
 			 unsigned char *sha1, int *flags);
 extern int read_ref(const char *refname, unsigned char *sha1);
diff --git a/hex.c b/hex.c
index cfd9d72..7e25a88 100644
--- a/hex.c
+++ b/hex.c
@@ -38,7 +38,7 @@ const signed char hexval_table[256] = {
 int get_sha1_hex(const char *hex, unsigned char *sha1)
 {
 	int i;
-	for (i = 0; i < 20; i++) {
+	for (i = 0; i < GIT_SHA1_RAWSZ; i++) {
 		unsigned int val;
 		/*
 		 * hex[1]=='\0' is caught when val is checked below,
@@ -56,15 +56,20 @@ int get_sha1_hex(const char *hex, unsigned char *sha1)
 	return 0;
 }
 
+int get_oid_hex(const char *hex, struct object_id *oid)
+{
+	return get_sha1_hex(hex, oid->sha1);
+}
+
 char *sha1_to_hex(const unsigned char *sha1)
 {
 	static int bufno;
-	static char hexbuffer[4][41];
+	static char hexbuffer[4][GIT_SHA1_HEXSZ + 1];
 	static const char hex[] = "0123456789abcdef";
 	char *buffer = hexbuffer[3 & ++bufno], *buf = buffer;
 	int i;
 
-	for (i = 0; i < 20; i++) {
+	for (i = 0; i < GIT_SHA1_RAWSZ; i++) {
 		unsigned int val = *sha1++;
 		*buf++ = hex[val >> 4];
 		*buf++ = hex[val & 0xf];
@@ -73,3 +78,8 @@ char *sha1_to_hex(const unsigned char *sha1)
 
 	return buffer;
 }
+
+char *oid_to_hex(const struct object_id *oid)
+{
+	return sha1_to_hex(oid->sha1);
+}
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 03/10] bisect.c: convert leaf functions to use struct object_id
  2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
  2015-03-07 23:23 ` [PATCH v2 01/10] Define " brian m. carlson
  2015-03-07 23:23 ` [PATCH v2 02/10] Define utility functions " brian m. carlson
@ 2015-03-07 23:23 ` brian m. carlson
  2015-03-07 23:23 ` [PATCH v2 04/10] archive.c: convert " brian m. carlson
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: brian m. carlson @ 2015-03-07 23:23 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Andreas Schwab

Convert some constants to GIT_SHA1_HEXSZ.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 bisect.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/bisect.c b/bisect.c
index 8c6d843..2692d54 100644
--- a/bisect.c
+++ b/bisect.c
@@ -15,7 +15,7 @@
 static struct sha1_array good_revs;
 static struct sha1_array skipped_revs;
 
-static unsigned char *current_bad_sha1;
+static struct object_id *current_bad_oid;
 
 static const char *argv_checkout[] = {"checkout", "-q", NULL, "--", NULL};
 static const char *argv_show_branch[] = {"show-branch", NULL, NULL};
@@ -404,8 +404,8 @@ static int register_ref(const char *refname, const unsigned char *sha1,
 			int flags, void *cb_data)
 {
 	if (!strcmp(refname, "bad")) {
-		current_bad_sha1 = xmalloc(20);
-		hashcpy(current_bad_sha1, sha1);
+		current_bad_oid = xmalloc(sizeof(*current_bad_oid));
+		hashcpy(current_bad_oid->sha1, sha1);
 	} else if (starts_with(refname, "good-")) {
 		sha1_array_append(&good_revs, sha1);
 	} else if (starts_with(refname, "skip-")) {
@@ -564,7 +564,7 @@ static struct commit_list *skip_away(struct commit_list *list, int count)
 
 	for (i = 0; cur; cur = cur->next, i++) {
 		if (i == index) {
-			if (hashcmp(cur->item->object.sha1, current_bad_sha1))
+			if (hashcmp(cur->item->object.sha1, current_bad_oid->sha1))
 				return cur;
 			if (previous)
 				return previous;
@@ -607,7 +607,7 @@ static void bisect_rev_setup(struct rev_info *revs, const char *prefix,
 
 	/* rev_argv.argv[0] will be ignored by setup_revisions */
 	argv_array_push(&rev_argv, "bisect_rev_setup");
-	argv_array_pushf(&rev_argv, bad_format, sha1_to_hex(current_bad_sha1));
+	argv_array_pushf(&rev_argv, bad_format, sha1_to_hex(current_bad_oid->sha1));
 	for (i = 0; i < good_revs.nr; i++)
 		argv_array_pushf(&rev_argv, good_format,
 				 sha1_to_hex(good_revs.sha1[i]));
@@ -628,7 +628,7 @@ static void bisect_common(struct rev_info *revs)
 }
 
 static void exit_if_skipped_commits(struct commit_list *tried,
-				    const unsigned char *bad)
+				    const struct object_id *bad)
 {
 	if (!tried)
 		return;
@@ -637,12 +637,12 @@ static void exit_if_skipped_commits(struct commit_list *tried,
 	       "The first bad commit could be any of:\n");
 	print_commit_list(tried, "%s\n", "%s\n");
 	if (bad)
-		printf("%s\n", sha1_to_hex(bad));
+		printf("%s\n", oid_to_hex(bad));
 	printf("We cannot bisect more!\n");
 	exit(2);
 }
 
-static int is_expected_rev(const unsigned char *sha1)
+static int is_expected_rev(const struct object_id *oid)
 {
 	const char *filename = git_path("BISECT_EXPECTED_REV");
 	struct stat st;
@@ -658,7 +658,7 @@ static int is_expected_rev(const unsigned char *sha1)
 		return 0;
 
 	if (strbuf_getline(&str, fp, '\n') != EOF)
-		res = !strcmp(str.buf, sha1_to_hex(sha1));
+		res = !strcmp(str.buf, oid_to_hex(oid));
 
 	strbuf_release(&str);
 	fclose(fp);
@@ -719,7 +719,7 @@ static struct commit **get_bad_and_good_commits(int *rev_nr)
 	struct commit **rev = xmalloc(len * sizeof(*rev));
 	int i, n = 0;
 
-	rev[n++] = get_commit_reference(current_bad_sha1);
+	rev[n++] = get_commit_reference(current_bad_oid->sha1);
 	for (i = 0; i < good_revs.nr; i++)
 		rev[n++] = get_commit_reference(good_revs.sha1[i]);
 	*rev_nr = n;
@@ -729,8 +729,8 @@ static struct commit **get_bad_and_good_commits(int *rev_nr)
 
 static void handle_bad_merge_base(void)
 {
-	if (is_expected_rev(current_bad_sha1)) {
-		char *bad_hex = sha1_to_hex(current_bad_sha1);
+	if (is_expected_rev(current_bad_oid)) {
+		char *bad_hex = oid_to_hex(current_bad_oid);
 		char *good_hex = join_sha1_array_hex(&good_revs, ' ');
 
 		fprintf(stderr, "The merge base %s is bad.\n"
@@ -750,7 +750,7 @@ static void handle_bad_merge_base(void)
 static void handle_skipped_merge_base(const unsigned char *mb)
 {
 	char *mb_hex = sha1_to_hex(mb);
-	char *bad_hex = sha1_to_hex(current_bad_sha1);
+	char *bad_hex = sha1_to_hex(current_bad_oid->sha1);
 	char *good_hex = join_sha1_array_hex(&good_revs, ' ');
 
 	warning("the merge base between %s and [%s] "
@@ -781,7 +781,7 @@ static void check_merge_bases(int no_checkout)
 
 	for (; result; result = result->next) {
 		const unsigned char *mb = result->item->object.sha1;
-		if (!hashcmp(mb, current_bad_sha1)) {
+		if (!hashcmp(mb, current_bad_oid->sha1)) {
 			handle_bad_merge_base();
 		} else if (0 <= sha1_array_lookup(&good_revs, mb)) {
 			continue;
@@ -838,7 +838,7 @@ static void check_good_are_ancestors_of_bad(const char *prefix, int no_checkout)
 	struct stat st;
 	int fd;
 
-	if (!current_bad_sha1)
+	if (!current_bad_oid)
 		die("a bad revision is needed");
 
 	/* Check if file BISECT_ANCESTORS_OK exists. */
@@ -903,7 +903,7 @@ int bisect_next_all(const char *prefix, int no_checkout)
 	struct commit_list *tried;
 	int reaches = 0, all = 0, nr, steps;
 	const unsigned char *bisect_rev;
-	char bisect_rev_hex[41];
+	char bisect_rev_hex[GIT_SHA1_HEXSZ + 1];
 
 	if (read_bisect_refs())
 		die("reading bisect refs failed");
@@ -927,7 +927,7 @@ int bisect_next_all(const char *prefix, int no_checkout)
 		exit_if_skipped_commits(tried, NULL);
 
 		printf("%s was both good and bad\n",
-		       sha1_to_hex(current_bad_sha1));
+		       oid_to_hex(current_bad_oid));
 		exit(1);
 	}
 
@@ -938,10 +938,10 @@ int bisect_next_all(const char *prefix, int no_checkout)
 	}
 
 	bisect_rev = revs.commits->item->object.sha1;
-	memcpy(bisect_rev_hex, sha1_to_hex(bisect_rev), 41);
+	memcpy(bisect_rev_hex, sha1_to_hex(bisect_rev), GIT_SHA1_HEXSZ + 1);
 
-	if (!hashcmp(bisect_rev, current_bad_sha1)) {
-		exit_if_skipped_commits(tried, current_bad_sha1);
+	if (!hashcmp(bisect_rev, current_bad_oid->sha1)) {
+		exit_if_skipped_commits(tried, current_bad_oid);
 		printf("%s is the first bad commit\n", bisect_rev_hex);
 		show_diff_tree(prefix, revs.commits->item);
 		/* This means the bisection process succeeded. */
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 04/10] archive.c: convert to use struct object_id
  2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (2 preceding siblings ...)
  2015-03-07 23:23 ` [PATCH v2 03/10] bisect.c: convert leaf functions to use struct object_id brian m. carlson
@ 2015-03-07 23:23 ` brian m. carlson
  2015-03-11 14:20   ` Michael Haggerty
  2015-03-07 23:24 ` [PATCH v2 05/10] zip: use GIT_SHA1_HEXSZ for trailers brian m. carlson
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: brian m. carlson @ 2015-03-07 23:23 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Andreas Schwab

Convert a hard-coded 20 as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 archive.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/archive.c b/archive.c
index 96057ed..46d9025 100644
--- a/archive.c
+++ b/archive.c
@@ -101,7 +101,7 @@ static void setup_archive_check(struct git_attr_check *check)
 
 struct directory {
 	struct directory *up;
-	unsigned char sha1[20];
+	unsigned char sha1[GIT_SHA1_RAWSZ];
 	int baselen, len;
 	unsigned mode;
 	int stage;
@@ -354,7 +354,7 @@ static void parse_treeish_arg(const char **argv,
 	time_t archive_time;
 	struct tree *tree;
 	const struct commit *commit;
-	unsigned char sha1[20];
+	struct object_id oid;
 
 	/* Remotes are only allowed to fetch actual refs */
 	if (remote && !remote_allow_unreachable) {
@@ -362,15 +362,15 @@ static void parse_treeish_arg(const char **argv,
 		const char *colon = strchrnul(name, ':');
 		int refnamelen = colon - name;
 
-		if (!dwim_ref(name, refnamelen, sha1, &ref))
+		if (!dwim_ref(name, refnamelen, oid.sha1, &ref))
 			die("no such ref: %.*s", refnamelen, name);
 		free(ref);
 	}
 
-	if (get_sha1(name, sha1))
+	if (get_sha1(name, oid.sha1))
 		die("Not a valid object name");
 
-	commit = lookup_commit_reference_gently(sha1, 1);
+	commit = lookup_commit_reference_gently(oid.sha1, 1);
 	if (commit) {
 		commit_sha1 = commit->object.sha1;
 		archive_time = commit->date;
@@ -379,21 +379,21 @@ static void parse_treeish_arg(const char **argv,
 		archive_time = time(NULL);
 	}
 
-	tree = parse_tree_indirect(sha1);
+	tree = parse_tree_indirect(oid.sha1);
 	if (tree == NULL)
 		die("not a tree object");
 
 	if (prefix) {
-		unsigned char tree_sha1[20];
+		struct object_id tree_oid;
 		unsigned int mode;
 		int err;
 
 		err = get_tree_entry(tree->object.sha1, prefix,
-				     tree_sha1, &mode);
+				     tree_oid.sha1, &mode);
 		if (err || !S_ISDIR(mode))
 			die("current working directory is untracked");
 
-		tree = parse_tree_indirect(tree_sha1);
+		tree = parse_tree_indirect(tree_oid.sha1);
 	}
 	ar_args->tree = tree;
 	ar_args->commit_sha1 = commit_sha1;
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 05/10] zip: use GIT_SHA1_HEXSZ for trailers
  2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (3 preceding siblings ...)
  2015-03-07 23:23 ` [PATCH v2 04/10] archive.c: convert " brian m. carlson
@ 2015-03-07 23:24 ` brian m. carlson
  2015-03-07 23:24 ` [PATCH v2 06/10] bulk-checkin.c: convert to use struct object_id brian m. carlson
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: brian m. carlson @ 2015-03-07 23:24 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Andreas Schwab

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 archive-zip.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/archive-zip.c b/archive-zip.c
index 4bde019..b669e50 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -427,12 +427,12 @@ static void write_zip_trailer(const unsigned char *sha1)
 	copy_le16(trailer.entries, zip_dir_entries);
 	copy_le32(trailer.size, zip_dir_offset);
 	copy_le32(trailer.offset, zip_offset);
-	copy_le16(trailer.comment_length, sha1 ? 40 : 0);
+	copy_le16(trailer.comment_length, sha1 ? GIT_SHA1_HEXSZ : 0);
 
 	write_or_die(1, zip_dir, zip_dir_offset);
 	write_or_die(1, &trailer, ZIP_DIR_TRAILER_SIZE);
 	if (sha1)
-		write_or_die(1, sha1_to_hex(sha1), 40);
+		write_or_die(1, sha1_to_hex(sha1), GIT_SHA1_HEXSZ);
 }
 
 static void dos_time(time_t *time, int *dos_date, int *dos_time)
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 06/10] bulk-checkin.c: convert to use struct object_id
  2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (4 preceding siblings ...)
  2015-03-07 23:24 ` [PATCH v2 05/10] zip: use GIT_SHA1_HEXSZ for trailers brian m. carlson
@ 2015-03-07 23:24 ` brian m. carlson
  2015-03-07 23:24 ` [PATCH v2 07/10] diff: convert struct combine_diff_path to object_id brian m. carlson
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: brian m. carlson @ 2015-03-07 23:24 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Andreas Schwab

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 bulk-checkin.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 0c4b8a7..e50f60e 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -24,7 +24,7 @@ static struct bulk_checkin_state {
 
 static void finish_bulk_checkin(struct bulk_checkin_state *state)
 {
-	unsigned char sha1[20];
+	struct object_id oid;
 	struct strbuf packname = STRBUF_INIT;
 	int i;
 
@@ -36,11 +36,11 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
 		unlink(state->pack_tmp_name);
 		goto clear_exit;
 	} else if (state->nr_written == 1) {
-		sha1close(state->f, sha1, CSUM_FSYNC);
+		sha1close(state->f, oid.sha1, CSUM_FSYNC);
 	} else {
-		int fd = sha1close(state->f, sha1, 0);
-		fixup_pack_header_footer(fd, sha1, state->pack_tmp_name,
-					 state->nr_written, sha1,
+		int fd = sha1close(state->f, oid.sha1, 0);
+		fixup_pack_header_footer(fd, oid.sha1, state->pack_tmp_name,
+					 state->nr_written, oid.sha1,
 					 state->offset);
 		close(fd);
 	}
@@ -48,7 +48,7 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
 	strbuf_addf(&packname, "%s/pack/pack-", get_object_directory());
 	finish_tmp_packfile(&packname, state->pack_tmp_name,
 			    state->written, state->nr_written,
-			    &state->pack_idx_opts, sha1);
+			    &state->pack_idx_opts, oid.sha1);
 	for (i = 0; i < state->nr_written; i++)
 		free(state->written[i]);
 
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 07/10] diff: convert struct combine_diff_path to object_id
  2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (5 preceding siblings ...)
  2015-03-07 23:24 ` [PATCH v2 06/10] bulk-checkin.c: convert to use struct object_id brian m. carlson
@ 2015-03-07 23:24 ` brian m. carlson
  2015-03-07 23:24 ` [PATCH v2 08/10] commit: convert parts to struct object_id brian m. carlson
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: brian m. carlson @ 2015-03-07 23:24 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Andreas Schwab

Also, convert a constant to GIT_SHA1_HEXSZ.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 combine-diff.c | 56 ++++++++++++++++++++++++++++----------------------------
 diff-lib.c     | 10 +++++-----
 diff.h         |  5 +++--
 tree-diff.c    | 10 +++++-----
 4 files changed, 41 insertions(+), 40 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 91edce5..ec25202 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -44,9 +44,9 @@ static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr,
 			memset(p->parent, 0,
 			       sizeof(p->parent[0]) * num_parent);
 
-			hashcpy(p->sha1, q->queue[i]->two->sha1);
+			hashcpy(p->oid.sha1, q->queue[i]->two->sha1);
 			p->mode = q->queue[i]->two->mode;
-			hashcpy(p->parent[n].sha1, q->queue[i]->one->sha1);
+			hashcpy(p->parent[n].oid.sha1, q->queue[i]->one->sha1);
 			p->parent[n].mode = q->queue[i]->one->mode;
 			p->parent[n].status = q->queue[i]->status;
 			*tail = p;
@@ -77,7 +77,7 @@ static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr,
 			continue;
 		}
 
-		hashcpy(p->parent[n].sha1, q->queue[i]->one->sha1);
+		hashcpy(p->parent[n].oid.sha1, q->queue[i]->one->sha1);
 		p->parent[n].mode = q->queue[i]->one->mode;
 		p->parent[n].status = q->queue[i]->status;
 
@@ -284,7 +284,7 @@ static struct lline *coalesce_lines(struct lline *base, int *lenbase,
 	return base;
 }
 
-static char *grab_blob(const unsigned char *sha1, unsigned int mode,
+static char *grab_blob(const struct object_id *oid, unsigned int mode,
 		       unsigned long *size, struct userdiff_driver *textconv,
 		       const char *path)
 {
@@ -294,20 +294,20 @@ static char *grab_blob(const unsigned char *sha1, unsigned int mode,
 	if (S_ISGITLINK(mode)) {
 		blob = xmalloc(100);
 		*size = snprintf(blob, 100,
-				 "Subproject commit %s\n", sha1_to_hex(sha1));
-	} else if (is_null_sha1(sha1)) {
+				 "Subproject commit %s\n", oid_to_hex(oid));
+	} else if (is_null_oid(oid)) {
 		/* deleted blob */
 		*size = 0;
 		return xcalloc(1, 1);
 	} else if (textconv) {
 		struct diff_filespec *df = alloc_filespec(path);
-		fill_filespec(df, sha1, 1, mode);
+		fill_filespec(df, oid->sha1, 1, mode);
 		*size = fill_textconv(textconv, df, &blob);
 		free_filespec(df);
 	} else {
-		blob = read_sha1_file(sha1, &type, size);
+		blob = read_sha1_file(oid->sha1, &type, size);
 		if (type != OBJ_BLOB)
-			die("object '%s' is not a blob!", sha1_to_hex(sha1));
+			die("object '%s' is not a blob!", oid_to_hex(oid));
 	}
 	return blob;
 }
@@ -389,7 +389,7 @@ static void consume_line(void *state_, char *line, unsigned long len)
 	}
 }
 
-static void combine_diff(const unsigned char *parent, unsigned int mode,
+static void combine_diff(const struct object_id *parent, unsigned int mode,
 			 mmfile_t *result_file,
 			 struct sline *sline, unsigned int cnt, int n,
 			 int num_parent, int result_deleted,
@@ -897,7 +897,7 @@ static void show_combined_header(struct combine_diff_path *elem,
 				 int show_file_header)
 {
 	struct diff_options *opt = &rev->diffopt;
-	int abbrev = DIFF_OPT_TST(opt, FULL_INDEX) ? 40 : DEFAULT_ABBREV;
+	int abbrev = DIFF_OPT_TST(opt, FULL_INDEX) ? GIT_SHA1_HEXSZ : DEFAULT_ABBREV;
 	const char *a_prefix = opt->a_prefix ? opt->a_prefix : "a/";
 	const char *b_prefix = opt->b_prefix ? opt->b_prefix : "b/";
 	const char *c_meta = diff_get_color_opt(opt, DIFF_METAINFO);
@@ -914,11 +914,11 @@ static void show_combined_header(struct combine_diff_path *elem,
 			 "", elem->path, line_prefix, c_meta, c_reset);
 	printf("%s%sindex ", line_prefix, c_meta);
 	for (i = 0; i < num_parent; i++) {
-		abb = find_unique_abbrev(elem->parent[i].sha1,
+		abb = find_unique_abbrev(elem->parent[i].oid.sha1,
 					 abbrev);
 		printf("%s%s", i ? "," : "", abb);
 	}
-	abb = find_unique_abbrev(elem->sha1, abbrev);
+	abb = find_unique_abbrev(elem->oid.sha1, abbrev);
 	printf("..%s%s\n", abb, c_reset);
 
 	if (mode_differs) {
@@ -991,7 +991,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 
 	/* Read the result of merge first */
 	if (!working_tree_file)
-		result = grab_blob(elem->sha1, elem->mode, &result_size,
+		result = grab_blob(&elem->oid, elem->mode, &result_size,
 				   textconv, elem->path);
 	else {
 		/* Used by diff-tree to read from the working tree */
@@ -1013,12 +1013,12 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			result = strbuf_detach(&buf, NULL);
 			elem->mode = canon_mode(st.st_mode);
 		} else if (S_ISDIR(st.st_mode)) {
-			unsigned char sha1[20];
-			if (resolve_gitlink_ref(elem->path, "HEAD", sha1) < 0)
-				result = grab_blob(elem->sha1, elem->mode,
+			struct object_id oid;
+			if (resolve_gitlink_ref(elem->path, "HEAD", oid.sha1) < 0)
+				result = grab_blob(&elem->oid, elem->mode,
 						   &result_size, NULL, NULL);
 			else
-				result = grab_blob(sha1, elem->mode,
+				result = grab_blob(&oid, elem->mode,
 						   &result_size, NULL, NULL);
 		} else if (textconv) {
 			struct diff_filespec *df = alloc_filespec(elem->path);
@@ -1090,7 +1090,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 		for (i = 0; !is_binary && i < num_parent; i++) {
 			char *buf;
 			unsigned long size;
-			buf = grab_blob(elem->parent[i].sha1,
+			buf = grab_blob(&elem->parent[i].oid,
 					elem->parent[i].mode,
 					&size, NULL, NULL);
 			if (buffer_is_binary(buf, size))
@@ -1139,14 +1139,14 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 	for (i = 0; i < num_parent; i++) {
 		int j;
 		for (j = 0; j < i; j++) {
-			if (!hashcmp(elem->parent[i].sha1,
-				     elem->parent[j].sha1)) {
+			if (!hashcmp(elem->parent[i].oid.sha1,
+				     elem->parent[j].oid.sha1)) {
 				reuse_combine_diff(sline, cnt, i, j);
 				break;
 			}
 		}
 		if (i <= j)
-			combine_diff(elem->parent[i].sha1,
+			combine_diff(&elem->parent[i].oid,
 				     elem->parent[i].mode,
 				     &result_file, sline,
 				     cnt, i, num_parent, result_deleted,
@@ -1206,9 +1206,9 @@ static void show_raw_diff(struct combine_diff_path *p, int num_parent, struct re
 
 		/* Show sha1's */
 		for (i = 0; i < num_parent; i++)
-			printf(" %s", diff_unique_abbrev(p->parent[i].sha1,
+			printf(" %s", diff_unique_abbrev(p->parent[i].oid.sha1,
 							 opt->abbrev));
-		printf(" %s ", diff_unique_abbrev(p->sha1, opt->abbrev));
+		printf(" %s ", diff_unique_abbrev(p->oid.sha1, opt->abbrev));
 	}
 
 	if (opt->output_format & (DIFF_FORMAT_RAW | DIFF_FORMAT_NAME_STATUS)) {
@@ -1271,16 +1271,16 @@ static struct diff_filepair *combined_pair(struct combine_diff_path *p,
 	for (i = 0; i < num_parent; i++) {
 		pair->one[i].path = p->path;
 		pair->one[i].mode = p->parent[i].mode;
-		hashcpy(pair->one[i].sha1, p->parent[i].sha1);
-		pair->one[i].sha1_valid = !is_null_sha1(p->parent[i].sha1);
+		hashcpy(pair->one[i].sha1, p->parent[i].oid.sha1);
+		pair->one[i].sha1_valid = !is_null_oid(&p->parent[i].oid);
 		pair->one[i].has_more_entries = 1;
 	}
 	pair->one[num_parent - 1].has_more_entries = 0;
 
 	pair->two->path = p->path;
 	pair->two->mode = p->mode;
-	hashcpy(pair->two->sha1, p->sha1);
-	pair->two->sha1_valid = !is_null_sha1(p->sha1);
+	hashcpy(pair->two->sha1, p->oid.sha1);
+	pair->two->sha1_valid = !is_null_oid(&p->oid);
 	return pair;
 }
 
diff --git a/diff-lib.c b/diff-lib.c
index a85c497..60a7d6f 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -125,7 +125,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 			dpath->next = NULL;
 			memcpy(dpath->path, ce->name, path_len);
 			dpath->path[path_len] = '\0';
-			hashclr(dpath->sha1);
+			oidclr(&dpath->oid);
 			memset(&(dpath->parent[0]), 0,
 			       sizeof(struct combine_diff_parent)*5);
 
@@ -155,7 +155,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 				if (2 <= stage) {
 					int mode = nce->ce_mode;
 					num_compare_stages++;
-					hashcpy(dpath->parent[stage-2].sha1, nce->sha1);
+					hashcpy(dpath->parent[stage-2].oid.sha1, nce->sha1);
 					dpath->parent[stage-2].mode = ce_mode_from_stat(nce, mode);
 					dpath->parent[stage-2].status =
 						DIFF_STATUS_MODIFIED;
@@ -339,14 +339,14 @@ static int show_modified(struct rev_info *revs,
 		memcpy(p->path, new->name, pathlen);
 		p->path[pathlen] = 0;
 		p->mode = mode;
-		hashclr(p->sha1);
+		oidclr(&p->oid);
 		memset(p->parent, 0, 2 * sizeof(struct combine_diff_parent));
 		p->parent[0].status = DIFF_STATUS_MODIFIED;
 		p->parent[0].mode = new->ce_mode;
-		hashcpy(p->parent[0].sha1, new->sha1);
+		hashcpy(p->parent[0].oid.sha1, new->sha1);
 		p->parent[1].status = DIFF_STATUS_MODIFIED;
 		p->parent[1].mode = old->ce_mode;
-		hashcpy(p->parent[1].sha1, old->sha1);
+		hashcpy(p->parent[1].oid.sha1, old->sha1);
 		show_combined_diff(p, 2, revs->dense_combined_merges, revs);
 		free(p);
 		return 0;
diff --git a/diff.h b/diff.h
index b4a624d..f6fdf49 100644
--- a/diff.h
+++ b/diff.h
@@ -6,6 +6,7 @@
 
 #include "tree-walk.h"
 #include "pathspec.h"
+#include "object.h"
 
 struct rev_info;
 struct diff_options;
@@ -207,11 +208,11 @@ struct combine_diff_path {
 	struct combine_diff_path *next;
 	char *path;
 	unsigned int mode;
-	unsigned char sha1[20];
+	struct object_id oid;
 	struct combine_diff_parent {
 		char status;
 		unsigned int mode;
-		unsigned char sha1[20];
+		struct object_id oid;
 	} parent[FLEX_ARRAY];
 };
 #define combine_diff_path_size(n, l) \
diff --git a/tree-diff.c b/tree-diff.c
index e7b378c..8631c28 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -64,7 +64,7 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
 {
 	struct combine_diff_parent *p0 = &p->parent[0];
 	if (p->mode && p0->mode) {
-		opt->change(opt, p0->mode, p->mode, p0->sha1, p->sha1,
+		opt->change(opt, p0->mode, p->mode, p0->oid.sha1, p->oid.sha1,
 			1, 1, p->path, 0, 0);
 	}
 	else {
@@ -74,11 +74,11 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
 
 		if (p->mode) {
 			addremove = '+';
-			sha1 = p->sha1;
+			sha1 = p->oid.sha1;
 			mode = p->mode;
 		} else {
 			addremove = '-';
-			sha1 = p0->sha1;
+			sha1 = p0->oid.sha1;
 			mode = p0->mode;
 		}
 
@@ -151,7 +151,7 @@ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
 	memcpy(p->path + base->len, path, pathlen);
 	p->path[len] = 0;
 	p->mode = mode;
-	hashcpy(p->sha1, sha1 ? sha1 : null_sha1);
+	hashcpy(p->oid.sha1, sha1 ? sha1 : null_sha1);
 
 	return p;
 }
@@ -238,7 +238,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
 			}
 
 			p->parent[i].mode = mode_i;
-			hashcpy(p->parent[i].sha1, sha1_i ? sha1_i : null_sha1);
+			hashcpy(p->parent[i].oid.sha1, sha1_i ? sha1_i : null_sha1);
 		}
 
 		keep = 1;
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 08/10] commit: convert parts to struct object_id
  2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (6 preceding siblings ...)
  2015-03-07 23:24 ` [PATCH v2 07/10] diff: convert struct combine_diff_path to object_id brian m. carlson
@ 2015-03-07 23:24 ` brian m. carlson
  2015-03-11 14:46   ` Michael Haggerty
  2015-03-07 23:24 ` [PATCH v2 09/10] patch-id: convert to use " brian m. carlson
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: brian m. carlson @ 2015-03-07 23:24 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Andreas Schwab

Convert struct commit_graft and necessary local parts of commit.c.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 commit.c      | 32 ++++++++++++++++----------------
 commit.h      |  4 ++--
 log-tree.c    |  2 +-
 send-pack.c   |  2 +-
 shallow.c     |  8 ++++----
 upload-pack.c |  2 +-
 6 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/commit.c b/commit.c
index a8c7577..89c207e 100644
--- a/commit.c
+++ b/commit.c
@@ -55,12 +55,12 @@ struct commit *lookup_commit(const unsigned char *sha1)
 
 struct commit *lookup_commit_reference_by_name(const char *name)
 {
-	unsigned char sha1[20];
+	struct object_id oid;
 	struct commit *commit;
 
-	if (get_sha1_committish(name, sha1))
+	if (get_sha1_committish(name, oid.sha1))
 		return NULL;
-	commit = lookup_commit_reference(sha1);
+	commit = lookup_commit_reference(oid.sha1);
 	if (parse_commit(commit))
 		return NULL;
 	return commit;
@@ -99,7 +99,7 @@ static int commit_graft_alloc, commit_graft_nr;
 static const unsigned char *commit_graft_sha1_access(size_t index, void *table)
 {
 	struct commit_graft **commit_graft_table = table;
-	return commit_graft_table[index]->sha1;
+	return commit_graft_table[index]->oid.sha1;
 }
 
 static int commit_graft_pos(const unsigned char *sha1)
@@ -110,7 +110,7 @@ static int commit_graft_pos(const unsigned char *sha1)
 
 int register_commit_graft(struct commit_graft *graft, int ignore_dups)
 {
-	int pos = commit_graft_pos(graft->sha1);
+	int pos = commit_graft_pos(graft->oid.sha1);
 
 	if (0 <= pos) {
 		if (ignore_dups)
@@ -148,12 +148,12 @@ struct commit_graft *read_graft_line(char *buf, int len)
 	i = (len + 1) / 41 - 1;
 	graft = xmalloc(sizeof(*graft) + 20 * i);
 	graft->nr_parent = i;
-	if (get_sha1_hex(buf, graft->sha1))
+	if (get_oid_hex(buf, &graft->oid))
 		goto bad_graft_data;
 	for (i = 40; i < len; i += 41) {
 		if (buf[i] != ' ')
 			goto bad_graft_data;
-		if (get_sha1_hex(buf + i + 1, graft->parent[i/41]))
+		if (get_sha1_hex(buf + i + 1, graft->parent[i/41].sha1))
 			goto bad_graft_data;
 	}
 	return graft;
@@ -302,7 +302,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 {
 	const char *tail = buffer;
 	const char *bufptr = buffer;
-	unsigned char parent[20];
+	struct object_id parent;
 	struct commit_list **pptr;
 	struct commit_graft *graft;
 
@@ -312,10 +312,10 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 	tail += size;
 	if (tail <= bufptr + 46 || memcmp(bufptr, "tree ", 5) || bufptr[45] != '\n')
 		return error("bogus commit object %s", sha1_to_hex(item->object.sha1));
-	if (get_sha1_hex(bufptr + 5, parent) < 0)
+	if (get_sha1_hex(bufptr + 5, parent.sha1) < 0)
 		return error("bad tree pointer in commit %s",
 			     sha1_to_hex(item->object.sha1));
-	item->tree = lookup_tree(parent);
+	item->tree = lookup_tree(parent.sha1);
 	bufptr += 46; /* "tree " + "hex sha1" + "\n" */
 	pptr = &item->parents;
 
@@ -324,7 +324,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 		struct commit *new_parent;
 
 		if (tail <= bufptr + 48 ||
-		    get_sha1_hex(bufptr + 7, parent) ||
+		    get_sha1_hex(bufptr + 7, parent.sha1) ||
 		    bufptr[47] != '\n')
 			return error("bad parents in commit %s", sha1_to_hex(item->object.sha1));
 		bufptr += 48;
@@ -334,7 +334,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 		 */
 		if (graft && (graft->nr_parent < 0 || grafts_replace_parents))
 			continue;
-		new_parent = lookup_commit(parent);
+		new_parent = lookup_commit(parent.sha1);
 		if (new_parent)
 			pptr = &commit_list_insert(new_parent, pptr)->next;
 	}
@@ -342,7 +342,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 		int i;
 		struct commit *new_parent;
 		for (i = 0; i < graft->nr_parent; i++) {
-			new_parent = lookup_commit(graft->parent[i]);
+			new_parent = lookup_commit(graft->parent[i].sha1);
 			if (!new_parent)
 				continue;
 			pptr = &commit_list_insert(new_parent, pptr)->next;
@@ -1580,10 +1580,10 @@ struct commit *get_merge_parent(const char *name)
 {
 	struct object *obj;
 	struct commit *commit;
-	unsigned char sha1[20];
-	if (get_sha1(name, sha1))
+	struct object_id oid;
+	if (get_sha1(name, oid.sha1))
 		return NULL;
-	obj = parse_object(sha1);
+	obj = parse_object(oid.sha1);
 	commit = (struct commit *)peel_to_type(name, 0, obj, OBJ_COMMIT);
 	if (commit && !commit->util) {
 		struct merge_remote_desc *desc;
diff --git a/commit.h b/commit.h
index 9f189cb..ed3a1d5 100644
--- a/commit.h
+++ b/commit.h
@@ -226,9 +226,9 @@ enum rev_sort_order {
 void sort_in_topological_order(struct commit_list **, enum rev_sort_order);
 
 struct commit_graft {
-	unsigned char sha1[20];
+	struct object_id oid;
 	int nr_parent; /* < 0 if shallow commit */
-	unsigned char parent[FLEX_ARRAY][20]; /* more */
+	struct object_id parent[FLEX_ARRAY]; /* more */
 };
 typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
 
diff --git a/log-tree.c b/log-tree.c
index 7f0890e..f6b7801 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -137,7 +137,7 @@ static int add_ref_decoration(const char *refname, const unsigned char *sha1, in
 
 static int add_graft_decoration(const struct commit_graft *graft, void *cb_data)
 {
-	struct commit *commit = lookup_commit(graft->sha1);
+	struct commit *commit = lookup_commit(graft->oid.sha1);
 	if (!commit)
 		return 0;
 	add_name_decoration(DECORATION_GRAFTED, "grafted", &commit->object);
diff --git a/send-pack.c b/send-pack.c
index 9d2b0c5..e8389a4 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -182,7 +182,7 @@ static int advertise_shallow_grafts_cb(const struct commit_graft *graft, void *c
 {
 	struct strbuf *sb = cb;
 	if (graft->nr_parent == -1)
-		packet_buf_write(sb, "shallow %s\n", sha1_to_hex(graft->sha1));
+		packet_buf_write(sb, "shallow %s\n", sha1_to_hex(graft->oid.sha1));
 	return 0;
 }
 
diff --git a/shallow.c b/shallow.c
index d8bf40a..9ff2d4c 100644
--- a/shallow.c
+++ b/shallow.c
@@ -31,7 +31,7 @@ int register_shallow(const unsigned char *sha1)
 		xmalloc(sizeof(struct commit_graft));
 	struct commit *commit = lookup_commit(sha1);
 
-	hashcpy(graft->sha1, sha1);
+	hashcpy(graft->oid.sha1, sha1);
 	graft->nr_parent = -1;
 	if (commit && commit->object.parsed)
 		commit->parents = NULL;
@@ -159,11 +159,11 @@ struct write_shallow_data {
 static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 {
 	struct write_shallow_data *data = cb_data;
-	const char *hex = sha1_to_hex(graft->sha1);
+	const char *hex = oid_to_hex(&graft->oid);
 	if (graft->nr_parent != -1)
 		return 0;
 	if (data->flags & SEEN_ONLY) {
-		struct commit *c = lookup_commit(graft->sha1);
+		struct commit *c = lookup_commit(graft->oid.sha1);
 		if (!c || !(c->object.flags & SEEN)) {
 			if (data->flags & VERBOSE)
 				printf("Removing %s from .git/shallow\n",
@@ -282,7 +282,7 @@ static int advertise_shallow_grafts_cb(const struct commit_graft *graft, void *c
 {
 	int fd = *(int *)cb;
 	if (graft->nr_parent == -1)
-		packet_write(fd, "shallow %s\n", sha1_to_hex(graft->sha1));
+		packet_write(fd, "shallow %s\n", oid_to_hex(&graft->oid));
 	return 0;
 }
 
diff --git a/upload-pack.c b/upload-pack.c
index b531a32..0566ce0 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -74,7 +74,7 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 {
 	FILE *fp = cb_data;
 	if (graft->nr_parent == -1)
-		fprintf(fp, "--shallow %s\n", sha1_to_hex(graft->sha1));
+		fprintf(fp, "--shallow %s\n", oid_to_hex(&graft->oid));
 	return 0;
 }
 
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 09/10] patch-id: convert to use struct object_id
  2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (7 preceding siblings ...)
  2015-03-07 23:24 ` [PATCH v2 08/10] commit: convert parts to struct object_id brian m. carlson
@ 2015-03-07 23:24 ` brian m. carlson
  2015-03-07 23:24 ` [PATCH v2 10/10] apply: convert threeway_stage to object_id brian m. carlson
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: brian m. carlson @ 2015-03-07 23:24 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Andreas Schwab

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/patch-id.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 77db873..c208e7e 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -1,14 +1,14 @@
 #include "builtin.h"
 
-static void flush_current_id(int patchlen, unsigned char *id, unsigned char *result)
+static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result)
 {
 	char name[50];
 
 	if (!patchlen)
 		return;
 
-	memcpy(name, sha1_to_hex(id), 41);
-	printf("%s %s\n", sha1_to_hex(result), name);
+	memcpy(name, oid_to_hex(id), GIT_SHA1_HEXSZ + 1);
+	printf("%s %s\n", oid_to_hex(result), name);
 }
 
 static int remove_space(char *line)
@@ -53,23 +53,23 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
 	return 1;
 }
 
-static void flush_one_hunk(unsigned char *result, git_SHA_CTX *ctx)
+static void flush_one_hunk(struct object_id *result, git_SHA_CTX *ctx)
 {
-	unsigned char hash[20];
+	unsigned char hash[GIT_SHA1_RAWSZ];
 	unsigned short carry = 0;
 	int i;
 
 	git_SHA1_Final(hash, ctx);
 	git_SHA1_Init(ctx);
 	/* 20-byte sum, with carry */
-	for (i = 0; i < 20; ++i) {
-		carry += result[i] + hash[i];
-		result[i] = carry;
+	for (i = 0; i < GIT_SHA1_RAWSZ; ++i) {
+		carry += result->sha1[i] + hash[i];
+		result->sha1[i] = carry;
 		carry >>= 8;
 	}
 }
 
-static int get_one_patchid(unsigned char *next_sha1, unsigned char *result,
+static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 			   struct strbuf *line_buf, int stable)
 {
 	int patchlen = 0, found_next = 0;
@@ -77,7 +77,7 @@ static int get_one_patchid(unsigned char *next_sha1, unsigned char *result,
 	git_SHA_CTX ctx;
 
 	git_SHA1_Init(&ctx);
-	hashclr(result);
+	oidclr(result);
 
 	while (strbuf_getwholeline(line_buf, stdin, '\n') != EOF) {
 		char *line = line_buf->buf;
@@ -93,7 +93,7 @@ static int get_one_patchid(unsigned char *next_sha1, unsigned char *result,
 		else if (!memcmp(line, "\\ ", 2) && 12 < strlen(line))
 			continue;
 
-		if (!get_sha1_hex(p, next_sha1)) {
+		if (!get_oid_hex(p, next_oid)) {
 			found_next = 1;
 			break;
 		}
@@ -143,7 +143,7 @@ static int get_one_patchid(unsigned char *next_sha1, unsigned char *result,
 	}
 
 	if (!found_next)
-		hashclr(next_sha1);
+		oidclr(next_oid);
 
 	flush_one_hunk(result, &ctx);
 
@@ -152,15 +152,15 @@ static int get_one_patchid(unsigned char *next_sha1, unsigned char *result,
 
 static void generate_id_list(int stable)
 {
-	unsigned char sha1[20], n[20], result[20];
+	struct object_id oid, n, result;
 	int patchlen;
 	struct strbuf line_buf = STRBUF_INIT;
 
-	hashclr(sha1);
+	oidclr(&oid);
 	while (!feof(stdin)) {
-		patchlen = get_one_patchid(n, result, &line_buf, stable);
-		flush_current_id(patchlen, sha1, result);
-		hashcpy(sha1, n);
+		patchlen = get_one_patchid(&n, &result, &line_buf, stable);
+		flush_current_id(patchlen, &oid, &result);
+		oidcpy(&oid, &n);
 	}
 	strbuf_release(&line_buf);
 }
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 10/10] apply: convert threeway_stage to object_id
  2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (8 preceding siblings ...)
  2015-03-07 23:24 ` [PATCH v2 09/10] patch-id: convert to use " brian m. carlson
@ 2015-03-07 23:24 ` brian m. carlson
  2015-03-08  7:43 ` [PATCH v2 00/10] Use a structure for object IDs Junio C Hamano
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: brian m. carlson @ 2015-03-07 23:24 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Andreas Schwab

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/apply.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index 65b97ee..75c5342 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -208,7 +208,7 @@ struct patch {
 	struct patch *next;
 
 	/* three-way fallback result */
-	unsigned char threeway_stage[3][20];
+	struct object_id threeway_stage[3];
 };
 
 static void free_fragment_list(struct fragment *list)
@@ -3424,11 +3424,11 @@ static int try_threeway(struct image *image, struct patch *patch,
 	if (status) {
 		patch->conflicted_threeway = 1;
 		if (patch->is_new)
-			hashclr(patch->threeway_stage[0]);
+			hashclr(patch->threeway_stage[0].sha1);
 		else
-			hashcpy(patch->threeway_stage[0], pre_sha1);
-		hashcpy(patch->threeway_stage[1], our_sha1);
-		hashcpy(patch->threeway_stage[2], post_sha1);
+			hashcpy(patch->threeway_stage[0].sha1, pre_sha1);
+		hashcpy(patch->threeway_stage[1].sha1, our_sha1);
+		hashcpy(patch->threeway_stage[2].sha1, post_sha1);
 		fprintf(stderr, "Applied patch to '%s' with conflicts.\n", patch->new_name);
 	} else {
 		fprintf(stderr, "Applied patch to '%s' cleanly.\n", patch->new_name);
@@ -4184,14 +4184,14 @@ static void add_conflicted_stages_file(struct patch *patch)
 
 	remove_file_from_cache(patch->new_name);
 	for (stage = 1; stage < 4; stage++) {
-		if (is_null_sha1(patch->threeway_stage[stage - 1]))
+		if (is_null_oid(&patch->threeway_stage[stage - 1]))
 			continue;
 		ce = xcalloc(1, ce_size);
 		memcpy(ce->name, patch->new_name, namelen);
 		ce->ce_mode = create_ce_mode(mode);
 		ce->ce_flags = create_ce_flags(stage);
 		ce->ce_namelen = namelen;
-		hashcpy(ce->sha1, patch->threeway_stage[stage - 1]);
+		hashcpy(ce->sha1, patch->threeway_stage[stage - 1].sha1);
 		if (add_cache_entry(ce, ADD_CACHE_OK_TO_ADD) < 0)
 			die(_("unable to add cache entry for %s"), patch->new_name);
 	}
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/10] Use a structure for object IDs.
  2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (9 preceding siblings ...)
  2015-03-07 23:24 ` [PATCH v2 10/10] apply: convert threeway_stage to object_id brian m. carlson
@ 2015-03-08  7:43 ` Junio C Hamano
  2015-03-11  2:38 ` Kyle J. McKay
  2015-03-11 16:08 ` Michael Haggerty
  12 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2015-03-08  7:43 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Michael Haggerty, Andreas Schwab

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Certain parts of the code have to be converted before others to keep the
> patch sizes small, maintainable, and bisectable, so functions and
> structures that are used across the codebase (e.g. struct object) will
> be converted later.  Conversion has been done in a somewhat haphazard
> manner by converting modules with leaf functions and less commonly used
> structs first.

That "leaf-first" approach sounds very sensible.

In the medium term, I wonder if the changes can progress faster and
in a less error prone way if you first used GIT_SHA1_RAWSZ in places
that cannot be immediately converted to the struct yet.  That way,
we will be easily tell by "git grep GIT_SHA1_RAWSZ" how many more
places need treatment.  I do not know if that is all that useful
offhand, though.  Those places need to be touched in the second pass
to use the struct again, after the "s/\[20\]/[GIT_SHA1_RAWSZ]/"
first pass.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 02/10] Define utility functions for object IDs.
  2015-03-07 23:23 ` [PATCH v2 02/10] Define utility functions " brian m. carlson
@ 2015-03-08  9:57   ` Duy Nguyen
  2015-03-08 14:48     ` brian m. carlson
  2015-03-11 12:44   ` Michael Haggerty
  1 sibling, 1 reply; 37+ messages in thread
From: Duy Nguyen @ 2015-03-08  9:57 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Michael Haggerty, Andreas Schwab

On Sun, Mar 8, 2015 at 6:23 AM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> There are several utility functions (hashcmp and friends) that are used
> for comparing object IDs (SHA-1 values).  Using these functions, which
> take pointers to unsigned char, with struct object_id requires tiresome
> access to the sha1 member, which bloats code and violates the desired
> encapsulation.  Provide wrappers around these functions for struct
> object_id for neater, more maintainable code.  Use the new constants to
> avoid the hard-coded 20s and 40s throughout the original functions.
>
> These functions simply call the underlying pointer-to-unsigned-char
> versions to ensure that any performance improvements will be passed
> through to the new functions.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> I'm not very excited about having to put the #include in the middle of
> cache.h.  The alternative, of course, is to move enum object_type up,
> which I can do if necessary.  Another alternative is to simply move the
> struct object_id definitions to cache.h instead of object.h, which might
> be cleaner.
>
> I'm interested in hearing opinions about the best way to go forward.

I think declaring struct object_id in cache.h would be simplest.
Another alternative is do it in git-compat-util.h. This is not the
first abuse of git-compat-util.h (because it's included everywhere).
starts_with(), ends_with() and friends are already in
git-compat-util.h
-- 
Duy

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 02/10] Define utility functions for object IDs.
  2015-03-08  9:57   ` Duy Nguyen
@ 2015-03-08 14:48     ` brian m. carlson
  0 siblings, 0 replies; 37+ messages in thread
From: brian m. carlson @ 2015-03-08 14:48 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Michael Haggerty, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 1216 bytes --]

On Sun, Mar 08, 2015 at 04:57:36PM +0700, Duy Nguyen wrote:
>On Sun, Mar 8, 2015 at 6:23 AM, brian m. carlson
><sandals@crustytoothpaste.net> wrote:
>> I'm not very excited about having to put the #include in the middle of
>> cache.h.  The alternative, of course, is to move enum object_type up,
>> which I can do if necessary.  Another alternative is to simply move the
>> struct object_id definitions to cache.h instead of object.h, which might
>> be cleaner.
>>
>> I'm interested in hearing opinions about the best way to go forward.
>
>I think declaring struct object_id in cache.h would be simplest.
>Another alternative is do it in git-compat-util.h. This is not the
>first abuse of git-compat-util.h (because it's included everywhere).
>starts_with(), ends_with() and friends are already in
>git-compat-util.h

cache.h is so ubiquitous due to the hash* functions that I didn't need 
to include it anywhere, so I don't see a reason to abuse 
git-compat-util.h for that purpose.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/10] Use a structure for object IDs.
  2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (10 preceding siblings ...)
  2015-03-08  7:43 ` [PATCH v2 00/10] Use a structure for object IDs Junio C Hamano
@ 2015-03-11  2:38 ` Kyle J. McKay
  2015-03-11 16:08 ` Michael Haggerty
  12 siblings, 0 replies; 37+ messages in thread
From: Kyle J. McKay @ 2015-03-11  2:38 UTC (permalink / raw)
  To: brian m. carlson, Junio C Hamano
  Cc: repo.or.cz admin team, Git mailing list, Michael Haggerty,
	Andreas Schwab

On Mar 7, 2015, at 15:23, brian m. carlson wrote:
> This is a patch series to convert some of the relevant uses of  
> unsigned
> char [20] to struct object_id.
>
> brian m. carlson (10):

All patches applied for me (to master) and all tests pass.

Tested-by: Kyle J. McKay


>  Define a structure for object IDs.

Comments in reply to the patch.


>  Define utility functions for object IDs.
>  bisect.c: convert leaf functions to use struct object_id
>  archive.c: convert to use struct object_id
>  zip: use GIT_SHA1_HEXSZ for trailers
>  bulk-checkin.c: convert to use struct object_id
>  diff: convert struct combine_diff_path to object_id
>  commit: convert parts to struct object_id
>  patch-id: convert to use struct object_id
>  apply: convert threeway_stage to object_id

These all look good, the conversions are simple and easy to follow.


On Mar 7, 2015, at 23:43, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>> Certain parts of the code have to be converted before others to  
>> keep the
>> patch sizes small, maintainable, and bisectable, so functions and
>> structures that are used across the codebase (e.g. struct object)  
>> will
>> be converted later.  Conversion has been done in a somewhat haphazard
>> manner by converting modules with leaf functions and less commonly  
>> used
>> structs first.
>
> That "leaf-first" approach sounds very sensible.
>
> In the medium term, I wonder if the changes can progress faster and
> in a less error prone way if you first used GIT_SHA1_RAWSZ in places
> that cannot be immediately converted to the struct yet.  That way,
> we will be easily tell by "git grep GIT_SHA1_RAWSZ" how many more
> places need treatment.  I do not know if that is all that useful
> offhand, though.  Those places need to be touched in the second pass
> to use the struct again, after the "s/\[20\]/[GIT_SHA1_RAWSZ]/"
> first pass.

I definitely noticed the leaf-first approach as I was looking through  
the patches where, for example (03/10), this prototype was left  
unchanged:

static int register_ref(const char *refname, const unsigned char *sha1,
                         int flags, void *cb_data)

but its contents got the update leaving it half converted.  As  
mentioned above this makes the patches more manageable, maintainable  
and bisectable.  However, these functions could be converted to take a  
typedef (a quick grep of 'CodingGuidelines' does not mention typedef)  
and perhaps, as Junio mentions above, help the changes progress faster  
by making it easier to find the affected code (e.g. changing or  
removing the typedef would make the compiler find them for you).

For example, if we added this to object.h:

     typedef unsigned char sha1raw_t[GIT_SHA1_RAWSZ];

then the above prototype could be immediately converted to (and this  
does compile and pass all the tests):

static int register_ref(const char *refname, const sha1raw_t sha,
                         int flags, void *cb_data)

So that together with Junio's suggestion above (and perhaps also a  
sha1hex_t type) would help mark everything in the first pass that  
needs to be touched again in the second pass.  (I'm just throwing out  
some typedef names as an example, there may be more preferable names  
to "sha1raw_t" and "sha1hex_t", but those names would end up being  
replaced eventually anyway.)

-Kyle 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 02/10] Define utility functions for object IDs.
  2015-03-07 23:23 ` [PATCH v2 02/10] Define utility functions " brian m. carlson
  2015-03-08  9:57   ` Duy Nguyen
@ 2015-03-11 12:44   ` Michael Haggerty
  1 sibling, 0 replies; 37+ messages in thread
From: Michael Haggerty @ 2015-03-11 12:44 UTC (permalink / raw)
  To: brian m. carlson, git; +Cc: Andreas Schwab

On 03/08/2015 12:23 AM, brian m. carlson wrote:
> There are several utility functions (hashcmp and friends) that are used
> for comparing object IDs (SHA-1 values).  Using these functions, which
> take pointers to unsigned char, with struct object_id requires tiresome
> access to the sha1 member, which bloats code and violates the desired
> encapsulation.  Provide wrappers around these functions for struct
> object_id for neater, more maintainable code.  Use the new constants to
> avoid the hard-coded 20s and 40s throughout the original functions.
> 
> These functions simply call the underlying pointer-to-unsigned-char
> versions to ensure that any performance improvements will be passed
> through to the new functions.
> 
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> I'm not very excited about having to put the #include in the middle of
> cache.h.  The alternative, of course, is to move enum object_type up,
> which I can do if necessary.  Another alternative is to simply move the
> struct object_id definitions to cache.h instead of object.h, which might
> be cleaner.
> 
> I'm interested in hearing opinions about the best way to go forward.
> 
>  cache.h | 37 +++++++++++++++++++++++++++++++++----
>  hex.c   | 16 +++++++++++++---
>  2 files changed, 46 insertions(+), 7 deletions(-)
> 
> diff --git a/cache.h b/cache.h
> index 761c570..f9addcc 100644
> --- a/cache.h
> +++ b/cache.h
> [...]
> @@ -724,20 +729,42 @@ static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>  	return 0;
>  }
>  
> +static inline int oidcmp(const struct object_id *o1, const struct object_id *o2)
> +{
> +	return hashcmp(o1->sha1, o2->sha1);
> +}
> +

Maybe rename o1 -> oid1 and o2 -> oid2 just to make things blindingly
obvious?

> [...]

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 04/10] archive.c: convert to use struct object_id
  2015-03-07 23:23 ` [PATCH v2 04/10] archive.c: convert " brian m. carlson
@ 2015-03-11 14:20   ` Michael Haggerty
  2015-03-11 22:12     ` brian m. carlson
  0 siblings, 1 reply; 37+ messages in thread
From: Michael Haggerty @ 2015-03-11 14:20 UTC (permalink / raw)
  To: brian m. carlson, git; +Cc: Andreas Schwab

On 03/08/2015 12:23 AM, brian m. carlson wrote:
> Convert a hard-coded 20 as well.
> 
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  archive.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/archive.c b/archive.c
> index 96057ed..46d9025 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -101,7 +101,7 @@ static void setup_archive_check(struct git_attr_check *check)
>  
>  struct directory {
>  	struct directory *up;
> -	unsigned char sha1[20];
> +	unsigned char sha1[GIT_SHA1_RAWSZ];

This is a local struct, and I think it is only a three-line change to
change the sha1 member to

        struct object_id oid;

Though perhaps you are delaying that change for some concrete reason.

> [...]

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 08/10] commit: convert parts to struct object_id
  2015-03-07 23:24 ` [PATCH v2 08/10] commit: convert parts to struct object_id brian m. carlson
@ 2015-03-11 14:46   ` Michael Haggerty
  0 siblings, 0 replies; 37+ messages in thread
From: Michael Haggerty @ 2015-03-11 14:46 UTC (permalink / raw)
  To: brian m. carlson, git; +Cc: Andreas Schwab

On 03/08/2015 12:24 AM, brian m. carlson wrote:
> Convert struct commit_graft and necessary local parts of commit.c.
> 
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  commit.c      | 32 ++++++++++++++++----------------
>  commit.h      |  4 ++--
>  log-tree.c    |  2 +-
>  send-pack.c   |  2 +-
>  shallow.c     |  8 ++++----
>  upload-pack.c |  2 +-
>  6 files changed, 25 insertions(+), 25 deletions(-)
> 
> diff --git a/commit.c b/commit.c
> index a8c7577..89c207e 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -55,12 +55,12 @@ struct commit *lookup_commit(const unsigned char *sha1)
>  
>  struct commit *lookup_commit_reference_by_name(const char *name)
>  {
> -	unsigned char sha1[20];
> +	struct object_id oid;
>  	struct commit *commit;
>  
> -	if (get_sha1_committish(name, sha1))
> +	if (get_sha1_committish(name, oid.sha1))
>  		return NULL;
> -	commit = lookup_commit_reference(sha1);
> +	commit = lookup_commit_reference(oid.sha1);
>  	if (parse_commit(commit))
>  		return NULL;
>  	return commit;
> @@ -99,7 +99,7 @@ static int commit_graft_alloc, commit_graft_nr;
>  static const unsigned char *commit_graft_sha1_access(size_t index, void *table)
>  {
>  	struct commit_graft **commit_graft_table = table;
> -	return commit_graft_table[index]->sha1;
> +	return commit_graft_table[index]->oid.sha1;
>  }
>  
>  static int commit_graft_pos(const unsigned char *sha1)
> @@ -110,7 +110,7 @@ static int commit_graft_pos(const unsigned char *sha1)
>  
>  int register_commit_graft(struct commit_graft *graft, int ignore_dups)
>  {
> -	int pos = commit_graft_pos(graft->sha1);
> +	int pos = commit_graft_pos(graft->oid.sha1);
>  
>  	if (0 <= pos) {
>  		if (ignore_dups)
> @@ -148,12 +148,12 @@ struct commit_graft *read_graft_line(char *buf, int len)
>  	i = (len + 1) / 41 - 1;
>  	graft = xmalloc(sizeof(*graft) + 20 * i);
>  	graft->nr_parent = i;
> -	if (get_sha1_hex(buf, graft->sha1))
> +	if (get_oid_hex(buf, &graft->oid))
>  		goto bad_graft_data;
>  	for (i = 40; i < len; i += 41) {
>  		if (buf[i] != ' ')
>  			goto bad_graft_data;
> -		if (get_sha1_hex(buf + i + 1, graft->parent[i/41]))
> +		if (get_sha1_hex(buf + i + 1, graft->parent[i/41].sha1))
>  			goto bad_graft_data;
>  	}

There are a bunch of constants in this function related to
GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ.

>  	return graft;
> @@ -302,7 +302,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
>  {
>  	const char *tail = buffer;
>  	const char *bufptr = buffer;
> -	unsigned char parent[20];
> +	struct object_id parent;
>  	struct commit_list **pptr;
>  	struct commit_graft *graft;
>  
> @@ -312,10 +312,10 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
>  	tail += size;
>  	if (tail <= bufptr + 46 || memcmp(bufptr, "tree ", 5) || bufptr[45] != '\n')
>  		return error("bogus commit object %s", sha1_to_hex(item->object.sha1));
> -	if (get_sha1_hex(bufptr + 5, parent) < 0)
> +	if (get_sha1_hex(bufptr + 5, parent.sha1) < 0)
>  		return error("bad tree pointer in commit %s",
>  			     sha1_to_hex(item->object.sha1));
> -	item->tree = lookup_tree(parent);
> +	item->tree = lookup_tree(parent.sha1);
>  	bufptr += 46; /* "tree " + "hex sha1" + "\n" */
>  	pptr = &item->parents;
>  

Here, too; e.g.,

* "45" -> GIT_SHA1_HEXSZ + 5
* "46" -> GIT_SHA1_HEXSZ + 5 + 1

and below,

* "47" -> GIT_SHA1_HEXSZ + 7
* "48" -> GIT_SHA1_HEXSZ + 7 + 1


> @@ -324,7 +324,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
>  		struct commit *new_parent;
>  
>  		if (tail <= bufptr + 48 ||
> -		    get_sha1_hex(bufptr + 7, parent) ||
> +		    get_sha1_hex(bufptr + 7, parent.sha1) ||
>  		    bufptr[47] != '\n')
>  			return error("bad parents in commit %s", sha1_to_hex(item->object.sha1));
>  		bufptr += 48;
> @@ -334,7 +334,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
>  		 */
>  		if (graft && (graft->nr_parent < 0 || grafts_replace_parents))
>  			continue;
> -		new_parent = lookup_commit(parent);
> +		new_parent = lookup_commit(parent.sha1);
>  		if (new_parent)
>  			pptr = &commit_list_insert(new_parent, pptr)->next;
>  	}
> @@ -342,7 +342,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
>  		int i;
>  		struct commit *new_parent;
>  		for (i = 0; i < graft->nr_parent; i++) {
> -			new_parent = lookup_commit(graft->parent[i]);
> +			new_parent = lookup_commit(graft->parent[i].sha1);
>  			if (!new_parent)
>  				continue;
>  			pptr = &commit_list_insert(new_parent, pptr)->next;
> @@ -1580,10 +1580,10 @@ struct commit *get_merge_parent(const char *name)
>  {
>  	struct object *obj;
>  	struct commit *commit;
> -	unsigned char sha1[20];
> -	if (get_sha1(name, sha1))
> +	struct object_id oid;
> +	if (get_sha1(name, oid.sha1))
>  		return NULL;
> -	obj = parse_object(sha1);
> +	obj = parse_object(oid.sha1);
>  	commit = (struct commit *)peel_to_type(name, 0, obj, OBJ_COMMIT);
>  	if (commit && !commit->util) {
>  		struct merge_remote_desc *desc;
> diff --git a/commit.h b/commit.h
> index 9f189cb..ed3a1d5 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -226,9 +226,9 @@ enum rev_sort_order {
>  void sort_in_topological_order(struct commit_list **, enum rev_sort_order);
>  
>  struct commit_graft {
> -	unsigned char sha1[20];
> +	struct object_id oid;
>  	int nr_parent; /* < 0 if shallow commit */
> -	unsigned char parent[FLEX_ARRAY][20]; /* more */
> +	struct object_id parent[FLEX_ARRAY]; /* more */
>  };
>  typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
>  
> diff --git a/log-tree.c b/log-tree.c
> index 7f0890e..f6b7801 100644
> --- a/log-tree.c
> +++ b/log-tree.c
> @@ -137,7 +137,7 @@ static int add_ref_decoration(const char *refname, const unsigned char *sha1, in
>  
>  static int add_graft_decoration(const struct commit_graft *graft, void *cb_data)
>  {
> -	struct commit *commit = lookup_commit(graft->sha1);
> +	struct commit *commit = lookup_commit(graft->oid.sha1);
>  	if (!commit)
>  		return 0;
>  	add_name_decoration(DECORATION_GRAFTED, "grafted", &commit->object);
> diff --git a/send-pack.c b/send-pack.c
> index 9d2b0c5..e8389a4 100644
> --- a/send-pack.c
> +++ b/send-pack.c
> @@ -182,7 +182,7 @@ static int advertise_shallow_grafts_cb(const struct commit_graft *graft, void *c
>  {
>  	struct strbuf *sb = cb;
>  	if (graft->nr_parent == -1)
> -		packet_buf_write(sb, "shallow %s\n", sha1_to_hex(graft->sha1));
> +		packet_buf_write(sb, "shallow %s\n", sha1_to_hex(graft->oid.sha1));
>  	return 0;
>  }
>  
> diff --git a/shallow.c b/shallow.c
> index d8bf40a..9ff2d4c 100644
> --- a/shallow.c
> +++ b/shallow.c
> @@ -31,7 +31,7 @@ int register_shallow(const unsigned char *sha1)
>  		xmalloc(sizeof(struct commit_graft));
>  	struct commit *commit = lookup_commit(sha1);
>  
> -	hashcpy(graft->sha1, sha1);
> +	hashcpy(graft->oid.sha1, sha1);
>  	graft->nr_parent = -1;
>  	if (commit && commit->object.parsed)
>  		commit->parents = NULL;
> @@ -159,11 +159,11 @@ struct write_shallow_data {
>  static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
>  {
>  	struct write_shallow_data *data = cb_data;
> -	const char *hex = sha1_to_hex(graft->sha1);
> +	const char *hex = oid_to_hex(&graft->oid);
>  	if (graft->nr_parent != -1)
>  		return 0;
>  	if (data->flags & SEEN_ONLY) {
> -		struct commit *c = lookup_commit(graft->sha1);
> +		struct commit *c = lookup_commit(graft->oid.sha1);
>  		if (!c || !(c->object.flags & SEEN)) {
>  			if (data->flags & VERBOSE)
>  				printf("Removing %s from .git/shallow\n",
> @@ -282,7 +282,7 @@ static int advertise_shallow_grafts_cb(const struct commit_graft *graft, void *c
>  {
>  	int fd = *(int *)cb;
>  	if (graft->nr_parent == -1)
> -		packet_write(fd, "shallow %s\n", sha1_to_hex(graft->sha1));
> +		packet_write(fd, "shallow %s\n", oid_to_hex(&graft->oid));
>  	return 0;
>  }
>  
> diff --git a/upload-pack.c b/upload-pack.c
> index b531a32..0566ce0 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -74,7 +74,7 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
>  {
>  	FILE *fp = cb_data;
>  	if (graft->nr_parent == -1)
> -		fprintf(fp, "--shallow %s\n", sha1_to_hex(graft->sha1));
> +		fprintf(fp, "--shallow %s\n", oid_to_hex(&graft->oid));
>  	return 0;
>  }
>  
> 


-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/10] Use a structure for object IDs.
  2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (11 preceding siblings ...)
  2015-03-11  2:38 ` Kyle J. McKay
@ 2015-03-11 16:08 ` Michael Haggerty
  2015-03-11 20:35   ` Junio C Hamano
  12 siblings, 1 reply; 37+ messages in thread
From: Michael Haggerty @ 2015-03-11 16:08 UTC (permalink / raw)
  To: brian m. carlson, git
  Cc: Andreas Schwab, Kyle J. McKay, Nguyen Thai Ngoc Duy,
	Junio C Hamano, Johannes Sixt, David Kastrup, James Denholm

On 03/08/2015 12:23 AM, brian m. carlson wrote:
> This is a patch series to convert some of the relevant uses of unsigned
> char [20] to struct object_id.
> 
> The goal of this series to improve type-checking in the codebase and to
> make it easier to move to a different hash function if the project
> decides to do that.  This series does not convert all of the codebase,
> but only parts.  I've dropped some of the patches from earlier (which no
> longer apply) and added others.
> 
> Certain parts of the code have to be converted before others to keep the
> patch sizes small, maintainable, and bisectable, so functions and
> structures that are used across the codebase (e.g. struct object) will
> be converted later.  Conversion has been done in a somewhat haphazard
> manner by converting modules with leaf functions and less commonly used
> structs first.
> 
> Since part of the goal is to ease a move to a different hash function if
> the project decides to do that, I've adopted Michael Haggerty's
> suggestion of using variables named "oid", naming the structure member
> "sha1", and using GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ as the constants.
> 
> I've been holding on to this series for a long time in hopes of
> converting more of the code before submitting, but realized that this
> deprived others of the ability to use the new structure and required me
> to rebase extremely frequently.
> [...]

I've added CC to several people who commented on v1.

I think this is a really interesting project and I hope that it works out.

In my opinion, the biggest risk (aside from the sheer amount of work
required) is the issue that was brought up on the mailing list when you
submitted v1 [1]: Converting an arbitrary (unsigned char *) to a (struct
object_id *) is not allowed, because the alignment requirements of the
latter might be stricter than those of the former. This means that if,
for example, we are reading some data from disk or from the network, and
we expect the 20 bytes starting with byte number 17 to be a SHA-1 in
binary format, we used to be able to do

    unsigned char *sha1 = buf + 17;

and use sha1 like any other SHA-1, without the need for any copying. But
we can't do the analogous

    struct object_id *oid = (struct object_id *)(buf + 17);

because the alignment is not necessarily correct. So in a pure "struct
object_id" world, I think we would be forced to change such code to

    struct object_id oid;
    hashcpy(oid.sha1, buf + 17);

This uses additional memory and introduces copying overhead. Also, the
lifetime of oid might exceed the scope of a function, so oid might have
to be allocated on the heap and later freed. This adds more
computational overhead, more memory overhead, and more programming
effort to get it all right.

So as much as I like the idea of wrapping SHA-1s in objects, I think it
would be a good idea to first make sure that we can all agree on a plan
for dealing with situations like this. I can think of the following
possibilities:

1. Maybe code that needs to handle SHA-1s with screwy alignment does not
exist, or maybe it does the copying anyway. I haven't actually checked.

2. Maybe somebody can prove that

       struct object_id *oid = (struct object_id *)(buf + 17);

   somehow *is* allowed by the C standard, or at least that it is
sufficiently portable for our purposes.

3. We accept the additional runtime costs and development effort for the
extra copies. To accept this approach, we would need some idea of which
areas of the code will be affected, and some estimate of how much
additional memory it would cost.

4. We continue to support working with SHA-1s declared to be (unsigned
char *) in some performance-critical code, even as we migrate most other
code to using SHA-1s embedded within a (struct object_id). This will
cost some duplication of code. To accept this approach, we would need an
idea of *how much* code duplication would be needed. E.g., how many
functions will need both (unsigned char *) versions and (struct
object_id *) versions?

5. We only make the change opportunistically. Whenever we find a
function that needs to work with non-struct-aligned SHA-1s, we leave the
function as-is rather than converting it or creating a second version
that works with object_id objects. This approach would leave the
codebase somewhat schizophrenic.

I'm not trying to dissuade you from this project, but I think that for
your project to have a chance of success, we need an answer to this
question. So let's confront it now rather than after you have invested
lots more time and/or gotten patches merged.

Michael

[1] http://thread.gmane.org/gmane.comp.version-control.git/248054

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/10] Use a structure for object IDs.
  2015-03-11 16:08 ` Michael Haggerty
@ 2015-03-11 20:35   ` Junio C Hamano
  2015-03-13 22:45     ` brian m. carlson
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2015-03-11 20:35 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: brian m. carlson, git, Andreas Schwab, Kyle J. McKay,
	Nguyen Thai Ngoc Duy, Johannes Sixt, David Kastrup,
	James Denholm

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I think this is a really interesting project and I hope that it works out.

Count me in ;-)

> In my opinion, the biggest risk (aside from the sheer amount of work
> required) is the issue that was brought up on the mailing list when you
> submitted v1 [1]: Converting an arbitrary (unsigned char *) to a (struct
> object_id *) is not allowed, because the alignment requirements of the
> latter might be stricter than those of the former. This means that if,
> for example, we are reading some data from disk or from the network, and
> we expect the 20 bytes starting with byte number 17 to be a SHA-1 in
> binary format, we used to be able to do
>
>     unsigned char *sha1 = buf + 17;
>
> and use sha1 like any other SHA-1, without the need for any copying. But
> we can't do the analogous
>
>     struct object_id *oid = (struct object_id *)(buf + 17);
>
> because the alignment is not necessarily correct. So in a pure "struct
> object_id" world, I think we would be forced to change such code to
>
>     struct object_id oid;
>     hashcpy(oid.sha1, buf + 17);
>
> This uses additional memory and introduces copying overhead. Also, the
> lifetime of oid might exceed the scope of a function, so oid might have
> to be allocated on the heap and later freed. This adds more
> computational overhead, more memory overhead, and more programming
> effort to get it all right.

Because we use abstraction to reduce burden on programmers, the last
point is really what defeats this approach.

I wonder if there is any way to make the new "oid wrapped in a
struct" result in identical binary---that would be a reasonable
criterion to judge the goodness of a change line this.  Any
difference could be (1) compiler being (a) stupid and not taking
optimization opportunity it notices for a bare byte array but not
for a byte array in struct or (b) clever and taking optimization
opportunity the other way around, or (2) breakage in the conversion
causing new bugs, perhaps coming from the "alignment" worries you
cited above.  (1-a) may or may not be an acceptable price to pay for
a cleaner abstraction, but that depends on the extent of damage.
(1-b) may be unlikely but if there is such a gain, that's nice ;-).
And (2) is obviously a show-stopper X-<.

> So as much as I like the idea of wrapping SHA-1s in objects, I think it
> would be a good idea to first make sure that we can all agree on a plan
> for dealing with situations like this. I can think of the following
> possibilities:

These all look sensible to me.

> 4. We continue to support working with SHA-1s declared to be (unsigned
> char *) in some performance-critical code, even as we migrate most other
> code to using SHA-1s embedded within a (struct object_id). This will
> cost some duplication of code. To accept this approach, we would need an
> idea of *how much* code duplication would be needed. E.g., how many
> functions will need both (unsigned char *) versions and (struct
> object_id *) versions?

Ideally, only the ones that knows the underlying hash function is
SHA-1 (i.e. anybody who mentions git_SHA_CTX), moves bytes from/to
in-core object name field and raw range of bytes (e.g. sha1hash());
everybody else like hashcpy() and hashcmp() should be able to do its
thing only on structs and a generic-looking constant that denotes
how many bytes there are in the hash (or even sizeof(struct oid)).

I do not know what kind of code duplication you are worried about,
though.  If a callee needs "unsigned char *", the caller that has a
"struct object_id *o" should pass o->hash to the callee.  We should
not add another variant of the same callee that takes a pointer to
"struct object_id"---I think that leads to insanty, e.g.

    int hashcmp_o_o(struct object_id *, struct object_id *);
    int hashcmp_o_b(struct object_id *, unsigned char *);
    int hashcmp_b_o(unsigned char *, struct object_id *);
    int hashcmp_b_b(unsigned char *, unsigned char *);

And please do not suggest switching to C++; all it would do to
overload these into a single name is to make the callers harder to
read.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
       [not found]   ` <CEA07500-9F47-4B24-AD5D-1423A601A4DD@gmail.com>
@ 2015-03-11 22:08     ` brian m. carlson
  2015-03-12  0:26       ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: brian m. carlson @ 2015-03-11 22:08 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: git, Michael Haggerty, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 1224 bytes --]

On Tue, Mar 10, 2015 at 07:38:42PM -0700, Kyle J. McKay wrote:
>GIT_SHA1_HEXSZ will always be exactly 2 * GIT_SHA1_RAWSZ, right?  In
>fact, if it's not things will almost certainly break, yes?
>
>Does it make more sense then to reflect this requirement by using:
>
>  #define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
>
>instead?

Yes.  I'll make that change in the next version.

>I don't see anything wrong with this.  However, in part 02/10 the
>utility functions all use "oid" in their names, so I'm thinking that
>it may make more sense to just go with:
>
>struct object_id {
>	unsigned char oid[GIT_SHA1_RAWSZ];
>};
>
>to match?

Michael Haggerty recommended that I call the structure element sha1
instead of oid in case we want to turn this into a union if we decide to
go the additional hash route.

I think it can also improve readability if we use "oid" only for the
instances of the struct itself, especially since it makes it more
obvious what code has been converted already.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 04/10] archive.c: convert to use struct object_id
  2015-03-11 14:20   ` Michael Haggerty
@ 2015-03-11 22:12     ` brian m. carlson
  0 siblings, 0 replies; 37+ messages in thread
From: brian m. carlson @ 2015-03-11 22:12 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 726 bytes --]

On Wed, Mar 11, 2015 at 03:20:08PM +0100, Michael Haggerty wrote:
>On 03/08/2015 12:23 AM, brian m. carlson wrote:
>>  struct directory {
>>  	struct directory *up;
>> -	unsigned char sha1[20];
>> +	unsigned char sha1[GIT_SHA1_RAWSZ];
>
>This is a local struct, and I think it is only a three-line change to
>change the sha1 member to
>
>        struct object_id oid;
>
>Though perhaps you are delaying that change for some concrete reason.

No, I just overlooked it.  I'll fix that in the re-roll.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-11 22:08     ` brian m. carlson
@ 2015-03-12  0:26       ` Junio C Hamano
  2015-03-12  9:34         ` brian m. carlson
  2015-03-12 10:28         ` Michael Haggerty
  0 siblings, 2 replies; 37+ messages in thread
From: Junio C Hamano @ 2015-03-12  0:26 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Kyle J. McKay, git, Michael Haggerty, Andreas Schwab

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Michael Haggerty recommended that I call the structure element sha1
> instead of oid in case we want to turn this into a union if we decide to
> go the additional hash route.

I'd advise against it.

As I wrote in $gmane/265337 in response to Michael:

    > 4. We continue to support working with SHA-1s declared to be (unsigned
    > char *) in some performance-critical code, even as we migrate most other
    > code to using SHA-1s embedded within a (struct object_id). This will
    > cost some duplication of code. To accept this approach, we would need an
    > idea of *how much* code duplication would be needed. E.g., how many
    > functions will need both (unsigned char *) versions and (struct
    > object_id *) versions?

    ...

    I do not know what kind of code duplication you are worried about,
    though.  If a callee needs "unsigned char *", the caller that has a
    "struct object_id *o" should pass o->hash to the callee.

And that would break the abstraction effort if you start calling the
field with a name that is specific to the underlying hash function.
The caller has to change o->sha1 to o->sha256 instead of keeping
that as o->oid and letting the callee handle the implementation
details when calling

        if (!hashcmp(o1->oid, o2->oid))
                ; /* they are the same */
        else
                ; /* they are different */

The only folks that need to _know_ what hash function is used or
how long the field is are the ones that have raw bytes of the hash
obtained from files (e.g. from the index) and they would do
something like this to implement a function that checks the file
records an object name that is expected by the caller:

        void check_oid(int fd, struct object_id *expected)
        {
                unsigned char object_name[GIT_HASH_RAWSZ];

                ...
                read(fd, object_name, GIT_HASH_RAWSZ);
                if (hashcmp(object_name, expected->oid))
                        die("object name mismatch???");
        }

or when they do know they are using a specific function, do:

        void compute_object_name(struct object_id *id,
                                unsignd char*data,
                                size_t len)
        {
                git_SHA_CTX c;
                unsigned char *sha1 = &(id->oid);

                /* if we are paranoid... */
                assert(sizeof(id->oid) >= 20);

                ...
                git_SHA1_Init(&c);
                git_SHA1_Update(&c, data, len);
                ...
                git_SHA1_Final(sha1, &c);
        }

Even the latter would not be helped if the field to store the hash
were named id->sha1 very much, I would think.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-12  0:26       ` Junio C Hamano
@ 2015-03-12  9:34         ` brian m. carlson
  2015-03-12 10:28         ` Michael Haggerty
  1 sibling, 0 replies; 37+ messages in thread
From: brian m. carlson @ 2015-03-12  9:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kyle J. McKay, git, Michael Haggerty, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 849 bytes --]

On Wed, Mar 11, 2015 at 05:26:56PM -0700, Junio C Hamano wrote:
>"brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>> Michael Haggerty recommended that I call the structure element sha1
>> instead of oid in case we want to turn this into a union if we decide to
>> go the additional hash route.
>
>I'd advise against it.

Yeah, after re-reading this, I think "hash" is the best choice for the
underlying element name.  That way, it's different from "oid", which
would be the instances of the struct itself, and it's sufficiently
different from unconverted places in the code, which don't typically use
that term.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-12  0:26       ` Junio C Hamano
  2015-03-12  9:34         ` brian m. carlson
@ 2015-03-12 10:28         ` Michael Haggerty
  2015-03-12 10:46           ` brian m. carlson
  1 sibling, 1 reply; 37+ messages in thread
From: Michael Haggerty @ 2015-03-12 10:28 UTC (permalink / raw)
  To: Junio C Hamano, brian m. carlson; +Cc: Kyle J. McKay, git, Andreas Schwab

On 03/12/2015 01:26 AM, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
>> Michael Haggerty recommended that I call the structure element sha1
>> instead of oid in case we want to turn this into a union if we decide to
>> go the additional hash route.
> 
> I'd advise against it.
> 
> As I wrote in $gmane/265337 in response to Michael:
> 
>     > 4. We continue to support working with SHA-1s declared to be (unsigned
>     > char *) in some performance-critical code, even as we migrate most other
>     > code to using SHA-1s embedded within a (struct object_id). This will
>     > cost some duplication of code. To accept this approach, we would need an
>     > idea of *how much* code duplication would be needed. E.g., how many
>     > functions will need both (unsigned char *) versions and (struct
>     > object_id *) versions?
> 
>     ...
> 
>     I do not know what kind of code duplication you are worried about,
>     though.  If a callee needs "unsigned char *", the caller that has a
>     "struct object_id *o" should pass o->hash to the callee.
> 
> And that would break the abstraction effort if you start calling the
> field with a name that is specific to the underlying hash function.
> The caller has to change o->sha1 to o->sha256 instead of keeping
> that as o->oid and letting the callee handle the implementation
> details when calling
> 
>         if (!hashcmp(o1->oid, o2->oid))
>                 ; /* they are the same */
>         else
>                 ; /* they are different */
> [...]

Hmm, I guess you imagine that we might sometimes pack SHA-1s, sometimes
SHA-256s (or whatever) in the "oid" field, which would be dimensioned
large enough for either one (with, say, SHA-1s padded with zeros).

I was imagining that this would evolve into a union (or maybe struct) of
different hash types, like

    struct object_id {
        unsigned char hash_type;
        union {
            unsigned char sha1[GIT_SHA1_RAWSZ];
            unsigned char sha256[GIT_SHA256_RAWSZ];
        } hash;
    };

BTW in either case, any hopes of mapping object_id objects directly on
top of buffer memory would disappear.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-12 10:28         ` Michael Haggerty
@ 2015-03-12 10:46           ` brian m. carlson
  2015-03-12 11:16             ` Duy Nguyen
  0 siblings, 1 reply; 37+ messages in thread
From: brian m. carlson @ 2015-03-12 10:46 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, Kyle J. McKay, git, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 2405 bytes --]

On Thu, Mar 12, 2015 at 11:28:10AM +0100, Michael Haggerty wrote:
>On 03/12/2015 01:26 AM, Junio C Hamano wrote:
>> And that would break the abstraction effort if you start calling the
>> field with a name that is specific to the underlying hash function.
>> The caller has to change o->sha1 to o->sha256 instead of keeping
>> that as o->oid and letting the callee handle the implementation
>> details when calling
>>
>>         if (!hashcmp(o1->oid, o2->oid))
>>                 ; /* they are the same */
>>         else
>>                 ; /* they are different */
>> [...]
>
>Hmm, I guess you imagine that we might sometimes pack SHA-1s, sometimes
>SHA-256s (or whatever) in the "oid" field, which would be dimensioned
>large enough for either one (with, say, SHA-1s padded with zeros).
>
>I was imagining that this would evolve into a union (or maybe struct) of
>different hash types, like
>
>    struct object_id {
>        unsigned char hash_type;
>        union {
>            unsigned char sha1[GIT_SHA1_RAWSZ];
>            unsigned char sha256[GIT_SHA256_RAWSZ];
>        } hash;
>    };
>
>BTW in either case, any hopes of mapping object_id objects directly on
>top of buffer memory would disappear.

What I think might be more beneficial is to make the hash function
specific to a particular repository, and therefore you could maintain
something like this:

  struct object_id {
      unsigned char hash[GIT_MAX_RAWSZ];
  };

and make hash_type (or hash_length) a global[0].  I don't think it's
very worthwhile to try to mix two different hash functions in the same
repository, so we could still map directly onto buffer memory if we
decide that's portable enough.  I expect the cases where we need to do
that will be relatively limited.

Regardless, it seems that this solution has the most support (including
Junio's) and it's more self-documenting than my current set of patches,
so I'm going to go with it for now.  It should be easy to change if the
consensus goes back the other way.

[0] I personally think globals are a bit gross, but they don't seem to
have the problems that they would if git were a shared library.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-12 10:46           ` brian m. carlson
@ 2015-03-12 11:16             ` Duy Nguyen
  2015-03-12 18:24               ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Duy Nguyen @ 2015-03-12 11:16 UTC (permalink / raw)
  To: brian m. carlson, Michael Haggerty, Junio C Hamano,
	Kyle J. McKay, Git Mailing List, Andreas Schwab

On Thu, Mar 12, 2015 at 5:46 PM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> On Thu, Mar 12, 2015 at 11:28:10AM +0100, Michael Haggerty wrote:
>>
>> On 03/12/2015 01:26 AM, Junio C Hamano wrote:
>>>
>>> And that would break the abstraction effort if you start calling the
>>> field with a name that is specific to the underlying hash function.
>>> The caller has to change o->sha1 to o->sha256 instead of keeping
>>> that as o->oid and letting the callee handle the implementation
>>> details when calling
>>>
>>>         if (!hashcmp(o1->oid, o2->oid))
>>>                 ; /* they are the same */
>>>         else
>>>                 ; /* they are different */
>>> [...]
>>
>>
>> Hmm, I guess you imagine that we might sometimes pack SHA-1s, sometimes
>> SHA-256s (or whatever) in the "oid" field, which would be dimensioned
>> large enough for either one (with, say, SHA-1s padded with zeros).
>>
>> I was imagining that this would evolve into a union (or maybe struct) of
>> different hash types, like
>>
>>    struct object_id {
>>        unsigned char hash_type;
>>        union {
>>            unsigned char sha1[GIT_SHA1_RAWSZ];
>>            unsigned char sha256[GIT_SHA256_RAWSZ];
>>        } hash;
>>    };
>>
>> BTW in either case, any hopes of mapping object_id objects directly on
>> top of buffer memory would disappear.
>
>
> What I think might be more beneficial is to make the hash function
> specific to a particular repository, and therefore you could maintain
> something like this:
>
>  struct object_id {
>      unsigned char hash[GIT_MAX_RAWSZ];
>  };
>
> and make hash_type (or hash_length) a global[0].  I don't think it's
> very worthwhile to try to mix two different hash functions in the same
> repository,

This may or may not fall into the "mix different hash functions"
category. In pack files version 4, trees are encoded to point to other
trees or blobs by a (pack, offset) tuple. It would be great if the new
object_id could support carrying this kind of object id around because
it could help reduce object lookup cost a lot. (pack, offset) can be
converted back to SHA-1 so no info is lost and hashcmp() can compare
(pack, tuple) against an SHA-1 just fine.

> so we could still map directly onto buffer memory if we
> decide that's portable enough.  I expect the cases where we need to do
> that will be relatively limited.
>
> Regardless, it seems that this solution has the most support (including
> Junio's) and it's more self-documenting than my current set of patches,
> so I'm going to go with it for now.  It should be easy to change if the
> consensus goes back the other way.
>
> [0] I personally think globals are a bit gross, but they don't seem to
> have the problems that they would if git were a shared library.
>
> --
> brian m. carlson / brian with sandals: Houston, Texas, US
> +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
> OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187



-- 
Duy

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-12 11:16             ` Duy Nguyen
@ 2015-03-12 18:24               ` Junio C Hamano
  2015-03-12 18:44                 ` Junio C Hamano
  2015-03-13  0:58                 ` Duy Nguyen
  0 siblings, 2 replies; 37+ messages in thread
From: Junio C Hamano @ 2015-03-12 18:24 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

Duy Nguyen <pclouds@gmail.com> writes:

> This may or may not fall into the "mix different hash functions"
> category. In pack files version 4, trees are encoded to point to other
> trees or blobs by a (pack, offset) tuple. It would be great if the new
> object_id could support carrying this kind of object id around because
> it could help reduce object lookup cost a lot. (pack, offset) can be
> converted back to SHA-1 so no info is lost and hashcmp() can compare
> (pack, tuple) against an SHA-1 just fine.

You mean "if it came in <pack, offset> format, convert it down to
<sha1> until the last second that it is needed (e.g. need to put
that in a tree object in order to compute the object name of the
containing tree object)"?

After converting an object name originally represented as <pack,
offset>, if we are doing the "union in struct" thing, to <sha1>
representation, you would have to look it up from .idx in order to
read the contents the usual way.  If that happens often enough, then
it may not be worth adding complexity to the code to carry the
<pack, offset> pair around.

Unless you fix that "union in struct" assumption, that is.

To me, <pack, offset> information smells to belong more to a "struct
object" (or its subclass) as an optional annotation---when a caller
is asked to parse_object(), you would bypass the sha1_read_file()
that goes and looks the object name up from the list of pack .idx
and instead go there straight using that annotation.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-12 18:24               ` Junio C Hamano
@ 2015-03-12 18:44                 ` Junio C Hamano
  2015-03-13  0:58                 ` Duy Nguyen
  1 sibling, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2015-03-12 18:44 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

Junio C Hamano <gitster@pobox.com> writes:

> Duy Nguyen <pclouds@gmail.com> writes:
>
>> This may or may not fall into the "mix different hash functions"
>> category. In pack files version 4, trees are encoded to point to other
>> trees or blobs by a (pack, offset) tuple. It would be great if the new
>> object_id could support carrying this kind of object id around because
>> it could help reduce object lookup cost a lot. (pack, offset) can be
>> converted back to SHA-1 so no info is lost and hashcmp() can compare
>> (pack, tuple) against an SHA-1 just fine.
>
> You mean "if it came in <pack, offset> format, convert it down to
> <sha1> until the last second that it is needed (e.g. need to put
> that in a tree object in order to compute the object name of the
> containing tree object)"?

Sorry, obviously I meant "do not convert until the last second".

> To me, <pack, offset> information smells to belong more to a "struct
> object" (or its subclass) as an optional annotation---when a caller
> is asked to parse_object(), you would bypass the sha1_read_file()
> that goes and looks the object name up from the list of pack .idx
> and instead go there straight using that annotation.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-12 18:24               ` Junio C Hamano
  2015-03-12 18:44                 ` Junio C Hamano
@ 2015-03-13  0:58                 ` Duy Nguyen
  2015-03-13  6:03                   ` Junio C Hamano
  1 sibling, 1 reply; 37+ messages in thread
From: Duy Nguyen @ 2015-03-13  0:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

On Fri, Mar 13, 2015 at 1:24 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> This may or may not fall into the "mix different hash functions"
>> category. In pack files version 4, trees are encoded to point to other
>> trees or blobs by a (pack, offset) tuple. It would be great if the new
>> object_id could support carrying this kind of object id around because
>> it could help reduce object lookup cost a lot. (pack, offset) can be
>> converted back to SHA-1 so no info is lost and hashcmp() can compare
>> (pack, tuple) against an SHA-1 just fine.
>
> You mean "if it came in <pack, offset> format, convert it down to
> <sha1> until the last second that it is needed (e.g. need to put
> that in a tree object in order to compute the object name of the
> containing tree object)"?

I picked my words poorly. It should be <pack, the index in pack>
instead of the _byte_ offset. This index ranges from 0 to the number
of objects minus one. This is essentially what pack v4 stores in
places where pack v2 would store SHA-1, which makes me make the
connection: both are different ways of identifying an object.

> After converting an object name originally represented as <pack,
> offset>, if we are doing the "union in struct" thing, to <sha1>
> representation, you would have to look it up from .idx in order to
> read the contents the usual way.  If that happens often enough, then
> it may not be worth adding complexity to the code to carry the
> <pack, offset> pair around.

I'd keep it in <pack, index> for as long as possible (or until higher
layer decides that it's not worth keeping in this representation
anymore). I can only see two use cases where actual SHA-1 is involved:
sorting by SHA-1 (rarely done, probably only in index-pack and
pack-objects) and output to the screen (or producing v2 packs).

> Unless you fix that "union in struct" assumption, that is.
>
> To me, <pack, offset> information smells to belong more to a "struct
> object" (or its subclass) as an optional annotation---when a caller
> is asked to parse_object(), you would bypass the sha1_read_file()
> that goes and looks the object name up from the list of pack .idx
> and instead go there straight using that annotation.

For pack v4, commits and trees can be encoded this way. parse_object()
could help the commit case (maybe, maybe not, I haven't looked at
commit walkers), not recursive tree walking where we now pass SHA-1
around to user callback and all. Carrying optional annotation around
would impact the code in many places.  Also, storing this info in
struct object seems conflict with v4 goal of reducing "struct object"
lookup cost.  Maybe I'm missing something here.

Then again, we don't know if pack v4 will get merged in the end (I do
hope it will). And we have an option of making specialized commit/tree
walkers that are aware of pack v4 and only use them in hot places to
reduce impact to the rest of the code base. If hash[GIT_MAX_RAWSZ]
looks like a good enough solution, we can go with that and worry about
pack v4 later when/if it comes.
-- 
Duy

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-13  0:58                 ` Duy Nguyen
@ 2015-03-13  6:03                   ` Junio C Hamano
  2015-03-14 11:49                     ` Duy Nguyen
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2015-03-13  6:03 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

Duy Nguyen <pclouds@gmail.com> writes:

>> You mean "if it came in <pack, offset> format, convert it down to
>> <sha1> until the last second that it is needed (e.g. need to put
>> that in a tree object in order to compute the object name of the
>> containing tree object)"?
>
> I picked my words poorly. It should be <pack, the index in pack>
> instead of the _byte_ offset.

Thanks for a clarification, but I do not think it affects the main
point of the discussion very much.  If we use "union in struct",
where we can store either an SHA-1 hash or some other identifying
information for the object, but not both, then at some point you
would need to convert <pack, nth> to <sha-1> in a codepath that
needs the sha-1 hash value (e.g. if the object A, that is known to
you as <pack, nth>, is placed in a tree object B, you need the
object name of A in <sha-1> representation in order to compute the
object name of the tree object B.  You can choose to keep it in
<pack, nth> form in "struct object_id { union }" and look up the
<sha-1> from the pack index every time, or you can choose to give
up the <pack, nth> form and upgrade the "struct object_id" to store
<sha-1> at that point.

If you keep both <pack, nth> *and* <sha-1> in "struct object_id" at
the same time, you can choose whichever is convenient, but it would
bloat everything in core.  Not just it bloats "struct object" and
its subclasses, the in-core index entries, which is what I meant
by ...

>> Unless you fix that "union in struct" assumption, that is.

... this.

>> To me, <pack, offset> information smells to belong more to a "struct
>> object" (or its subclass) as an optional annotation---when a caller
>> is asked to parse_object(), you would bypass the sha1_read_file()
>> that goes and looks the object name up from the list of pack .idx
>> and instead go there straight using that annotation.
>
> For pack v4, commits and trees can be encoded this way.

Even if your in-pack representation of a commit object allowed to
store the tree pointer in <pack, nth> format, its object name must
be computed as if you have the commit object in the traditional
format and computed the hash of that (together with the standard
"<type> <size>\0" header), and at that point, you need the contained
object's name in <sha-1> format (imagine how you would implement the
commit-tree command).  Hence, I do not think the v4 encoding changes
the discussion very much.  I see the primary value of v4 encoding is
to shorten the length of various fields take on-disk and in-pack.
If it were <pack, offset>, it could be argued that it would also be
faster to get to the packed data in the packfile, and going from
<pack, nth> to the .idx file and then going to the location in the
data in the packfile would be faster than going from <sha-1> to a
particular pack and its in-pack offset, with the difference of cost
around log(n)*m where n is the number of objects in a pack and m is
the total number of packs in the repository.

It is true that <nth> format (force that the referred-to object
lives in the same pack as the referrer) can help speed up
interpretation of extended SHA-1 expression, e.g. "v1.0^0:t", which
can read v1.0 tag in v4 format, find the <nth> info for the commit
pointed by the tag and get to that data in the pack, find the <nth>
info for the top-tree recorded in that commit and directly get to
the data of that tree, and then find the entry for "t", which will
give the object name for that subtree again in <nth> format, and at
that point you can find the <sha-1> of that final object, without
having to know any object names of the intermediate objects
(i.e. you must start from <sha-1> of the tag you obtain from the
refs API, but you didn't use the object name of the commit and its
top-level tree).  So for such a codepath, I would say it would be
sufficient to use a "union in struct" people have been envisioning
and convert <pack, nth> to <sha-1> when the latter form becomes
necessary for the first time for the object.

Anyway, wouldn't this be all academic?  I do not see how you would
keep the object name in the <pack, nth> format in-core, as the
obj_hash[] is a hashtable keyed by <sha-1>, and even when we switch
to a different hash, I cannot see how such a table to ensure the
singleton-ness of in-core objects can be keyed sometimes by <hash>
and by <pack, nth> in some other time.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/10] Use a structure for object IDs.
  2015-03-11 20:35   ` Junio C Hamano
@ 2015-03-13 22:45     ` brian m. carlson
  0 siblings, 0 replies; 37+ messages in thread
From: brian m. carlson @ 2015-03-13 22:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Michael Haggerty, git, Andreas Schwab, Kyle J. McKay,
	Nguyen Thai Ngoc Duy, Johannes Sixt, David Kastrup,
	James Denholm

[-- Attachment #1: Type: text/plain, Size: 2132 bytes --]

On Wed, Mar 11, 2015 at 01:35:06PM -0700, Junio C Hamano wrote:
>Michael Haggerty <mhagger@alum.mit.edu> writes:
>> 4. We continue to support working with SHA-1s declared to be (unsigned
>> char *) in some performance-critical code, even as we migrate most other
>> code to using SHA-1s embedded within a (struct object_id). This will
>> cost some duplication of code. To accept this approach, we would need an
>> idea of *how much* code duplication would be needed. E.g., how many
>> functions will need both (unsigned char *) versions and (struct
>> object_id *) versions?
>
>Ideally, only the ones that knows the underlying hash function is
>SHA-1 (i.e. anybody who mentions git_SHA_CTX), moves bytes from/to
>in-core object name field and raw range of bytes (e.g. sha1hash());
>everybody else like hashcpy() and hashcmp() should be able to do its
>thing only on structs and a generic-looking constant that denotes
>how many bytes there are in the hash (or even sizeof(struct oid)).
>
>I do not know what kind of code duplication you are worried about,
>though.  If a callee needs "unsigned char *", the caller that has a
>"struct object_id *o" should pass o->hash to the callee.

That's the plan.  My goal (which may or may not be achievable) is to
make hashcpy, hashcmp, and similar functions an implementation detail of
struct object_id functions.  If we have to use o->hash somewhere, okay.
I'm much more interested in practicality than theoretical purity.  I
want these changes to provide easier-to-change, more maintainable code,
not more complex and difficult code.

>And please do not suggest switching to C++; all it would do to
>overload these into a single name is to make the callers harder to
>read.

I'm not considering that.  I've attempted to compile git with g++
before as an idle curiosity, and I gave up almost immediately because it
looked like a bunch of work.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-13  6:03                   ` Junio C Hamano
@ 2015-03-14 11:49                     ` Duy Nguyen
  2015-03-14 22:19                       ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Duy Nguyen @ 2015-03-14 11:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

On Fri, Mar 13, 2015 at 1:03 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> To me, <pack, offset> information smells to belong more to a "struct
>>> object" (or its subclass) as an optional annotation---when a caller
>>> is asked to parse_object(), you would bypass the sha1_read_file()
>>> that goes and looks the object name up from the list of pack .idx
>>> and instead go there straight using that annotation.
>>
>> For pack v4, commits and trees can be encoded this way.
>
> Even if your in-pack representation of a commit object allowed to
> store the tree pointer in <pack, nth> format, its object name must
> be computed as if you have the commit object in the traditional
> format and computed the hash of that (together with the standard
> "<type> <size>\0" header), and at that point, you need the contained
> object's name in <sha-1> format (imagine how you would implement the
> commit-tree command).

True. But commit-tree is not often executed as, say, rev-list, where
the hash is used as 20-byte key to some pieces somewhere (pack data,
struct object *). If we keep <pack, nth> in object_id (the union
approach), v4-aware code has the chance to go on a fast path.

> Hence, I do not think the v4 encoding changes
> the discussion very much.  I see the primary value of v4 encoding is
> to shorten the length of various fields take on-disk and in-pack.
> If it were <pack, offset>, it could be argued that it would also be
> faster to get to the packed data in the packfile, and going from
> <pack, nth> to the .idx file and then going to the location in the
> data in the packfile would be faster than going from <sha-1> to a
> particular pack and its in-pack offset, with the difference of cost
> around log(n)*m where n is the number of objects in a pack and m is
> the total number of packs in the repository.
>
> It is true that <nth> format (force that the referred-to object
> lives in the same pack as the referrer) can help speed up
> interpretation of extended SHA-1 expression, e.g. "v1.0^0:t", which
> can read v1.0 tag in v4 format, find the <nth> info for the commit
> pointed by the tag and get to that data in the pack, find the <nth>
> info for the top-tree recorded in that commit and directly get to
> the data of that tree, and then find the entry for "t", which will
> give the object name for that subtree again in <nth> format, and at
> that point you can find the <sha-1> of that final object, without
> having to know any object names of the intermediate objects
> (i.e. you must start from <sha-1> of the tag you obtain from the
> refs API, but you didn't use the object name of the commit and its
> top-level tree).  So for such a codepath, I would say it would be
> sufficient to use a "union in struct" people have been envisioning
> and convert <pack, nth> to <sha-1> when the latter form becomes
> necessary for the first time for the object.

It's the point: faster access (to either pack data, or "struct object
*"). Using a separate union from  struct object_id (iow, object_id
only contains SHA-1) works. But imagine to independent code islands
can perform this type of fast access. In order to pass this union from
one island to another, we need to update pretty much every function
call between them. The change to using object_id has a similar impact.
If we go with "union in object_id", we may be able to avoid the second
mass change to make use of <pack, nth>.

> Anyway, wouldn't this be all academic?  I do not see how you would
> keep the object name in the <pack, nth> format in-core, as the
> obj_hash[] is a hashtable keyed by <sha-1>, and even when we switch
> to a different hash, I cannot see how such a table to ensure the
> singleton-ness of in-core objects can be keyed sometimes by <hash>
> and by <pack, nth> in some other time.

I'm implementing something to see how much we gain by avoiding object
lookup. The current approach is having "struct object ** obj" in
"struct packed_git", indexed by "nth". So when you have <pack, nth>
and pack->obj[nth] is valid, you'll get to "struct object *" without
hashing. If pack->obj[nth] is NULL, hashing and looking up are
required the first time using SHA-1, then the result is cached in
pack->obj[]. So no, it's not academic :)
-- 
Duy

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-14 11:49                     ` Duy Nguyen
@ 2015-03-14 22:19                       ` Junio C Hamano
  2015-03-15  0:17                         ` Duy Nguyen
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2015-03-14 22:19 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

Duy Nguyen <pclouds@gmail.com> writes:

>> Anyway, wouldn't this be all academic?  I do not see how you would
>> keep the object name in the <pack, nth> format in-core, as the
>> obj_hash[] is a hashtable keyed by <sha-1>, and even when we switch
>> to a different hash, I cannot see how such a table to ensure the
>> singleton-ness of in-core objects can be keyed sometimes by <hash>
>> and by <pack, nth> in some other time.
>
> I'm implementing something to see how much we gain by avoiding object
> lookup. The current approach is having "struct object ** obj" in
> "struct packed_git", indexed by "nth". So when you have <pack, nth>
> and pack->obj[nth] is valid, you'll get to "struct object *" without
> hashing.

But do you realize that the hashtable serves two purposes?  Grab the
object from its name is one thing, and the other one I am not seeing
how you will make it work with "sometimes <sha-1> sometimes <pack,nth>"
is to ensure that we will have only one in-core copy for the same object.
We even walk the hashtable when we want to drop the flag bits from
all in-core objects, so even if you instanciated an in-core object
without going through the object name layer, the hashtable needs to
have a pointer to such a pointer, no?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-14 22:19                       ` Junio C Hamano
@ 2015-03-15  0:17                         ` Duy Nguyen
  2015-03-15  2:32                           ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Duy Nguyen @ 2015-03-15  0:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

On Sun, Mar 15, 2015 at 5:19 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>>> Anyway, wouldn't this be all academic?  I do not see how you would
>>> keep the object name in the <pack, nth> format in-core, as the
>>> obj_hash[] is a hashtable keyed by <sha-1>, and even when we switch
>>> to a different hash, I cannot see how such a table to ensure the
>>> singleton-ness of in-core objects can be keyed sometimes by <hash>
>>> and by <pack, nth> in some other time.
>>
>> I'm implementing something to see how much we gain by avoiding object
>> lookup. The current approach is having "struct object ** obj" in
>> "struct packed_git", indexed by "nth". So when you have <pack, nth>
>> and pack->obj[nth] is valid, you'll get to "struct object *" without
>> hashing.
>
> But do you realize that the hashtable serves two purposes?  Grab the
> object from its name is one thing, and the other one I am not seeing
> how you will make it work with "sometimes <sha-1> sometimes <pack,nth>"
> is to ensure that we will have only one in-core copy for the same object.
> We even walk the hashtable when we want to drop the flag bits from
> all in-core objects, so even if you instanciated an in-core object
> without going through the object name layer, the hashtable needs to
> have a pointer to such a pointer, no?

Notice that the first time pack->obj[] is filled using
lookup_object(). So yes, the hash table has all the pointers that
pack->obj[] has. If somebody wants to remove an object out of hash
table, then all pack->obj[] are invalidated, but I don't think we ever
delete objects from hash table.
-- 
Duy

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-15  0:17                         ` Duy Nguyen
@ 2015-03-15  2:32                           ` Junio C Hamano
  0 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2015-03-15  2:32 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

Duy Nguyen <pclouds@gmail.com> writes:

> Notice that the first time pack->obj[] is filled using
> lookup_object(). So yes, the hash table has all the pointers that
> pack->obj[] has.

Are we talking about the same thing?

By "the hash table", I mean **obj_hash that is a hashtable that uses
<sha-1> form of identifier as the key.  So "the hash table has all
the pointers" sounds like all the objects you instantiate from a v4
packfile needs their object name known in the <sha-1> form anyway
when it gets instantiated (or more importantly, when you realize
there is another copy of it already in the **obj_hash hashtable, you
may have to free or refrain from creating it in the first place).

As long as we use **obj_hash hashtable as "all the objects this
process cares about in the world" and "struct object_id.hash" as the
key into it, I do not think <pack,nth> representation belongs to
that layer.

I do think <pack,nth> representation has its use as a short-cut
mechanism to reach an indirectly referenced object directly without
instantiating intermediate objects when dealing with an extended
SHA-1 expression (cf. "v1.0^0:t" example in a few messages
upthread).  I just think it does not belong to "struct object_id"
that are used to refer to in-core objects.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-13 23:39 brian m. carlson
@ 2015-03-13 23:39 ` brian m. carlson
  0 siblings, 0 replies; 37+ messages in thread
From: brian m. carlson @ 2015-03-13 23:39 UTC (permalink / raw)
  To: git
  Cc: Andreas Schwab, Kyle J. McKay, Nguyen Thai Ngoc Duy,
	Junio C Hamano, Johannes Sixt, David Kastrup, James Denholm

Many places throughout the code use "unsigned char [20]" to store object IDs
(SHA-1 values).  This leads to lots of hardcoded numbers throughout the
codebase.  It also leads to confusion about the purposes of a buffer.

Introduce a structure for object IDs.  This allows us to obtain the benefits
of compile-time checking for misuse.  The structure is expected to remain
the same size and have the same alignment requirements on all known
platforms, compared to the array of unsigned char, although this is not
required for correctness.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/cache.h b/cache.h
index 761c570..6582c35 100644
--- a/cache.h
+++ b/cache.h
@@ -43,6 +43,14 @@ int git_deflate_end_gently(git_zstream *);
 int git_deflate(git_zstream *, int flush);
 unsigned long git_deflate_bound(git_zstream *, unsigned long);
 
+/* The length in bytes and in hex digits of an object name (SHA-1 value). */
+#define GIT_SHA1_RAWSZ 20
+#define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
+
+struct object_id {
+	unsigned char hash[GIT_SHA1_RAWSZ];
+};
+
 #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
 #define DTYPE(de)	((de)->d_type)
 #else
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2015-03-15  2:35 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
2015-03-07 23:23 ` [PATCH v2 01/10] Define " brian m. carlson
     [not found]   ` <CEA07500-9F47-4B24-AD5D-1423A601A4DD@gmail.com>
2015-03-11 22:08     ` brian m. carlson
2015-03-12  0:26       ` Junio C Hamano
2015-03-12  9:34         ` brian m. carlson
2015-03-12 10:28         ` Michael Haggerty
2015-03-12 10:46           ` brian m. carlson
2015-03-12 11:16             ` Duy Nguyen
2015-03-12 18:24               ` Junio C Hamano
2015-03-12 18:44                 ` Junio C Hamano
2015-03-13  0:58                 ` Duy Nguyen
2015-03-13  6:03                   ` Junio C Hamano
2015-03-14 11:49                     ` Duy Nguyen
2015-03-14 22:19                       ` Junio C Hamano
2015-03-15  0:17                         ` Duy Nguyen
2015-03-15  2:32                           ` Junio C Hamano
2015-03-07 23:23 ` [PATCH v2 02/10] Define utility functions " brian m. carlson
2015-03-08  9:57   ` Duy Nguyen
2015-03-08 14:48     ` brian m. carlson
2015-03-11 12:44   ` Michael Haggerty
2015-03-07 23:23 ` [PATCH v2 03/10] bisect.c: convert leaf functions to use struct object_id brian m. carlson
2015-03-07 23:23 ` [PATCH v2 04/10] archive.c: convert " brian m. carlson
2015-03-11 14:20   ` Michael Haggerty
2015-03-11 22:12     ` brian m. carlson
2015-03-07 23:24 ` [PATCH v2 05/10] zip: use GIT_SHA1_HEXSZ for trailers brian m. carlson
2015-03-07 23:24 ` [PATCH v2 06/10] bulk-checkin.c: convert to use struct object_id brian m. carlson
2015-03-07 23:24 ` [PATCH v2 07/10] diff: convert struct combine_diff_path to object_id brian m. carlson
2015-03-07 23:24 ` [PATCH v2 08/10] commit: convert parts to struct object_id brian m. carlson
2015-03-11 14:46   ` Michael Haggerty
2015-03-07 23:24 ` [PATCH v2 09/10] patch-id: convert to use " brian m. carlson
2015-03-07 23:24 ` [PATCH v2 10/10] apply: convert threeway_stage to object_id brian m. carlson
2015-03-08  7:43 ` [PATCH v2 00/10] Use a structure for object IDs Junio C Hamano
2015-03-11  2:38 ` Kyle J. McKay
2015-03-11 16:08 ` Michael Haggerty
2015-03-11 20:35   ` Junio C Hamano
2015-03-13 22:45     ` brian m. carlson
2015-03-13 23:39 brian m. carlson
2015-03-13 23:39 ` [PATCH v2 01/10] Define " brian m. carlson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.