All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] Clean up notes-related code around `load_subtree()`
@ 2017-08-26  8:28 Michael Haggerty
  2017-08-26  8:28 ` [PATCH 01/12] notes: make GET_NIBBLE macro more robust Michael Haggerty
                   ` (13 more replies)
  0 siblings, 14 replies; 23+ messages in thread
From: Michael Haggerty @ 2017-08-26  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, git, Michael Haggerty

While putzing around in the notes code quite some time ago, I found
this comment:

    /*
     * Determine full path for this non-note entry:
     * The filename is already found in entry.path, but the
     * directory part of the path must be deduced from the subtree
     * containing this entry. We assume here that the overall notes
     * tree follows a strict byte-based progressive fanout
     * structure (i.e. using 2/38, 2/2/36, etc. fanouts, and not
     * e.g. 4/36 fanout). This means that if a non-note is found at
     * path "dead/beef", the following code will register it as
     * being found on "de/ad/beef".
     * On the other hand, if you use such non-obvious non-note
     * paths in the middle of a notes tree, you deserve what's
     * coming to you ;). Note that for non-notes that are not
     * SHA1-like at the top level, there will be no problems.
     *
     * To conclude, it is strongly advised to make sure non-notes
     * have at least one non-hex character in the top-level path
     * component.
     */

This was enough of a nerd snipe to get me to dig into the code.

It turns out that the comment is incorrect, but there was nevertheless
plenty that could be cleaned up in the area:

* Make macro `GIT_NIBBLE` safer by adding some parentheses
* Remove some dead code
* Fix some memory leaks
* Fix some obsolete and incorrect comments
* Reject "notes" that are not blobs

I hope the result is also easier to understand.

This branch is also available from my Git fork [1] as branch
`load-subtree-cleanup`.

Michael

[1] https://github.com/mhagger/git

Michael Haggerty (12):
  notes: make GET_NIBBLE macro more robust
  load_subtree(): remove unnecessary conditional
  load_subtree(): reduce the scope of some local variables
  load_subtree(): fix incorrect comment
  load_subtree(): separate logic for internal vs. terminal entries
  load_subtree(): check earlier whether an internal node is a tree entry
  load_subtree(): only consider blobs to be potential notes
  get_oid_hex_segment(): return 0 on success
  load_subtree(): combine some common code
  get_oid_hex_segment(): don't pad the rest of `oid`
  hex_to_bytes(): simpler replacement for `get_oid_hex_segment()`
  load_subtree(): declare some variables to be `size_t`

 notes.c | 136 +++++++++++++++++++++++++++++++---------------------------------
 1 file changed, 66 insertions(+), 70 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/12] notes: make GET_NIBBLE macro more robust
  2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
@ 2017-08-26  8:28 ` Michael Haggerty
  2017-08-26  8:28 ` [PATCH 02/12] load_subtree(): remove unnecessary conditional Michael Haggerty
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Michael Haggerty @ 2017-08-26  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, git, Michael Haggerty

Put parentheses around sha1. Otherwise it could fail for something
like

    GET_NIBBLE(n, (unsigned char *)data);

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/notes.c b/notes.c
index f090c88363..00630a9396 100644
--- a/notes.c
+++ b/notes.c
@@ -64,7 +64,7 @@ struct non_note {
 #define CLR_PTR_TYPE(ptr)       ((void *) ((uintptr_t) (ptr) & ~3))
 #define SET_PTR_TYPE(ptr, type) ((void *) ((uintptr_t) (ptr) | (type)))
 
-#define GET_NIBBLE(n, sha1) (((sha1[(n) >> 1]) >> ((~(n) & 0x01) << 2)) & 0x0f)
+#define GET_NIBBLE(n, sha1) ((((sha1)[(n) >> 1]) >> ((~(n) & 0x01) << 2)) & 0x0f)
 
 #define KEY_INDEX (GIT_SHA1_RAWSZ - 1)
 #define FANOUT_PATH_SEPARATORS ((GIT_SHA1_HEXSZ / 2) - 1)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/12] load_subtree(): remove unnecessary conditional
  2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
  2017-08-26  8:28 ` [PATCH 01/12] notes: make GET_NIBBLE macro more robust Michael Haggerty
@ 2017-08-26  8:28 ` Michael Haggerty
  2017-08-26 16:38   ` Junio C Hamano
  2017-08-26  8:28 ` [PATCH 03/12] load_subtree(): reduce the scope of some local variables Michael Haggerty
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Michael Haggerty @ 2017-08-26  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, git, Michael Haggerty

At this point in the code, len is *always* <= 20.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/notes.c b/notes.c
index 00630a9396..f7ce64ff48 100644
--- a/notes.c
+++ b/notes.c
@@ -446,25 +446,24 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 		 * If object SHA1 is incomplete (len < 20), and current
 		 * component consists of 2 hex chars, assume note subtree
 		 */
-		if (len <= GIT_SHA1_RAWSZ) {
-			type = PTR_TYPE_NOTE;
-			l = (struct leaf_node *)
-				xcalloc(1, sizeof(struct leaf_node));
-			oidcpy(&l->key_oid, &object_oid);
-			oidcpy(&l->val_oid, entry.oid);
-			if (len < GIT_SHA1_RAWSZ) {
-				if (!S_ISDIR(entry.mode) || path_len != 2)
-					goto handle_non_note; /* not subtree */
-				l->key_oid.hash[KEY_INDEX] = (unsigned char) len;
-				type = PTR_TYPE_SUBTREE;
-			}
-			if (note_tree_insert(t, node, n, l, type,
-					     combine_notes_concatenate))
-				die("Failed to load %s %s into notes tree "
-				    "from %s",
-				    type == PTR_TYPE_NOTE ? "note" : "subtree",
-				    oid_to_hex(&l->key_oid), t->ref);
+		type = PTR_TYPE_NOTE;
+		l = (struct leaf_node *)
+			xcalloc(1, sizeof(struct leaf_node));
+		oidcpy(&l->key_oid, &object_oid);
+		oidcpy(&l->val_oid, entry.oid);
+		if (len < GIT_SHA1_RAWSZ) {
+			if (!S_ISDIR(entry.mode) || path_len != 2)
+				goto handle_non_note; /* not subtree */
+			l->key_oid.hash[KEY_INDEX] = (unsigned char) len;
+			type = PTR_TYPE_SUBTREE;
 		}
+		if (note_tree_insert(t, node, n, l, type,
+				     combine_notes_concatenate))
+			die("Failed to load %s %s into notes tree "
+			    "from %s",
+			    type == PTR_TYPE_NOTE ? "note" : "subtree",
+			    oid_to_hex(&l->key_oid), t->ref);
+
 		continue;
 
 handle_non_note:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 03/12] load_subtree(): reduce the scope of some local variables
  2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
  2017-08-26  8:28 ` [PATCH 01/12] notes: make GET_NIBBLE macro more robust Michael Haggerty
  2017-08-26  8:28 ` [PATCH 02/12] load_subtree(): remove unnecessary conditional Michael Haggerty
@ 2017-08-26  8:28 ` Michael Haggerty
  2017-08-26  8:28 ` [PATCH 04/12] load_subtree(): fix incorrect comment Michael Haggerty
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Michael Haggerty @ 2017-08-26  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, git, Michael Haggerty

Declare the variables inside the loop, to make it more obvious that
their values are not carried across loop iterations.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/notes.c b/notes.c
index f7ce64ff48..fbed8c3013 100644
--- a/notes.c
+++ b/notes.c
@@ -421,9 +421,6 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 	void *buf;
 	struct tree_desc desc;
 	struct name_entry entry;
-	int len, path_len;
-	unsigned char type;
-	struct leaf_node *l;
 
 	buf = fill_tree_descriptor(&desc, &subtree->val_oid);
 	if (!buf)
@@ -434,7 +431,10 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 	assert(prefix_len * 2 >= n);
 	memcpy(object_oid.hash, subtree->key_oid.hash, prefix_len);
 	while (tree_entry(&desc, &entry)) {
-		path_len = strlen(entry.path);
+		unsigned char type;
+		struct leaf_node *l;
+		int len, path_len = strlen(entry.path);
+
 		len = get_oid_hex_segment(entry.path, path_len,
 				object_oid.hash + prefix_len, GIT_SHA1_RAWSZ - prefix_len);
 		if (len < 0)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/12] load_subtree(): fix incorrect comment
  2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
                   ` (2 preceding siblings ...)
  2017-08-26  8:28 ` [PATCH 03/12] load_subtree(): reduce the scope of some local variables Michael Haggerty
@ 2017-08-26  8:28 ` Michael Haggerty
  2017-08-26  8:28 ` [PATCH 05/12] load_subtree(): separate logic for internal vs. terminal entries Michael Haggerty
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Michael Haggerty @ 2017-08-26  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, git, Michael Haggerty

This comment was added in 851c2b3791 (Teach notes code to properly
preserve non-notes in the notes tree, 2010-02-13) when the
corresponding code was added. But I believe it was incorrect even
then. The condition `path_len != 2` a dozen lines up prevents a path
like "dead/beef" from being converted to "de/ad/beef", and indeed the
test added in commit 851c2b3 verifies that this case works correctly.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 24 +++++++-----------------
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/notes.c b/notes.c
index fbed8c3013..62ab3f4ce3 100644
--- a/notes.c
+++ b/notes.c
@@ -468,23 +468,13 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 
 handle_non_note:
 		/*
-		 * Determine full path for this non-note entry:
-		 * The filename is already found in entry.path, but the
-		 * directory part of the path must be deduced from the subtree
-		 * containing this entry. We assume here that the overall notes
-		 * tree follows a strict byte-based progressive fanout
-		 * structure (i.e. using 2/38, 2/2/36, etc. fanouts, and not
-		 * e.g. 4/36 fanout). This means that if a non-note is found at
-		 * path "dead/beef", the following code will register it as
-		 * being found on "de/ad/beef".
-		 * On the other hand, if you use such non-obvious non-note
-		 * paths in the middle of a notes tree, you deserve what's
-		 * coming to you ;). Note that for non-notes that are not
-		 * SHA1-like at the top level, there will be no problems.
-		 *
-		 * To conclude, it is strongly advised to make sure non-notes
-		 * have at least one non-hex character in the top-level path
-		 * component.
+		 * Determine full path for this non-note entry. The
+		 * filename is already found in entry.path, but the
+		 * directory part of the path must be deduced from the
+		 * subtree containing this entry based on our
+		 * knowledge that the overall notes tree follows a
+		 * strict byte-based progressive fanout structure
+		 * (i.e. using 2/38, 2/2/36, etc. fanouts).
 		 */
 		{
 			struct strbuf non_note_path = STRBUF_INIT;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/12] load_subtree(): separate logic for internal vs. terminal entries
  2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
                   ` (3 preceding siblings ...)
  2017-08-26  8:28 ` [PATCH 04/12] load_subtree(): fix incorrect comment Michael Haggerty
@ 2017-08-26  8:28 ` Michael Haggerty
  2017-08-26  8:28 ` [PATCH 06/12] load_subtree(): check earlier whether an internal node is a tree entry Michael Haggerty
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Michael Haggerty @ 2017-08-26  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, git, Michael Haggerty

There are only two legitimate notes path components:

* A hexadecimal string that fills the rest of the SHA-1

* A two-digit hexadecimal string that constitutes another internal
  node.

So handle those two cases at the top level, and reject others as
non-notes without trying to parse them. The logic separation also
simplifies upcoming changes.

This prevents us from leaking memory for a leaf_node in the case of
wrong-sized paths. There are still memory leaks in this code; they will
be fixed in upcoming commits.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 52 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/notes.c b/notes.c
index 62ab3f4ce3..768902055e 100644
--- a/notes.c
+++ b/notes.c
@@ -433,30 +433,40 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 	while (tree_entry(&desc, &entry)) {
 		unsigned char type;
 		struct leaf_node *l;
-		int len, path_len = strlen(entry.path);
+		int path_len = strlen(entry.path);
+
+		if (path_len == 2 * (GIT_SHA1_RAWSZ - prefix_len)) {
+			/* This is potentially the remainder of the SHA-1 */
+			if (get_oid_hex_segment(entry.path, path_len,
+						object_oid.hash + prefix_len,
+						GIT_SHA1_RAWSZ - prefix_len) < 0)
+				goto handle_non_note; /* entry.path is not a SHA1 */
+
+			type = PTR_TYPE_NOTE;
+			l = (struct leaf_node *)
+				xcalloc(1, sizeof(struct leaf_node));
+			oidcpy(&l->key_oid, &object_oid);
+			oidcpy(&l->val_oid, entry.oid);
+		} else if (path_len == 2) {
+			/* This is potentially an internal node */
+			if (get_oid_hex_segment(entry.path, 2,
+						object_oid.hash + prefix_len,
+						GIT_SHA1_RAWSZ - prefix_len) < 0)
+				goto handle_non_note; /* entry.path is not a SHA1 */
 
-		len = get_oid_hex_segment(entry.path, path_len,
-				object_oid.hash + prefix_len, GIT_SHA1_RAWSZ - prefix_len);
-		if (len < 0)
-			goto handle_non_note; /* entry.path is not a SHA1 */
-		len += prefix_len;
-
-		/*
-		 * If object SHA1 is complete (len == 20), assume note object
-		 * If object SHA1 is incomplete (len < 20), and current
-		 * component consists of 2 hex chars, assume note subtree
-		 */
-		type = PTR_TYPE_NOTE;
-		l = (struct leaf_node *)
-			xcalloc(1, sizeof(struct leaf_node));
-		oidcpy(&l->key_oid, &object_oid);
-		oidcpy(&l->val_oid, entry.oid);
-		if (len < GIT_SHA1_RAWSZ) {
-			if (!S_ISDIR(entry.mode) || path_len != 2)
-				goto handle_non_note; /* not subtree */
-			l->key_oid.hash[KEY_INDEX] = (unsigned char) len;
 			type = PTR_TYPE_SUBTREE;
+			l = (struct leaf_node *)
+				xcalloc(1, sizeof(struct leaf_node));
+			oidcpy(&l->key_oid, &object_oid);
+			oidcpy(&l->val_oid, entry.oid);
+			if (!S_ISDIR(entry.mode))
+				goto handle_non_note; /* not subtree */
+			l->key_oid.hash[KEY_INDEX] = (unsigned char) (prefix_len + 1);
+		} else {
+			/* This can't be part of a note */
+			goto handle_non_note;
 		}
+
 		if (note_tree_insert(t, node, n, l, type,
 				     combine_notes_concatenate))
 			die("Failed to load %s %s into notes tree "
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/12] load_subtree(): check earlier whether an internal node is a tree entry
  2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
                   ` (4 preceding siblings ...)
  2017-08-26  8:28 ` [PATCH 05/12] load_subtree(): separate logic for internal vs. terminal entries Michael Haggerty
@ 2017-08-26  8:28 ` Michael Haggerty
  2017-08-26  8:28 ` [PATCH 07/12] load_subtree(): only consider blobs to be potential notes Michael Haggerty
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Michael Haggerty @ 2017-08-26  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, git, Michael Haggerty

If an entry is not a tree entry, then it cannot possibly be an
internal node. But the old code checked this condition only after
allocating a leaf_node object and therefore leaked that memory.
Instead, check before even entering this branch of the code.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/notes.c b/notes.c
index 768902055e..ac69c5aa18 100644
--- a/notes.c
+++ b/notes.c
@@ -449,6 +449,11 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 			oidcpy(&l->val_oid, entry.oid);
 		} else if (path_len == 2) {
 			/* This is potentially an internal node */
+
+			if (!S_ISDIR(entry.mode))
+				/* internal nodes must be trees */
+				goto handle_non_note;
+
 			if (get_oid_hex_segment(entry.path, 2,
 						object_oid.hash + prefix_len,
 						GIT_SHA1_RAWSZ - prefix_len) < 0)
@@ -459,8 +464,6 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 				xcalloc(1, sizeof(struct leaf_node));
 			oidcpy(&l->key_oid, &object_oid);
 			oidcpy(&l->val_oid, entry.oid);
-			if (!S_ISDIR(entry.mode))
-				goto handle_non_note; /* not subtree */
 			l->key_oid.hash[KEY_INDEX] = (unsigned char) (prefix_len + 1);
 		} else {
 			/* This can't be part of a note */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 07/12] load_subtree(): only consider blobs to be potential notes
  2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
                   ` (5 preceding siblings ...)
  2017-08-26  8:28 ` [PATCH 06/12] load_subtree(): check earlier whether an internal node is a tree entry Michael Haggerty
@ 2017-08-26  8:28 ` Michael Haggerty
  2017-08-26  8:28 ` [PATCH 08/12] get_oid_hex_segment(): return 0 on success Michael Haggerty
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Michael Haggerty @ 2017-08-26  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, git, Michael Haggerty

The old code converted any entry whose path constituted a full SHA-1
as a leaf node, without regard for the type of the entry. But only
blobs can be notes. So treat entries whose paths *look like* notes
paths but that are not blobs as non-notes.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/notes.c b/notes.c
index ac69c5aa18..46ab15b83a 100644
--- a/notes.c
+++ b/notes.c
@@ -437,6 +437,11 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 
 		if (path_len == 2 * (GIT_SHA1_RAWSZ - prefix_len)) {
 			/* This is potentially the remainder of the SHA-1 */
+
+			if (!S_ISREG(entry.mode))
+				/* notes must be blobs */
+				goto handle_non_note;
+
 			if (get_oid_hex_segment(entry.path, path_len,
 						object_oid.hash + prefix_len,
 						GIT_SHA1_RAWSZ - prefix_len) < 0)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/12] get_oid_hex_segment(): return 0 on success
  2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
                   ` (6 preceding siblings ...)
  2017-08-26  8:28 ` [PATCH 07/12] load_subtree(): only consider blobs to be potential notes Michael Haggerty
@ 2017-08-26  8:28 ` Michael Haggerty
  2017-08-26  8:28 ` [PATCH 09/12] load_subtree(): combine some common code Michael Haggerty
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Michael Haggerty @ 2017-08-26  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, git, Michael Haggerty

Nobody cares about the return value of get_oid_hex_segment() except to
check whether it failed. So just return 0 on success.

And while we're updating its docstring, update it for some argument
renaming that happened a while ago.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/notes.c b/notes.c
index 46ab15b83a..6ce71bfedb 100644
--- a/notes.c
+++ b/notes.c
@@ -338,11 +338,10 @@ static void note_tree_free(struct int_node *tree)
  * Convert a partial SHA1 hex string to the corresponding partial SHA1 value.
  * - hex      - Partial SHA1 segment in ASCII hex format
  * - hex_len  - Length of above segment. Must be multiple of 2 between 0 and 40
- * - sha1     - Partial SHA1 value is written here
- * - sha1_len - Max #bytes to store in sha1, Must be >= hex_len / 2, and < 20
- * Returns -1 on error (invalid arguments or invalid SHA1 (not in hex format)).
- * Otherwise, returns number of bytes written to sha1 (i.e. hex_len / 2).
- * Pads sha1 with NULs up to sha1_len (not included in returned length).
+ * - oid      - Partial SHA1 value is written here
+ * - oid_len  - Max #bytes to store in sha1, Must be >= hex_len / 2, and < 20
+ * Return 0 on success or -1 on error (invalid arguments or input not
+ * in hex format). Pad oid with NULs up to oid_len.
  */
 static int get_oid_hex_segment(const char *hex, unsigned int hex_len,
 		unsigned char *oid, unsigned int oid_len)
@@ -359,7 +358,7 @@ static int get_oid_hex_segment(const char *hex, unsigned int hex_len,
 	}
 	for (; i < oid_len; i++)
 		*oid++ = 0;
-	return len;
+	return 0;
 }
 
 static int non_note_cmp(const struct non_note *a, const struct non_note *b)
@@ -444,7 +443,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 
 			if (get_oid_hex_segment(entry.path, path_len,
 						object_oid.hash + prefix_len,
-						GIT_SHA1_RAWSZ - prefix_len) < 0)
+						GIT_SHA1_RAWSZ - prefix_len))
 				goto handle_non_note; /* entry.path is not a SHA1 */
 
 			type = PTR_TYPE_NOTE;
@@ -461,7 +460,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 
 			if (get_oid_hex_segment(entry.path, 2,
 						object_oid.hash + prefix_len,
-						GIT_SHA1_RAWSZ - prefix_len) < 0)
+						GIT_SHA1_RAWSZ - prefix_len))
 				goto handle_non_note; /* entry.path is not a SHA1 */
 
 			type = PTR_TYPE_SUBTREE;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/12] load_subtree(): combine some common code
  2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
                   ` (7 preceding siblings ...)
  2017-08-26  8:28 ` [PATCH 08/12] get_oid_hex_segment(): return 0 on success Michael Haggerty
@ 2017-08-26  8:28 ` Michael Haggerty
  2017-08-26  8:28 ` [PATCH 10/12] get_oid_hex_segment(): don't pad the rest of `oid` Michael Haggerty
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Michael Haggerty @ 2017-08-26  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, git, Michael Haggerty

Write the length into `object_oid` (before copying) rather than
`l->key_oid` (after copying). Then combine some code from the two `if`
blocks.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/notes.c b/notes.c
index 6ce71bfedb..534fda007e 100644
--- a/notes.c
+++ b/notes.c
@@ -447,10 +447,6 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 				goto handle_non_note; /* entry.path is not a SHA1 */
 
 			type = PTR_TYPE_NOTE;
-			l = (struct leaf_node *)
-				xcalloc(1, sizeof(struct leaf_node));
-			oidcpy(&l->key_oid, &object_oid);
-			oidcpy(&l->val_oid, entry.oid);
 		} else if (path_len == 2) {
 			/* This is potentially an internal node */
 
@@ -463,17 +459,17 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 						GIT_SHA1_RAWSZ - prefix_len))
 				goto handle_non_note; /* entry.path is not a SHA1 */
 
+			object_oid.hash[KEY_INDEX] = (unsigned char) (prefix_len + 1);
+
 			type = PTR_TYPE_SUBTREE;
-			l = (struct leaf_node *)
-				xcalloc(1, sizeof(struct leaf_node));
-			oidcpy(&l->key_oid, &object_oid);
-			oidcpy(&l->val_oid, entry.oid);
-			l->key_oid.hash[KEY_INDEX] = (unsigned char) (prefix_len + 1);
 		} else {
 			/* This can't be part of a note */
 			goto handle_non_note;
 		}
 
+		l = xcalloc(1, sizeof(*l));
+		oidcpy(&l->key_oid, &object_oid);
+		oidcpy(&l->val_oid, entry.oid);
 		if (note_tree_insert(t, node, n, l, type,
 				     combine_notes_concatenate))
 			die("Failed to load %s %s into notes tree "
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 10/12] get_oid_hex_segment(): don't pad the rest of `oid`
  2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
                   ` (8 preceding siblings ...)
  2017-08-26  8:28 ` [PATCH 09/12] load_subtree(): combine some common code Michael Haggerty
@ 2017-08-26  8:28 ` Michael Haggerty
  2017-08-26  8:28 ` [PATCH 11/12] hex_to_bytes(): simpler replacement for `get_oid_hex_segment()` Michael Haggerty
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Michael Haggerty @ 2017-08-26  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, git, Michael Haggerty

Remove the feature of `get_oid_hex_segment()` that it pads the rest of
the `oid` argument with zeros. Instead, do this at the caller who
needs it.

This makes the functionality of this function more coherent and
removes the need for its `oid_len` argument.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/notes.c b/notes.c
index 534fda007e..ce9ba36179 100644
--- a/notes.c
+++ b/notes.c
@@ -339,15 +339,14 @@ static void note_tree_free(struct int_node *tree)
  * - hex      - Partial SHA1 segment in ASCII hex format
  * - hex_len  - Length of above segment. Must be multiple of 2 between 0 and 40
  * - oid      - Partial SHA1 value is written here
- * - oid_len  - Max #bytes to store in sha1, Must be >= hex_len / 2, and < 20
  * Return 0 on success or -1 on error (invalid arguments or input not
- * in hex format). Pad oid with NULs up to oid_len.
+ * in hex format).
  */
 static int get_oid_hex_segment(const char *hex, unsigned int hex_len,
-		unsigned char *oid, unsigned int oid_len)
+		unsigned char *oid)
 {
 	unsigned int i, len = hex_len >> 1;
-	if (hex_len % 2 != 0 || len > oid_len)
+	if (hex_len % 2 != 0)
 		return -1;
 	for (i = 0; i < len; i++) {
 		unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
@@ -356,8 +355,6 @@ static int get_oid_hex_segment(const char *hex, unsigned int hex_len,
 		*oid++ = val;
 		hex += 2;
 	}
-	for (; i < oid_len; i++)
-		*oid++ = 0;
 	return 0;
 }
 
@@ -442,24 +439,29 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 				goto handle_non_note;
 
 			if (get_oid_hex_segment(entry.path, path_len,
-						object_oid.hash + prefix_len,
-						GIT_SHA1_RAWSZ - prefix_len))
+						object_oid.hash + prefix_len))
 				goto handle_non_note; /* entry.path is not a SHA1 */
 
 			type = PTR_TYPE_NOTE;
 		} else if (path_len == 2) {
 			/* This is potentially an internal node */
+			size_t len = prefix_len;
 
 			if (!S_ISDIR(entry.mode))
 				/* internal nodes must be trees */
 				goto handle_non_note;
 
 			if (get_oid_hex_segment(entry.path, 2,
-						object_oid.hash + prefix_len,
-						GIT_SHA1_RAWSZ - prefix_len))
+						object_oid.hash + len++))
 				goto handle_non_note; /* entry.path is not a SHA1 */
 
-			object_oid.hash[KEY_INDEX] = (unsigned char) (prefix_len + 1);
+			/*
+			 * Pad the rest of the SHA-1 with zeros,
+			 * except for the last byte, where we write
+			 * the length:
+			 */
+			memset(object_oid.hash + len, 0, GIT_SHA1_RAWSZ - len - 1);
+			object_oid.hash[KEY_INDEX] = (unsigned char)len;
 
 			type = PTR_TYPE_SUBTREE;
 		} else {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 11/12] hex_to_bytes(): simpler replacement for `get_oid_hex_segment()`
  2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
                   ` (9 preceding siblings ...)
  2017-08-26  8:28 ` [PATCH 10/12] get_oid_hex_segment(): don't pad the rest of `oid` Michael Haggerty
@ 2017-08-26  8:28 ` Michael Haggerty
  2017-08-26  8:28 ` [PATCH 12/12] load_subtree(): declare some variables to be `size_t` Michael Haggerty
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Michael Haggerty @ 2017-08-26  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, git, Michael Haggerty

Now that `get_oid_hex_segment()` does less, it makes sense to rename
it and simplify its semantics:

* Instead of a `hex_len` parameter, which was the number of hex
  characters (and had to be even), use a `len` parameter, which is the
  number of resulting bytes. This removes then need for the check that
  `hex_len` is even and to divide it by two to determine the number of
  bytes. For good hygiene, declare the `len` parameter to be `size_t`
  instead of `unsigned int`.

* Change the order of the arguments to the more traditional (dst,
  src, len).

* Rename the function to `hex_to_bytes()`.

* Remove a loop variable: just count `len` down instead.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 28 ++++++++++------------------
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/notes.c b/notes.c
index ce9ba36179..d5409b55e3 100644
--- a/notes.c
+++ b/notes.c
@@ -335,25 +335,18 @@ static void note_tree_free(struct int_node *tree)
 }
 
 /*
- * Convert a partial SHA1 hex string to the corresponding partial SHA1 value.
- * - hex      - Partial SHA1 segment in ASCII hex format
- * - hex_len  - Length of above segment. Must be multiple of 2 between 0 and 40
- * - oid      - Partial SHA1 value is written here
- * Return 0 on success or -1 on error (invalid arguments or input not
- * in hex format).
+ * Read `len` pairs of hexadecimal digits from `hex` and write the
+ * values to `binary` as `len` bytes. Return 0 on success, or -1 if
+ * the input does not consist of hex digits).
  */
-static int get_oid_hex_segment(const char *hex, unsigned int hex_len,
-		unsigned char *oid)
+static int hex_to_bytes(unsigned char *binary, const char *hex, size_t len)
 {
-	unsigned int i, len = hex_len >> 1;
-	if (hex_len % 2 != 0)
-		return -1;
-	for (i = 0; i < len; i++) {
+	for (; len; len--, hex += 2) {
 		unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
+
 		if (val & ~0xff)
 			return -1;
-		*oid++ = val;
-		hex += 2;
+		*binary++ = val;
 	}
 	return 0;
 }
@@ -438,8 +431,8 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 				/* notes must be blobs */
 				goto handle_non_note;
 
-			if (get_oid_hex_segment(entry.path, path_len,
-						object_oid.hash + prefix_len))
+			if (hex_to_bytes(object_oid.hash + prefix_len, entry.path,
+					 GIT_SHA1_RAWSZ - prefix_len))
 				goto handle_non_note; /* entry.path is not a SHA1 */
 
 			type = PTR_TYPE_NOTE;
@@ -451,8 +444,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 				/* internal nodes must be trees */
 				goto handle_non_note;
 
-			if (get_oid_hex_segment(entry.path, 2,
-						object_oid.hash + len++))
+			if (hex_to_bytes(object_oid.hash + len++, entry.path, 1))
 				goto handle_non_note; /* entry.path is not a SHA1 */
 
 			/*
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 12/12] load_subtree(): declare some variables to be `size_t`
  2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
                   ` (10 preceding siblings ...)
  2017-08-26  8:28 ` [PATCH 11/12] hex_to_bytes(): simpler replacement for `get_oid_hex_segment()` Michael Haggerty
@ 2017-08-26  8:28 ` Michael Haggerty
  2017-08-26 23:36 ` [PATCH 00/12] Clean up notes-related code around `load_subtree()` Johan Herland
  2017-09-09 10:31 ` Jeff King
  13 siblings, 0 replies; 23+ messages in thread
From: Michael Haggerty @ 2017-08-26  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, git, Michael Haggerty

* `prefix_len`
* `path_len`
* `i`

It's good hygiene.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/notes.c b/notes.c
index d5409b55e3..7f5bfa19c7 100644
--- a/notes.c
+++ b/notes.c
@@ -406,7 +406,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 		struct int_node *node, unsigned int n)
 {
 	struct object_id object_oid;
-	unsigned int prefix_len;
+	size_t prefix_len;
 	void *buf;
 	struct tree_desc desc;
 	struct name_entry entry;
@@ -422,7 +422,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 	while (tree_entry(&desc, &entry)) {
 		unsigned char type;
 		struct leaf_node *l;
-		int path_len = strlen(entry.path);
+		size_t path_len = strlen(entry.path);
 
 		if (path_len == 2 * (GIT_SHA1_RAWSZ - prefix_len)) {
 			/* This is potentially the remainder of the SHA-1 */
@@ -486,7 +486,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 		{
 			struct strbuf non_note_path = STRBUF_INIT;
 			const char *q = oid_to_hex(&subtree->key_oid);
-			int i;
+			size_t i;
 			for (i = 0; i < prefix_len; i++) {
 				strbuf_addch(&non_note_path, *q++);
 				strbuf_addch(&non_note_path, *q++);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 02/12] load_subtree(): remove unnecessary conditional
  2017-08-26  8:28 ` [PATCH 02/12] load_subtree(): remove unnecessary conditional Michael Haggerty
@ 2017-08-26 16:38   ` Junio C Hamano
  2017-08-27  6:37     ` Michael Haggerty
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2017-08-26 16:38 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Johan Herland, Johannes Schindelin, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> At this point in the code, len is *always* <= 20.

This is the kind of log message that makes me unconfortable, as it
lacks "because", and the readers would need to find out themselves
by following the same codepath the patch author already followed.

There is an assert earlier before the control gets in this loop

	prefix_len = subtree->key_oid.hash[KEY_INDEX];
	assert(prefix_len * 2 >= n);
	memcpy(object_oid.hash, subtree->key_oid.hash, prefix_len);

that tries to ensure there is sufficient number of prefix defined in
that key, and the codeflow may ensure that prefix_len is both an
even number and shorter than 20 (the correctness of the code depends
on these, it seems, and if for some reason prefix_len is much
larger, calls to get_oid_hex_segment() will overflow the oid.hash[]
array without checking).  I'd at least feel safer to have an assert
next to the existing one that catches a bug to throw a randomly
large value into subtree->key_oid.hash[KEY_INDEX].  Then we can
safely say "at this point in the code, len is always <= 20", as that
assert will makes it obvious without looking at anything other than
this code and get_oid_hex_segment() implementaiton (combined with
the fact that this function is the only one that coerces len and
puts it into ->key_oid.hash[KEY_INDEX], but that is a weak assurance
as we cannot tell where "subtree" came from---it may have full
20-byte oid in its key_oid field---without following the callchain a
lot more widely).

> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> ---
>  notes.c | 35 +++++++++++++++++------------------
>  1 file changed, 17 insertions(+), 18 deletions(-)
>
> diff --git a/notes.c b/notes.c
> index 00630a9396..f7ce64ff48 100644
> --- a/notes.c
> +++ b/notes.c
> @@ -446,25 +446,24 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
>  		 * If object SHA1 is incomplete (len < 20), and current
>  		 * component consists of 2 hex chars, assume note subtree
>  		 */
> -		if (len <= GIT_SHA1_RAWSZ) {
> -			type = PTR_TYPE_NOTE;
> -			l = (struct leaf_node *)
> -				xcalloc(1, sizeof(struct leaf_node));
> -			oidcpy(&l->key_oid, &object_oid);
> -			oidcpy(&l->val_oid, entry.oid);
> -			if (len < GIT_SHA1_RAWSZ) {
> -				if (!S_ISDIR(entry.mode) || path_len != 2)
> -					goto handle_non_note; /* not subtree */
> -				l->key_oid.hash[KEY_INDEX] = (unsigned char) len;
> -				type = PTR_TYPE_SUBTREE;
> -			}
> -			if (note_tree_insert(t, node, n, l, type,
> -					     combine_notes_concatenate))
> -				die("Failed to load %s %s into notes tree "
> -				    "from %s",
> -				    type == PTR_TYPE_NOTE ? "note" : "subtree",
> -				    oid_to_hex(&l->key_oid), t->ref);
> +		type = PTR_TYPE_NOTE;
> +		l = (struct leaf_node *)
> +			xcalloc(1, sizeof(struct leaf_node));
> +		oidcpy(&l->key_oid, &object_oid);
> +		oidcpy(&l->val_oid, entry.oid);
> +		if (len < GIT_SHA1_RAWSZ) {
> +			if (!S_ISDIR(entry.mode) || path_len != 2)
> +				goto handle_non_note; /* not subtree */
> +			l->key_oid.hash[KEY_INDEX] = (unsigned char) len;
> +			type = PTR_TYPE_SUBTREE;
>  		}
> +		if (note_tree_insert(t, node, n, l, type,
> +				     combine_notes_concatenate))
> +			die("Failed to load %s %s into notes tree "
> +			    "from %s",
> +			    type == PTR_TYPE_NOTE ? "note" : "subtree",
> +			    oid_to_hex(&l->key_oid), t->ref);
> +
>  		continue;
>  
>  handle_non_note:

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/12] Clean up notes-related code around `load_subtree()`
  2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
                   ` (11 preceding siblings ...)
  2017-08-26  8:28 ` [PATCH 12/12] load_subtree(): declare some variables to be `size_t` Michael Haggerty
@ 2017-08-26 23:36 ` Johan Herland
  2017-09-09 10:31 ` Jeff King
  13 siblings, 0 replies; 23+ messages in thread
From: Johan Herland @ 2017-08-26 23:36 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, Johannes Schindelin, Git mailing list

On Sat, Aug 26, 2017 at 10:28 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
[...]
> plenty that could be cleaned up in the area:
>
> * Make macro `GIT_NIBBLE` safer by adding some parentheses
> * Remove some dead code
> * Fix some memory leaks
> * Fix some obsolete and incorrect comments
> * Reject "notes" that are not blobs
>
> I hope the result is also easier to understand.

I looked through the series, and the patches look good to me, although
I do agree with Junio's comments on #2.

Thanks for a long-overdue cleanup in one of the hairier parts of
the notes code. The end result reads a lot better IMHO.

...Johan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 02/12] load_subtree(): remove unnecessary conditional
  2017-08-26 16:38   ` Junio C Hamano
@ 2017-08-27  6:37     ` Michael Haggerty
  2017-08-28  6:55       ` Michael Haggerty
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Haggerty @ 2017-08-27  6:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, Git Mailing List

On Sat, Aug 26, 2017 at 6:38 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
>
>> At this point in the code, len is *always* <= 20.
>
> This is the kind of log message that makes me unconfortable, as it
> lacks "because", and the readers would need to find out themselves
> by following the same codepath the patch author already followed.
> [...]

That's a valid complaint. I've adjusted the patch series to add the
assertion and explain the reasoning better in the commit message. I've
pushed the revised series to GitHub, but I'll wait a couple of days to
see if there's more feedback before resubmitting.

Thanks,
Michael

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 02/12] load_subtree(): remove unnecessary conditional
  2017-08-27  6:37     ` Michael Haggerty
@ 2017-08-28  6:55       ` Michael Haggerty
  2017-09-01 21:53         ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Haggerty @ 2017-08-28  6:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Johannes Schindelin, Git Mailing List

Junio, I'm surprised that you have merged the `mh/notes-cleanup`
branch into `next` already. Was that intentional? Aside from the fact
that the topic has had very little cooking time, there's the issue of
the assertion that you asked for. I have implemented the assertion in
a new version of the branch that I pushed to GitHub but haven't yet
submitted to the list.

How would you like to proceed? Do you want me to submit a patch
(adding the assertion) that applies on top of this branch?

Michael

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 02/12] load_subtree(): remove unnecessary conditional
  2017-08-28  6:55       ` Michael Haggerty
@ 2017-09-01 21:53         ` Junio C Hamano
  0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2017-09-01 21:53 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Johan Herland, Johannes Schindelin, Git Mailing List

Michael Haggerty <mhagger@alum.mit.edu> writes:

> Junio, I'm surprised that you have merged the `mh/notes-cleanup`
> branch into `next` already. Was that intentional?

Yup, I was clearing the deck as much as possible before I go
offline, as there didn't seem to be any glaring problem that we do
not even want to see in the history of our codebase in the series,
and I thought it would be better to give wider exposure early, as
long as small improvements are done incrementally.

Thanks.

ps. I'll still be offline for a bit more, so please do not get
disappointed if your updates do not show up in my tree until I come
back.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/12] Clean up notes-related code around `load_subtree()`
  2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
                   ` (12 preceding siblings ...)
  2017-08-26 23:36 ` [PATCH 00/12] Clean up notes-related code around `load_subtree()` Johan Herland
@ 2017-09-09 10:31 ` Jeff King
  2017-09-10  4:45   ` Michael Haggerty
  13 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2017-09-09 10:31 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Johan Herland, Johannes Schindelin, git, Stefan Beller

On Sat, Aug 26, 2017 at 10:28:00AM +0200, Michael Haggerty wrote:

> It turns out that the comment is incorrect, but there was nevertheless
> plenty that could be cleaned up in the area:
> 
> * Make macro `GIT_NIBBLE` safer by adding some parentheses
> * Remove some dead code
> * Fix some memory leaks
> * Fix some obsolete and incorrect comments
> * Reject "notes" that are not blobs
> 
> I hope the result is also easier to understand.
> 
> This branch is also available from my Git fork [1] as branch
> `load-subtree-cleanup`.

FYI, Coverity seems to complain about "pu" after this series is merged, but
I think it's wrong.  It says:

  *** CID 1417630:  Memory - illegal accesses  (OVERRUN)
  /notes.c: 458 in load_subtree()
  452     
  453     			/*
  454     			 * Pad the rest of the SHA-1 with zeros,
  455     			 * except for the last byte, where we write
  456     			 * the length:
  457     			 */
  >>>     CID 1417630:  Memory - illegal accesses  (OVERRUN)
  >>>     Overrunning array of 20 bytes at byte offset 20 by dereferencing pointer "&object_oid.hash[len]".
  458     			memset(object_oid.hash + len, 0, GIT_SHA1_RAWSZ - len - 1);
  459     			object_oid.hash[KEY_INDEX] = (unsigned char)len;
  460     
  461     			type = PTR_TYPE_SUBTREE;
  462     		} else {
  463     			/* This can't be part of a note */

I agree that if "len" were 20 here that would be a problem, but I don't
think that's possible.

The tool correctly claims that prefix_len can be up to 19, due to the
assert:

     3. cond_at_most: Checking prefix_len >= 20UL implies that prefix_len may be up to 19 on the false branch.
  420        if (prefix_len >= GIT_SHA1_RAWSZ)
  421                BUG("prefix_len (%"PRIuMAX") is out of range", (uintmax_t)prefix_len);

Then it claims:

    13. Condition path_len == 2 * (20 - prefix_len), taking false branch.
  430                if (path_len == 2 * (GIT_SHA1_RAWSZ - prefix_len)) {
  431                        /* This is potentially the remainder of the SHA-1 */

So we know that either prefix_len is not 19, or that path_len is not 2
(since that combination would cause us to take the true branch here).
But then it goes on to say:

    14. Condition path_len == 2, taking true branch.
  442                } else if (path_len == 2) {
  443                        /* This is potentially an internal node */

which I believe must mean that prefix_len cannot be 19 here. And yet it
says:

    15. assignment: Assigning: len = prefix_len. The value of len may now be up to 19.
  444                        size_t len = prefix_len;
  445
  [...]
     17. incr: Incrementing len. The value of len may now be up to 20.
     18. Condition hex_to_bytes(&object_oid.hash[len++], entry.path, 1), taking false branch.
  450                        if (hex_to_bytes(object_oid.hash + len++, entry.path, 1))
  451                                goto handle_non_note; /* entry.path is not a SHA1 */

I think that's impossible, and Coverity simply isn't smart enough to
shrink the set of possible values for prefix_len based on the set of
if-else conditions.

So nothing to see here, but since I spent 20 minutes scratching my head
(and I know others look at Coverity output and may scratch their heads
too), I thought it was worth writing up. And also if I'm wrong, it would
be good to know. ;)

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/12] Clean up notes-related code around `load_subtree()`
  2017-09-09 10:31 ` Jeff King
@ 2017-09-10  4:45   ` Michael Haggerty
  2017-09-10  7:39     ` Jeff King
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Haggerty @ 2017-09-10  4:45 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Johan Herland, Johannes Schindelin, git, Stefan Beller

On 09/09/2017 12:31 PM, Jeff King wrote:
> On Sat, Aug 26, 2017 at 10:28:00AM +0200, Michael Haggerty wrote:
> 
>> It turns out that the comment is incorrect, but there was nevertheless
>> plenty that could be cleaned up in the area:
>>
>> * Make macro `GIT_NIBBLE` safer by adding some parentheses
>> * Remove some dead code
>> * Fix some memory leaks
>> * Fix some obsolete and incorrect comments
>> * Reject "notes" that are not blobs
>>
>> I hope the result is also easier to understand.
>>
>> This branch is also available from my Git fork [1] as branch
>> `load-subtree-cleanup`.
> 
> FYI, Coverity seems to complain about "pu" after this series is merged, but
> I think it's wrong.  It says:
> 
>   *** CID 1417630:  Memory - illegal accesses  (OVERRUN)
>   /notes.c: 458 in load_subtree()
>   452     
>   453     			/*
>   454     			 * Pad the rest of the SHA-1 with zeros,
>   455     			 * except for the last byte, where we write
>   456     			 * the length:
>   457     			 */
>   >>>     CID 1417630:  Memory - illegal accesses  (OVERRUN)
>   >>>     Overrunning array of 20 bytes at byte offset 20 by dereferencing pointer "&object_oid.hash[len]".
>   458     			memset(object_oid.hash + len, 0, GIT_SHA1_RAWSZ - len - 1);
>   459     			object_oid.hash[KEY_INDEX] = (unsigned char)len;
>   460     
>   461     			type = PTR_TYPE_SUBTREE;
>   462     		} else {
>   463     			/* This can't be part of a note */
> 
> I agree that if "len" were 20 here that would be a problem, but I don't
> think that's possible.
> 
> The tool correctly claims that prefix_len can be up to 19, due to the
> assert:
> 
>      3. cond_at_most: Checking prefix_len >= 20UL implies that prefix_len may be up to 19 on the false branch.
>   420        if (prefix_len >= GIT_SHA1_RAWSZ)
>   421                BUG("prefix_len (%"PRIuMAX") is out of range", (uintmax_t)prefix_len);
> 
> Then it claims:
> 
>     13. Condition path_len == 2 * (20 - prefix_len), taking false branch.
>   430                if (path_len == 2 * (GIT_SHA1_RAWSZ - prefix_len)) {
>   431                        /* This is potentially the remainder of the SHA-1 */
> 
> So we know that either prefix_len is not 19, or that path_len is not 2
> (since that combination would cause us to take the true branch here).
> But then it goes on to say:
> 
>     14. Condition path_len == 2, taking true branch.
>   442                } else if (path_len == 2) {
>   443                        /* This is potentially an internal node */
> 
> which I believe must mean that prefix_len cannot be 19 here. And yet it
> says:
> 
>     15. assignment: Assigning: len = prefix_len. The value of len may now be up to 19.
>   444                        size_t len = prefix_len;
>   445
>   [...]
>      17. incr: Incrementing len. The value of len may now be up to 20.
>      18. Condition hex_to_bytes(&object_oid.hash[len++], entry.path, 1), taking false branch.
>   450                        if (hex_to_bytes(object_oid.hash + len++, entry.path, 1))
>   451                                goto handle_non_note; /* entry.path is not a SHA1 */
> 
> I think that's impossible, and Coverity simply isn't smart enough to
> shrink the set of possible values for prefix_len based on the set of
> if-else conditions.
> 
> So nothing to see here, but since I spent 20 minutes scratching my head
> (and I know others look at Coverity output and may scratch their heads
> too), I thought it was worth writing up. And also if I'm wrong, it would
> be good to know. ;)

Thanks for looking into this. I agree with your analysis.

I wonder whether it is the factor of two between path lengths and byte
lengths that is confusing Coverity. Perhaps the patch below would help.
It requires an extra, superfluous, check, but perhaps makes the code a
tad more readable. I'm neutral on whether we would want to make the change.

Is there a way to ask Coverity whether a hypothetical change would
remove the warning, short of merging the change to master?

Michael

diff --git a/notes.c b/notes.c
index 27d232f294..34f623f7b1 100644
--- a/notes.c
+++ b/notes.c
@@ -426,8 +426,14 @@ static void load_subtree(struct notes_tree *t,
struct leaf_node *subtree,
 		unsigned char type;
 		struct leaf_node *l;
 		size_t path_len = strlen(entry.path);
+		size_t path_bytes;

-		if (path_len == 2 * (GIT_SHA1_RAWSZ - prefix_len)) {
+		if (path_len % 2 != 0)
+			/* Path chunks must come in pairs of hex characters */
+			goto handle_non_note;
+
+		path_bytes = path_len / 2;
+		if (path_bytes == GIT_SHA1_RAWSZ - prefix_len) {
 			/* This is potentially the remainder of the SHA-1 */

 			if (!S_ISREG(entry.mode))
@@ -439,7 +445,7 @@ static void load_subtree(struct notes_tree *t,
struct leaf_node *subtree,
 				goto handle_non_note; /* entry.path is not a SHA1 */

 			type = PTR_TYPE_NOTE;
-		} else if (path_len == 2) {
+		} else if (path_bytes == 1) {
 			/* This is potentially an internal node */
 			size_t len = prefix_len;


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/12] Clean up notes-related code around `load_subtree()`
  2017-09-10  4:45   ` Michael Haggerty
@ 2017-09-10  7:39     ` Jeff King
  2017-09-12  6:47       ` Michael Haggerty
  2017-09-12 11:55       ` Lars Schneider
  0 siblings, 2 replies; 23+ messages in thread
From: Jeff King @ 2017-09-10  7:39 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Johan Herland, Johannes Schindelin, git,
	Stefan Beller, Lars Schneider

On Sun, Sep 10, 2017 at 06:45:08AM +0200, Michael Haggerty wrote:

> > So nothing to see here, but since I spent 20 minutes scratching my head
> > (and I know others look at Coverity output and may scratch their heads
> > too), I thought it was worth writing up. And also if I'm wrong, it would
> > be good to know. ;)
> 
> Thanks for looking into this. I agree with your analysis.
> 
> I wonder whether it is the factor of two between path lengths and byte
> lengths that is confusing Coverity. Perhaps the patch below would help.
> It requires an extra, superfluous, check, but perhaps makes the code a
> tad more readable. I'm neutral on whether we would want to make the change.

Yeah, I do agree that it makes the code's assumptions a bit easier to
follow.

> Is there a way to ask Coverity whether a hypothetical change would
> remove the warning, short of merging the change to master?

You can download and run the build portion of the coverity tools
yourself. IIRC, that pushes the build up to their servers which then do
the analysis (you can make your own "project", or use the existing "git"
project -- I checked and you are already listed as an admin). I recall
it being a minor pain to get it set up, but not too bad.

Stefan runs it against "pu" on a regular basis, which is where the
emailed results come from. So just having Junio merge it to "pu" would
be enough to get results.

I noticed that they now have some GitHub/Travis integration:

  https://scan.coverity.com/github

I'm not sure if that is new, or if we just didn't notice it before. ;)
But that probably makes more sense to use than ad-hoc uploading (and
maybe it would make it easy for you to test personal branches, too).

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/12] Clean up notes-related code around `load_subtree()`
  2017-09-10  7:39     ` Jeff King
@ 2017-09-12  6:47       ` Michael Haggerty
  2017-09-12 11:55       ` Lars Schneider
  1 sibling, 0 replies; 23+ messages in thread
From: Michael Haggerty @ 2017-09-12  6:47 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Johan Herland, Johannes Schindelin,
	Git Mailing List, Stefan Beller, Lars Schneider

On Sun, Sep 10, 2017 at 9:39 AM, Jeff King <peff@peff.net> wrote:
> On Sun, Sep 10, 2017 at 06:45:08AM +0200, Michael Haggerty wrote:
>
>> > So nothing to see here, but since I spent 20 minutes scratching my head
>> > (and I know others look at Coverity output and may scratch their heads
>> > too), I thought it was worth writing up. And also if I'm wrong, it would
>> > be good to know. ;)
>>
>> Thanks for looking into this. I agree with your analysis.
>>
>> I wonder whether it is the factor of two between path lengths and byte
>> lengths that is confusing Coverity. Perhaps the patch below would help.
>> It requires an extra, superfluous, check, but perhaps makes the code a
>> tad more readable. I'm neutral on whether we would want to make the change.
>
> Yeah, I do agree that it makes the code's assumptions a bit easier to
> follow.
>
>> Is there a way to ask Coverity whether a hypothetical change would
>> remove the warning, short of merging the change to master?
>
> You can download and run the build portion of the coverity tools
> yourself. [...]

Thanks for the info.

My suggested tweak doesn't appease Coverity. Given that, I don't think
I'll bother adding it to the patch series.

Michael

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/12] Clean up notes-related code around `load_subtree()`
  2017-09-10  7:39     ` Jeff King
  2017-09-12  6:47       ` Michael Haggerty
@ 2017-09-12 11:55       ` Lars Schneider
  1 sibling, 0 replies; 23+ messages in thread
From: Lars Schneider @ 2017-09-12 11:55 UTC (permalink / raw)
  To: Jeff King
  Cc: Michael Haggerty, Junio C Hamano, Johan Herland,
	Johannes Schindelin, git, Stefan Beller


> On 10 Sep 2017, at 09:39, Jeff King <peff@peff.net> wrote:
> 
> On Sun, Sep 10, 2017 at 06:45:08AM +0200, Michael Haggerty wrote:
> 
>>> So nothing to see here, but since I spent 20 minutes scratching my head
>>> (and I know others look at Coverity output and may scratch their heads
>>> too), I thought it was worth writing up. And also if I'm wrong, it would
>>> be good to know. ;)
>> 
>> Thanks for looking into this. I agree with your analysis.
>> 
>> I wonder whether it is the factor of two between path lengths and byte
>> lengths that is confusing Coverity. Perhaps the patch below would help.
>> It requires an extra, superfluous, check, but perhaps makes the code a
>> tad more readable. I'm neutral on whether we would want to make the change.
> 
> Yeah, I do agree that it makes the code's assumptions a bit easier to
> follow.
> 
>> Is there a way to ask Coverity whether a hypothetical change would
>> remove the warning, short of merging the change to master?
> 
> You can download and run the build portion of the coverity tools
> yourself. IIRC, that pushes the build up to their servers which then do
> the analysis (you can make your own "project", or use the existing "git"
> project -- I checked and you are already listed as an admin). I recall
> it being a minor pain to get it set up, but not too bad.
> 
> Stefan runs it against "pu" on a regular basis, which is where the
> emailed results come from. So just having Junio merge it to "pu" would
> be enough to get results.
> 
> I noticed that they now have some GitHub/Travis integration:
> 
>  https://scan.coverity.com/github
> 
> I'm not sure if that is new, or if we just didn't notice it before. ;)
> But that probably makes more sense to use than ad-hoc uploading (and
> maybe it would make it easy for you to test personal branches, too).

Coverity scans Git already:
https://scan.coverity.com/projects/70

I requested access to this Coverity project to integrate into our TravisCI
build.

- Lars

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-09-12 11:55 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-26  8:28 [PATCH 00/12] Clean up notes-related code around `load_subtree()` Michael Haggerty
2017-08-26  8:28 ` [PATCH 01/12] notes: make GET_NIBBLE macro more robust Michael Haggerty
2017-08-26  8:28 ` [PATCH 02/12] load_subtree(): remove unnecessary conditional Michael Haggerty
2017-08-26 16:38   ` Junio C Hamano
2017-08-27  6:37     ` Michael Haggerty
2017-08-28  6:55       ` Michael Haggerty
2017-09-01 21:53         ` Junio C Hamano
2017-08-26  8:28 ` [PATCH 03/12] load_subtree(): reduce the scope of some local variables Michael Haggerty
2017-08-26  8:28 ` [PATCH 04/12] load_subtree(): fix incorrect comment Michael Haggerty
2017-08-26  8:28 ` [PATCH 05/12] load_subtree(): separate logic for internal vs. terminal entries Michael Haggerty
2017-08-26  8:28 ` [PATCH 06/12] load_subtree(): check earlier whether an internal node is a tree entry Michael Haggerty
2017-08-26  8:28 ` [PATCH 07/12] load_subtree(): only consider blobs to be potential notes Michael Haggerty
2017-08-26  8:28 ` [PATCH 08/12] get_oid_hex_segment(): return 0 on success Michael Haggerty
2017-08-26  8:28 ` [PATCH 09/12] load_subtree(): combine some common code Michael Haggerty
2017-08-26  8:28 ` [PATCH 10/12] get_oid_hex_segment(): don't pad the rest of `oid` Michael Haggerty
2017-08-26  8:28 ` [PATCH 11/12] hex_to_bytes(): simpler replacement for `get_oid_hex_segment()` Michael Haggerty
2017-08-26  8:28 ` [PATCH 12/12] load_subtree(): declare some variables to be `size_t` Michael Haggerty
2017-08-26 23:36 ` [PATCH 00/12] Clean up notes-related code around `load_subtree()` Johan Herland
2017-09-09 10:31 ` Jeff King
2017-09-10  4:45   ` Michael Haggerty
2017-09-10  7:39     ` Jeff King
2017-09-12  6:47       ` Michael Haggerty
2017-09-12 11:55       ` Lars Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.