Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/2] Pull objects of various types
@ 2005-06-22  0:33 Daniel Barkalow
  2005-06-22  0:35 ` [PATCH 1/2] Parse tags for absent objects Daniel Barkalow
  2005-06-22  0:35 ` [PATCH 2/2] Pull misc objects Daniel Barkalow
  0 siblings, 2 replies; 142+ messages in thread
From: Daniel Barkalow @ 2005-06-22  0:33 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds

This series handles pulling objects of various types, rather than just
commits. In order to support pulling tags (the interesting case), it is
necessary to support getting the struct object for the tagged object when
the tagged object isn't available.

 1: Support getting a valid struct object for the absent object tagged by
    a tag file.
 2: Support processing objects of unknown type in pull.c

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 1/2] Parse tags for absent objects
  2005-06-22  0:33 [PATCH 0/2] Pull objects of various types Daniel Barkalow
@ 2005-06-22  0:35 ` Daniel Barkalow
  2021-03-08 20:04   ` [PATCH 0/7] improve reporting of unexpected objects Ævar Arnfjörð Bjarmason
                     ` (7 more replies)
  2005-06-22  0:35 ` [PATCH 2/2] Pull misc objects Daniel Barkalow
  1 sibling, 8 replies; 142+ messages in thread
From: Daniel Barkalow @ 2005-06-22  0:35 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds

Handle parsing a tag for a non-present object. This adds a function to lookup
an object with lookup_* for * in a string, so that it can get the right storage
based on the "type" line in the tag.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>

Index: object.c
===================================================================
--- 5760644db3f9cb044c99aa0f9ce4fc8d4eb76da1/object.c  (mode:100644 sha1:5e8378857028afeb4d1cd91c0de26c8414e137de)
+++ d0df139324abdbf701ffcae26e43bcb0350c270e/object.c  (mode:100644 sha1:21f872ee163e9eeff1a52854791c5b96a2ec2ceb)
@@ -98,6 +98,22 @@
 	}
 }
 
+struct object *lookup_object_type(const unsigned char *sha1, const char *type)
+{
+	if (!strcmp(type, blob_type)) {
+		return &lookup_blob(sha1)->object;
+	} else if (!strcmp(type, tree_type)) {
+		return &lookup_tree(sha1)->object;
+	} else if (!strcmp(type, commit_type)) {
+		return &lookup_commit(sha1)->object;
+	} else if (!strcmp(type, tag_type)) {
+		return &lookup_tag(sha1)->object;
+	} else {
+		error("Unknown type %s", type);
+		return NULL;
+	}
+}
+
 struct object *parse_object(const unsigned char *sha1)
 {
 	unsigned long mapsize;
Index: object.h
===================================================================
--- 5760644db3f9cb044c99aa0f9ce4fc8d4eb76da1/object.h  (mode:100644 sha1:ca455d57117af5f15e83b791a336351b43af6716)
+++ d0df139324abdbf701ffcae26e43bcb0350c270e/object.h  (mode:100644 sha1:1bd59ac6fcf7798e02c2474e630c282f022eff10)
@@ -21,8 +21,12 @@
 extern int nr_objs;
 extern struct object **objs;
 
+/** Internal only **/
 struct object *lookup_object(const unsigned char *sha1);
 
+/** Returns the object, having looked it up as being the given type. **/
+struct object *lookup_object_type(const unsigned char *sha1, const char *type);
+
 void created_object(const unsigned char *sha1, struct object *obj);
 
 /** Returns the object, having parsed it to find out what it is. **/
Index: tag.c
===================================================================
--- 5760644db3f9cb044c99aa0f9ce4fc8d4eb76da1/tag.c  (mode:100644 sha1:4041af2572a4427a03bc8955137b7d2211f9d770)
+++ d0df139324abdbf701ffcae26e43bcb0350c270e/tag.c  (mode:100644 sha1:2b25fc0e1dc53234e38e8ed8fdc1cb99fa4fd84a)
@@ -28,6 +28,7 @@
 	int typelen, taglen;
 	unsigned char object[20];
 	const char *type_line, *tag_line, *sig_line;
+	char type[20];
 
         if (item->object.parsed)
                 return 0;
@@ -38,10 +39,6 @@
 	if (memcmp("object ", data, 7) || get_sha1_hex(data + 7, object))
 		return -1;
 
-	item->tagged = parse_object(object);
-	if (item->tagged)
-		add_ref(&item->object, item->tagged);
-
 	type_line = data + 48;
 	if (memcmp("\ntype ", type_line-1, 6))
 		return -1;
@@ -58,11 +55,17 @@
 	typelen = tag_line - type_line - strlen("type \n");
 	if (typelen >= 20)
 		return -1;
+	memcpy(type, type_line + 5, typelen);
+	type[typelen] = '\0';
 	taglen = sig_line - tag_line - strlen("tag \n");
 	item->tag = xmalloc(taglen + 1);
 	memcpy(item->tag, tag_line + 4, taglen);
 	item->tag[taglen] = '\0';
 
+	item->tagged = lookup_object_type(object, type);
+	if (item->tagged)
+		add_ref(&item->object, item->tagged);
+
 	return 0;
 }
 


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 2/2] Pull misc objects
  2005-06-22  0:33 [PATCH 0/2] Pull objects of various types Daniel Barkalow
  2005-06-22  0:35 ` [PATCH 1/2] Parse tags for absent objects Daniel Barkalow
@ 2005-06-22  0:35 ` Daniel Barkalow
  1 sibling, 0 replies; 142+ messages in thread
From: Daniel Barkalow @ 2005-06-22  0:35 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds

Make pull fetch whatever is specified, parse it to figure out what it is, and
then process it appropriately. This also supports getting tag objects, and
getting whatever they tag.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Index: pull.c
===================================================================
--- d0df139324abdbf701ffcae26e43bcb0350c270e/pull.c  (mode:100644 sha1:e70fc02f3bf5b6c626a138d6d76d819fab76f0c8)
+++ b6a510708036fe29a19c33472f5c0b746e2d26d7/pull.c  (mode:100644 sha1:91d9db6c7b1be84e7a5fe21c5194fbf22dadc8cb)
@@ -3,6 +3,8 @@
 #include "cache.h"
 #include "commit.h"
 #include "tree.h"
+#include "tag.h"
+#include "blob.h"
 #include "refs.h"
 
 const char *write_ref = NULL;
@@ -57,6 +59,8 @@
 	return status;
 }
 
+static int process_unknown(unsigned char *sha1);
+
 static int process_tree(unsigned char *sha1)
 {
 	struct tree *tree = lookup_tree(sha1);
@@ -115,6 +119,35 @@
 	return 0;
 }
 
+static int process_tag(unsigned char *sha1)
+{
+	struct tag *obj = lookup_tag(sha1);
+
+	if (parse_tag(obj))
+		return -1;
+	return process_unknown(obj->tagged->sha1);
+}
+
+static int process_unknown(unsigned char *sha1)
+{
+	struct object *obj;
+	if (make_sure_we_have_it("object", sha1))
+		return -1;
+	obj = parse_object(sha1);
+	if (!obj)
+		return error("Unable to parse object %s", sha1_to_hex(sha1));
+	if (obj->type == commit_type)
+		return process_commit(sha1);
+	if (obj->type == tree_type)
+		return process_tree(sha1);
+	if (obj->type == blob_type)
+		return 0;
+	if (obj->type == tag_type)
+		return process_tag(sha1);
+	return error("Unable to determine requirement of type %s for %s",
+		     obj->type, sha1_to_hex(sha1));
+}
+
 static int interpret_target(char *target, unsigned char *sha1)
 {
 	if (!get_sha1_hex(target, sha1))
@@ -142,7 +175,7 @@
 	if (interpret_target(target, sha1))
 		return error("Could not interpret %s as something to pull",
 			     target);
-	if (process_commit(sha1))
+	if (process_unknown(sha1))
 		return -1;
 	
 	if (write_ref) {


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 0/7] improve reporting of unexpected objects
  2005-06-22  0:35 ` [PATCH 1/2] Parse tags for absent objects Daniel Barkalow
@ 2021-03-08 20:04   ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
  2021-03-08 20:04   ` [PATCH 1/7] object.c: refactor type_from_string_gently() Ævar Arnfjörð Bjarmason
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-08 20:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

We have some errors that say "your OID %s is not a blob" or whatever,
now with 5/7 we'll say "object %s is a %s, not a %s" in most cases.

Then 7/7 fixes a regression which AFAICT is from 2005. So it's pretty
obscure. When you craft an invalid tag object saying an OID is a tag
instead of a commit, we'd buggily invert the two types when reporting
that error.

The solution to that in 7/7 may not be ideal, but is the best one I
could come up with, so feedback on that patch & this whole thing is
most welcome.

Ævar Arnfjörð Bjarmason (7):
  object.c: refactor type_from_string_gently()
  object.c: make type_from_string() return "enum object_type"
  oid_object_info(): return "enum object_type"
  tree.c: fix misindentation in parse_tree_gently()
  object.c: add a utility function for "expected type X, got Y"
  object tests: add test for unexpected objects in tags
  tag: don't misreport type of tagged objects in errors

 blob.c                                 |  16 +++-
 blob.h                                 |   3 +
 builtin/blame.c                        |   2 +-
 builtin/cat-file.c                     |   2 +-
 builtin/index-pack.c                   |  15 ++--
 builtin/mktree.c                       |   2 +-
 builtin/pack-objects.c                 |   4 +-
 builtin/replace.c                      |   2 +-
 builtin/tag.c                          |   2 +-
 builtin/unpack-objects.c               |   6 +-
 commit.c                               |  24 ++++--
 commit.h                               |   2 +
 fsck.c                                 |   4 +-
 object-file.c                          |  15 ++--
 object-name.c                          |  18 ++--
 object-store.h                         |   2 +-
 object.c                               |  62 +++++++++++---
 object.h                               |   9 +-
 packfile.c                             |   4 +-
 reachable.c                            |   5 +-
 t/t6102-rev-list-unexpected-objects.sh | 113 ++++++++++++++++++++++++-
 tag.c                                  |  14 ++-
 tag.h                                  |   2 +
 tree.c                                 |  27 ++++--
 tree.h                                 |   2 +
 25 files changed, 282 insertions(+), 75 deletions(-)

-- 
2.31.0.rc1.210.g0f8085a843c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 1/7] object.c: refactor type_from_string_gently()
  2005-06-22  0:35 ` [PATCH 1/2] Parse tags for absent objects Daniel Barkalow
  2021-03-08 20:04   ` [PATCH 0/7] improve reporting of unexpected objects Ævar Arnfjörð Bjarmason
@ 2021-03-08 20:04   ` Ævar Arnfjörð Bjarmason
  2021-03-08 20:52     ` Taylor Blau
  2021-03-09 10:46     ` Jeff King
  2021-03-08 20:04   ` [PATCH 2/7] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
                     ` (5 subsequent siblings)
  7 siblings, 2 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-08 20:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Get rid of the "gently" argument to type_from_string_gently() to make
it consistent with most other *_gently() functions.

This refactoring of adding a third parameter was done in
fe8e3b71805 (Refactor type_from_string() to allow continuing after
detecting an error, 2014-09-10) in preparation for its use in
fsck.c.

Since then no callers of this function have passed a "len < 0" as was
expected might happen, so we can simplify its invocation by knowing
that it's never called like that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c        |  2 +-
 object-file.c |  2 +-
 object.c      | 18 ++++++++++--------
 object.h      |  4 ++--
 4 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/fsck.c b/fsck.c
index e3030f3b358..6cc4f9ea892 100644
--- a/fsck.c
+++ b/fsck.c
@@ -957,7 +957,7 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
 		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TYPE, "invalid format - unexpected end after 'type' line");
 		goto done;
 	}
-	*tagged_type = type_from_string_gently(buffer, eol - buffer, 1);
+	*tagged_type = type_from_string_gently(buffer, eol - buffer);
 	if (*tagged_type < 0)
 		ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_TYPE, "invalid 'type' value");
 	if (ret)
diff --git a/object-file.c b/object-file.c
index 5bcfde84718..42bc579828d 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1314,7 +1314,7 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 		type_len++;
 	}
 
-	type = type_from_string_gently(type_buf, type_len, 1);
+	type = type_from_string_gently(type_buf, type_len);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
 	/*
diff --git a/object.c b/object.c
index 98017bed8ef..c7586e46727 100644
--- a/object.c
+++ b/object.c
@@ -35,22 +35,24 @@ const char *type_name(unsigned int type)
 	return object_type_strings[type];
 }
 
-int type_from_string_gently(const char *str, ssize_t len, int gentle)
+int type_from_string_gently(const char *str, ssize_t len)
 {
 	int i;
 
-	if (len < 0)
-		len = strlen(str);
-
 	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
 		if (!strncmp(str, object_type_strings[i], len) &&
 		    object_type_strings[i][len] == '\0')
 			return i;
+	return -1;
+}
 
-	if (gentle)
-		return -1;
-
-	die(_("invalid object type \"%s\""), str);
+int type_from_string(const char *str)
+{
+	size_t len = strlen(str);
+	int ret = type_from_string_gently(str, len);
+	if (ret < 0)
+		die(_("invalid object type \"%s\""), str);
+	return ret;
 }
 
 /*
diff --git a/object.h b/object.h
index 59daadce214..ffdc1298300 100644
--- a/object.h
+++ b/object.h
@@ -93,8 +93,8 @@ struct object {
 };
 
 const char *type_name(unsigned int type);
-int type_from_string_gently(const char *str, ssize_t, int gentle);
-#define type_from_string(str) type_from_string_gently(str, -1, 0)
+int type_from_string_gently(const char *str, ssize_t len);
+int type_from_string(const char *str);
 
 /*
  * Return the current number of buckets in the object hashmap.
-- 
2.31.0.rc1.210.g0f8085a843c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 2/7] object.c: make type_from_string() return "enum object_type"
  2005-06-22  0:35 ` [PATCH 1/2] Parse tags for absent objects Daniel Barkalow
  2021-03-08 20:04   ` [PATCH 0/7] improve reporting of unexpected objects Ævar Arnfjörð Bjarmason
  2021-03-08 20:04   ` [PATCH 1/7] object.c: refactor type_from_string_gently() Ævar Arnfjörð Bjarmason
@ 2021-03-08 20:04   ` Ævar Arnfjörð Bjarmason
  2021-03-08 20:56     ` Taylor Blau
  2021-03-08 21:48     ` Junio C Hamano
  2021-03-08 20:04   ` [PATCH 3/7] oid_object_info(): " Ævar Arnfjörð Bjarmason
                     ` (4 subsequent siblings)
  7 siblings, 2 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-08 20:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Change the type_from_string*() functions to return an "enum
object_type", and refactor their callers to check for "== OBJ_BAD"
instead of "< 0".

This helps to distinguish code in object.c where we really do return
-1 from code that returns an "enum object_type", whose OBJ_BAD happens
to be -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c        |  2 +-
 object-file.c |  2 +-
 object.c      | 12 ++++++------
 object.h      |  4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fsck.c b/fsck.c
index 6cc4f9ea892..a6d00dfa2e6 100644
--- a/fsck.c
+++ b/fsck.c
@@ -958,7 +958,7 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
 		goto done;
 	}
 	*tagged_type = type_from_string_gently(buffer, eol - buffer);
-	if (*tagged_type < 0)
+	if (*tagged_type == OBJ_BAD)
 		ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_TYPE, "invalid 'type' value");
 	if (ret)
 		goto done;
diff --git a/object-file.c b/object-file.c
index 42bc579828d..cd30c2b5590 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1324,7 +1324,7 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	 */
 	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
 		type = 0;
-	else if (type < 0)
+	else if (type == OBJ_BAD)
 		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
diff --git a/object.c b/object.c
index c7586e46727..eebacc28847 100644
--- a/object.c
+++ b/object.c
@@ -35,22 +35,22 @@ const char *type_name(unsigned int type)
 	return object_type_strings[type];
 }
 
-int type_from_string_gently(const char *str, ssize_t len)
+enum object_type type_from_string_gently(const char *str, ssize_t len)
 {
-	int i;
+	enum object_type i;
 
 	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
 		if (!strncmp(str, object_type_strings[i], len) &&
 		    object_type_strings[i][len] == '\0')
 			return i;
-	return -1;
+	return OBJ_BAD;
 }
 
-int type_from_string(const char *str)
+enum object_type type_from_string(const char *str)
 {
 	size_t len = strlen(str);
-	int ret = type_from_string_gently(str, len);
-	if (ret < 0)
+	enum object_type ret = type_from_string_gently(str, len);
+	if (ret == OBJ_BAD)
 		die(_("invalid object type \"%s\""), str);
 	return ret;
 }
diff --git a/object.h b/object.h
index ffdc1298300..5e7a523e858 100644
--- a/object.h
+++ b/object.h
@@ -93,8 +93,8 @@ struct object {
 };
 
 const char *type_name(unsigned int type);
-int type_from_string_gently(const char *str, ssize_t len);
-int type_from_string(const char *str);
+enum object_type type_from_string_gently(const char *str, ssize_t len);
+enum object_type type_from_string(const char *str);
 
 /*
  * Return the current number of buckets in the object hashmap.
-- 
2.31.0.rc1.210.g0f8085a843c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 3/7] oid_object_info(): return "enum object_type"
  2005-06-22  0:35 ` [PATCH 1/2] Parse tags for absent objects Daniel Barkalow
                     ` (2 preceding siblings ...)
  2021-03-08 20:04   ` [PATCH 2/7] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
@ 2021-03-08 20:04   ` Ævar Arnfjörð Bjarmason
  2021-03-08 21:54     ` Junio C Hamano
  2021-03-09 10:34     ` Jeff King
  2021-03-08 20:04   ` [PATCH 4/7] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
                     ` (3 subsequent siblings)
  7 siblings, 2 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-08 20:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Change oid_object_info() to return an "enum object_type", this is what
it did anyway, except that it hardcoded -1 instead of an
OBJ_BAD.

Let's instead have it return the "enum object_type", at which point
callers will expect OBJ_BAD. This allows for refactoring code that
e.g. expected any "< 0" value, but would only have to deal with
OBJ_BAD (= -1).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/blame.c          |  2 +-
 builtin/cat-file.c       |  2 +-
 builtin/index-pack.c     |  6 +++---
 builtin/mktree.c         |  2 +-
 builtin/pack-objects.c   |  4 ++--
 builtin/replace.c        |  2 +-
 builtin/tag.c            |  2 +-
 builtin/unpack-objects.c |  6 +++++-
 object-file.c            | 11 +++++------
 object-name.c            | 18 +++++++++---------
 object-store.h           |  2 +-
 packfile.c               |  4 +---
 reachable.c              |  5 +++--
 13 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 641523ff9af..5dd3c38a8c6 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -810,7 +810,7 @@ static int peel_to_commit_oid(struct object_id *oid_ret, void *cbdata)
 	oidcpy(&oid, oid_ret);
 	while (1) {
 		struct object *obj;
-		int kind = oid_object_info(r, &oid, NULL);
+		enum object_type kind = oid_object_info(r, &oid, NULL);
 		if (kind == OBJ_COMMIT) {
 			oidcpy(oid_ret, &oid);
 			return 0;
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5ebf13359e8..1d989c62a4e 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -133,7 +133,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 
 	case 'p':
 		type = oid_object_info(the_repository, &oid, NULL);
-		if (type < 0)
+		if (type == OBJ_BAD)
 			die("Not a valid object name %s", obj_name);
 
 		/* custom pretty-print here */
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index bad57488079..253cfb07fbd 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -236,8 +236,8 @@ static unsigned check_object(struct object *obj)
 
 	if (!(obj->flags & FLAG_CHECKED)) {
 		unsigned long size;
-		int type = oid_object_info(the_repository, &obj->oid, &size);
-		if (type <= 0)
+		enum object_type type = oid_object_info(the_repository, &obj->oid, &size);
+		if (type == OBJ_BAD)
 			die(_("did not receive expected object %s"),
 			      oid_to_hex(&obj->oid));
 		if (type != obj->type)
@@ -820,7 +820,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 		unsigned long has_size;
 		read_lock();
 		has_type = oid_object_info(the_repository, oid, &has_size);
-		if (has_type < 0)
+		if (has_type == OBJ_BAD)
 			die(_("cannot read existing object info %s"), oid_to_hex(oid));
 		if (has_type != type || has_size != size)
 			die(_("SHA1 COLLISION FOUND WITH %s !"), oid_to_hex(oid));
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 891991b00d6..e6f8e0edb23 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -118,7 +118,7 @@ static void mktree_line(char *buf, int nul_term_line, int allow_missing)
 
 	/* Check the type of object identified by sha1 */
 	obj_type = oid_object_info(the_repository, &oid, NULL);
-	if (obj_type < 0) {
+	if (obj_type == OBJ_BAD) {
 		if (allow_missing) {
 			; /* no problem - missing objects are presumed to be of the right type */
 		} else {
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6d62aaf59a0..e60ae4ebd9a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2222,7 +2222,7 @@ unsigned long oe_get_size_slow(struct packing_data *pack,
 
 	if (e->type_ != OBJ_OFS_DELTA && e->type_ != OBJ_REF_DELTA) {
 		packing_data_lock(&to_pack);
-		if (oid_object_info(the_repository, &e->idx.oid, &size) < 0)
+		if (oid_object_info(the_repository, &e->idx.oid, &size) == OBJ_BAD)
 			die(_("unable to get size of %s"),
 			    oid_to_hex(&e->idx.oid));
 		packing_data_unlock(&to_pack);
@@ -3198,7 +3198,7 @@ static int add_loose_object(const struct object_id *oid, const char *path,
 {
 	enum object_type type = oid_object_info(the_repository, oid, NULL);
 
-	if (type < 0) {
+	if (type == OBJ_BAD) {
 		warning(_("loose object at %s could not be examined"), path);
 		return 0;
 	}
diff --git a/builtin/replace.c b/builtin/replace.c
index cd487659117..e9e151ae957 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -322,7 +322,7 @@ static int edit_and_replace(const char *object_ref, int force, int raw)
 		return error(_("not a valid object name: '%s'"), object_ref);
 
 	type = oid_object_info(the_repository, &old_oid, NULL);
-	if (type < 0)
+	if (type == OBJ_BAD)
 		return error(_("unable to get object type for %s"),
 			     oid_to_hex(&old_oid));
 
diff --git a/builtin/tag.c b/builtin/tag.c
index d403417b562..18206341409 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -260,7 +260,7 @@ static void create_tag(const struct object_id *object, const char *object_ref,
 	char *path = NULL;
 
 	type = oid_object_info(the_repository, object, NULL);
-	if (type <= OBJ_NONE)
+	if (type == OBJ_BAD)
 		die(_("bad object type."));
 
 	if (type == OBJ_TAG)
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index dd4a75e030d..c88daa3a5b1 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -203,7 +203,11 @@ static int check_object(struct object *obj, int type, void *data, struct fsck_op
 	if (!(obj->flags & FLAG_OPEN)) {
 		unsigned long size;
 		int type = oid_object_info(the_repository, &obj->oid, &size);
-		if (type != obj->type || type <= 0)
+		if (type == OBJ_BAD)
+			die(_("unable to get object type for %s"),
+			    oid_to_hex(&obj->oid));
+		if (type != obj->type)
+			/* todo to new function */
 			die("object of unexpected type");
 		obj->flags |= FLAG_WRITTEN;
 		return 0;
diff --git a/object-file.c b/object-file.c
index cd30c2b5590..8be6ce56133 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1573,10 +1573,9 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 }
 
 
-/* returns enum object_type or negative */
-int oid_object_info(struct repository *r,
-		    const struct object_id *oid,
-		    unsigned long *sizep)
+enum object_type oid_object_info(struct repository *r,
+				 const struct object_id *oid,
+				 unsigned long *sizep)
 {
 	enum object_type type;
 	struct object_info oi = OBJECT_INFO_INIT;
@@ -1585,7 +1584,7 @@ int oid_object_info(struct repository *r,
 	oi.sizep = sizep;
 	if (oid_object_info_extended(r, oid, &oi,
 				      OBJECT_INFO_LOOKUP_REPLACE) < 0)
-		return -1;
+		return OBJ_BAD;
 	return type;
 }
 
@@ -2265,7 +2264,7 @@ int read_pack_header(int fd, struct pack_header *header)
 void assert_oid_type(const struct object_id *oid, enum object_type expect)
 {
 	enum object_type type = oid_object_info(the_repository, oid, NULL);
-	if (type < 0)
+	if (type == OBJ_BAD)
 		die(_("%s is not a valid object"), oid_to_hex(oid));
 	if (type != expect)
 		die(_("%s is not a valid '%s' object"), oid_to_hex(oid),
diff --git a/object-name.c b/object-name.c
index 64202de60b1..c6c3fd5228b 100644
--- a/object-name.c
+++ b/object-name.c
@@ -239,7 +239,7 @@ static int disambiguate_committish_only(struct repository *r,
 					void *cb_data_unused)
 {
 	struct object *obj;
-	int kind;
+	enum object_type kind;
 
 	kind = oid_object_info(r, oid, NULL);
 	if (kind == OBJ_COMMIT)
@@ -258,7 +258,7 @@ static int disambiguate_tree_only(struct repository *r,
 				  const struct object_id *oid,
 				  void *cb_data_unused)
 {
-	int kind = oid_object_info(r, oid, NULL);
+	enum object_type kind = oid_object_info(r, oid, NULL);
 	return kind == OBJ_TREE;
 }
 
@@ -267,7 +267,7 @@ static int disambiguate_treeish_only(struct repository *r,
 				     void *cb_data_unused)
 {
 	struct object *obj;
-	int kind;
+	enum object_type kind;
 
 	kind = oid_object_info(r, oid, NULL);
 	if (kind == OBJ_TREE || kind == OBJ_COMMIT)
@@ -286,7 +286,7 @@ static int disambiguate_blob_only(struct repository *r,
 				  const struct object_id *oid,
 				  void *cb_data_unused)
 {
-	int kind = oid_object_info(r, oid, NULL);
+	enum object_type kind = oid_object_info(r, oid, NULL);
 	return kind == OBJ_BLOB;
 }
 
@@ -361,7 +361,7 @@ static int show_ambiguous_object(const struct object_id *oid, void *data)
 {
 	const struct disambiguate_state *ds = data;
 	struct strbuf desc = STRBUF_INIT;
-	int type;
+	enum object_type type;
 
 	if (ds->fn && !ds->fn(ds->repo, oid, ds->cb_data))
 		return 0;
@@ -405,10 +405,10 @@ static int repo_collect_ambiguous(struct repository *r,
 static int sort_ambiguous(const void *a, const void *b, void *ctx)
 {
 	struct repository *sort_ambiguous_repo = ctx;
-	int a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
-	int b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
-	int a_type_sort;
-	int b_type_sort;
+	enum object_type a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
+	enum object_type b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
+	enum object_type a_type_sort;
+	enum object_type b_type_sort;
 
 	/*
 	 * Sorts by hash within the same object type, just as
diff --git a/object-store.h b/object-store.h
index 541dab08586..e9ab6aff2ab 100644
--- a/object-store.h
+++ b/object-store.h
@@ -203,7 +203,7 @@ static inline void *repo_read_object_file(struct repository *r,
 #endif
 
 /* Read and unpack an object file into memory, write memory to an object file */
-int oid_object_info(struct repository *r, const struct object_id *, unsigned long *);
+enum object_type oid_object_info(struct repository *r, const struct object_id *, unsigned long *);
 
 int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 		     unsigned long len, const char *type,
diff --git a/packfile.c b/packfile.c
index 1fec12ac5f4..17c8c3222eb 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1266,7 +1266,7 @@ static int retry_bad_packed_offset(struct repository *r,
 				   struct packed_git *p,
 				   off_t obj_offset)
 {
-	int type;
+	enum object_type type;
 	uint32_t pos;
 	struct object_id oid;
 	if (offset_to_pack_pos(p, obj_offset, &pos) < 0)
@@ -1274,8 +1274,6 @@ static int retry_bad_packed_offset(struct repository *r,
 	nth_packed_object_id(&oid, p, pack_pos_to_index(p, pos));
 	mark_bad_packed_object(p, oid.hash);
 	type = oid_object_info(r, &oid, NULL);
-	if (type <= OBJ_NONE)
-		return OBJ_BAD;
 	return type;
 }
 
diff --git a/reachable.c b/reachable.c
index 77a60c70a5d..85f04492ea1 100644
--- a/reachable.c
+++ b/reachable.c
@@ -80,8 +80,6 @@ static void add_recent_object(const struct object_id *oid,
 	 * commits and tags to have been parsed.
 	 */
 	type = oid_object_info(the_repository, oid, NULL);
-	if (type < 0)
-		die("unable to get object info for %s", oid_to_hex(oid));
 
 	switch (type) {
 	case OBJ_TAG:
@@ -94,6 +92,9 @@ static void add_recent_object(const struct object_id *oid,
 	case OBJ_BLOB:
 		obj = (struct object *)lookup_blob(the_repository, oid);
 		break;
+	case OBJ_BAD:
+		die("unable to get object info for %s", oid_to_hex(oid));
+		break;
 	default:
 		die("unknown object type for %s: %s",
 		    oid_to_hex(oid), type_name(type));
-- 
2.31.0.rc1.210.g0f8085a843c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 4/7] tree.c: fix misindentation in parse_tree_gently()
  2005-06-22  0:35 ` [PATCH 1/2] Parse tags for absent objects Daniel Barkalow
                     ` (3 preceding siblings ...)
  2021-03-08 20:04   ` [PATCH 3/7] oid_object_info(): " Ævar Arnfjörð Bjarmason
@ 2021-03-08 20:04   ` Ævar Arnfjörð Bjarmason
  2021-03-08 20:04   ` [PATCH 5/7] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-08 20:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

The variables declared in parse_tree_gently() had a single space after
the TAB. This dates back to their introduction in bd2c39f58f9 ([PATCH]
don't load and decompress objects twice with parse_object(),
2005-05-06). Let's fix them to follow the style of the rest of the
file.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 tree.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tree.c b/tree.c
index a52479812ce..4820d66a10c 100644
--- a/tree.c
+++ b/tree.c
@@ -216,9 +216,9 @@ int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
 
 int parse_tree_gently(struct tree *item, int quiet_on_missing)
 {
-	 enum object_type type;
-	 void *buffer;
-	 unsigned long size;
+	enum object_type type;
+	void *buffer;
+	unsigned long size;
 
 	if (item->object.parsed)
 		return 0;
-- 
2.31.0.rc1.210.g0f8085a843c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 5/7] object.c: add a utility function for "expected type X, got Y"
  2005-06-22  0:35 ` [PATCH 1/2] Parse tags for absent objects Daniel Barkalow
                     ` (4 preceding siblings ...)
  2021-03-08 20:04   ` [PATCH 4/7] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
@ 2021-03-08 20:04   ` Ævar Arnfjörð Bjarmason
  2021-03-08 20:59     ` Taylor Blau
  2021-03-08 22:15     ` Junio C Hamano
  2021-03-08 20:04   ` [PATCH 6/7] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
  2021-03-08 20:04   ` [PATCH 7/7] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
  7 siblings, 2 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-08 20:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Refactor various "Object X is not Y" error messages so that they use
the same message as the long-standing object_as_type() error
message. Now we'll consistently report e.g. that we got a commit when
we expected a tag, not just that the object is not a tag.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/index-pack.c |  9 +++------
 commit.c             | 10 ++++------
 object.c             | 34 +++++++++++++++++++++++++++++++++-
 object.h             |  5 +++++
 tree.c               |  7 ++++---
 5 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 253cfb07fbd..d9082831bb2 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -217,8 +217,8 @@ static int mark_link(struct object *obj, int type, void *data, struct fsck_optio
 	if (!obj)
 		return -1;
 
-	if (type != OBJ_ANY && obj->type != type)
-		die(_("object type mismatch at %s"), oid_to_hex(&obj->oid));
+	if (type != OBJ_ANY)
+		oid_is_type_or_die(&obj->oid, obj->type, &type);
 
 	obj->flags |= FLAG_LINK;
 	return 0;
@@ -240,10 +240,7 @@ static unsigned check_object(struct object *obj)
 		if (type == OBJ_BAD)
 			die(_("did not receive expected object %s"),
 			      oid_to_hex(&obj->oid));
-		if (type != obj->type)
-			die(_("object %s: expected type %s, found %s"),
-			    oid_to_hex(&obj->oid),
-			    type_name(obj->type), type_name(type));
+		oid_is_type_or_die(&obj->oid, obj->type, &type);
 		obj->flags |= FLAG_CHECKED;
 		return 1;
 	}
diff --git a/commit.c b/commit.c
index 6ccd774841c..54627b546c3 100644
--- a/commit.c
+++ b/commit.c
@@ -299,9 +299,7 @@ const void *repo_get_commit_buffer(struct repository *r,
 		if (!ret)
 			die("cannot read commit object %s",
 			    oid_to_hex(&commit->object.oid));
-		if (type != OBJ_COMMIT)
-			die("expected commit for %s, got %s",
-			    oid_to_hex(&commit->object.oid), type_name(type));
+		oid_is_type_or_die(&commit->object.oid, OBJ_COMMIT, &type);
 		if (sizep)
 			*sizep = size;
 	}
@@ -489,10 +487,10 @@ int repo_parse_commit_internal(struct repository *r,
 		return quiet_on_missing ? -1 :
 			error("Could not read %s",
 			     oid_to_hex(&item->object.oid));
-	if (type != OBJ_COMMIT) {
+	ret = oid_is_type_or_error(&item->object.oid, OBJ_COMMIT, &type);
+	if (ret) {
 		free(buffer);
-		return error("Object %s not a commit",
-			     oid_to_hex(&item->object.oid));
+		return ret;
 	}
 
 	ret = parse_commit_buffer(r, item, buffer, size, 0);
diff --git a/object.c b/object.c
index eebacc28847..819ee0faa26 100644
--- a/object.c
+++ b/object.c
@@ -28,6 +28,8 @@ static const char *object_type_strings[] = {
 	"tag",		/* OBJ_TAG = 4 */
 };
 
+static const char *oid_is_a_X_not_a_Y = N_("object %s is a %s, not a %s");
+
 const char *type_name(unsigned int type)
 {
 	if (type >= ARRAY_SIZE(object_type_strings))
@@ -159,6 +161,36 @@ void *create_object(struct repository *r, const struct object_id *oid, void *o)
 	return obj;
 }
 
+static int oid_is_type_or(const struct object_id *oid,
+			  enum object_type want,
+			  enum object_type type,
+			  int err)
+{
+	if (want == type)
+		return 0;
+	if (err)
+		return error(_(oid_is_a_X_not_a_Y),
+			     oid_to_hex(oid), type_name(type),
+			     type_name(want));
+	else
+		die(_(oid_is_a_X_not_a_Y), oid_to_hex(oid),
+		    type_name(type), type_name(want));
+}
+
+void oid_is_type_or_die(const struct object_id *oid,
+			enum object_type want,
+			enum object_type *type)
+{
+	oid_is_type_or(oid, want, *type, 0);
+}
+
+int oid_is_type_or_error(const struct object_id *oid,
+			 enum object_type want,
+			 enum object_type *type)
+{
+	return oid_is_type_or(oid, want, *type, 1);
+}
+
 void *object_as_type(struct object *obj, enum object_type type, int quiet)
 {
 	if (obj->type == type)
@@ -172,7 +204,7 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 	}
 	else {
 		if (!quiet)
-			error(_("object %s is a %s, not a %s"),
+			error(_(oid_is_a_X_not_a_Y),
 			      oid_to_hex(&obj->oid),
 			      type_name(obj->type), type_name(type));
 		return NULL;
diff --git a/object.h b/object.h
index 5e7a523e858..d2d4a236d0e 100644
--- a/object.h
+++ b/object.h
@@ -124,6 +124,11 @@ void *create_object(struct repository *r, const struct object_id *oid, void *obj
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
+void oid_is_type_or_die(const struct object_id *oid, enum object_type want,
+			enum object_type *type);
+int oid_is_type_or_error(const struct object_id *oid, enum object_type want,
+			 enum object_type *type);
+
 /*
  * Returns the object, having parsed it to find out what it is.
  *
diff --git a/tree.c b/tree.c
index 4820d66a10c..d9b1c70b28a 100644
--- a/tree.c
+++ b/tree.c
@@ -219,6 +219,7 @@ int parse_tree_gently(struct tree *item, int quiet_on_missing)
 	enum object_type type;
 	void *buffer;
 	unsigned long size;
+	int ret;
 
 	if (item->object.parsed)
 		return 0;
@@ -227,10 +228,10 @@ int parse_tree_gently(struct tree *item, int quiet_on_missing)
 		return quiet_on_missing ? -1 :
 			error("Could not read %s",
 			     oid_to_hex(&item->object.oid));
-	if (type != OBJ_TREE) {
+	ret = oid_is_type_or_error(&item->object.oid, OBJ_TREE, &type);
+	if (ret) {
 		free(buffer);
-		return error("Object %s not a tree",
-			     oid_to_hex(&item->object.oid));
+		return ret;
 	}
 	return parse_tree_buffer(item, buffer, size);
 }
-- 
2.31.0.rc1.210.g0f8085a843c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 6/7] object tests: add test for unexpected objects in tags
  2005-06-22  0:35 ` [PATCH 1/2] Parse tags for absent objects Daniel Barkalow
                     ` (5 preceding siblings ...)
  2021-03-08 20:04   ` [PATCH 5/7] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
@ 2021-03-08 20:04   ` Ævar Arnfjörð Bjarmason
  2021-03-09 10:44     ` Jeff King
  2021-03-08 20:04   ` [PATCH 7/7] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
  7 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-08 20:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Fix a blind spot in the tests added in 0616617c7e1 (t: introduce tests
for unexpected object types, 2019-04-09), there were no meaningful
tests for checking how we reported on finding the incorrect object
type in a tag, i.e. one that broke the "type" promise in the tag
header.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t6102-rev-list-unexpected-objects.sh | 113 ++++++++++++++++++++++++-
 1 file changed, 112 insertions(+), 1 deletion(-)

diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 52cde097dd5..2ea1982b9ed 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -31,7 +31,8 @@ test_expect_success 'setup unexpected non-tree entry' '
 '
 
 test_expect_success 'traverse unexpected non-tree entry (lone)' '
-	test_must_fail git rev-list --objects $broken_tree
+	test_must_fail git rev-list --objects $broken_tree >output 2>&1 &&
+	test_i18ngrep "not a tree" output
 '
 
 test_expect_success 'traverse unexpected non-tree entry (seen)' '
@@ -124,4 +125,114 @@ test_expect_success 'traverse unexpected non-blob tag (seen)' '
 	test_i18ngrep "not a blob" output
 '
 
+test_expect_success 'setup unexpected non-tag tag' '
+	test_when_finished "git tag -d tag-commit tag-tag" &&
+
+	git tag -a -m"tagged commit" tag-commit $commit &&
+	tag_commit=$(git rev-parse tag-commit) &&
+	git tag -a -m"tagged tag" tag-tag tag-commit &&
+	tag_tag=$(git rev-parse tag-tag) &&
+
+	git cat-file tag tag-tag >good-tag-tag &&
+	git cat-file tag tag-commit >good-commit-tag &&
+
+	sed -e "s/$tag_commit/$commit/" <good-tag-tag >broken-tag-tag-commit &&
+	sed -e "s/$tag_commit/$tree/" <good-tag-tag >broken-tag-tag-tree &&
+	sed -e "s/$tag_commit/$blob/" <good-tag-tag >broken-tag-tag-blob &&
+
+	sed -e "s/$commit/$tag_commit/" <good-commit-tag >broken-commit-tag-tag &&
+	sed -e "s/$commit/$tree/" <good-commit-tag >broken-commit-tag-tree &&
+	sed -e "s/$commit/$blob/" <good-commit-tag >broken-commit-tag-blob &&
+
+	tag_tag_commit=$(git hash-object -w -t tag broken-tag-tag-commit) &&
+	tag_tag_tree=$(git hash-object -w -t tag broken-tag-tag-tree) &&
+	tag_tag_blob=$(git hash-object -w -t tag broken-tag-tag-blob) &&
+
+	commit_tag_tag=$(git hash-object -w -t tag broken-commit-tag-tag) &&
+	commit_tag_tree=$(git hash-object -w -t tag broken-commit-tag-tree) &&
+	commit_tag_blob=$(git hash-object -w -t tag broken-commit-tag-blob)
+'
+
+test_expect_success 'traverse unexpected incorrectly typed tag (to commit & tag)' '
+	test_must_fail git rev-list --objects $tag_tag_commit 2>err &&
+	cat >expected <<-EOF &&
+	error: object $commit is a tag, not a commit
+	fatal: bad object $commit
+	EOF
+	test_cmp expected err &&
+
+	test_must_fail git rev-list --objects $commit_tag_tag 2>err &&
+	cat >expected <<-EOF &&
+	error: object $tag_commit is a commit, not a tag
+	fatal: bad object $tag_commit
+	EOF
+	test_cmp expected err
+'
+
+test_expect_success 'traverse unexpected incorrectly typed tag (to tree)' '
+	test_must_fail git rev-list --objects $tag_tag_tree 2>err &&
+	cat >expected <<-EOF &&
+	error: object $tree is a tag, not a tree
+	fatal: bad object $tree
+	EOF
+	test_cmp expected err &&
+
+	test_must_fail git rev-list --objects $commit_tag_tree 2>err &&
+	cat >expected <<-EOF &&
+	error: object $tree is a commit, not a tree
+	fatal: bad object $tree
+	EOF
+	test_cmp expected err
+'
+
+test_expect_success 'traverse unexpected incorrectly typed tag (to blob)' '
+	test_must_fail git rev-list --objects $tag_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a tag, not a blob
+	fatal: bad object $blob
+	EOF
+	test_cmp expected err &&
+
+	test_must_fail git rev-list --objects $commit_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a commit, not a blob
+	fatal: bad object $blob
+	EOF
+	test_cmp expected err
+'
+
+test_expect_success 'traverse unexpected non-tag tag (tree seen to blob)' '
+	test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a commit, not a blob
+	fatal: bad object $blob
+	EOF
+	test_cmp expected err &&
+
+	test_must_fail git rev-list --objects $tree $tag_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a tag, not a blob
+	fatal: bad object $blob
+	EOF
+	test_cmp expected err
+'
+
+test_expect_success 'traverse unexpected non-tag tag (blob seen to blob)' '
+	test_must_fail git rev-list --objects $blob $commit_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a blob, not a commit
+	error: bad tag pointer to $blob in $commit_tag_blob
+	fatal: bad object $commit_tag_blob
+	EOF
+	test_cmp expected err &&
+
+	test_must_fail git rev-list --objects $blob $tag_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a blob, not a tag
+	error: bad tag pointer to $blob in $tag_tag_blob
+	fatal: bad object $tag_tag_blob
+	EOF
+	test_cmp expected err
+'
+
 test_done
-- 
2.31.0.rc1.210.g0f8085a843c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 7/7] tag: don't misreport type of tagged objects in errors
  2005-06-22  0:35 ` [PATCH 1/2] Parse tags for absent objects Daniel Barkalow
                     ` (6 preceding siblings ...)
  2021-03-08 20:04   ` [PATCH 6/7] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
@ 2021-03-08 20:04   ` Ævar Arnfjörð Bjarmason
  7 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-08 20:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Fix a regression in 89e4202f982 ([PATCH] Parse tags for absent
objects, 2005-06-21) (yes, that ancient!) and correctly report an
error on a tag like:

    object <a tree hash>
    type commit

As:

    error: object <a tree hash> is tree, not a commit

Instead of our long-standing misbehavior of inverting the two, and
reporting:

    error: object <a tree hash> is commit, not a tree

Which, as can be trivially seen with 'git cat-file -t <a tree hash>'
is incorrect.

The reason for this misreporting is that in parse_tag_buffer() we end
up doing a lookup_{blob,commit,tag,tree}() depending on what we read
out of the "type" line.

If we haven't parsed that object before we end up dispatching to the
type-specific lookup functions, e.g. this for commit.c in
lookup_commit_type():

	struct object *obj = lookup_object(r, oid);
	if (!obj)
		return create_object(r, oid, alloc_commit_node(r));

Its allocation will then set the obj->type according to what the tag
told us the type was, but which we've never validated. At this point
we've got an object in memory that hasn't been parsed, and whose type
is incorrect, since we mistrusted a tag to tell us the type.

Then when we actually load the object with parse_object() we read it
and find that it's a "tree". See 8ff226a9d5e (add object_as_type
helper for casting objects, 2014-07-13) for that behavior (that's just
a refactoring commit, but shows all the code involved).

Which explains why we inverted the error report. Normally when
object_as_type() is called it's by the lookup_{blob,commit,tag,tree}()
functions via parse_object(). At that point we can trust the
obj->type.

In the case of parsing objects we've learned about via a tag with an
incorrect type it's the opposite, the obj->type isn't correct and
holds the mislabeled type, but we're parsing the object and know for
sure what object type we're dealing with.

Hence the non-intuitive solution of adding a
lookup_{blob,commit,tag,tree}_type() function. It's to distinguish
parse_object_buffer() where we actually know the type from
parse_tag_buffer() where we're just guessing about the type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 blob.c                                 | 16 +++++++++++++++-
 blob.h                                 |  3 +++
 commit.c                               | 14 +++++++++++++-
 commit.h                               |  2 ++
 object.c                               |  8 ++++----
 t/t6102-rev-list-unexpected-objects.sh | 16 ++++++++--------
 tag.c                                  | 14 +++++++++++++-
 tag.h                                  |  2 ++
 tree.c                                 | 14 +++++++++++++-
 tree.h                                 |  2 ++
 10 files changed, 75 insertions(+), 16 deletions(-)

diff --git a/blob.c b/blob.c
index 182718aba9f..b233e0daa2f 100644
--- a/blob.c
+++ b/blob.c
@@ -2,17 +2,31 @@
 #include "blob.h"
 #include "repository.h"
 #include "alloc.h"
+#include "object-store.h"
 
 const char *blob_type = "blob";
 
-struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
+struct blob *lookup_blob_type(struct repository *r,
+			      const struct object_id *oid,
+			      enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_blob_node(r));
+	if (type != OBJ_NONE &&
+	    obj->type != OBJ_NONE) {
+		enum object_type want = OBJ_BLOB;
+		if (oid_is_type_or_error(oid, obj->type, &want))
+			return NULL;
+	}
 	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
+struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
+{
+	return lookup_blob_type(r, oid, OBJ_NONE);
+}
+
 int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
 {
 	item->object.parsed = 1;
diff --git a/blob.h b/blob.h
index 16648720557..066a2effcbf 100644
--- a/blob.h
+++ b/blob.h
@@ -10,6 +10,9 @@ struct blob {
 };
 
 struct blob *lookup_blob(struct repository *r, const struct object_id *oid);
+struct blob *lookup_blob_type(struct repository *r,
+			      const struct object_id *oid,
+			      enum object_type type);
 
 int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size);
 
diff --git a/commit.c b/commit.c
index 54627b546c3..30cc990d003 100644
--- a/commit.c
+++ b/commit.c
@@ -57,14 +57,26 @@ struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref
 	return c;
 }
 
-struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
+struct commit *lookup_commit_type(struct repository *r, const struct object_id *oid,
+				  enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_commit_node(r));
+	if (type != OBJ_NONE &&
+	    obj->type != OBJ_NONE) {
+		enum object_type want = OBJ_COMMIT;
+		if (oid_is_type_or_error(oid, obj->type, &want))
+			return NULL;
+	}
 	return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
+struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
+{
+	return lookup_commit_type(r, oid, OBJ_NONE);
+}
+
 struct commit *lookup_commit_reference_by_name(const char *name)
 {
 	struct object_id oid;
diff --git a/commit.h b/commit.h
index 49c0f503964..9b92f661836 100644
--- a/commit.h
+++ b/commit.h
@@ -64,6 +64,8 @@ void add_name_decoration(enum decoration_type type, const char *name, struct obj
 const struct name_decoration *get_name_decoration(const struct object *obj);
 
 struct commit *lookup_commit(struct repository *r, const struct object_id *oid);
+struct commit *lookup_commit_type(struct repository *r, const struct object_id *oid,
+				  enum object_type type);
 struct commit *lookup_commit_reference(struct repository *r,
 				       const struct object_id *oid);
 struct commit *lookup_commit_reference_gently(struct repository *r,
diff --git a/object.c b/object.c
index 819ee0faa26..1a45b149e08 100644
--- a/object.c
+++ b/object.c
@@ -227,14 +227,14 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 
 	obj = NULL;
 	if (type == OBJ_BLOB) {
-		struct blob *blob = lookup_blob(r, oid);
+		struct blob *blob = lookup_blob_type(r, oid, type);
 		if (blob) {
 			if (parse_blob_buffer(blob, buffer, size))
 				return NULL;
 			obj = &blob->object;
 		}
 	} else if (type == OBJ_TREE) {
-		struct tree *tree = lookup_tree(r, oid);
+		struct tree *tree = lookup_tree_type(r, oid, type);
 		if (tree) {
 			obj = &tree->object;
 			if (!tree->buffer)
@@ -246,7 +246,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 			}
 		}
 	} else if (type == OBJ_COMMIT) {
-		struct commit *commit = lookup_commit(r, oid);
+		struct commit *commit = lookup_commit_type(r, oid, type);
 		if (commit) {
 			if (parse_commit_buffer(r, commit, buffer, size, 1))
 				return NULL;
@@ -257,7 +257,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 			obj = &commit->object;
 		}
 	} else if (type == OBJ_TAG) {
-		struct tag *tag = lookup_tag(r, oid);
+		struct tag *tag = lookup_tag_type(r, oid, type);
 		if (tag) {
 			if (parse_tag_buffer(r, tag, buffer, size))
 			       return NULL;
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 2ea1982b9ed..4a6b3cc3b01 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -156,14 +156,14 @@ test_expect_success 'setup unexpected non-tag tag' '
 test_expect_success 'traverse unexpected incorrectly typed tag (to commit & tag)' '
 	test_must_fail git rev-list --objects $tag_tag_commit 2>err &&
 	cat >expected <<-EOF &&
-	error: object $commit is a tag, not a commit
+	error: object $commit is a commit, not a tag
 	fatal: bad object $commit
 	EOF
 	test_cmp expected err &&
 
 	test_must_fail git rev-list --objects $commit_tag_tag 2>err &&
 	cat >expected <<-EOF &&
-	error: object $tag_commit is a commit, not a tag
+	error: object $tag_commit is a tag, not a commit
 	fatal: bad object $tag_commit
 	EOF
 	test_cmp expected err
@@ -172,14 +172,14 @@ test_expect_success 'traverse unexpected incorrectly typed tag (to commit & tag)
 test_expect_success 'traverse unexpected incorrectly typed tag (to tree)' '
 	test_must_fail git rev-list --objects $tag_tag_tree 2>err &&
 	cat >expected <<-EOF &&
-	error: object $tree is a tag, not a tree
+	error: object $tree is a tree, not a tag
 	fatal: bad object $tree
 	EOF
 	test_cmp expected err &&
 
 	test_must_fail git rev-list --objects $commit_tag_tree 2>err &&
 	cat >expected <<-EOF &&
-	error: object $tree is a commit, not a tree
+	error: object $tree is a tree, not a commit
 	fatal: bad object $tree
 	EOF
 	test_cmp expected err
@@ -188,14 +188,14 @@ test_expect_success 'traverse unexpected incorrectly typed tag (to tree)' '
 test_expect_success 'traverse unexpected incorrectly typed tag (to blob)' '
 	test_must_fail git rev-list --objects $tag_tag_blob 2>err &&
 	cat >expected <<-EOF &&
-	error: object $blob is a tag, not a blob
+	error: object $blob is a blob, not a tag
 	fatal: bad object $blob
 	EOF
 	test_cmp expected err &&
 
 	test_must_fail git rev-list --objects $commit_tag_blob 2>err &&
 	cat >expected <<-EOF &&
-	error: object $blob is a commit, not a blob
+	error: object $blob is a blob, not a commit
 	fatal: bad object $blob
 	EOF
 	test_cmp expected err
@@ -204,14 +204,14 @@ test_expect_success 'traverse unexpected incorrectly typed tag (to blob)' '
 test_expect_success 'traverse unexpected non-tag tag (tree seen to blob)' '
 	test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err &&
 	cat >expected <<-EOF &&
-	error: object $blob is a commit, not a blob
+	error: object $blob is a blob, not a commit
 	fatal: bad object $blob
 	EOF
 	test_cmp expected err &&
 
 	test_must_fail git rev-list --objects $tree $tag_tag_blob 2>err &&
 	cat >expected <<-EOF &&
-	error: object $blob is a tag, not a blob
+	error: object $blob is a blob, not a tag
 	fatal: bad object $blob
 	EOF
 	test_cmp expected err
diff --git a/tag.c b/tag.c
index 3e18a418414..0ef87897b29 100644
--- a/tag.c
+++ b/tag.c
@@ -99,14 +99,26 @@ struct object *deref_tag_noverify(struct object *o)
 	return o;
 }
 
-struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
+struct tag *lookup_tag_type(struct repository *r, const struct object_id *oid,
+			    enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_tag_node(r));
+	if (type != OBJ_NONE &&
+	    obj->type != OBJ_NONE) {
+		enum object_type want = OBJ_TAG;
+		if (oid_is_type_or_error(oid, obj->type, &want))
+			return NULL;
+	}
 	return object_as_type(obj, OBJ_TAG, 0);
 }
 
+struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
+{
+	return lookup_tag_type(r, oid, OBJ_NONE);
+}
+
 static timestamp_t parse_tag_date(const char *buf, const char *tail)
 {
 	const char *dateptr;
diff --git a/tag.h b/tag.h
index 3ce8e721924..42bd3e64011 100644
--- a/tag.h
+++ b/tag.h
@@ -12,6 +12,8 @@ struct tag {
 	timestamp_t date;
 };
 struct tag *lookup_tag(struct repository *r, const struct object_id *oid);
+struct tag *lookup_tag_type(struct repository *r, const struct object_id *oid,
+			    enum object_type type);
 int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size);
 int parse_tag(struct tag *item);
 void release_tag_memory(struct tag *t);
diff --git a/tree.c b/tree.c
index d9b1c70b28a..895c66420e8 100644
--- a/tree.c
+++ b/tree.c
@@ -195,14 +195,26 @@ int read_tree(struct repository *r, struct tree *tree, int stage,
 	return 0;
 }
 
-struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
+struct tree *lookup_tree_type(struct repository *r, const struct object_id *oid,
+			      enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_tree_node(r));
+	if (type != OBJ_NONE &&
+	    obj->type != OBJ_NONE) {
+		enum object_type want = OBJ_TREE;
+		if (oid_is_type_or_error(oid, obj->type, &want))
+			return NULL;
+	}
 	return object_as_type(obj, OBJ_TREE, 0);
 }
 
+struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
+{
+	return lookup_tree_type(r, oid, OBJ_NONE);
+}
+
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
 {
 	if (item->object.parsed)
diff --git a/tree.h b/tree.h
index 3eb0484cbf2..49bd44f79b3 100644
--- a/tree.h
+++ b/tree.h
@@ -15,6 +15,8 @@ struct tree {
 extern const char *tree_type;
 
 struct tree *lookup_tree(struct repository *r, const struct object_id *oid);
+struct tree *lookup_tree_type(struct repository *r, const struct object_id *oid,
+			      enum object_type type);
 
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size);
 
-- 
2.31.0.rc1.210.g0f8085a843c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/7] object.c: refactor type_from_string_gently()
  2021-03-08 20:04   ` [PATCH 1/7] object.c: refactor type_from_string_gently() Ævar Arnfjörð Bjarmason
@ 2021-03-08 20:52     ` Taylor Blau
  2021-03-09 10:46     ` Jeff King
  1 sibling, 0 replies; 142+ messages in thread
From: Taylor Blau @ 2021-03-08 20:52 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin

On Mon, Mar 08, 2021 at 09:04:20PM +0100, Ævar Arnfjörð Bjarmason wrote:
> Get rid of the "gently" argument to type_from_string_gently() to make
> it consistent with most other *_gently() functions.
>
> This refactoring of adding a third parameter was done in
> fe8e3b71805 (Refactor type_from_string() to allow continuing after
> detecting an error, 2014-09-10) in preparation for its use in
> fsck.c.

Makes sense. I don't think it hurts to mention that fe8e3b71805 also
wrote the implementation of type_from_string() as:

  #define type_from_string(str) type_from_string_gently(str, -1, 0)

making it the only caller to pass '0' for the 'gentle' parameter. So by
implementing it as a function in terms of type_from_string_gently()
which checks for a negative return value, we can drop the 'gentle'
parameter entirely.

Makes sense.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/7] object.c: make type_from_string() return "enum object_type"
  2021-03-08 20:04   ` [PATCH 2/7] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
@ 2021-03-08 20:56     ` Taylor Blau
  2021-03-08 21:48     ` Junio C Hamano
  1 sibling, 0 replies; 142+ messages in thread
From: Taylor Blau @ 2021-03-08 20:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin

On Mon, Mar 08, 2021 at 09:04:21PM +0100, Ævar Arnfjörð Bjarmason wrote:
> Change the type_from_string*() functions to return an "enum
> object_type", and refactor their callers to check for "== OBJ_BAD"
> instead of "< 0".
>
> This helps to distinguish code in object.c where we really do return
> -1 from code that returns an "enum object_type", whose OBJ_BAD happens
> to be -1.

I'm not sure I understand the intention of this and the following few
patches. I could imagine that you eventually want to use other negative
return values for a different purpose, but OBJ_BAD _is_ '-1' (as you
note), and we use 'int' and 'enum object_type' interchangeably in other
parts of the code.

That's not to say that I'm necessarily opposed to picking up these few
patches, but rather that I don't fully understand their purpose.

The patch below looks good.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 5/7] object.c: add a utility function for "expected type X, got Y"
  2021-03-08 20:04   ` [PATCH 5/7] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
@ 2021-03-08 20:59     ` Taylor Blau
  2021-03-08 22:15     ` Junio C Hamano
  1 sibling, 0 replies; 142+ messages in thread
From: Taylor Blau @ 2021-03-08 20:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin

On Mon, Mar 08, 2021 at 09:04:24PM +0100, Ævar Arnfjörð Bjarmason wrote:
> Refactor various "Object X is not Y" error messages so that they use
> the same message as the long-standing object_as_type() error
> message. Now we'll consistently report e.g. that we got a commit when
> we expected a tag, not just that the object is not a tag.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/index-pack.c |  9 +++------
>  commit.c             | 10 ++++------
>  object.c             | 34 +++++++++++++++++++++++++++++++++-
>  object.h             |  5 +++++
>  tree.c               |  7 ++++---
>  5 files changed, 49 insertions(+), 16 deletions(-)
>
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 253cfb07fbd..d9082831bb2 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -217,8 +217,8 @@ static int mark_link(struct object *obj, int type, void *data, struct fsck_optio
>  	if (!obj)
>  		return -1;
>
> -	if (type != OBJ_ANY && obj->type != type)
> -		die(_("object type mismatch at %s"), oid_to_hex(&obj->oid));
> +	if (type != OBJ_ANY)
> +		oid_is_type_or_die(&obj->oid, obj->type, &type);

Nice. This is definitely an improvement over the existing code.

> +static const char *oid_is_a_X_not_a_Y = N_("object %s is a %s, not a %s");
> +

This name is a little verbose. I'm nitpicking, of course, but would you
consider "object_type_mismatch_msg" instead?

>  const char *type_name(unsigned int type)
>  {
>  	if (type >= ARRAY_SIZE(object_type_strings))
> @@ -159,6 +161,36 @@ void *create_object(struct repository *r, const struct object_id *oid, void *o)
>  	return obj;
>  }
>
> +static int oid_is_type_or(const struct object_id *oid,
> +			  enum object_type want,
> +			  enum object_type type,
> +			  int err)
> +{
> +	if (want == type)
> +		return 0;
> +	if (err)
> +		return error(_(oid_is_a_X_not_a_Y),
> +			     oid_to_hex(oid), type_name(type),
> +			     type_name(want));
> +	else
> +		die(_(oid_is_a_X_not_a_Y), oid_to_hex(oid),
> +		    type_name(type), type_name(want));
> +}
> +
> +void oid_is_type_or_die(const struct object_id *oid,
> +			enum object_type want,
> +			enum object_type *type)
> +{
> +	oid_is_type_or(oid, want, *type, 0);
> +}
> +
> +int oid_is_type_or_error(const struct object_id *oid,
> +			 enum object_type want,
> +			 enum object_type *type)
> +{
> +	return oid_is_type_or(oid, want, *type, 1);
> +}
> +

I'm not sure that this oid_is_type_or() is really doing all that much.
It allows you to share the 'if (want == type)' conditional between the
other two functions, but the rest of the function is a conditional
itself that behaves differently depending on whether you called the
_die() or _error() variant.

Why not duplicate the 'if (want == type)' between the two functions, and
then remove 'oid_is_type_or()' altogether?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/7] object.c: make type_from_string() return "enum object_type"
  2021-03-08 20:04   ` [PATCH 2/7] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
  2021-03-08 20:56     ` Taylor Blau
@ 2021-03-08 21:48     ` Junio C Hamano
  1 sibling, 0 replies; 142+ messages in thread
From: Junio C Hamano @ 2021-03-08 21:48 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Taylor Blau, Elijah Newren, Johannes Schindelin

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change the type_from_string*() functions to return an "enum
> object_type", and refactor their callers to check for "== OBJ_BAD"
> instead of "< 0".
>
> This helps to distinguish code in object.c where we really do return
> -1 from code that returns an "enum object_type", whose OBJ_BAD happens
> to be -1.

"We will be adding different error modes and the plan is to signal
them by returning negative values other than -1" would make sense
(if that is what we are going to see in the remainder of the
series), but not the above.

To the callees, what comes back from a function is -1 either way,
and they cannot tell -1 resulting from OBJ_BAD and -1 resulting from
error() apart, can they?


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 3/7] oid_object_info(): return "enum object_type"
  2021-03-08 20:04   ` [PATCH 3/7] oid_object_info(): " Ævar Arnfjörð Bjarmason
@ 2021-03-08 21:54     ` Junio C Hamano
  2021-03-08 22:32       ` Junio C Hamano
  2021-03-09 10:34     ` Jeff King
  1 sibling, 1 reply; 142+ messages in thread
From: Junio C Hamano @ 2021-03-08 21:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Taylor Blau, Elijah Newren, Johannes Schindelin

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change oid_object_info() to return an "enum object_type", this is what
> it did anyway, except that it hardcoded -1 instead of an
> OBJ_BAD.
>
> Let's instead have it return the "enum object_type", at which point
> callers will expect OBJ_BAD. This allows for refactoring code that
> e.g. expected any "< 0" value, but would only have to deal with
> OBJ_BAD (= -1).

Hmph, I have a mixed feeling about this.

> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 5ebf13359e8..1d989c62a4e 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -133,7 +133,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
>  
>  	case 'p':
>  		type = oid_object_info(the_repository, &oid, NULL);
> -		if (type < 0)
> +		if (type == OBJ_BAD)
>  			die("Not a valid object name %s", obj_name);

So, when oid_object_info() starts to return different negative value
to signal new kinds of errors, this codepath need to be changed, or
the control falls through to the rest of the case arm, which would
be worse than what the current code does (which is to die with less
specific error message).

> -		int type = oid_object_info(the_repository, &obj->oid, &size);
> -		if (type <= 0)
> +		enum object_type type = oid_object_info(the_repository, &obj->oid, &size);
> +		if (type == OBJ_BAD)
>  			die(_("did not receive expected object %s"),
>  			      oid_to_hex(&obj->oid));

Ditto.

And the issue is the same for all the other explicit comparison with
OBJ_BAD.  If we do it the other way around, i.e. leave these callers
as they are and add new negative return values to the function first,
and then convert "if negative, say error" to "if OBJ_BAD, say so,
else if we have this new type of error, say so", then the risk of
mistake becomes smaller.

But hopefully any such potential issue will be resolved by the end
of this short series, so as long as it won't be left as technical
debt, I am OK.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 5/7] object.c: add a utility function for "expected type X, got Y"
  2021-03-08 20:04   ` [PATCH 5/7] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
  2021-03-08 20:59     ` Taylor Blau
@ 2021-03-08 22:15     ` Junio C Hamano
  1 sibling, 0 replies; 142+ messages in thread
From: Junio C Hamano @ 2021-03-08 22:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Taylor Blau, Elijah Newren, Johannes Schindelin

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Refactor various "Object X is not Y" error messages so that they use
> the same message as the long-standing object_as_type() error
> message. Now we'll consistently report e.g. that we got a commit when
> we expected a tag, not just that the object is not a tag.

This one is quite nice.  There might be some i18n fallout to do this
at this low layer, but everything should be manageable (i.e. we can
just tell the users "don't parse the die() message---they are meant
for humans") immediately, and in the longer term, we probably would
want to move away from dying at this low level anyway and instead
return an error for the higher layer to deal with.

> +static const char *oid_is_a_X_not_a_Y = N_("object %s is a %s, not a %s");
> +
>  const char *type_name(unsigned int type)
>  {
>  	if (type >= ARRAY_SIZE(object_type_strings))
> @@ -159,6 +161,36 @@ void *create_object(struct repository *r, const struct object_id *oid, void *o)
>  	return obj;
>  }
>  
> +static int oid_is_type_or(const struct object_id *oid,
> +			  enum object_type want,
> +			  enum object_type type,
> +			  int err)

"err" is usually called "gently" in this codebase, isn't it?

> +{
> +	if (want == type)
> +		return 0;
> +	if (err)
> +		return error(_(oid_is_a_X_not_a_Y),
> +			     oid_to_hex(oid), type_name(type),
> +			     type_name(want));

Just a style thing, but breaking line after oid like you did on the
other "!err" side makes it a lot more readable.

> +	else
> +		die(_(oid_is_a_X_not_a_Y), oid_to_hex(oid),
> +		    type_name(type), type_name(want));
> +}

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 3/7] oid_object_info(): return "enum object_type"
  2021-03-08 21:54     ` Junio C Hamano
@ 2021-03-08 22:32       ` Junio C Hamano
  0 siblings, 0 replies; 142+ messages in thread
From: Junio C Hamano @ 2021-03-08 22:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Taylor Blau, Elijah Newren, Johannes Schindelin

Junio C Hamano <gitster@pobox.com> writes:

> And the issue is the same for all the other explicit comparison with
> OBJ_BAD.  If we do it the other way around, i.e. leave these callers
> as they are and add new negative return values to the function first,
> and then convert "if negative, say error" to "if OBJ_BAD, say so,
> else if we have this new type of error, say so", then the risk of
> mistake becomes smaller.
>
> But hopefully any such potential issue will be resolved by the end
> of this short series, so as long as it won't be left as technical
> debt, I am OK.

And after reading through the topic to the end, it turns out that
the code did not add new error return value.  So while it probably
is a good idea to make oid_object_info() to return the enum, I am
not convinced that the updates to the caller that used to check for
the negativeness is an improvement.  Rewriting the ones that used to
compare with -1 for equality to instead compare with OBJ_BAD would
be very much welcome, though.


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 3/7] oid_object_info(): return "enum object_type"
  2021-03-08 20:04   ` [PATCH 3/7] oid_object_info(): " Ævar Arnfjörð Bjarmason
  2021-03-08 21:54     ` Junio C Hamano
@ 2021-03-09 10:34     ` Jeff King
  1 sibling, 0 replies; 142+ messages in thread
From: Jeff King @ 2021-03-09 10:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Taylor Blau, Elijah Newren, Johannes Schindelin

On Mon, Mar 08, 2021 at 09:04:22PM +0100, Ævar Arnfjörð Bjarmason wrote:

> Change oid_object_info() to return an "enum object_type", this is what
> it did anyway, except that it hardcoded -1 instead of an
> OBJ_BAD.

It does return -1 in the error case, and otherwise returns the "type"
field it got from the odb-specific fields. Which presumably is always
greater than 0, but...

> Let's instead have it return the "enum object_type", at which point
> callers will expect OBJ_BAD. This allows for refactoring code that
> e.g. expected any "< 0" value, but would only have to deal with
> OBJ_BAD (= -1).

Some of these conversions are not just "< 0", like:

> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index bad57488079..253cfb07fbd 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -236,8 +236,8 @@ static unsigned check_object(struct object *obj)
>  
>  	if (!(obj->flags & FLAG_CHECKED)) {
>  		unsigned long size;
> -		int type = oid_object_info(the_repository, &obj->oid, &size);
> -		if (type <= 0)
> +		enum object_type type = oid_object_info(the_repository, &obj->oid, &size);
> +		if (type == OBJ_BAD)

I kind of doubt that we could get OBJ_NONE here, but it seems like a
much riskier change than just "let's prefer OBJ_BAD to -1".

Did you trace through all of the paths that oid_object_info() can end up
in? (I did very briefly and I _think_ it's OK, but...).

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 6/7] object tests: add test for unexpected objects in tags
  2021-03-08 20:04   ` [PATCH 6/7] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
@ 2021-03-09 10:44     ` Jeff King
  2021-03-28  1:35       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 142+ messages in thread
From: Jeff King @ 2021-03-09 10:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Taylor Blau, Elijah Newren, Johannes Schindelin

On Mon, Mar 08, 2021 at 09:04:25PM +0100, Ævar Arnfjörð Bjarmason wrote:

> Fix a blind spot in the tests added in 0616617c7e1 (t: introduce tests
> for unexpected object types, 2019-04-09), there were no meaningful
> tests for checking how we reported on finding the incorrect object
> type in a tag, i.e. one that broke the "type" promise in the tag
> header.

Isn't this covered by tests 16 and 17 ("traverse unexpected non-commit
tag", both "lone" and "seen")? And likewise the matching "non-tree" and
"non-blob" variants afterwards?

The only thing we don't seem to cover is an unexpected non-tag. I don't
mind adding that, but why wouldn't we just follow the template of the
existing tests?

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/7] object.c: refactor type_from_string_gently()
  2021-03-08 20:04   ` [PATCH 1/7] object.c: refactor type_from_string_gently() Ævar Arnfjörð Bjarmason
  2021-03-08 20:52     ` Taylor Blau
@ 2021-03-09 10:46     ` Jeff King
  1 sibling, 0 replies; 142+ messages in thread
From: Jeff King @ 2021-03-09 10:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Taylor Blau, Elijah Newren, Johannes Schindelin

On Mon, Mar 08, 2021 at 09:04:20PM +0100, Ævar Arnfjörð Bjarmason wrote:

> Get rid of the "gently" argument to type_from_string_gently() to make
> it consistent with most other *_gently() functions.
> 
> This refactoring of adding a third parameter was done in
> fe8e3b71805 (Refactor type_from_string() to allow continuing after
> detecting an error, 2014-09-10) in preparation for its use in
> fsck.c.
> 
> Since then no callers of this function have passed a "len < 0" as was
> expected might happen, so we can simplify its invocation by knowing
> that it's never called like that.

This final paragraph confused me. What does "len < 0" have to do with
"gently"?

I think the answer is just that the non-gentle form never bothered to
take a "len" parameter, so you are doing both simplifications at once.
IMHO it would be easier to understand broken into two commits (but not
necessarily worth re-rolling).

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 6/7] object tests: add test for unexpected objects in tags
  2021-03-09 10:44     ` Jeff King
@ 2021-03-28  1:35       ` Ævar Arnfjörð Bjarmason
  2021-03-28  9:06         ` Jeff King
  0 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  1:35 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Taylor Blau, Elijah Newren, Johannes Schindelin


On Tue, Mar 09 2021, Jeff King wrote:

> On Mon, Mar 08, 2021 at 09:04:25PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> Fix a blind spot in the tests added in 0616617c7e1 (t: introduce tests
>> for unexpected object types, 2019-04-09), there were no meaningful
>> tests for checking how we reported on finding the incorrect object
>> type in a tag, i.e. one that broke the "type" promise in the tag
>> header.
>
> Isn't this covered by tests 16 and 17 ("traverse unexpected non-commit
> tag", both "lone" and "seen")? And likewise the matching "non-tree" and
> "non-blob" variants afterwards?

Barely, those tests are mainly testing that rev-list doesn't die, and
only do a very fuzzy match on the output. E.g. checking `grep "not a
commit" err`, not a full test_cmp that'll check what OID is reported
etc.

> The only thing we don't seem to cover is an unexpected non-tag. I don't
> mind adding that, but why wouldn't we just follow the template of the
> existing tests?

I am following that template to some extent, e.g. using
${commit,tree,blob}. It just didn't seem worth it to refactor an earlier
test in the file just to re-use a single hash-object invocation, those
tests e.g. clobber the $tag variable, so bending over backwards to
re-use anything set up in them would mean some refactoring.

I think it's much clearer just do do all the different kinds of setup in
the new setup function.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 00/10] improve reporting of unexpected objects
  2021-03-08 20:04   ` [PATCH 0/7] improve reporting of unexpected objects Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:13     ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:13       ` [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
                         ` (11 more replies)
  0 siblings, 12 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

As noted in v1[1] this is some s/int/enum object_type/ refactoring,
and finally fixing an issue with our error reporting about corrupt
tags being wrong.

This should address all the feedback I got about the v1. Thanks
everyone, and sorry about the time it took to re-roll this.

1. http://lore.kernel.org/git/20210308200426.21824-1-avarab@gmail.com

*** BLURB HERE ***

Ævar Arnfjörð Bjarmason (10):
  object.c: stop supporting len == -1 in type_from_string_gently()
  object.c: refactor type_from_string_gently()
  object.c: make type_from_string() return "enum object_type"
  object-file.c: make oid_object_info() return "enum object_type"
  object-name.c: make dependency on object_type order more obvious
  tree.c: fix misindentation in parse_tree_gently()
  object.c: add a utility function for "expected type X, got Y"
  object.c: add and use oid_is_type_or_die_msg() function
  object tests: add test for unexpected objects in tags
  tag: don't misreport type of tagged objects in errors

 blob.c                                 |  16 +++-
 blob.h                                 |   3 +
 builtin/blame.c                        |   2 +-
 builtin/index-pack.c                   |  11 +--
 combine-diff.c                         |   3 +-
 commit.c                               |  24 ++++--
 commit.h                               |   2 +
 fsck.c                                 |   2 +-
 merge-recursive.c                      |   5 +-
 object-file.c                          |  10 +--
 object-name.c                          |  25 +++---
 object-store.h                         |   4 +-
 object.c                               |  65 +++++++++++---
 object.h                               |  12 ++-
 packfile.c                             |   2 +-
 t/t6102-rev-list-unexpected-objects.sh | 113 ++++++++++++++++++++++++-
 tag.c                                  |  14 ++-
 tag.h                                  |   2 +
 tree.c                                 |  27 ++++--
 tree.h                                 |   2 +
 20 files changed, 279 insertions(+), 65 deletions(-)

Range-diff:
 -:  ----------- >  1:  e51c860a65d object.c: stop supporting len == -1 in type_from_string_gently()
 1:  1f50a33ab5c !  2:  3e3979b6b35 object.c: refactor type_from_string_gently()
    @@ Commit message
         detecting an error, 2014-09-10) in preparation for its use in
         fsck.c.
     
    -    Since then no callers of this function have passed a "len < 0" as was
    -    expected might happen, so we can simplify its invocation by knowing
    -    that it's never called like that.
    +    Simplifying this means we can move the die() into the simpler
    +    type_from_string() function.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ object.c: const char *type_name(unsigned int type)
      {
      	int i;
      
    --	if (len < 0)
    --		len = strlen(str);
    --
    - 	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
    +@@ object.c: int type_from_string_gently(const char *str, ssize_t len, int gentle)
      		if (!strncmp(str, object_type_strings[i], len) &&
      		    object_type_strings[i][len] == '\0')
      			return i;
    -+	return -1;
    -+}
    - 
    +-
     -	if (gentle)
     -		return -1;
     -
     -	die(_("invalid object type \"%s\""), str);
    -+int type_from_string(const char *str)
    -+{
    -+	size_t len = strlen(str);
    ++	return -1;
    + }
    + 
    + int type_from_string(const char *str)
    + {
    + 	size_t len = strlen(str);
    +-	int ret = type_from_string_gently(str, len, 0);
     +	int ret = type_from_string_gently(str, len);
     +	if (ret < 0)
     +		die(_("invalid object type \"%s\""), str);
    -+	return ret;
    + 	return ret;
      }
      
    - /*
     
      ## object.h ##
     @@ object.h: struct object {
    @@ object.h: struct object {
      
      const char *type_name(unsigned int type);
     -int type_from_string_gently(const char *str, ssize_t, int gentle);
    --#define type_from_string(str) type_from_string_gently(str, -1, 0)
     +int type_from_string_gently(const char *str, ssize_t len);
    -+int type_from_string(const char *str);
    + int type_from_string(const char *str);
      
      /*
    -  * Return the current number of buckets in the object hashmap.
 2:  a4e444f9274 !  3:  5615730f023 object.c: make type_from_string() return "enum object_type"
    @@ Commit message
         object.c: make type_from_string() return "enum object_type"
     
         Change the type_from_string*() functions to return an "enum
    -    object_type", and refactor their callers to check for "== OBJ_BAD"
    -    instead of "< 0".
    +    object_type", but don't refactor their callers to check for "==
    +    OBJ_BAD" instead of "< 0".
     
    -    This helps to distinguish code in object.c where we really do return
    -    -1 from code that returns an "enum object_type", whose OBJ_BAD happens
    -    to be -1.
    +    Refactoring the check of the return value to check == OBJ_BAD would
    +    now be equivalent to "ret < 0", but the consensus on an earlier
    +    version of this patch was to not do that, and to instead use -1
    +    consistently as a return value. It just so happens that OBJ_BAD == -1,
    +    but let's not put a hard reliance on that.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    - ## fsck.c ##
    -@@ fsck.c: int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
    - 		goto done;
    - 	}
    - 	*tagged_type = type_from_string_gently(buffer, eol - buffer);
    --	if (*tagged_type < 0)
    -+	if (*tagged_type == OBJ_BAD)
    - 		ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_TYPE, "invalid 'type' value");
    - 	if (ret)
    - 		goto done;
    -
    - ## object-file.c ##
    -@@ object-file.c: static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
    - 	 */
    - 	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
    - 		type = 0;
    --	else if (type < 0)
    -+	else if (type == OBJ_BAD)
    - 		die(_("invalid object type"));
    - 	if (oi->typep)
    - 		*oi->typep = type;
    -
      ## object.c ##
     @@ object.c: const char *type_name(unsigned int type)
      	return object_type_strings[type];
    @@ object.c: const char *type_name(unsigned int type)
      
      	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
      		if (!strncmp(str, object_type_strings[i], len) &&
    - 		    object_type_strings[i][len] == '\0')
    - 			return i;
    --	return -1;
    -+	return OBJ_BAD;
    +@@ object.c: int type_from_string_gently(const char *str, ssize_t len)
    + 	return -1;
      }
      
     -int type_from_string(const char *str)
    @@ object.c: const char *type_name(unsigned int type)
      {
      	size_t len = strlen(str);
     -	int ret = type_from_string_gently(str, len);
    --	if (ret < 0)
     +	enum object_type ret = type_from_string_gently(str, len);
    -+	if (ret == OBJ_BAD)
    + 	if (ret < 0)
      		die(_("invalid object type \"%s\""), str);
      	return ret;
    - }
     
      ## object.h ##
     @@ object.h: struct object {
 3:  309fb7b71e7 !  4:  c10082f4fac oid_object_info(): return "enum object_type"
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    oid_object_info(): return "enum object_type"
    +    object-file.c: make oid_object_info() return "enum object_type"
     
    -    Change oid_object_info() to return an "enum object_type", this is what
    -    it did anyway, except that it hardcoded -1 instead of an
    -    OBJ_BAD.
    -
    -    Let's instead have it return the "enum object_type", at which point
    -    callers will expect OBJ_BAD. This allows for refactoring code that
    -    e.g. expected any "< 0" value, but would only have to deal with
    -    OBJ_BAD (= -1).
    +    Change oid_object_info() to return an "enum object_type". Unlike
    +    oid_object_info_extended() function the simpler oid_object_info()
    +    explicitly returns the oi.typep member, which is itself an "enum
    +    object_type".
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ builtin/blame.c: static int peel_to_commit_oid(struct object_id *oid_ret, void *
      			oidcpy(oid_ret, &oid);
      			return 0;
     
    - ## builtin/cat-file.c ##
    -@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
    - 
    - 	case 'p':
    - 		type = oid_object_info(the_repository, &oid, NULL);
    --		if (type < 0)
    -+		if (type == OBJ_BAD)
    - 			die("Not a valid object name %s", obj_name);
    - 
    - 		/* custom pretty-print here */
    -
      ## builtin/index-pack.c ##
     @@ builtin/index-pack.c: static unsigned check_object(struct object *obj)
      
      	if (!(obj->flags & FLAG_CHECKED)) {
      		unsigned long size;
     -		int type = oid_object_info(the_repository, &obj->oid, &size);
    --		if (type <= 0)
     +		enum object_type type = oid_object_info(the_repository, &obj->oid, &size);
    -+		if (type == OBJ_BAD)
    + 		if (type <= 0)
      			die(_("did not receive expected object %s"),
      			      oid_to_hex(&obj->oid));
    - 		if (type != obj->type)
    -@@ builtin/index-pack.c: static void sha1_object(const void *data, struct object_entry *obj_entry,
    - 		unsigned long has_size;
    - 		read_lock();
    - 		has_type = oid_object_info(the_repository, oid, &has_size);
    --		if (has_type < 0)
    -+		if (has_type == OBJ_BAD)
    - 			die(_("cannot read existing object info %s"), oid_to_hex(oid));
    - 		if (has_type != type || has_size != size)
    - 			die(_("SHA1 COLLISION FOUND WITH %s !"), oid_to_hex(oid));
    -
    - ## builtin/mktree.c ##
    -@@ builtin/mktree.c: static void mktree_line(char *buf, int nul_term_line, int allow_missing)
    - 
    - 	/* Check the type of object identified by sha1 */
    - 	obj_type = oid_object_info(the_repository, &oid, NULL);
    --	if (obj_type < 0) {
    -+	if (obj_type == OBJ_BAD) {
    - 		if (allow_missing) {
    - 			; /* no problem - missing objects are presumed to be of the right type */
    - 		} else {
    -
    - ## builtin/pack-objects.c ##
    -@@ builtin/pack-objects.c: unsigned long oe_get_size_slow(struct packing_data *pack,
    - 
    - 	if (e->type_ != OBJ_OFS_DELTA && e->type_ != OBJ_REF_DELTA) {
    - 		packing_data_lock(&to_pack);
    --		if (oid_object_info(the_repository, &e->idx.oid, &size) < 0)
    -+		if (oid_object_info(the_repository, &e->idx.oid, &size) == OBJ_BAD)
    - 			die(_("unable to get size of %s"),
    - 			    oid_to_hex(&e->idx.oid));
    - 		packing_data_unlock(&to_pack);
    -@@ builtin/pack-objects.c: static int add_loose_object(const struct object_id *oid, const char *path,
    - {
    - 	enum object_type type = oid_object_info(the_repository, oid, NULL);
    - 
    --	if (type < 0) {
    -+	if (type == OBJ_BAD) {
    - 		warning(_("loose object at %s could not be examined"), path);
    - 		return 0;
    - 	}
    -
    - ## builtin/replace.c ##
    -@@ builtin/replace.c: static int edit_and_replace(const char *object_ref, int force, int raw)
    - 		return error(_("not a valid object name: '%s'"), object_ref);
    - 
    - 	type = oid_object_info(the_repository, &old_oid, NULL);
    --	if (type < 0)
    -+	if (type == OBJ_BAD)
    - 		return error(_("unable to get object type for %s"),
    - 			     oid_to_hex(&old_oid));
    - 
    -
    - ## builtin/tag.c ##
    -@@ builtin/tag.c: static void create_tag(const struct object_id *object, const char *object_ref,
    - 	char *path = NULL;
    - 
    - 	type = oid_object_info(the_repository, object, NULL);
    --	if (type <= OBJ_NONE)
    -+	if (type == OBJ_BAD)
    - 		die(_("bad object type."));
    - 
    - 	if (type == OBJ_TAG)
    -
    - ## builtin/unpack-objects.c ##
    -@@ builtin/unpack-objects.c: static int check_object(struct object *obj, int type, void *data, struct fsck_op
    - 	if (!(obj->flags & FLAG_OPEN)) {
    - 		unsigned long size;
    - 		int type = oid_object_info(the_repository, &obj->oid, &size);
    --		if (type != obj->type || type <= 0)
    -+		if (type == OBJ_BAD)
    -+			die(_("unable to get object type for %s"),
    -+			    oid_to_hex(&obj->oid));
    -+		if (type != obj->type)
    -+			/* todo to new function */
    - 			die("object of unexpected type");
    - 		obj->flags |= FLAG_WRITTEN;
    - 		return 0;
     
      ## object-file.c ##
     @@ object-file.c: int oid_object_info_extended(struct repository *r, const struct object_id *oid,
    + 	return ret;
      }
      
    - 
    +-
     -/* returns enum object_type or negative */
     -int oid_object_info(struct repository *r,
     -		    const struct object_id *oid,
    @@ object-file.c: int oid_object_info_extended(struct repository *r, const struct o
      {
      	enum object_type type;
      	struct object_info oi = OBJECT_INFO_INIT;
    -@@ object-file.c: int oid_object_info(struct repository *r,
    - 	oi.sizep = sizep;
    - 	if (oid_object_info_extended(r, oid, &oi,
    - 				      OBJECT_INFO_LOOKUP_REPLACE) < 0)
    --		return -1;
    -+		return OBJ_BAD;
    - 	return type;
    - }
    - 
    -@@ object-file.c: int read_pack_header(int fd, struct pack_header *header)
    - void assert_oid_type(const struct object_id *oid, enum object_type expect)
    - {
    - 	enum object_type type = oid_object_info(the_repository, oid, NULL);
    --	if (type < 0)
    -+	if (type == OBJ_BAD)
    - 		die(_("%s is not a valid object"), oid_to_hex(oid));
    - 	if (type != expect)
    - 		die(_("%s is not a valid '%s' object"), oid_to_hex(oid),
     
      ## object-name.c ##
     @@ object-name.c: static int disambiguate_committish_only(struct repository *r,
    @@ object-name.c: static int disambiguate_committish_only(struct repository *r,
      {
      	struct object *obj;
     -	int kind;
    -+	enum object_type kind;
    ++	enum object_type kind = oid_object_info(r, oid, NULL);
      
    - 	kind = oid_object_info(r, oid, NULL);
    +-	kind = oid_object_info(r, oid, NULL);
      	if (kind == OBJ_COMMIT)
    + 		return 1;
    + 	if (kind != OBJ_TAG)
     @@ object-name.c: static int disambiguate_tree_only(struct repository *r,
      				  const struct object_id *oid,
      				  void *cb_data_unused)
    @@ object-store.h: static inline void *repo_read_object_file(struct repository *r,
      
      /* Read and unpack an object file into memory, write memory to an object file */
     -int oid_object_info(struct repository *r, const struct object_id *, unsigned long *);
    -+enum object_type oid_object_info(struct repository *r, const struct object_id *, unsigned long *);
    ++enum object_type oid_object_info(struct repository *r,
    ++				 const struct object_id *,
    ++				 unsigned long *);
      
      int hash_object_file(const struct git_hash_algo *algo, const void *buf,
      		     unsigned long len, const char *type,
    @@ packfile.c: static int retry_bad_packed_offset(struct repository *r,
      	uint32_t pos;
      	struct object_id oid;
      	if (offset_to_pack_pos(p, obj_offset, &pos) < 0)
    -@@ packfile.c: static int retry_bad_packed_offset(struct repository *r,
    - 	nth_packed_object_id(&oid, p, pack_pos_to_index(p, pos));
    - 	mark_bad_packed_object(p, oid.hash);
    - 	type = oid_object_info(r, &oid, NULL);
    --	if (type <= OBJ_NONE)
    --		return OBJ_BAD;
    - 	return type;
    - }
    - 
    -
    - ## reachable.c ##
    -@@ reachable.c: static void add_recent_object(const struct object_id *oid,
    - 	 * commits and tags to have been parsed.
    - 	 */
    - 	type = oid_object_info(the_repository, oid, NULL);
    --	if (type < 0)
    --		die("unable to get object info for %s", oid_to_hex(oid));
    - 
    - 	switch (type) {
    - 	case OBJ_TAG:
    -@@ reachable.c: static void add_recent_object(const struct object_id *oid,
    - 	case OBJ_BLOB:
    - 		obj = (struct object *)lookup_blob(the_repository, oid);
    - 		break;
    -+	case OBJ_BAD:
    -+		die("unable to get object info for %s", oid_to_hex(oid));
    -+		break;
    - 	default:
    - 		die("unknown object type for %s: %s",
    - 		    oid_to_hex(oid), type_name(type));
 -:  ----------- >  5:  1ebcf1416b8 object-name.c: make dependency on object_type order more obvious
 4:  e93881ed264 =  6:  464c9e35256 tree.c: fix misindentation in parse_tree_gently()
 5:  bed81215646 !  7:  4bf29cbb383 object.c: add a utility function for "expected type X, got Y"
    @@ builtin/index-pack.c: static int mark_link(struct object *obj, int type, void *d
      	obj->flags |= FLAG_LINK;
      	return 0;
     @@ builtin/index-pack.c: static unsigned check_object(struct object *obj)
    - 		if (type == OBJ_BAD)
    + 		if (type <= 0)
      			die(_("did not receive expected object %s"),
      			      oid_to_hex(&obj->oid));
     -		if (type != obj->type)
    @@ builtin/index-pack.c: static unsigned check_object(struct object *obj)
      		return 1;
      	}
     
    + ## combine-diff.c ##
    +@@ combine-diff.c: static char *grab_blob(struct repository *r,
    + 		free_filespec(df);
    + 	} else {
    + 		blob = read_object_file(oid, &type, size);
    +-		if (type != OBJ_BLOB)
    +-			die("object '%s' is not a blob!", oid_to_hex(oid));
    ++		oid_is_type_or_die(oid, OBJ_BLOB, &type);
    + 	}
    + 	return blob;
    + }
    +
      ## commit.c ##
     @@ commit.c: const void *repo_get_commit_buffer(struct repository *r,
      		if (!ret)
    @@ commit.c: int repo_parse_commit_internal(struct repository *r,
      
      	ret = parse_commit_buffer(r, item, buffer, size, 0);
     
    + ## merge-recursive.c ##
    +@@ merge-recursive.c: static int read_oid_strbuf(struct merge_options *opt,
    + 	if (!buf)
    + 		return err(opt, _("cannot read object %s"), oid_to_hex(oid));
    + 	if (type != OBJ_BLOB) {
    ++		const char* msg = oid_is_type_or_die_msg(oid, OBJ_BLOB, &type);
    + 		free(buf);
    + 		return err(opt, _("object %s is not a blob"), oid_to_hex(oid));
    + 	}
    +
      ## object.c ##
    -@@ object.c: static const char *object_type_strings[] = {
    - 	"tag",		/* OBJ_TAG = 4 */
    - };
    - 
    -+static const char *oid_is_a_X_not_a_Y = N_("object %s is a %s, not a %s");
    -+
    - const char *type_name(unsigned int type)
    - {
    - 	if (type >= ARRAY_SIZE(object_type_strings))
     @@ object.c: void *create_object(struct repository *r, const struct object_id *oid, void *o)
      	return obj;
      }
      
    -+static int oid_is_type_or(const struct object_id *oid,
    -+			  enum object_type want,
    -+			  enum object_type type,
    -+			  int err)
    -+{
    -+	if (want == type)
    -+		return 0;
    -+	if (err)
    -+		return error(_(oid_is_a_X_not_a_Y),
    -+			     oid_to_hex(oid), type_name(type),
    -+			     type_name(want));
    -+	else
    -+		die(_(oid_is_a_X_not_a_Y), oid_to_hex(oid),
    -+		    type_name(type), type_name(want));
    -+}
    ++static const char *object_type_mismatch_msg = N_("object %s is a %s, not a %s");
     +
     +void oid_is_type_or_die(const struct object_id *oid,
     +			enum object_type want,
     +			enum object_type *type)
     +{
    -+	oid_is_type_or(oid, want, *type, 0);
    ++	if (want == *type)
    ++		return;
    ++	die(_(object_type_mismatch_msg), oid_to_hex(oid),
    ++	    type_name(*type), type_name(want));
     +}
     +
     +int oid_is_type_or_error(const struct object_id *oid,
     +			 enum object_type want,
     +			 enum object_type *type)
     +{
    -+	return oid_is_type_or(oid, want, *type, 1);
    ++	if (want == *type)
    ++		return 0;
    ++	return error(_(object_type_mismatch_msg),
    ++		     oid_to_hex(oid), type_name(*type),
    ++		     type_name(want));
     +}
     +
      void *object_as_type(struct object *obj, enum object_type type, int quiet)
    @@ object.c: void *object_as_type(struct object *obj, enum object_type type, int qu
      	else {
      		if (!quiet)
     -			error(_("object %s is a %s, not a %s"),
    -+			error(_(oid_is_a_X_not_a_Y),
    ++			error(_(object_type_mismatch_msg),
      			      oid_to_hex(&obj->oid),
      			      type_name(obj->type), type_name(type));
      		return NULL;
 -:  ----------- >  8:  351a8ec79c8 object.c: add and use oid_is_type_or_die_msg() function
 6:  6d34b2b80db =  9:  6a43bf897ae object tests: add test for unexpected objects in tags
 7:  f93236c25fd ! 10:  a84f670ac24 tag: don't misreport type of tagged objects in errors
    @@ Commit message
     
         Hence the non-intuitive solution of adding a
         lookup_{blob,commit,tag,tree}_type() function. It's to distinguish
    -    parse_object_buffer() where we actually know the type from
    -    parse_tag_buffer() where we're just guessing about the type.
    +    calls from parse_object_buffer() where we actually know the type, from
    +    a parse_tag_buffer() where we're just guessing about the type.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
-- 
2.31.1.442.g6c06c9fe35c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently()
  2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:13       ` Ævar Arnfjörð Bjarmason
  2021-03-28  5:35         ` Junio C Hamano
  2021-03-28  2:13       ` [PATCH v2 02/10] object.c: refactor type_from_string_gently() Ævar Arnfjörð Bjarmason
                         ` (10 subsequent siblings)
  11 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Change the type_from_string() macro into a function and drop the
support for passing len < 0.

Support for len < 0 was added in fe8e3b71805 (Refactor
type_from_string() to allow continuing after detecting an error,
2014-09-10), but no callers use that form. Let's drop it to simplify
this, and in preparation for simplifying these even further.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object.c | 10 +++++++---
 object.h |  2 +-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/object.c b/object.c
index 78343781ae7..65446172172 100644
--- a/object.c
+++ b/object.c
@@ -39,9 +39,6 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
 {
 	int i;
 
-	if (len < 0)
-		len = strlen(str);
-
 	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
 		if (!strncmp(str, object_type_strings[i], len) &&
 		    object_type_strings[i][len] == '\0')
@@ -53,6 +50,13 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
 	die(_("invalid object type \"%s\""), str);
 }
 
+int type_from_string(const char *str)
+{
+	size_t len = strlen(str);
+	int ret = type_from_string_gently(str, len, 0);
+	return ret;
+}
+
 /*
  * Return a numerical hash value between 0 and n-1 for the object with
  * the specified sha1.  n must be a power of 2.  Please note that the
diff --git a/object.h b/object.h
index 59daadce214..3ab3eb193d3 100644
--- a/object.h
+++ b/object.h
@@ -94,7 +94,7 @@ struct object {
 
 const char *type_name(unsigned int type);
 int type_from_string_gently(const char *str, ssize_t, int gentle);
-#define type_from_string(str) type_from_string_gently(str, -1, 0)
+int type_from_string(const char *str);
 
 /*
  * Return the current number of buckets in the object hashmap.
-- 
2.31.1.442.g6c06c9fe35c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 02/10] object.c: refactor type_from_string_gently()
  2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
  2021-03-28  2:13       ` [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:13       ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:13       ` [PATCH v2 03/10] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
                         ` (9 subsequent siblings)
  11 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Get rid of the "gently" argument to type_from_string_gently() to make
it consistent with most other *_gently() functions.

This refactoring of adding a third parameter was done in
fe8e3b71805 (Refactor type_from_string() to allow continuing after
detecting an error, 2014-09-10) in preparation for its use in
fsck.c.

Simplifying this means we can move the die() into the simpler
type_from_string() function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c        |  2 +-
 object-file.c |  2 +-
 object.c      | 12 +++++-------
 object.h      |  2 +-
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fsck.c b/fsck.c
index e3030f3b358..6cc4f9ea892 100644
--- a/fsck.c
+++ b/fsck.c
@@ -957,7 +957,7 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
 		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TYPE, "invalid format - unexpected end after 'type' line");
 		goto done;
 	}
-	*tagged_type = type_from_string_gently(buffer, eol - buffer, 1);
+	*tagged_type = type_from_string_gently(buffer, eol - buffer);
 	if (*tagged_type < 0)
 		ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_TYPE, "invalid 'type' value");
 	if (ret)
diff --git a/object-file.c b/object-file.c
index 624af408cdc..b7c26b67355 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1314,7 +1314,7 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 		type_len++;
 	}
 
-	type = type_from_string_gently(type_buf, type_len, 1);
+	type = type_from_string_gently(type_buf, type_len);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
 	/*
diff --git a/object.c b/object.c
index 65446172172..1c36ea606f2 100644
--- a/object.c
+++ b/object.c
@@ -35,7 +35,7 @@ const char *type_name(unsigned int type)
 	return object_type_strings[type];
 }
 
-int type_from_string_gently(const char *str, ssize_t len, int gentle)
+int type_from_string_gently(const char *str, ssize_t len)
 {
 	int i;
 
@@ -43,17 +43,15 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
 		if (!strncmp(str, object_type_strings[i], len) &&
 		    object_type_strings[i][len] == '\0')
 			return i;
-
-	if (gentle)
-		return -1;
-
-	die(_("invalid object type \"%s\""), str);
+	return -1;
 }
 
 int type_from_string(const char *str)
 {
 	size_t len = strlen(str);
-	int ret = type_from_string_gently(str, len, 0);
+	int ret = type_from_string_gently(str, len);
+	if (ret < 0)
+		die(_("invalid object type \"%s\""), str);
 	return ret;
 }
 
diff --git a/object.h b/object.h
index 3ab3eb193d3..ffdc1298300 100644
--- a/object.h
+++ b/object.h
@@ -93,7 +93,7 @@ struct object {
 };
 
 const char *type_name(unsigned int type);
-int type_from_string_gently(const char *str, ssize_t, int gentle);
+int type_from_string_gently(const char *str, ssize_t len);
 int type_from_string(const char *str);
 
 /*
-- 
2.31.1.442.g6c06c9fe35c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 03/10] object.c: make type_from_string() return "enum object_type"
  2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
  2021-03-28  2:13       ` [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
  2021-03-28  2:13       ` [PATCH v2 02/10] object.c: refactor type_from_string_gently() Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:13       ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:13       ` [PATCH v2 04/10] object-file.c: make oid_object_info() " Ævar Arnfjörð Bjarmason
                         ` (8 subsequent siblings)
  11 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Change the type_from_string*() functions to return an "enum
object_type", but don't refactor their callers to check for "==
OBJ_BAD" instead of "< 0".

Refactoring the check of the return value to check == OBJ_BAD would
now be equivalent to "ret < 0", but the consensus on an earlier
version of this patch was to not do that, and to instead use -1
consistently as a return value. It just so happens that OBJ_BAD == -1,
but let's not put a hard reliance on that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object.c | 8 ++++----
 object.h | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/object.c b/object.c
index 1c36ea606f2..c0e68d4bbf6 100644
--- a/object.c
+++ b/object.c
@@ -35,9 +35,9 @@ const char *type_name(unsigned int type)
 	return object_type_strings[type];
 }
 
-int type_from_string_gently(const char *str, ssize_t len)
+enum object_type type_from_string_gently(const char *str, ssize_t len)
 {
-	int i;
+	enum object_type i;
 
 	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
 		if (!strncmp(str, object_type_strings[i], len) &&
@@ -46,10 +46,10 @@ int type_from_string_gently(const char *str, ssize_t len)
 	return -1;
 }
 
-int type_from_string(const char *str)
+enum object_type type_from_string(const char *str)
 {
 	size_t len = strlen(str);
-	int ret = type_from_string_gently(str, len);
+	enum object_type ret = type_from_string_gently(str, len);
 	if (ret < 0)
 		die(_("invalid object type \"%s\""), str);
 	return ret;
diff --git a/object.h b/object.h
index ffdc1298300..5e7a523e858 100644
--- a/object.h
+++ b/object.h
@@ -93,8 +93,8 @@ struct object {
 };
 
 const char *type_name(unsigned int type);
-int type_from_string_gently(const char *str, ssize_t len);
-int type_from_string(const char *str);
+enum object_type type_from_string_gently(const char *str, ssize_t len);
+enum object_type type_from_string(const char *str);
 
 /*
  * Return the current number of buckets in the object hashmap.
-- 
2.31.1.442.g6c06c9fe35c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 04/10] object-file.c: make oid_object_info() return "enum object_type"
  2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                         ` (2 preceding siblings ...)
  2021-03-28  2:13       ` [PATCH v2 03/10] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:13       ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:13       ` [PATCH v2 05/10] object-name.c: make dependency on object_type order more obvious Ævar Arnfjörð Bjarmason
                         ` (7 subsequent siblings)
  11 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Change oid_object_info() to return an "enum object_type". Unlike
oid_object_info_extended() function the simpler oid_object_info()
explicitly returns the oi.typep member, which is itself an "enum
object_type".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/blame.c      |  2 +-
 builtin/index-pack.c |  2 +-
 object-file.c        |  8 +++-----
 object-name.c        | 19 +++++++++----------
 object-store.h       |  4 +++-
 packfile.c           |  2 +-
 6 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 641523ff9af..5dd3c38a8c6 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -810,7 +810,7 @@ static int peel_to_commit_oid(struct object_id *oid_ret, void *cbdata)
 	oidcpy(&oid, oid_ret);
 	while (1) {
 		struct object *obj;
-		int kind = oid_object_info(r, &oid, NULL);
+		enum object_type kind = oid_object_info(r, &oid, NULL);
 		if (kind == OBJ_COMMIT) {
 			oidcpy(oid_ret, &oid);
 			return 0;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 21899687e2c..17376db1e39 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -236,7 +236,7 @@ static unsigned check_object(struct object *obj)
 
 	if (!(obj->flags & FLAG_CHECKED)) {
 		unsigned long size;
-		int type = oid_object_info(the_repository, &obj->oid, &size);
+		enum object_type type = oid_object_info(the_repository, &obj->oid, &size);
 		if (type <= 0)
 			die(_("did not receive expected object %s"),
 			      oid_to_hex(&obj->oid));
diff --git a/object-file.c b/object-file.c
index b7c26b67355..8ed54d6f621 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1572,11 +1572,9 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 	return ret;
 }
 
-
-/* returns enum object_type or negative */
-int oid_object_info(struct repository *r,
-		    const struct object_id *oid,
-		    unsigned long *sizep)
+enum object_type oid_object_info(struct repository *r,
+				 const struct object_id *oid,
+				 unsigned long *sizep)
 {
 	enum object_type type;
 	struct object_info oi = OBJECT_INFO_INIT;
diff --git a/object-name.c b/object-name.c
index 64202de60b1..4d7f0c66cf2 100644
--- a/object-name.c
+++ b/object-name.c
@@ -239,9 +239,8 @@ static int disambiguate_committish_only(struct repository *r,
 					void *cb_data_unused)
 {
 	struct object *obj;
-	int kind;
+	enum object_type kind = oid_object_info(r, oid, NULL);
 
-	kind = oid_object_info(r, oid, NULL);
 	if (kind == OBJ_COMMIT)
 		return 1;
 	if (kind != OBJ_TAG)
@@ -258,7 +257,7 @@ static int disambiguate_tree_only(struct repository *r,
 				  const struct object_id *oid,
 				  void *cb_data_unused)
 {
-	int kind = oid_object_info(r, oid, NULL);
+	enum object_type kind = oid_object_info(r, oid, NULL);
 	return kind == OBJ_TREE;
 }
 
@@ -267,7 +266,7 @@ static int disambiguate_treeish_only(struct repository *r,
 				     void *cb_data_unused)
 {
 	struct object *obj;
-	int kind;
+	enum object_type kind;
 
 	kind = oid_object_info(r, oid, NULL);
 	if (kind == OBJ_TREE || kind == OBJ_COMMIT)
@@ -286,7 +285,7 @@ static int disambiguate_blob_only(struct repository *r,
 				  const struct object_id *oid,
 				  void *cb_data_unused)
 {
-	int kind = oid_object_info(r, oid, NULL);
+	enum object_type kind = oid_object_info(r, oid, NULL);
 	return kind == OBJ_BLOB;
 }
 
@@ -361,7 +360,7 @@ static int show_ambiguous_object(const struct object_id *oid, void *data)
 {
 	const struct disambiguate_state *ds = data;
 	struct strbuf desc = STRBUF_INIT;
-	int type;
+	enum object_type type;
 
 	if (ds->fn && !ds->fn(ds->repo, oid, ds->cb_data))
 		return 0;
@@ -405,10 +404,10 @@ static int repo_collect_ambiguous(struct repository *r,
 static int sort_ambiguous(const void *a, const void *b, void *ctx)
 {
 	struct repository *sort_ambiguous_repo = ctx;
-	int a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
-	int b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
-	int a_type_sort;
-	int b_type_sort;
+	enum object_type a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
+	enum object_type b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
+	enum object_type a_type_sort;
+	enum object_type b_type_sort;
 
 	/*
 	 * Sorts by hash within the same object type, just as
diff --git a/object-store.h b/object-store.h
index ec32c23dcb5..eab9674d085 100644
--- a/object-store.h
+++ b/object-store.h
@@ -208,7 +208,9 @@ static inline void *repo_read_object_file(struct repository *r,
 #endif
 
 /* Read and unpack an object file into memory, write memory to an object file */
-int oid_object_info(struct repository *r, const struct object_id *, unsigned long *);
+enum object_type oid_object_info(struct repository *r,
+				 const struct object_id *,
+				 unsigned long *);
 
 int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 		     unsigned long len, const char *type,
diff --git a/packfile.c b/packfile.c
index 6661f3325a4..3ee01ea7323 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1266,7 +1266,7 @@ static int retry_bad_packed_offset(struct repository *r,
 				   struct packed_git *p,
 				   off_t obj_offset)
 {
-	int type;
+	enum object_type type;
 	uint32_t pos;
 	struct object_id oid;
 	if (offset_to_pack_pos(p, obj_offset, &pos) < 0)
-- 
2.31.1.442.g6c06c9fe35c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 05/10] object-name.c: make dependency on object_type order more obvious
  2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                         ` (3 preceding siblings ...)
  2021-03-28  2:13       ` [PATCH v2 04/10] object-file.c: make oid_object_info() " Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:13       ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:13       ` [PATCH v2 06/10] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
                         ` (6 subsequent siblings)
  11 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Add an assert to make it more obvious that we were effectively
hardcoding OBJ_TAG in sort_ambiguous() as "4".

I wrote this code in 5cc044e0257 (get_short_oid: sort ambiguous
objects by type, then SHA-1, 2018-05-10), there was already a comment
about this magic, but let's make sure that someone doing a potential
reordering of "enum object_type" in the future would notice it
breaking this function (and probably a bunch of other things...).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-name.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/object-name.c b/object-name.c
index 4d7f0c66cf2..b6a7328b7a7 100644
--- a/object-name.c
+++ b/object-name.c
@@ -408,6 +408,8 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
 	enum object_type b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
 	enum object_type a_type_sort;
 	enum object_type b_type_sort;
+	const enum object_type tag_type_offs = OBJ_TAG - OBJ_NONE;
+	assert(tag_type_offs == 4);
 
 	/*
 	 * Sorts by hash within the same object type, just as
@@ -425,8 +427,8 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
 	 * cleverly) do that with modulus, since the enum assigns 1 to
 	 * commit, so tag becomes 0.
 	 */
-	a_type_sort = a_type % 4;
-	b_type_sort = b_type % 4;
+	a_type_sort = a_type % tag_type_offs;
+	b_type_sort = b_type % tag_type_offs;
 	return a_type_sort > b_type_sort ? 1 : -1;
 }
 
-- 
2.31.1.442.g6c06c9fe35c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 06/10] tree.c: fix misindentation in parse_tree_gently()
  2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                         ` (4 preceding siblings ...)
  2021-03-28  2:13       ` [PATCH v2 05/10] object-name.c: make dependency on object_type order more obvious Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:13       ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:13       ` [PATCH v2 07/10] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
                         ` (5 subsequent siblings)
  11 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

The variables declared in parse_tree_gently() had a single space after
the TAB. This dates back to their introduction in bd2c39f58f9 ([PATCH]
don't load and decompress objects twice with parse_object(),
2005-05-06). Let's fix them to follow the style of the rest of the
file.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 tree.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tree.c b/tree.c
index a52479812ce..4820d66a10c 100644
--- a/tree.c
+++ b/tree.c
@@ -216,9 +216,9 @@ int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
 
 int parse_tree_gently(struct tree *item, int quiet_on_missing)
 {
-	 enum object_type type;
-	 void *buffer;
-	 unsigned long size;
+	enum object_type type;
+	void *buffer;
+	unsigned long size;
 
 	if (item->object.parsed)
 		return 0;
-- 
2.31.1.442.g6c06c9fe35c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 07/10] object.c: add a utility function for "expected type X, got Y"
  2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                         ` (5 preceding siblings ...)
  2021-03-28  2:13       ` [PATCH v2 06/10] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:13       ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:13       ` [PATCH v2 08/10] object.c: add and use oid_is_type_or_die_msg() function Ævar Arnfjörð Bjarmason
                         ` (4 subsequent siblings)
  11 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Refactor various "Object X is not Y" error messages so that they use
the same message as the long-standing object_as_type() error
message. Now we'll consistently report e.g. that we got a commit when
we expected a tag, not just that the object is not a tag.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/index-pack.c |  9 +++------
 combine-diff.c       |  3 +--
 commit.c             | 10 ++++------
 merge-recursive.c    |  1 +
 object.c             | 25 ++++++++++++++++++++++++-
 object.h             |  5 +++++
 tree.c               |  7 ++++---
 7 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 17376db1e39..2a7a4df5f16 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -217,8 +217,8 @@ static int mark_link(struct object *obj, int type, void *data, struct fsck_optio
 	if (!obj)
 		return -1;
 
-	if (type != OBJ_ANY && obj->type != type)
-		die(_("object type mismatch at %s"), oid_to_hex(&obj->oid));
+	if (type != OBJ_ANY)
+		oid_is_type_or_die(&obj->oid, obj->type, &type);
 
 	obj->flags |= FLAG_LINK;
 	return 0;
@@ -240,10 +240,7 @@ static unsigned check_object(struct object *obj)
 		if (type <= 0)
 			die(_("did not receive expected object %s"),
 			      oid_to_hex(&obj->oid));
-		if (type != obj->type)
-			die(_("object %s: expected type %s, found %s"),
-			    oid_to_hex(&obj->oid),
-			    type_name(obj->type), type_name(type));
+		oid_is_type_or_die(&obj->oid, obj->type, &type);
 		obj->flags |= FLAG_CHECKED;
 		return 1;
 	}
diff --git a/combine-diff.c b/combine-diff.c
index 06635f91bc2..aa767dbb8ea 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -333,8 +333,7 @@ static char *grab_blob(struct repository *r,
 		free_filespec(df);
 	} else {
 		blob = read_object_file(oid, &type, size);
-		if (type != OBJ_BLOB)
-			die("object '%s' is not a blob!", oid_to_hex(oid));
+		oid_is_type_or_die(oid, OBJ_BLOB, &type);
 	}
 	return blob;
 }
diff --git a/commit.c b/commit.c
index 8ea55a447fa..b3701003678 100644
--- a/commit.c
+++ b/commit.c
@@ -299,9 +299,7 @@ const void *repo_get_commit_buffer(struct repository *r,
 		if (!ret)
 			die("cannot read commit object %s",
 			    oid_to_hex(&commit->object.oid));
-		if (type != OBJ_COMMIT)
-			die("expected commit for %s, got %s",
-			    oid_to_hex(&commit->object.oid), type_name(type));
+		oid_is_type_or_die(&commit->object.oid, OBJ_COMMIT, &type);
 		if (sizep)
 			*sizep = size;
 	}
@@ -489,10 +487,10 @@ int repo_parse_commit_internal(struct repository *r,
 		return quiet_on_missing ? -1 :
 			error("Could not read %s",
 			     oid_to_hex(&item->object.oid));
-	if (type != OBJ_COMMIT) {
+	ret = oid_is_type_or_error(&item->object.oid, OBJ_COMMIT, &type);
+	if (ret) {
 		free(buffer);
-		return error("Object %s not a commit",
-			     oid_to_hex(&item->object.oid));
+		return ret;
 	}
 
 	ret = parse_commit_buffer(r, item, buffer, size, 0);
diff --git a/merge-recursive.c b/merge-recursive.c
index b69e694d986..feb9bfeb8af 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2971,6 +2971,7 @@ static int read_oid_strbuf(struct merge_options *opt,
 	if (!buf)
 		return err(opt, _("cannot read object %s"), oid_to_hex(oid));
 	if (type != OBJ_BLOB) {
+		const char* msg = oid_is_type_or_die_msg(oid, OBJ_BLOB, &type);
 		free(buf);
 		return err(opt, _("object %s is not a blob"), oid_to_hex(oid));
 	}
diff --git a/object.c b/object.c
index c0e68d4bbf6..fa18b243280 100644
--- a/object.c
+++ b/object.c
@@ -159,6 +159,29 @@ void *create_object(struct repository *r, const struct object_id *oid, void *o)
 	return obj;
 }
 
+static const char *object_type_mismatch_msg = N_("object %s is a %s, not a %s");
+
+void oid_is_type_or_die(const struct object_id *oid,
+			enum object_type want,
+			enum object_type *type)
+{
+	if (want == *type)
+		return;
+	die(_(object_type_mismatch_msg), oid_to_hex(oid),
+	    type_name(*type), type_name(want));
+}
+
+int oid_is_type_or_error(const struct object_id *oid,
+			 enum object_type want,
+			 enum object_type *type)
+{
+	if (want == *type)
+		return 0;
+	return error(_(object_type_mismatch_msg),
+		     oid_to_hex(oid), type_name(*type),
+		     type_name(want));
+}
+
 void *object_as_type(struct object *obj, enum object_type type, int quiet)
 {
 	if (obj->type == type)
@@ -172,7 +195,7 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 	}
 	else {
 		if (!quiet)
-			error(_("object %s is a %s, not a %s"),
+			error(_(object_type_mismatch_msg),
 			      oid_to_hex(&obj->oid),
 			      type_name(obj->type), type_name(type));
 		return NULL;
diff --git a/object.h b/object.h
index 5e7a523e858..d2d4a236d0e 100644
--- a/object.h
+++ b/object.h
@@ -124,6 +124,11 @@ void *create_object(struct repository *r, const struct object_id *oid, void *obj
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
+void oid_is_type_or_die(const struct object_id *oid, enum object_type want,
+			enum object_type *type);
+int oid_is_type_or_error(const struct object_id *oid, enum object_type want,
+			 enum object_type *type);
+
 /*
  * Returns the object, having parsed it to find out what it is.
  *
diff --git a/tree.c b/tree.c
index 4820d66a10c..d9b1c70b28a 100644
--- a/tree.c
+++ b/tree.c
@@ -219,6 +219,7 @@ int parse_tree_gently(struct tree *item, int quiet_on_missing)
 	enum object_type type;
 	void *buffer;
 	unsigned long size;
+	int ret;
 
 	if (item->object.parsed)
 		return 0;
@@ -227,10 +228,10 @@ int parse_tree_gently(struct tree *item, int quiet_on_missing)
 		return quiet_on_missing ? -1 :
 			error("Could not read %s",
 			     oid_to_hex(&item->object.oid));
-	if (type != OBJ_TREE) {
+	ret = oid_is_type_or_error(&item->object.oid, OBJ_TREE, &type);
+	if (ret) {
 		free(buffer);
-		return error("Object %s not a tree",
-			     oid_to_hex(&item->object.oid));
+		return ret;
 	}
 	return parse_tree_buffer(item, buffer, size);
 }
-- 
2.31.1.442.g6c06c9fe35c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 08/10] object.c: add and use oid_is_type_or_die_msg() function
  2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                         ` (6 preceding siblings ...)
  2021-03-28  2:13       ` [PATCH v2 07/10] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:13       ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:13       ` [PATCH v2 09/10] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
                         ` (3 subsequent siblings)
  11 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Add a oid_is_type_or_die_msg() function to go with the "error" and
"die" forms for emitting "expected type X, got Y" messages. This is
useful for callers that want the message itself as a char *.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 merge-recursive.c |  6 ++++--
 object.c          | 12 ++++++++++++
 object.h          |  3 +++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index feb9bfeb8af..152e63e4436 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2971,9 +2971,11 @@ static int read_oid_strbuf(struct merge_options *opt,
 	if (!buf)
 		return err(opt, _("cannot read object %s"), oid_to_hex(oid));
 	if (type != OBJ_BLOB) {
-		const char* msg = oid_is_type_or_die_msg(oid, OBJ_BLOB, &type);
+		char *msg = oid_is_type_or_die_msg(oid, OBJ_BLOB, &type);
+		int ret = err(opt, msg);
 		free(buf);
-		return err(opt, _("object %s is not a blob"), oid_to_hex(oid));
+		free(msg);
+		return ret;
 	}
 	strbuf_attach(dst, buf, size, size + 1);
 	return 0;
diff --git a/object.c b/object.c
index fa18b243280..0f60743e61f 100644
--- a/object.c
+++ b/object.c
@@ -182,6 +182,18 @@ int oid_is_type_or_error(const struct object_id *oid,
 		     type_name(want));
 }
 
+char* oid_is_type_or_die_msg(const struct object_id *oid,
+				   enum object_type want,
+				   enum object_type *type)
+{
+	struct strbuf sb = STRBUF_INIT;
+	if (want == *type)
+		BUG("call this just to get the message!");
+	strbuf_addf(&sb, _(object_type_mismatch_msg), oid_to_hex(oid),
+		    type_name(*type), type_name(want));
+	return strbuf_detach(&sb, NULL);
+}
+
 void *object_as_type(struct object *obj, enum object_type type, int quiet)
 {
 	if (obj->type == type)
diff --git a/object.h b/object.h
index d2d4a236d0e..cdc3242a128 100644
--- a/object.h
+++ b/object.h
@@ -128,6 +128,9 @@ void oid_is_type_or_die(const struct object_id *oid, enum object_type want,
 			enum object_type *type);
 int oid_is_type_or_error(const struct object_id *oid, enum object_type want,
 			 enum object_type *type);
+char* oid_is_type_or_die_msg(const struct object_id *oid,
+			     enum object_type want,
+			     enum object_type *type);
 
 /*
  * Returns the object, having parsed it to find out what it is.
-- 
2.31.1.442.g6c06c9fe35c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 09/10] object tests: add test for unexpected objects in tags
  2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                         ` (7 preceding siblings ...)
  2021-03-28  2:13       ` [PATCH v2 08/10] object.c: add and use oid_is_type_or_die_msg() function Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:13       ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:13       ` [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
                         ` (2 subsequent siblings)
  11 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Fix a blind spot in the tests added in 0616617c7e1 (t: introduce tests
for unexpected object types, 2019-04-09), there were no meaningful
tests for checking how we reported on finding the incorrect object
type in a tag, i.e. one that broke the "type" promise in the tag
header.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t6102-rev-list-unexpected-objects.sh | 113 ++++++++++++++++++++++++-
 1 file changed, 112 insertions(+), 1 deletion(-)

diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 52cde097dd5..2ea1982b9ed 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -31,7 +31,8 @@ test_expect_success 'setup unexpected non-tree entry' '
 '
 
 test_expect_success 'traverse unexpected non-tree entry (lone)' '
-	test_must_fail git rev-list --objects $broken_tree
+	test_must_fail git rev-list --objects $broken_tree >output 2>&1 &&
+	test_i18ngrep "not a tree" output
 '
 
 test_expect_success 'traverse unexpected non-tree entry (seen)' '
@@ -124,4 +125,114 @@ test_expect_success 'traverse unexpected non-blob tag (seen)' '
 	test_i18ngrep "not a blob" output
 '
 
+test_expect_success 'setup unexpected non-tag tag' '
+	test_when_finished "git tag -d tag-commit tag-tag" &&
+
+	git tag -a -m"tagged commit" tag-commit $commit &&
+	tag_commit=$(git rev-parse tag-commit) &&
+	git tag -a -m"tagged tag" tag-tag tag-commit &&
+	tag_tag=$(git rev-parse tag-tag) &&
+
+	git cat-file tag tag-tag >good-tag-tag &&
+	git cat-file tag tag-commit >good-commit-tag &&
+
+	sed -e "s/$tag_commit/$commit/" <good-tag-tag >broken-tag-tag-commit &&
+	sed -e "s/$tag_commit/$tree/" <good-tag-tag >broken-tag-tag-tree &&
+	sed -e "s/$tag_commit/$blob/" <good-tag-tag >broken-tag-tag-blob &&
+
+	sed -e "s/$commit/$tag_commit/" <good-commit-tag >broken-commit-tag-tag &&
+	sed -e "s/$commit/$tree/" <good-commit-tag >broken-commit-tag-tree &&
+	sed -e "s/$commit/$blob/" <good-commit-tag >broken-commit-tag-blob &&
+
+	tag_tag_commit=$(git hash-object -w -t tag broken-tag-tag-commit) &&
+	tag_tag_tree=$(git hash-object -w -t tag broken-tag-tag-tree) &&
+	tag_tag_blob=$(git hash-object -w -t tag broken-tag-tag-blob) &&
+
+	commit_tag_tag=$(git hash-object -w -t tag broken-commit-tag-tag) &&
+	commit_tag_tree=$(git hash-object -w -t tag broken-commit-tag-tree) &&
+	commit_tag_blob=$(git hash-object -w -t tag broken-commit-tag-blob)
+'
+
+test_expect_success 'traverse unexpected incorrectly typed tag (to commit & tag)' '
+	test_must_fail git rev-list --objects $tag_tag_commit 2>err &&
+	cat >expected <<-EOF &&
+	error: object $commit is a tag, not a commit
+	fatal: bad object $commit
+	EOF
+	test_cmp expected err &&
+
+	test_must_fail git rev-list --objects $commit_tag_tag 2>err &&
+	cat >expected <<-EOF &&
+	error: object $tag_commit is a commit, not a tag
+	fatal: bad object $tag_commit
+	EOF
+	test_cmp expected err
+'
+
+test_expect_success 'traverse unexpected incorrectly typed tag (to tree)' '
+	test_must_fail git rev-list --objects $tag_tag_tree 2>err &&
+	cat >expected <<-EOF &&
+	error: object $tree is a tag, not a tree
+	fatal: bad object $tree
+	EOF
+	test_cmp expected err &&
+
+	test_must_fail git rev-list --objects $commit_tag_tree 2>err &&
+	cat >expected <<-EOF &&
+	error: object $tree is a commit, not a tree
+	fatal: bad object $tree
+	EOF
+	test_cmp expected err
+'
+
+test_expect_success 'traverse unexpected incorrectly typed tag (to blob)' '
+	test_must_fail git rev-list --objects $tag_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a tag, not a blob
+	fatal: bad object $blob
+	EOF
+	test_cmp expected err &&
+
+	test_must_fail git rev-list --objects $commit_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a commit, not a blob
+	fatal: bad object $blob
+	EOF
+	test_cmp expected err
+'
+
+test_expect_success 'traverse unexpected non-tag tag (tree seen to blob)' '
+	test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a commit, not a blob
+	fatal: bad object $blob
+	EOF
+	test_cmp expected err &&
+
+	test_must_fail git rev-list --objects $tree $tag_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a tag, not a blob
+	fatal: bad object $blob
+	EOF
+	test_cmp expected err
+'
+
+test_expect_success 'traverse unexpected non-tag tag (blob seen to blob)' '
+	test_must_fail git rev-list --objects $blob $commit_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a blob, not a commit
+	error: bad tag pointer to $blob in $commit_tag_blob
+	fatal: bad object $commit_tag_blob
+	EOF
+	test_cmp expected err &&
+
+	test_must_fail git rev-list --objects $blob $tag_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a blob, not a tag
+	error: bad tag pointer to $blob in $tag_tag_blob
+	fatal: bad object $tag_tag_blob
+	EOF
+	test_cmp expected err
+'
+
 test_done
-- 
2.31.1.442.g6c06c9fe35c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors
  2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                         ` (8 preceding siblings ...)
  2021-03-28  2:13       ` [PATCH v2 09/10] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:13       ` Ævar Arnfjörð Bjarmason
  2021-03-30  5:50         ` Junio C Hamano
  2021-03-28  9:27       ` [PATCH v2 00/10] improve reporting of unexpected objects Jeff King
  2021-04-09  8:07       ` [PATCH 0/2] blob/object.c: trivial readability improvements Ævar Arnfjörð Bjarmason
  11 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Elijah Newren,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

Fix a regression in 89e4202f982 ([PATCH] Parse tags for absent
objects, 2005-06-21) (yes, that ancient!) and correctly report an
error on a tag like:

    object <a tree hash>
    type commit

As:

    error: object <a tree hash> is tree, not a commit

Instead of our long-standing misbehavior of inverting the two, and
reporting:

    error: object <a tree hash> is commit, not a tree

Which, as can be trivially seen with 'git cat-file -t <a tree hash>'
is incorrect.

The reason for this misreporting is that in parse_tag_buffer() we end
up doing a lookup_{blob,commit,tag,tree}() depending on what we read
out of the "type" line.

If we haven't parsed that object before we end up dispatching to the
type-specific lookup functions, e.g. this for commit.c in
lookup_commit_type():

	struct object *obj = lookup_object(r, oid);
	if (!obj)
		return create_object(r, oid, alloc_commit_node(r));

Its allocation will then set the obj->type according to what the tag
told us the type was, but which we've never validated. At this point
we've got an object in memory that hasn't been parsed, and whose type
is incorrect, since we mistrusted a tag to tell us the type.

Then when we actually load the object with parse_object() we read it
and find that it's a "tree". See 8ff226a9d5e (add object_as_type
helper for casting objects, 2014-07-13) for that behavior (that's just
a refactoring commit, but shows all the code involved).

Which explains why we inverted the error report. Normally when
object_as_type() is called it's by the lookup_{blob,commit,tag,tree}()
functions via parse_object(). At that point we can trust the
obj->type.

In the case of parsing objects we've learned about via a tag with an
incorrect type it's the opposite, the obj->type isn't correct and
holds the mislabeled type, but we're parsing the object and know for
sure what object type we're dealing with.

Hence the non-intuitive solution of adding a
lookup_{blob,commit,tag,tree}_type() function. It's to distinguish
calls from parse_object_buffer() where we actually know the type, from
a parse_tag_buffer() where we're just guessing about the type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 blob.c                                 | 16 +++++++++++++++-
 blob.h                                 |  3 +++
 commit.c                               | 14 +++++++++++++-
 commit.h                               |  2 ++
 object.c                               |  8 ++++----
 t/t6102-rev-list-unexpected-objects.sh | 16 ++++++++--------
 tag.c                                  | 14 +++++++++++++-
 tag.h                                  |  2 ++
 tree.c                                 | 14 +++++++++++++-
 tree.h                                 |  2 ++
 10 files changed, 75 insertions(+), 16 deletions(-)

diff --git a/blob.c b/blob.c
index 182718aba9f..b233e0daa2f 100644
--- a/blob.c
+++ b/blob.c
@@ -2,17 +2,31 @@
 #include "blob.h"
 #include "repository.h"
 #include "alloc.h"
+#include "object-store.h"
 
 const char *blob_type = "blob";
 
-struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
+struct blob *lookup_blob_type(struct repository *r,
+			      const struct object_id *oid,
+			      enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_blob_node(r));
+	if (type != OBJ_NONE &&
+	    obj->type != OBJ_NONE) {
+		enum object_type want = OBJ_BLOB;
+		if (oid_is_type_or_error(oid, obj->type, &want))
+			return NULL;
+	}
 	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
+struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
+{
+	return lookup_blob_type(r, oid, OBJ_NONE);
+}
+
 int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
 {
 	item->object.parsed = 1;
diff --git a/blob.h b/blob.h
index 16648720557..066a2effcbf 100644
--- a/blob.h
+++ b/blob.h
@@ -10,6 +10,9 @@ struct blob {
 };
 
 struct blob *lookup_blob(struct repository *r, const struct object_id *oid);
+struct blob *lookup_blob_type(struct repository *r,
+			      const struct object_id *oid,
+			      enum object_type type);
 
 int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size);
 
diff --git a/commit.c b/commit.c
index b3701003678..ab6cee1e8c3 100644
--- a/commit.c
+++ b/commit.c
@@ -57,14 +57,26 @@ struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref
 	return c;
 }
 
-struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
+struct commit *lookup_commit_type(struct repository *r, const struct object_id *oid,
+				  enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_commit_node(r));
+	if (type != OBJ_NONE &&
+	    obj->type != OBJ_NONE) {
+		enum object_type want = OBJ_COMMIT;
+		if (oid_is_type_or_error(oid, obj->type, &want))
+			return NULL;
+	}
 	return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
+struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
+{
+	return lookup_commit_type(r, oid, OBJ_NONE);
+}
+
 struct commit *lookup_commit_reference_by_name(const char *name)
 {
 	struct object_id oid;
diff --git a/commit.h b/commit.h
index df42eb434f3..9def4f3f19d 100644
--- a/commit.h
+++ b/commit.h
@@ -64,6 +64,8 @@ void add_name_decoration(enum decoration_type type, const char *name, struct obj
 const struct name_decoration *get_name_decoration(const struct object *obj);
 
 struct commit *lookup_commit(struct repository *r, const struct object_id *oid);
+struct commit *lookup_commit_type(struct repository *r, const struct object_id *oid,
+				  enum object_type type);
 struct commit *lookup_commit_reference(struct repository *r,
 				       const struct object_id *oid);
 struct commit *lookup_commit_reference_gently(struct repository *r,
diff --git a/object.c b/object.c
index 0f60743e61f..60037422488 100644
--- a/object.c
+++ b/object.c
@@ -230,14 +230,14 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 
 	obj = NULL;
 	if (type == OBJ_BLOB) {
-		struct blob *blob = lookup_blob(r, oid);
+		struct blob *blob = lookup_blob_type(r, oid, type);
 		if (blob) {
 			if (parse_blob_buffer(blob, buffer, size))
 				return NULL;
 			obj = &blob->object;
 		}
 	} else if (type == OBJ_TREE) {
-		struct tree *tree = lookup_tree(r, oid);
+		struct tree *tree = lookup_tree_type(r, oid, type);
 		if (tree) {
 			obj = &tree->object;
 			if (!tree->buffer)
@@ -249,7 +249,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 			}
 		}
 	} else if (type == OBJ_COMMIT) {
-		struct commit *commit = lookup_commit(r, oid);
+		struct commit *commit = lookup_commit_type(r, oid, type);
 		if (commit) {
 			if (parse_commit_buffer(r, commit, buffer, size, 1))
 				return NULL;
@@ -260,7 +260,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 			obj = &commit->object;
 		}
 	} else if (type == OBJ_TAG) {
-		struct tag *tag = lookup_tag(r, oid);
+		struct tag *tag = lookup_tag_type(r, oid, type);
 		if (tag) {
 			if (parse_tag_buffer(r, tag, buffer, size))
 			       return NULL;
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 2ea1982b9ed..4a6b3cc3b01 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -156,14 +156,14 @@ test_expect_success 'setup unexpected non-tag tag' '
 test_expect_success 'traverse unexpected incorrectly typed tag (to commit & tag)' '
 	test_must_fail git rev-list --objects $tag_tag_commit 2>err &&
 	cat >expected <<-EOF &&
-	error: object $commit is a tag, not a commit
+	error: object $commit is a commit, not a tag
 	fatal: bad object $commit
 	EOF
 	test_cmp expected err &&
 
 	test_must_fail git rev-list --objects $commit_tag_tag 2>err &&
 	cat >expected <<-EOF &&
-	error: object $tag_commit is a commit, not a tag
+	error: object $tag_commit is a tag, not a commit
 	fatal: bad object $tag_commit
 	EOF
 	test_cmp expected err
@@ -172,14 +172,14 @@ test_expect_success 'traverse unexpected incorrectly typed tag (to commit & tag)
 test_expect_success 'traverse unexpected incorrectly typed tag (to tree)' '
 	test_must_fail git rev-list --objects $tag_tag_tree 2>err &&
 	cat >expected <<-EOF &&
-	error: object $tree is a tag, not a tree
+	error: object $tree is a tree, not a tag
 	fatal: bad object $tree
 	EOF
 	test_cmp expected err &&
 
 	test_must_fail git rev-list --objects $commit_tag_tree 2>err &&
 	cat >expected <<-EOF &&
-	error: object $tree is a commit, not a tree
+	error: object $tree is a tree, not a commit
 	fatal: bad object $tree
 	EOF
 	test_cmp expected err
@@ -188,14 +188,14 @@ test_expect_success 'traverse unexpected incorrectly typed tag (to tree)' '
 test_expect_success 'traverse unexpected incorrectly typed tag (to blob)' '
 	test_must_fail git rev-list --objects $tag_tag_blob 2>err &&
 	cat >expected <<-EOF &&
-	error: object $blob is a tag, not a blob
+	error: object $blob is a blob, not a tag
 	fatal: bad object $blob
 	EOF
 	test_cmp expected err &&
 
 	test_must_fail git rev-list --objects $commit_tag_blob 2>err &&
 	cat >expected <<-EOF &&
-	error: object $blob is a commit, not a blob
+	error: object $blob is a blob, not a commit
 	fatal: bad object $blob
 	EOF
 	test_cmp expected err
@@ -204,14 +204,14 @@ test_expect_success 'traverse unexpected incorrectly typed tag (to blob)' '
 test_expect_success 'traverse unexpected non-tag tag (tree seen to blob)' '
 	test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err &&
 	cat >expected <<-EOF &&
-	error: object $blob is a commit, not a blob
+	error: object $blob is a blob, not a commit
 	fatal: bad object $blob
 	EOF
 	test_cmp expected err &&
 
 	test_must_fail git rev-list --objects $tree $tag_tag_blob 2>err &&
 	cat >expected <<-EOF &&
-	error: object $blob is a tag, not a blob
+	error: object $blob is a blob, not a tag
 	fatal: bad object $blob
 	EOF
 	test_cmp expected err
diff --git a/tag.c b/tag.c
index 3e18a418414..0ef87897b29 100644
--- a/tag.c
+++ b/tag.c
@@ -99,14 +99,26 @@ struct object *deref_tag_noverify(struct object *o)
 	return o;
 }
 
-struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
+struct tag *lookup_tag_type(struct repository *r, const struct object_id *oid,
+			    enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_tag_node(r));
+	if (type != OBJ_NONE &&
+	    obj->type != OBJ_NONE) {
+		enum object_type want = OBJ_TAG;
+		if (oid_is_type_or_error(oid, obj->type, &want))
+			return NULL;
+	}
 	return object_as_type(obj, OBJ_TAG, 0);
 }
 
+struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
+{
+	return lookup_tag_type(r, oid, OBJ_NONE);
+}
+
 static timestamp_t parse_tag_date(const char *buf, const char *tail)
 {
 	const char *dateptr;
diff --git a/tag.h b/tag.h
index 3ce8e721924..42bd3e64011 100644
--- a/tag.h
+++ b/tag.h
@@ -12,6 +12,8 @@ struct tag {
 	timestamp_t date;
 };
 struct tag *lookup_tag(struct repository *r, const struct object_id *oid);
+struct tag *lookup_tag_type(struct repository *r, const struct object_id *oid,
+			    enum object_type type);
 int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size);
 int parse_tag(struct tag *item);
 void release_tag_memory(struct tag *t);
diff --git a/tree.c b/tree.c
index d9b1c70b28a..895c66420e8 100644
--- a/tree.c
+++ b/tree.c
@@ -195,14 +195,26 @@ int read_tree(struct repository *r, struct tree *tree, int stage,
 	return 0;
 }
 
-struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
+struct tree *lookup_tree_type(struct repository *r, const struct object_id *oid,
+			      enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_tree_node(r));
+	if (type != OBJ_NONE &&
+	    obj->type != OBJ_NONE) {
+		enum object_type want = OBJ_TREE;
+		if (oid_is_type_or_error(oid, obj->type, &want))
+			return NULL;
+	}
 	return object_as_type(obj, OBJ_TREE, 0);
 }
 
+struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
+{
+	return lookup_tree_type(r, oid, OBJ_NONE);
+}
+
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
 {
 	if (item->object.parsed)
diff --git a/tree.h b/tree.h
index 3eb0484cbf2..49bd44f79b3 100644
--- a/tree.h
+++ b/tree.h
@@ -15,6 +15,8 @@ struct tree {
 extern const char *tree_type;
 
 struct tree *lookup_tree(struct repository *r, const struct object_id *oid);
+struct tree *lookup_tree_type(struct repository *r, const struct object_id *oid,
+			      enum object_type type);
 
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size);
 
-- 
2.31.1.442.g6c06c9fe35c


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently()
  2021-03-28  2:13       ` [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
@ 2021-03-28  5:35         ` Junio C Hamano
  2021-03-28 15:46           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 142+ messages in thread
From: Junio C Hamano @ 2021-03-28  5:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Taylor Blau, Elijah Newren, Johannes Schindelin

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change the type_from_string() macro into a function and drop the
> support for passing len < 0.
>
> Support for len < 0 was added in fe8e3b71805 (Refactor
> type_from_string() to allow continuing after detecting an error,
> 2014-09-10), but no callers use that form. Let's drop it to simplify
> this, and in preparation for simplifying these even further.

Given the recent fallout of oversimplifying we've seen in other
topic, this line of thinking makes me nauseated, but let's see how
well this works this time around.

At least, replacing an already queued topic with v2 would not
increase the number of topics that are supposedly in-flight but not
quite moving due to lack of reviews and responses, unlike bunch of
totally new patches ;-)

Will replace.  Thanks.

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  object.c | 10 +++++++---
>  object.h |  2 +-
>  2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/object.c b/object.c
> index 78343781ae7..65446172172 100644
> --- a/object.c
> +++ b/object.c
> @@ -39,9 +39,6 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
>  {
>  	int i;
>  
> -	if (len < 0)
> -		len = strlen(str);
> -
>  	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
>  		if (!strncmp(str, object_type_strings[i], len) &&
>  		    object_type_strings[i][len] == '\0')
> @@ -53,6 +50,13 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
>  	die(_("invalid object type \"%s\""), str);
>  }
>  
> +int type_from_string(const char *str)
> +{
> +	size_t len = strlen(str);
> +	int ret = type_from_string_gently(str, len, 0);
> +	return ret;
> +}
> +
>  /*
>   * Return a numerical hash value between 0 and n-1 for the object with
>   * the specified sha1.  n must be a power of 2.  Please note that the
> diff --git a/object.h b/object.h
> index 59daadce214..3ab3eb193d3 100644
> --- a/object.h
> +++ b/object.h
> @@ -94,7 +94,7 @@ struct object {
>  
>  const char *type_name(unsigned int type);
>  int type_from_string_gently(const char *str, ssize_t, int gentle);
> -#define type_from_string(str) type_from_string_gently(str, -1, 0)
> +int type_from_string(const char *str);
>  
>  /*
>   * Return the current number of buckets in the object hashmap.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 6/7] object tests: add test for unexpected objects in tags
  2021-03-28  1:35       ` Ævar Arnfjörð Bjarmason
@ 2021-03-28  9:06         ` Jeff King
  2021-03-28 15:39           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 142+ messages in thread
From: Jeff King @ 2021-03-28  9:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Taylor Blau, Elijah Newren, Johannes Schindelin

On Sun, Mar 28, 2021 at 03:35:46AM +0200, Ævar Arnfjörð Bjarmason wrote:

> >> Fix a blind spot in the tests added in 0616617c7e1 (t: introduce tests
> >> for unexpected object types, 2019-04-09), there were no meaningful
> >> tests for checking how we reported on finding the incorrect object
> >> type in a tag, i.e. one that broke the "type" promise in the tag
> >> header.
> >
> > Isn't this covered by tests 16 and 17 ("traverse unexpected non-commit
> > tag", both "lone" and "seen")? And likewise the matching "non-tree" and
> > "non-blob" variants afterwards?
> 
> Barely, those tests are mainly testing that rev-list doesn't die, and
> only do a very fuzzy match on the output. E.g. checking `grep "not a
> commit" err`, not a full test_cmp that'll check what OID is reported
> etc.

I thought the "blind spot" you meant was not testing these cases. But I
guess you mean that we are not checking stderr.

So OK, but...

> > The only thing we don't seem to cover is an unexpected non-tag. I don't
> > mind adding that, but why wouldn't we just follow the template of the
> > existing tests?
> 
> I am following that template to some extent, e.g. using
> ${commit,tree,blob}. It just didn't seem worth it to refactor an earlier
> test in the file just to re-use a single hash-object invocation, those
> tests e.g. clobber the $tag variable, so bending over backwards to
> re-use anything set up in them would mean some refactoring.
> 
> I think it's much clearer just do do all the different kinds of setup in
> the new setup function.

It does not seem to make the resulting test script more clear at all to
create the same situation twice, but test stderr only in the second
case. I.e., why is the change to the test script not just:

diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 52cde097dd..4cdc87c913 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -82,12 +82,13 @@ test_expect_success 'setup unexpected non-commit tag' '
 '
 
 test_expect_success 'traverse unexpected non-commit tag (lone)' '
-	test_must_fail git rev-list --objects $tag
+	test_must_fail git rev-list --objects $tag >output 2>&1 &&
+	test_i18ngrep "is a blob, not a commit" output
 '
 
 test_expect_success 'traverse unexpected non-commit tag (seen)' '
 	test_must_fail git rev-list --objects $blob $tag >output 2>&1 &&
-	test_i18ngrep "not a commit" output
+	test_i18ngrep "is a blob, not a commit" output
 '
 
 test_expect_success 'setup unexpected non-tree tag' '

and so forth (or you can replace it with a full test_cmp if you really
prefer). That does not seem like bending over backwards to me, but
rather keeping the test script tidy and readable.

But I wonder if there is something more going on that led you to have
trouble demonstrating your problem with the existing tests. Looking at
your follow-on patch that flips the "is an X, not a Y", I am not sure
this is something that we can actually handle reliably. At least not
without further access to the object database.

Because when we call, say, lookup_blob() and find that the object is
already in memory as a non-blob, we don't know who the culprit is.
Perhaps an earlier part of the code called parse_object(), found that it
really is a blob on disk, and used that type. But it may equally have
been the case that we saw a reference to the object as a commit, called
lookup_commit() on it, and now our lookup_blob() call is unhappy,
thinking it is really a commit. In that case, one of those references is
wrong, but we don't know which.

I think a robust solution would be one of:

  - make the message more precise: "saw object X as a commit, but
    previously it was referred to as a blob". Or vice versa.

  - when we see such a mismatch, go to the object database to say "aha,
    on disk it is really a blob". That's expensive, but this is an error
    case, so we can afford to be slow. But it does produce unsatisfying
    results when it was the earlier lookup_commit() call that was wrong.
    Because we have to say "object X is really a blob, but some object
    earlier referred to it as a commit. No idea who did that, though!".

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 00/10] improve reporting of unexpected objects
  2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                         ` (9 preceding siblings ...)
  2021-03-28  2:13       ` [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
@ 2021-03-28  9:27       ` Jeff King
  2021-03-29 13:34         ` Ævar Arnfjörð Bjarmason
  2021-04-09  8:07       ` [PATCH 0/2] blob/object.c: trivial readability improvements Ævar Arnfjörð Bjarmason
  11 siblings, 1 reply; 142+ messages in thread
From: Jeff King @ 2021-03-28  9:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Taylor Blau, Elijah Newren, Johannes Schindelin

On Sun, Mar 28, 2021 at 04:13:30AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Ævar Arnfjörð Bjarmason (10):
>   object.c: stop supporting len == -1 in type_from_string_gently()
>   object.c: refactor type_from_string_gently()
>   object.c: make type_from_string() return "enum object_type"
>   object-file.c: make oid_object_info() return "enum object_type"
>   object-name.c: make dependency on object_type order more obvious
>   tree.c: fix misindentation in parse_tree_gently()
>   object.c: add a utility function for "expected type X, got Y"
>   object.c: add and use oid_is_type_or_die_msg() function
>   object tests: add test for unexpected objects in tags
>   tag: don't misreport type of tagged objects in errors

I'm somewhat skeptical of the final patch, given my comments (just now)
in:

  https://lore.kernel.org/git/YGBHH7sAVsPpVKWd@coredump.intra.peff.net/

I'll quote them here:

> Because when we call, say, lookup_blob() and find that the object is
> already in memory as a non-blob, we don't know who the culprit is.
> Perhaps an earlier part of the code called parse_object(), found that it
> really is a blob on disk, and used that type. But it may equally have
> been the case that we saw a reference to the object as a commit, called
> lookup_commit() on it, and now our lookup_blob() call is unhappy,
> thinking it is really a commit. In that case, one of those references is
> wrong, but we don't know which.
>
> I think a robust solution would be one of:
>
>   - make the message more precise: "saw object X as a commit, but
>     previously it was referred to as a blob". Or vice versa.
>
>   - when we see such a mismatch, go to the object database to say "aha,
>     on disk it is really a blob". That's expensive, but this is an error
>     case, so we can afford to be slow. But it does produce unsatisfying
>     results when it was the earlier lookup_commit() call that was wrong.
>     Because we have to say "object X is really a blob, but some object
>     earlier referred to it as a commit. No idea who did that, though!".

Looking at the final patch, I think you side-step the issue to some
degree because it is only touching the parse_object() code paths, where
we really have looked at the bytes in the object database. So it
basically is doing the second thing above (which is "free" because we
were accessing the odb anyway).

But I think it still has the "oops, somebody made a wrong reference much
earlier" problem. The actual bug is in some other object entirely, whose
identity is long forgotten. I think we would be much better off to say
something like "somebody expected X to be a commit, but now somebody
else expects it to be a blob", which is all that we can reliably say.
And the next step really ought to be running "git fsck" to figure out
what is going on (and we should perhaps even say so via advise()).

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 6/7] object tests: add test for unexpected objects in tags
  2021-03-28  9:06         ` Jeff King
@ 2021-03-28 15:39           ` Ævar Arnfjörð Bjarmason
  2021-03-29  9:16             ` Jeff King
  0 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 15:39 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Taylor Blau, Elijah Newren, Johannes Schindelin


On Sun, Mar 28 2021, Jeff King wrote:

> On Sun, Mar 28, 2021 at 03:35:46AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> >> Fix a blind spot in the tests added in 0616617c7e1 (t: introduce tests
>> >> for unexpected object types, 2019-04-09), there were no meaningful
>> >> tests for checking how we reported on finding the incorrect object
>> >> type in a tag, i.e. one that broke the "type" promise in the tag
>> >> header.
>> >
>> > Isn't this covered by tests 16 and 17 ("traverse unexpected non-commit
>> > tag", both "lone" and "seen")? And likewise the matching "non-tree" and
>> > "non-blob" variants afterwards?
>> 
>> Barely, those tests are mainly testing that rev-list doesn't die, and
>> only do a very fuzzy match on the output. E.g. checking `grep "not a
>> commit" err`, not a full test_cmp that'll check what OID is reported
>> etc.
>
> I thought the "blind spot" you meant was not testing these cases. But I
> guess you mean that we are not checking stderr.
>
> So OK, but...

Ah yes. I'll clarify that. I thought it was clear since the series is
about the output we emit on errors, not the rev-list traversal itself.

>> > The only thing we don't seem to cover is an unexpected non-tag. I don't
>> > mind adding that, but why wouldn't we just follow the template of the
>> > existing tests?
>> 
>> I am following that template to some extent, e.g. using
>> ${commit,tree,blob}. It just didn't seem worth it to refactor an earlier
>> test in the file just to re-use a single hash-object invocation, those
>> tests e.g. clobber the $tag variable, so bending over backwards to
>> re-use anything set up in them would mean some refactoring.
>> 
>> I think it's much clearer just do do all the different kinds of setup in
>> the new setup function.
>
> It does not seem to make the resulting test script more clear at all to
> create the same situation twice, but test stderr only in the second
> case. I.e., why is the change to the test script not just:
>
> diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
> index 52cde097dd..4cdc87c913 100755
> --- a/t/t6102-rev-list-unexpected-objects.sh
> +++ b/t/t6102-rev-list-unexpected-objects.sh
> @@ -82,12 +82,13 @@ test_expect_success 'setup unexpected non-commit tag' '
>  '
>  
>  test_expect_success 'traverse unexpected non-commit tag (lone)' '
> -	test_must_fail git rev-list --objects $tag
> +	test_must_fail git rev-list --objects $tag >output 2>&1 &&
> +	test_i18ngrep "is a blob, not a commit" output
>  '
>  
>  test_expect_success 'traverse unexpected non-commit tag (seen)' '
>  	test_must_fail git rev-list --objects $blob $tag >output 2>&1 &&
> -	test_i18ngrep "not a commit" output
> +	test_i18ngrep "is a blob, not a commit" output
>  '
>  
>  test_expect_success 'setup unexpected non-tree tag' '
>
> and so forth (or you can replace it with a full test_cmp if you really
> prefer). That does not seem like bending over backwards to me, but
> rather keeping the test script tidy and readable.

Yeah, it needs to be a test_cmp (or equivalent) since the point of the
test is *which* thing we're reporting as the "is X not Y".

But yes, these can be combined. But I still think it's clearer to have
minimal tests for the traversal and then later (much more verbose) tests
for the output.

So if you the traversal fails you'd be looking at a fairly isolated test
with just two lines, v.s. 10-15 lines of later test_cmp etc.

> [...]

Will reply to the rest, but that discussion seems split, so reading the
thread to see what the best place to continue that chat is...

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently()
  2021-03-28  5:35         ` Junio C Hamano
@ 2021-03-28 15:46           ` Ævar Arnfjörð Bjarmason
  2021-03-28 18:25             ` Junio C Hamano
  0 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28 15:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Taylor Blau, Elijah Newren, Johannes Schindelin


On Sun, Mar 28 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Change the type_from_string() macro into a function and drop the
>> support for passing len < 0.
>>
>> Support for len < 0 was added in fe8e3b71805 (Refactor
>> type_from_string() to allow continuing after detecting an error,
>> 2014-09-10), but no callers use that form. Let's drop it to simplify
>> this, and in preparation for simplifying these even further.
>
> Given the recent fallout of oversimplifying we've seen in other
> topic, this line of thinking makes me nauseated, but let's see how
> well this works this time around.

Do you mean related to tree-walk.[ch]? But yeah, this step doesn't
striclly need to be taken, but seemed worth it given that there's just
10 or so callers (none of which used this).

> At least, replacing an already queued topic with v2 would not
> increase the number of topics that are supposedly in-flight but not
> quite moving due to lack of reviews and responses, unlike bunch of
> totally new patches ;-)

I'm not sure what to do to improve things in that area.

I'm obviously for increasing the net velocity of my patches making it to
master, but if it's held up my number of reviews a submission of Y won't
necessarily make X worse, since people who've got an interest in Y will
be different than those with an interest in X.

But some of it's definitely on my end, e.g. re-rolls sometimes taking me
longer than I'd prefer. It's a different activity to dissect outstanding
reviews & re-roll than writing code, and sometimes I'm interested in one
over the other...

>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  object.c | 10 +++++++---
>>  object.h |  2 +-
>>  2 files changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/object.c b/object.c
>> index 78343781ae7..65446172172 100644
>> --- a/object.c
>> +++ b/object.c
>> @@ -39,9 +39,6 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
>>  {
>>  	int i;
>>  
>> -	if (len < 0)
>> -		len = strlen(str);
>> -
>>  	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
>>  		if (!strncmp(str, object_type_strings[i], len) &&
>>  		    object_type_strings[i][len] == '\0')
>> @@ -53,6 +50,13 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
>>  	die(_("invalid object type \"%s\""), str);
>>  }
>>  
>> +int type_from_string(const char *str)
>> +{
>> +	size_t len = strlen(str);
>> +	int ret = type_from_string_gently(str, len, 0);
>> +	return ret;
>> +}
>> +
>>  /*
>>   * Return a numerical hash value between 0 and n-1 for the object with
>>   * the specified sha1.  n must be a power of 2.  Please note that the
>> diff --git a/object.h b/object.h
>> index 59daadce214..3ab3eb193d3 100644
>> --- a/object.h
>> +++ b/object.h
>> @@ -94,7 +94,7 @@ struct object {
>>  
>>  const char *type_name(unsigned int type);
>>  int type_from_string_gently(const char *str, ssize_t, int gentle);
>> -#define type_from_string(str) type_from_string_gently(str, -1, 0)
>> +int type_from_string(const char *str);
>>  
>>  /*
>>   * Return the current number of buckets in the object hashmap.


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently()
  2021-03-28 15:46           ` Ævar Arnfjörð Bjarmason
@ 2021-03-28 18:25             ` Junio C Hamano
  2021-04-22 18:09               ` Felipe Contreras
  0 siblings, 1 reply; 142+ messages in thread
From: Junio C Hamano @ 2021-03-28 18:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Taylor Blau, Elijah Newren, Johannes Schindelin

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> At least, replacing an already queued topic with v2 would not
>> increase the number of topics that are supposedly in-flight but not
>> quite moving due to lack of reviews and responses, unlike bunch of
>> totally new patches ;-)
>
> I'm not sure what to do to improve things in that area.
>
> I'm obviously for increasing the net velocity of my patches making it to
> master, but if it's held up my number of reviews a submission of Y won't
> necessarily make X worse, since people who've got an interest in Y will
> be different than those with an interest in X.
>
> But some of it's definitely on my end, e.g. re-rolls sometimes taking me
> longer than I'd prefer. It's a different activity to dissect outstanding
> reviews & re-roll than writing code, and sometimes I'm interested in one
> over the other...

What I'd like to encourage contributors to think is the velocity in
the whole project, not only their own patches.  The changes proposed
on the list would consume the review bandwidth, which is
unfortunately not an infinite resource.

To balance the supply and the consumption, one way might be to
throttle incoming patches to restrict consumption and distribute the
supply more evenly among authors.  But a more desirable way that
would benefit the community more would be to increase the supply.

If all of those who consume the review bandwidth tip in by reviewing
others' patches, not limited to the area they are interested in but
more in the "I am not so familiar with the area, but I've been here
long enough and know general principles, so let's polish your patch
together" spirit, that would help the community greatly, I would
think, by:

 - replenishing review bandwidth they consumed from the pool;

 - throttling their patch flow that consume review bandwidth (while
   they are reviewing others patches, they won't be throwing new
   patches at the list to consume even more review bandwidth);

 - helping the reviewers themselves become more familiar with the
   parts of the code they are not working in right now.

I am reasonably sure I and a few others on the list are net
suppliers of the reviewer bandwidth.  I do not expect all the
prolific contributors to become net suppliers; after all, designing
and writing their own stuff is always fun.  But I wish that the most
prominent contributors in the community to be reviewing others'
topics and ushering these topics to completion from time to time,
and I am hoping to see that happen more.

Thanks.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 6/7] object tests: add test for unexpected objects in tags
  2021-03-28 15:39           ` Ævar Arnfjörð Bjarmason
@ 2021-03-29  9:16             ` Jeff King
  0 siblings, 0 replies; 142+ messages in thread
From: Jeff King @ 2021-03-29  9:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Taylor Blau, Elijah Newren, Johannes Schindelin

On Sun, Mar 28, 2021 at 05:39:45PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I thought the "blind spot" you meant was not testing these cases. But I
> > guess you mean that we are not checking stderr.
> >
> > So OK, but...
> 
> Ah yes. I'll clarify that. I thought it was clear since the series is
> about the output we emit on errors, not the rev-list traversal itself.

TBH, I think it was as much a reading comprehension fail on my part. But
it certainly doesn't hurt to make the commit message more clear.

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 00/10] improve reporting of unexpected objects
  2021-03-28  9:27       ` [PATCH v2 00/10] improve reporting of unexpected objects Jeff King
@ 2021-03-29 13:34         ` Ævar Arnfjörð Bjarmason
  2021-03-31 10:43           ` Jeff King
  0 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-29 13:34 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Taylor Blau, Elijah Newren, Johannes Schindelin


On Sun, Mar 28 2021, Jeff King wrote:

> On Sun, Mar 28, 2021 at 04:13:30AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Ævar Arnfjörð Bjarmason (10):
>>   object.c: stop supporting len == -1 in type_from_string_gently()
>>   object.c: refactor type_from_string_gently()
>>   object.c: make type_from_string() return "enum object_type"
>>   object-file.c: make oid_object_info() return "enum object_type"
>>   object-name.c: make dependency on object_type order more obvious
>>   tree.c: fix misindentation in parse_tree_gently()
>>   object.c: add a utility function for "expected type X, got Y"
>>   object.c: add and use oid_is_type_or_die_msg() function
>>   object tests: add test for unexpected objects in tags
>>   tag: don't misreport type of tagged objects in errors
>
> I'm somewhat skeptical of the final patch, given my comments (just now)
> in:
>
>   https://lore.kernel.org/git/YGBHH7sAVsPpVKWd@coredump.intra.peff.net/
>
> I'll quote them here:

Picking up where we left off in
http://lore.kernel.org/git/8735wfnv7i.fsf@evledraar.gmail.com ...

>> Because when we call, say, lookup_blob() and find that the object is
>> already in memory as a non-blob, we don't know who the culprit is.
>> Perhaps an earlier part of the code called parse_object(), found that it
>> really is a blob on disk, and used that type. But it may equally have
>> been the case that we saw a reference to the object as a commit, called
>> lookup_commit() on it, and now our lookup_blob() call is unhappy,
>> thinking it is really a commit. In that case, one of those references is
>> wrong, but we don't know which.
>>
>> I think a robust solution would be one of:
>>
>>   - make the message more precise: "saw object X as a commit, but
>>     previously it was referred to as a blob". Or vice versa.
>>
>>   - when we see such a mismatch, go to the object database to say "aha,
>>     on disk it is really a blob". That's expensive, but this is an error
>>     case, so we can afford to be slow. But it does produce unsatisfying
>>     results when it was the earlier lookup_commit() call that was wrong.
>>     Because we have to say "object X is really a blob, but some object
>>     earlier referred to it as a commit. No idea who did that, though!".
>
> Looking at the final patch, I think you side-step the issue to some
> degree because it is only touching the parse_object() code paths, where
> we really have looked at the bytes in the object database. So it
> basically is doing the second thing above (which is "free" because we
> were accessing the odb anyway).
>
> But I think it still has the "oops, somebody made a wrong reference much
> earlier" problem. The actual bug is in some other object entirely, whose
> identity is long forgotten. I think we would be much better off to say
> something like "somebody expected X to be a commit, but now somebody
> else expects it to be a blob", which is all that we can reliably say.
> And the next step really ought to be running "git fsck" to figure out
> what is going on (and we should perhaps even say so via advise()).

Yes I'm totally side-stepping the issue, but I don't see a way around
that that doesn't make the whole object lookup code either much slower,
or more complex.

I.e. the whole thing is an emergent effect of us seeing a tag object,
and noting in-memory that we saw a given OID of type X, but we don't
even know if we can look it up at that point, or that it's not type Y.

I don't think it's guaranteed that we're neatly in one single object
traversal at that point (e.g. if we're looking at N tags, and only later
dereferencing their "object" pointers). So passing a "object A which I
have now says B is a X, assert!" wouldn't work.

We could eagerly get the object from disk when parsing tags (slow?), or
have a void* in the object struct or whatever to say "this is the OID
that claimed you were XYZ" (ew!).

Or, which I think makes sense here, just don't worry about it and error
with the limited info we have at hand. Yes we can't report who the
ultimate culprit is without an fsck, but that's not different than a lot
of other error() and die() messages in the object code now.

So if we're going to emit an advise() that seems generally useful for
many of those error()/die() messages, and not something we should tack
onto this incremental improvement to one error.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors
  2021-03-28  2:13       ` [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
@ 2021-03-30  5:50         ` Junio C Hamano
  2021-03-31 11:02           ` Jeff King
  0 siblings, 1 reply; 142+ messages in thread
From: Junio C Hamano @ 2021-03-30  5:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Taylor Blau, Elijah Newren, Johannes Schindelin

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Fix a regression in 89e4202f982 ([PATCH] Parse tags for absent
> objects, 2005-06-21) (yes, that ancient!) and correctly report an
> error on a tag like:
>
>     object <a tree hash>
>     type commit
>
> As:
>
>     error: object <a tree hash> is tree, not a commit
>
> Instead of our long-standing misbehavior of inverting the two, and
> reporting:
>
>     error: object <a tree hash> is commit, not a tree
>
> Which, as can be trivially seen with 'git cat-file -t <a tree hash>'
> is incorrect.

Hmph, I've always thought it is just "supposed to be a" missing in
the sentence ;-)

> Hence the non-intuitive solution of adding a
> lookup_{blob,commit,tag,tree}_type() function. It's to distinguish
> calls from parse_object_buffer() where we actually know the type, from
> a parse_tag_buffer() where we're just guessing about the type.

I think it makes sense to allow the caller to express distinction
between "I know that this object is a blob, because I just read its
object header" and "Another object tells me that this object must be
a blob, because it is in a tree entry whose mode bits are 100644".

I wish we found a set of names better than lookup_<type>_type() for
that, though.  It's just between

      lookup_tag_type(r, oid, OBJ_NONE);
      lookup_tag_type(r, oid, OBJ_TAG);

I cannot quite tell which one is which.  I also wonder if the last
arg should just be a boolean ("I know it is a tag" vs "I heard it
must be a tag").

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 00/10] improve reporting of unexpected objects
  2021-03-29 13:34         ` Ævar Arnfjörð Bjarmason
@ 2021-03-31 10:43           ` Jeff King
  0 siblings, 0 replies; 142+ messages in thread
From: Jeff King @ 2021-03-31 10:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Taylor Blau, Elijah Newren, Johannes Schindelin

On Mon, Mar 29, 2021 at 03:34:03PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > But I think it still has the "oops, somebody made a wrong reference much
> > earlier" problem. The actual bug is in some other object entirely, whose
> > identity is long forgotten. I think we would be much better off to say
> > something like "somebody expected X to be a commit, but now somebody
> > else expects it to be a blob", which is all that we can reliably say.
> > And the next step really ought to be running "git fsck" to figure out
> > what is going on (and we should perhaps even say so via advise()).
> 
> Yes I'm totally side-stepping the issue, but I don't see a way around
> that that doesn't make the whole object lookup code either much slower,
> or more complex.
> 
> I.e. the whole thing is an emergent effect of us seeing a tag object,
> and noting in-memory that we saw a given OID of type X, but we don't
> even know if we can look it up at that point, or that it's not type Y.
> 
> I don't think it's guaranteed that we're neatly in one single object
> traversal at that point (e.g. if we're looking at N tags, and only later
> dereferencing their "object" pointers). So passing a "object A which I
> have now says B is a X, assert!" wouldn't work.
> 
> We could eagerly get the object from disk when parsing tags (slow?), or
> have a void* in the object struct or whatever to say "this is the OID
> that claimed you were XYZ" (ew!).
> 
> Or, which I think makes sense here, just don't worry about it and error
> with the limited info we have at hand. Yes we can't report who the
> ultimate culprit is without an fsck, but that's not different than a lot
> of other error() and die() messages in the object code now.

Yes, that "don't worry too much about it" was where my line of thinking
is going. But then I do not see all that much point in your final patch
at all. I.e., I think just changing the message to more clearly say what
we do know in lookup_commit(), etc, would be sufficient.

> So if we're going to emit an advise() that seems generally useful for
> many of those error()/die() messages, and not something we should tack
> onto this incremental improvement to one error.

Yeah, I think doing an advise() is probably overkill. My next step would
always be "run fsck", and I was thinking only that we might point the
user in that direction. But it's probably fine to just emit the error.

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors
  2021-03-30  5:50         ` Junio C Hamano
@ 2021-03-31 11:02           ` Jeff King
  2021-03-31 18:05             ` Junio C Hamano
                               ` (2 more replies)
  0 siblings, 3 replies; 142+ messages in thread
From: Jeff King @ 2021-03-31 11:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Taylor Blau,
	Elijah Newren, Johannes Schindelin

On Mon, Mar 29, 2021 at 10:50:18PM -0700, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> 
> > Fix a regression in 89e4202f982 ([PATCH] Parse tags for absent
> > objects, 2005-06-21) (yes, that ancient!) and correctly report an
> > error on a tag like:
> >
> >     object <a tree hash>
> >     type commit
> >
> > As:
> >
> >     error: object <a tree hash> is tree, not a commit
> >
> > Instead of our long-standing misbehavior of inverting the two, and
> > reporting:
> >
> >     error: object <a tree hash> is commit, not a tree
> >
> > Which, as can be trivially seen with 'git cat-file -t <a tree hash>'
> > is incorrect.
> 
> Hmph, I've always thought it is just "supposed to be a" missing in
> the sentence ;-)

So going with the discussion elsewhere in the thread, I'd probably say
something like:

  error: object <oid> seen as both a commit and a tree

which precisely says what we do know, without implying which is correct.

Ævar's patch tries to improve the case where we might _know_ which is
correct (because we're actually parsing the object contents), but of
course it covers only a fraction of cases. I'm not really opposed to
that per se, but I probably wouldn't bother myself.

  Side note: this is all making the assumption that what is in the
  object itself is "correct", but of course that is not necessarily
  true, even. All of these cases are the result of bugs, so it is
  possible that the bug was in the writing of the original object
  contents, and not the object that is referring to it. Likewise, I'd
  imagine an easy way to get into this situation is with a bogus
  refs/replace object that switches type.

> > Hence the non-intuitive solution of adding a
> > lookup_{blob,commit,tag,tree}_type() function. It's to distinguish
> > calls from parse_object_buffer() where we actually know the type, from
> > a parse_tag_buffer() where we're just guessing about the type.
> 
> I think it makes sense to allow the caller to express distinction
> between "I know that this object is a blob, because I just read its
> object header" and "Another object tells me that this object must be
> a blob, because it is in a tree entry whose mode bits are 100644".
> 
> I wish we found a set of names better than lookup_<type>_type() for
> that, though.  It's just between
> 
>       lookup_tag_type(r, oid, OBJ_NONE);
>       lookup_tag_type(r, oid, OBJ_TAG);
> 
> I cannot quite tell which one is which.  I also wonder if the last
> arg should just be a boolean ("I know it is a tag" vs "I heard it
> must be a tag").

Yeah, I also found that very confusing. AFAICT lookup_tag_type() would
only ever see OBJ_NONE or OBJ_TAG. Making it more than a boolean makes
both the interface and implementation more complicated.

I also think the manual handling of OBJ_NONE in each lookup_* function
is confusing. They all call object_as_type() because the point of that
function is both to type-check the struct and to convert it away from
OBJ_NONE.

If we handled this error there, then I think it would be much more
natural, because we'd have already covered the OBJ_NONE case, and
because it's already the place we're emitting the existing error. E.g.:

diff --git a/object.c b/object.c
index 2c32691dc4..e6345541f7 100644
--- a/object.c
+++ b/object.c
@@ -157,7 +157,7 @@ void *create_object(struct repository *r, const struct object_id *oid, void *o)
 	return obj;
 }
 
-void *object_as_type(struct object *obj, enum object_type type, int quiet)
+void *object_as_type(struct object *obj, enum object_type type, unsigned flags)
 {
 	if (obj->type == type)
 		return obj;
@@ -169,10 +169,16 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 		return obj;
 	}
 	else {
-		if (!quiet)
-			error(_("object %s is a %s, not a %s"),
-			      oid_to_hex(&obj->oid),
-			      type_name(obj->type), type_name(type));
+		if (!(flags & OBJECT_AS_TYPE_QUIET)) {
+			if (flags & OBJECT_AS_TYPE_EXPECT_PARSED)
+				error(_("object %s is a %s, but was referred to as a %s"),
+				      oid_to_hex(&obj->oid), type_name(obj->type),
+				      type_name(type));
+			else
+				error(_("object %s referred to as both a %s and a %s"),
+				      oid_to_hex(&obj->oid),
+				      type_name(obj->type), type_name(type));
+		}
 		return NULL;
 	}
 }

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors
  2021-03-31 11:02           ` Jeff King
@ 2021-03-31 18:05             ` Junio C Hamano
  2021-03-31 18:31             ` Ævar Arnfjörð Bjarmason
  2021-03-31 18:41             ` [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors Junio C Hamano
  2 siblings, 0 replies; 142+ messages in thread
From: Junio C Hamano @ 2021-03-31 18:05 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Taylor Blau,
	Elijah Newren, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> I also think the manual handling of OBJ_NONE in each lookup_* function
> is confusing. They all call object_as_type() because the point of that
> function is both to type-check the struct and to convert it away from
> OBJ_NONE.
>
> If we handled this error there, then I think it would be much more
> natural, because we'd have already covered the OBJ_NONE case, and
> because it's already the place we're emitting the existing error. E.g.:

This makes quite a lot of sense.  If presented with this simple
change and 10-patch series at the same time and are told that the
goal of the changes were more or less the same, I'd pick this one
100% of the time.

> diff --git a/object.c b/object.c
> index 2c32691dc4..e6345541f7 100644
> --- a/object.c
> +++ b/object.c
> @@ -157,7 +157,7 @@ void *create_object(struct repository *r, const struct object_id *oid, void *o)
>  	return obj;
>  }
>  
> -void *object_as_type(struct object *obj, enum object_type type, int quiet)
> +void *object_as_type(struct object *obj, enum object_type type, unsigned flags)
>  {
>  	if (obj->type == type)
>  		return obj;
> @@ -169,10 +169,16 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
>  		return obj;
>  	}
>  	else {
> -		if (!quiet)
> -			error(_("object %s is a %s, not a %s"),
> -			      oid_to_hex(&obj->oid),
> -			      type_name(obj->type), type_name(type));
> +		if (!(flags & OBJECT_AS_TYPE_QUIET)) {
> +			if (flags & OBJECT_AS_TYPE_EXPECT_PARSED)
> +				error(_("object %s is a %s, but was referred to as a %s"),
> +				      oid_to_hex(&obj->oid), type_name(obj->type),
> +				      type_name(type));
> +			else
> +				error(_("object %s referred to as both a %s and a %s"),
> +				      oid_to_hex(&obj->oid),
> +				      type_name(obj->type), type_name(type));
> +		}
>  		return NULL;
>  	}
>  }
>
> -Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors
  2021-03-31 11:02           ` Jeff King
  2021-03-31 18:05             ` Junio C Hamano
@ 2021-03-31 18:31             ` Ævar Arnfjörð Bjarmason
  2021-03-31 18:59               ` Jeff King
  2021-03-31 18:41             ` [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors Junio C Hamano
  2 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-31 18:31 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, git, Taylor Blau, Elijah Newren, Johannes Schindelin


On Wed, Mar 31 2021, Jeff King wrote:

> On Mon, Mar 29, 2021 at 10:50:18PM -0700, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>> 
>> > Fix a regression in 89e4202f982 ([PATCH] Parse tags for absent
>> > objects, 2005-06-21) (yes, that ancient!) and correctly report an
>> > error on a tag like:
>> >
>> >     object <a tree hash>
>> >     type commit
>> >
>> > As:
>> >
>> >     error: object <a tree hash> is tree, not a commit
>> >
>> > Instead of our long-standing misbehavior of inverting the two, and
>> > reporting:
>> >
>> >     error: object <a tree hash> is commit, not a tree
>> >
>> > Which, as can be trivially seen with 'git cat-file -t <a tree hash>'
>> > is incorrect.
>> 
>> Hmph, I've always thought it is just "supposed to be a" missing in
>> the sentence ;-)
>
> So going with the discussion elsewhere in the thread, I'd probably say
> something like:
>
>   error: object <oid> seen as both a commit and a tree
>
> which precisely says what we do know, without implying which is correct.
>
> Ævar's patch tries to improve the case where we might _know_ which is
> correct (because we're actually parsing the object contents), but of
> course it covers only a fraction of cases. I'm not really opposed to
> that per se, but I probably wouldn't bother myself.

What fraction of cases? As far as I can tell it covers all cases where
we get this error.

If there is a case like what you're describing I haven't found it.

I.e. it happens when we have an un-parsed "struct object" whose type is
inferred, and parse it to find out it's not what we expected.

It's not ambigious at all what the object actually is. It's just that
the previous code was leaking the *assumption* about the type at the
time of emitting the error, due to an apparent oversight with parsed
v.s. non-parsed.

Or in other words, we're leaking the implementation detail that we
pre-allocated an object struct of a given type in anticipation of
holding a parsed version of that object soon.

>   Side note: this is all making the assumption that what is in the
>   object itself is "correct", but of course that is not necessarily
>   true, even. All of these cases are the result of bugs, so it is
>   possible that the bug was in the writing of the original object
>   contents, and not the object that is referring to it. Likewise, I'd
>   imagine an easy way to get into this situation is with a bogus
>   refs/replace object that switches type.

Perhaps, I haven't tested that in any detail.

>> > Hence the non-intuitive solution of adding a
>> > lookup_{blob,commit,tag,tree}_type() function. It's to distinguish
>> > calls from parse_object_buffer() where we actually know the type, from
>> > a parse_tag_buffer() where we're just guessing about the type.
>> 
>> I think it makes sense to allow the caller to express distinction
>> between "I know that this object is a blob, because I just read its
>> object header" and "Another object tells me that this object must be
>> a blob, because it is in a tree entry whose mode bits are 100644".
>> 
>> I wish we found a set of names better than lookup_<type>_type() for
>> that, though.  It's just between
>> 
>>       lookup_tag_type(r, oid, OBJ_NONE);
>>       lookup_tag_type(r, oid, OBJ_TAG);
>> 
>> I cannot quite tell which one is which.  I also wonder if the last
>> arg should just be a boolean ("I know it is a tag" vs "I heard it
>> must be a tag").
>
> Yeah, I also found that very confusing. AFAICT lookup_tag_type() would
> only ever see OBJ_NONE or OBJ_TAG. Making it more than a boolean makes
> both the interface and implementation more complicated.

I don't feel strongly either way, but one concern here is that these are
very hot functions, and maybe it's better to give the compiler a better
chance to work with them without considering an extra argument, but I
haven't tested that...

> I also think the manual handling of OBJ_NONE in each lookup_* function
> is confusing. They all call object_as_type() because the point of that
> function is both to type-check the struct and to convert it away from
> OBJ_NONE.
>
> If we handled this error there, then I think it would be much more
> natural, because we'd have already covered the OBJ_NONE case, and
> because it's already the place we're emitting the existing error. E.g.:
>
> diff --git a/object.c b/object.c
> index 2c32691dc4..e6345541f7 100644
> --- a/object.c
> +++ b/object.c
> @@ -157,7 +157,7 @@ void *create_object(struct repository *r, const struct object_id *oid, void *o)
>  	return obj;
>  }
>  
> -void *object_as_type(struct object *obj, enum object_type type, int quiet)
> +void *object_as_type(struct object *obj, enum object_type type, unsigned flags)
>  {
>  	if (obj->type == type)
>  		return obj;
> @@ -169,10 +169,16 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
>  		return obj;
>  	}
>  	else {
> -		if (!quiet)
> -			error(_("object %s is a %s, not a %s"),
> -			      oid_to_hex(&obj->oid),
> -			      type_name(obj->type), type_name(type));
> +		if (!(flags & OBJECT_AS_TYPE_QUIET)) {
> +			if (flags & OBJECT_AS_TYPE_EXPECT_PARSED)
> +				error(_("object %s is a %s, but was referred to as a %s"),
> +				      oid_to_hex(&obj->oid), type_name(obj->type),
> +				      type_name(type));
> +			else
> +				error(_("object %s referred to as both a %s and a %s"),
> +				      oid_to_hex(&obj->oid),
> +				      type_name(obj->type), type_name(type));
> +		}
>  		return NULL;
>  	}
>  }

Per the above I don't understand how you think there's any uncertainty
here.

If I'm right and there isn't then first of all I don't see how we could
emit 1/2 of those errors. The whole problem here is that we don't know
the type of the un-parsed object (and presumably don't want to eagerly
know, it would mean hitting the object store).

But when we do know why would we beat around the bush and say "was
referred to as X and Y" once we know what it is. 

AFAICT there's no more reason to think that parse_object_buffer() will
be wrong about the type than "git cat-file -t" will be. They both use
the same underlying functions to get that information.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors
  2021-03-31 11:02           ` Jeff King
  2021-03-31 18:05             ` Junio C Hamano
  2021-03-31 18:31             ` Ævar Arnfjörð Bjarmason
@ 2021-03-31 18:41             ` Junio C Hamano
  2021-03-31 19:00               ` Jeff King
  2 siblings, 1 reply; 142+ messages in thread
From: Junio C Hamano @ 2021-03-31 18:41 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Taylor Blau,
	Elijah Newren, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> +			if (flags & OBJECT_AS_TYPE_EXPECT_PARSED)
> +				error(_("object %s is a %s, but was referred to as a %s"),
> +				      oid_to_hex(&obj->oid), type_name(obj->type),
> +				      type_name(type));
> +			else
> +				error(_("object %s referred to as both a %s and a %s"),
> +				      oid_to_hex(&obj->oid),
> +				      type_name(obj->type), type_name(type));
> +		}

Am I correct to understand that the latter is after we read a tree
that refers to an object with 100644 (blob) and then another tree
that refers to the same object with 40000 (tree), before we have a
need/chance to actually find out what that object is?  The error
would trigger while reading the second tree and find the second
mention of the object that conflicts with the earlier one?


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors
  2021-03-31 18:31             ` Ævar Arnfjörð Bjarmason
@ 2021-03-31 18:59               ` Jeff King
  2021-03-31 20:46                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 142+ messages in thread
From: Jeff King @ 2021-03-31 18:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, Taylor Blau, Elijah Newren, Johannes Schindelin

On Wed, Mar 31, 2021 at 08:31:16PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Ævar's patch tries to improve the case where we might _know_ which is
> > correct (because we're actually parsing the object contents), but of
> > course it covers only a fraction of cases. I'm not really opposed to
> > that per se, but I probably wouldn't bother myself.
> 
> What fraction of cases? As far as I can tell it covers all cases where
> we get this error.
> 
> If there is a case like what you're describing I haven't found it.

It would happen any time somebody calls lookup_foo() because they saw an
object referenced, but _doesn't_ parse it. And then somebody later calls
lookup_bar() in the same way. Neither of them consulted the actual
object database.

Try this with your patches:

-- >8 --
git init repo
cd repo

# just for making things deterministic
export GIT_COMMITTER_NAME='A U Thor'
export GIT_COMMITTER_EMAIL='author@example.com'
export GIT_COMMITTER_DATE='@1234567890 +0000'

blob=$(echo foo | git hash-object -w --stdin)
git tag -m 'tag of blob' tag-of-blob $blob
git update-ref refs/tags/tag-of-commit $(
  git cat-file tag tag-of-blob |
  sed s/blob/commit/g |
  git hash-object -w --stdin -t tag
)
git update-ref refs/tags/tag-of-tree $(
  git cat-file tag tag-of-blob |
  sed s/blob/tree/g |
  git hash-object -w --stdin -t tag
)

git fsck
-- >8 --

That fsck produces (257cc5642 is the blob):

  error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a blob, not a commit
  error: 257cc5642cb1a054f08cc83f2d943e56fd3ebe99: object could not be parsed: .git/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99
  error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a commit, not a tree
  error: bad tag pointer to 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 in aaff0d42df150e1a734f6a8516878b2ea315ee0a
  error: aaff0d42df150e1a734f6a8516878b2ea315ee0a: object could not be parsed: .git/objects/aa/ff0d42df150e1a734f6a8516878b2ea315ee0a
  error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a commit, not a blob
  error: bad tag pointer to 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 in bbd2b7077cd91ee6175cdc0e4c477c25c230cdc7
  error: bbd2b7077cd91ee6175cdc0e4c477c25c230cdc7: object could not be parsed: .git/objects/bb/d2b7077cd91ee6175cdc0e4c477c25c230cdc7

So we claim "is X, not Y" in multiple directions for the same object.

It might just be that there are spots in the fsck code that need to be
adjusted to use your new function (if they are indeed parsing the
referred-to object). But there are lots of places that don't actually
parse the object at the moment they're parsing the tag. E.g.:

  $ git for-each-ref --format='%(*objectname)'
  error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a commit, not a tree
  error: bad tag pointer to 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 in aaff0d42df150e1a734f6a8516878b2ea315ee0a
  Segmentation fault

Neither of those types is the correct one. And the segfault is just a
bonus! :)

I'd expect similar cases with parsing commit parents and tree pointers.
And probably tree entries whose modes are wrong.

> I.e. it happens when we have an un-parsed "struct object" whose type is
> inferred, and parse it to find out it's not what we expected.
> 
> It's not ambigious at all what the object actually is. It's just that
> the previous code was leaking the *assumption* about the type at the
> time of emitting the error, due to an apparent oversight with parsed
> v.s. non-parsed.
> 
> Or in other words, we're leaking the implementation detail that we
> pre-allocated an object struct of a given type in anticipation of
> holding a parsed version of that object soon.

Right. In the case that you are indeed parsing the object later, you can
say definitively "it is X in the odb, but seen as Y previously". But we
do not always hit the "is X, not Y" error when parsing the object. It
might be caused by two of these "pre-allocations" (though really I think
it is not just an implementation detail; the pre-allocation happened
because some other object referred to us as a given type, so it really
is a corruption in the repository. Just not in the object we mention).

> > @@ -169,10 +169,16 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
> >  		return obj;
> >  	}
> >  	else {
> > -		if (!quiet)
> > -			error(_("object %s is a %s, not a %s"),
> > -			      oid_to_hex(&obj->oid),
> > -			      type_name(obj->type), type_name(type));
> > +		if (!(flags & OBJECT_AS_TYPE_QUIET)) {
> > +			if (flags & OBJECT_AS_TYPE_EXPECT_PARSED)
> > +				error(_("object %s is a %s, but was referred to as a %s"),
> > +				      oid_to_hex(&obj->oid), type_name(obj->type),
> > +				      type_name(type));
> > +			else
> > +				error(_("object %s referred to as both a %s and a %s"),
> > +				      oid_to_hex(&obj->oid),
> > +				      type_name(obj->type), type_name(type));
> > +		}
> >  		return NULL;
> >  	}
> >  }
> 
> Per the above I don't understand how you think there's any uncertainty
> here.
> 
> If I'm right and there isn't then first of all I don't see how we could
> emit 1/2 of those errors. The whole problem here is that we don't know
> the type of the un-parsed object (and presumably don't want to eagerly
> know, it would mean hitting the object store).

Forgetting for a moment how to trigger it with actual Git commands, the
root of the problem is that:

  lookup_tree(&oid);
  lookup_blob(&oid);

is going to produce an error message. But we cannot know which object
type is wrong and which is right (if any). So we'd want to produce the
"referred to as both" message.

_If_ the caller happens to know that it has just parsed the object
contents and got a tree, then it would call lookup_parsed_tree(&oid),
which would pass along OBJECT_AS_TYPE_EXPECT_PARSED, and produce the
other message.

In practice, of course those two lookup_foo() calls are not right next
to each other. But they may be triggered on an identical oid by two
references from different objects.

> But when we do know why would we beat around the bush and say "was
> referred to as X and Y" once we know what it is. 
> 
> AFAICT there's no more reason to think that parse_object_buffer() will
> be wrong about the type than "git cat-file -t" will be. They both use
> the same underlying functions to get that information.

My point is that we are not always coming from parse_object_buffer()
when we see these error messages.

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors
  2021-03-31 18:41             ` [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors Junio C Hamano
@ 2021-03-31 19:00               ` Jeff King
  0 siblings, 0 replies; 142+ messages in thread
From: Jeff King @ 2021-03-31 19:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Taylor Blau,
	Elijah Newren, Johannes Schindelin

On Wed, Mar 31, 2021 at 11:41:18AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > +			if (flags & OBJECT_AS_TYPE_EXPECT_PARSED)
> > +				error(_("object %s is a %s, but was referred to as a %s"),
> > +				      oid_to_hex(&obj->oid), type_name(obj->type),
> > +				      type_name(type));
> > +			else
> > +				error(_("object %s referred to as both a %s and a %s"),
> > +				      oid_to_hex(&obj->oid),
> > +				      type_name(obj->type), type_name(type));
> > +		}
> 
> Am I correct to understand that the latter is after we read a tree
> that refers to an object with 100644 (blob) and then another tree
> that refers to the same object with 40000 (tree), before we have a
> need/chance to actually find out what that object is?  The error
> would trigger while reading the second tree and find the second
> mention of the object that conflicts with the earlier one?

Yes, exactly (or two tags, or a tag and a tree, or a commit and a tree,
etc).

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors
  2021-03-31 18:59               ` Jeff King
@ 2021-03-31 20:46                 ` Ævar Arnfjörð Bjarmason
  2021-04-01  7:54                   ` Jeff King
  0 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-31 20:46 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, git, Taylor Blau, Elijah Newren, Johannes Schindelin


On Wed, Mar 31 2021, Jeff King wrote:

> On Wed, Mar 31, 2021 at 08:31:16PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > Ævar's patch tries to improve the case where we might _know_ which is
>> > correct (because we're actually parsing the object contents), but of
>> > course it covers only a fraction of cases. I'm not really opposed to
>> > that per se, but I probably wouldn't bother myself.
>> 
>> What fraction of cases? As far as I can tell it covers all cases where
>> we get this error.
>> 
>> If there is a case like what you're describing I haven't found it.
>
> It would happen any time somebody calls lookup_foo() because they saw an
> object referenced, but _doesn't_ parse it. And then somebody later calls
> lookup_bar() in the same way. Neither of them consulted the actual
> object database.
>
> Try this with your patches:
>
> -- >8 --
> git init repo
> cd repo
>
> # just for making things deterministic
> export GIT_COMMITTER_NAME='A U Thor'
> export GIT_COMMITTER_EMAIL='author@example.com'
> export GIT_COMMITTER_DATE='@1234567890 +0000'
>
> blob=$(echo foo | git hash-object -w --stdin)
> git tag -m 'tag of blob' tag-of-blob $blob
> git update-ref refs/tags/tag-of-commit $(
>   git cat-file tag tag-of-blob |
>   sed s/blob/commit/g |
>   git hash-object -w --stdin -t tag
> )
> git update-ref refs/tags/tag-of-tree $(
>   git cat-file tag tag-of-blob |
>   sed s/blob/tree/g |
>   git hash-object -w --stdin -t tag
> )
>
> git fsck
> -- >8 --
>
> That fsck produces (257cc5642 is the blob):
>
>   error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a blob, not a commit
>   error: 257cc5642cb1a054f08cc83f2d943e56fd3ebe99: object could not be parsed: .git/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99
>   error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a commit, not a tree
>   error: bad tag pointer to 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 in aaff0d42df150e1a734f6a8516878b2ea315ee0a
>   error: aaff0d42df150e1a734f6a8516878b2ea315ee0a: object could not be parsed: .git/objects/aa/ff0d42df150e1a734f6a8516878b2ea315ee0a
>   error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a commit, not a blob
>   error: bad tag pointer to 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 in bbd2b7077cd91ee6175cdc0e4c477c25c230cdc7
>   error: bbd2b7077cd91ee6175cdc0e4c477c25c230cdc7: object could not be parsed: .git/objects/bb/d2b7077cd91ee6175cdc0e4c477c25c230cdc7
>
> So we claim "is X, not Y" in multiple directions for the same object.
>
> It might just be that there are spots in the fsck code that need to be
> adjusted to use your new function (if they are indeed parsing the
> referred-to object). But there are lots of places that don't actually
> parse the object at the moment they're parsing the tag. E.g.:
>
>   $ git for-each-ref --format='%(*objectname)'
>   error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a commit, not a tree
>   error: bad tag pointer to 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 in aaff0d42df150e1a734f6a8516878b2ea315ee0a
>   Segmentation fault
>
> Neither of those types is the correct one. And the segfault is just a
> bonus! :)
>
> I'd expect similar cases with parsing commit parents and tree pointers.
> And probably tree entries whose modes are wrong.

So the segfault happens without my patches, but the change is that
before we'd always get it wrong and say "commit, not a tree", but now
we'll get it right some of the time. Patching the relevant object.c code
to emit different messages from the various functions shows that it's
the oid_is_type*() functions that get it right, but object_as_type() is
wrong as before.

So that's certainly something I missed.

But are there any cases where it makes things worse? Or is it just that
it's not a full fix in all cases, but only a partial one?

>> I.e. it happens when we have an un-parsed "struct object" whose type is
>> inferred, and parse it to find out it's not what we expected.
>> 
>> It's not ambigious at all what the object actually is. It's just that
>> the previous code was leaking the *assumption* about the type at the
>> time of emitting the error, due to an apparent oversight with parsed
>> v.s. non-parsed.
>> 
>> Or in other words, we're leaking the implementation detail that we
>> pre-allocated an object struct of a given type in anticipation of
>> holding a parsed version of that object soon.
>
> Right. In the case that you are indeed parsing the object later, you can
> say definitively "it is X in the odb, but seen as Y previously". But we
> do not always hit the "is X, not Y" error when parsing the object. It
> might be caused by two of these "pre-allocations" (though really I think
> it is not just an implementation detail; the pre-allocation happened
> because some other object referred to us as a given type, so it really
> is a corruption in the repository. Just not in the object we mention).

Indeed, the goal is to emit a sensible message on-the-fly when we see
that corruption.

>> > @@ -169,10 +169,16 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
>> >  		return obj;
>> >  	}
>> >  	else {
>> > -		if (!quiet)
>> > -			error(_("object %s is a %s, not a %s"),
>> > -			      oid_to_hex(&obj->oid),
>> > -			      type_name(obj->type), type_name(type));
>> > +		if (!(flags & OBJECT_AS_TYPE_QUIET)) {
>> > +			if (flags & OBJECT_AS_TYPE_EXPECT_PARSED)
>> > +				error(_("object %s is a %s, but was referred to as a %s"),
>> > +				      oid_to_hex(&obj->oid), type_name(obj->type),
>> > +				      type_name(type));
>> > +			else
>> > +				error(_("object %s referred to as both a %s and a %s"),
>> > +				      oid_to_hex(&obj->oid),
>> > +				      type_name(obj->type), type_name(type));
>> > +		}
>> >  		return NULL;
>> >  	}
>> >  }
>> 
>> Per the above I don't understand how you think there's any uncertainty
>> here.
>> 
>> If I'm right and there isn't then first of all I don't see how we could
>> emit 1/2 of those errors. The whole problem here is that we don't know
>> the type of the un-parsed object (and presumably don't want to eagerly
>> know, it would mean hitting the object store).
>
> Forgetting for a moment how to trigger it with actual Git commands, the
> root of the problem is that:
>
>   lookup_tree(&oid);
>   lookup_blob(&oid);
>
> is going to produce an error message. But we cannot know which object
> type is wrong and which is right (if any). So we'd want to produce the
> "referred to as both" message.
>
> _If_ the caller happens to know that it has just parsed the object
> contents and got a tree, then it would call lookup_parsed_tree(&oid),
> which would pass along OBJECT_AS_TYPE_EXPECT_PARSED, and produce the
> other message.
>
> In practice, of course those two lookup_foo() calls are not right next
> to each other. But they may be triggered on an identical oid by two
> references from different objects.

[...]

>> But when we do know why would we beat around the bush and say "was
>> referred to as X and Y" once we know what it is. 
>> 
>> AFAICT there's no more reason to think that parse_object_buffer() will
>> be wrong about the type than "git cat-file -t" will be. They both use
>> the same underlying functions to get that information.
>
> My point is that we are not always coming from parse_object_buffer()
> when we see these error messages.

If my solution of relying on the parsed v.s. non-parsed shouldn't we
just devolve to a full object info lookup when emitting the error? It's
more expensive, but we're emitting an error anyway...

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors
  2021-03-31 20:46                 ` Ævar Arnfjörð Bjarmason
@ 2021-04-01  7:54                   ` Jeff King
  2021-04-01  8:32                     ` [PATCH] ref-filter: fix NULL check for parse object failure Jeff King
  0 siblings, 1 reply; 142+ messages in thread
From: Jeff King @ 2021-04-01  7:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, Taylor Blau, Elijah Newren, Johannes Schindelin

On Wed, Mar 31, 2021 at 10:46:22PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Neither of those types is the correct one. And the segfault is just a
> > bonus! :)
> >
> > I'd expect similar cases with parsing commit parents and tree pointers.
> > And probably tree entries whose modes are wrong.
> 
> So the segfault happens without my patches,

Yeah, sorry if that was unclear. It is definitely a pre-existing bug.

> but the change is that
> before we'd always get it wrong and say "commit, not a tree", but now
> we'll get it right some of the time. Patching the relevant object.c code
> to emit different messages from the various functions shows that it's
> the oid_is_type*() functions that get it right, but object_as_type() is
> wrong as before.
> 
> So that's certainly something I missed.
> 
> But are there any cases where it makes things worse? Or is it just that
> it's not a full fix in all cases, but only a partial one?

Right, I don't think your patch is making anything worse. It's just that
it does not cover all cases where we see an object as two different
types. Nor can it, since it is relying on code paths that actually parse
the object, and not all of them do.

> > My point is that we are not always coming from parse_object_buffer()
> > when we see these error messages.
> 
> If my solution of relying on the parsed v.s. non-parsed shouldn't we
> just devolve to a full object info lookup when emitting the error? It's
> more expensive, but we're emitting an error anyway...

That's certainly one option (that I suggested earlier in [0]). If we go
that route, then we do not need any of this "the caller passes in an
extra bit to say that it is parsing the object, and it found a tree",
because the error routine in object_as_type() would consult the odb
itself.

But I still think it does not make the error messages fully useful. We
might say "object X is really a tree in the odb, but we previously saw
it as a commit". But we will still have to return NULL from
lookup_tree(), so whatever containing object referenced X, _even though
it has the correct type_, will be the one to propagate the failure up
the stack. It was whoever was responsible for that "previously saw" that
is actually corrupt, and we no longer know who that was.

Which is why I wonder if it is worth even bothering to put a lot of
effort in here. If the issue is just that "X is a foo, not a bar" is
sometimes misleading, then we could solve that by simply making the
message more precise ("we saw X as a foo and a bar; one of them is
wrong"). Even if we could know _which_ is wrong with respect to what's
in the object contents, it isn't all that helpful without being able to
tell the user which object reference was the one that led us to the
wrong conclusion.

-Peff

[0] https://lore.kernel.org/git/YGBHH7sAVsPpVKWd@coredump.intra.peff.net/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH] ref-filter: fix NULL check for parse object failure
  2021-04-01  7:54                   ` Jeff King
@ 2021-04-01  8:32                     ` Jeff King
  2021-04-01 13:56                       ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Ævar Arnfjörð Bjarmason
  2021-04-01 19:52                       ` [PATCH] ref-filter: fix NULL check for parse object failure Junio C Hamano
  0 siblings, 2 replies; 142+ messages in thread
From: Jeff King @ 2021-04-01  8:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Olga Telezhnaya, Junio C Hamano, git, Taylor Blau, Elijah Newren,
	Johannes Schindelin

On Thu, Apr 01, 2021 at 03:54:56AM -0400, Jeff King wrote:

> On Wed, Mar 31, 2021 at 10:46:22PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> > > Neither of those types is the correct one. And the segfault is just a
> > > bonus! :)
> > >
> > > I'd expect similar cases with parsing commit parents and tree pointers.
> > > And probably tree entries whose modes are wrong.
> > 
> > So the segfault happens without my patches,
> 
> Yeah, sorry if that was unclear. It is definitely a pre-existing bug.

Here's a patch to fix it. This is mostly orthogonal to your patch
series. It happens to use a similar recipe to reproduce, but that is not
the only way to do it, and the fix and the test shouldn't conflict
textually or semantically.

-- >8 --
Subject: [PATCH] ref-filter: fix NULL check for parse object failure

After we run parse_object_buffer() to get an object's contents, we try
to check that the return value wasn't NULL. However, since our "struct
object" is a pointer-to-pointer, and we assign like:

  *obj = parse_object_buffer(...);

it's not correct to check:

  if (!obj)

That will always be true, since our double pointer will continue to
point to the single pointer (which is itself NULL). This is a regression
that was introduced by aa46a0da30 (ref-filter: use oid_object_info() to
get object, 2018-07-17); since that commit we'll segfault on a parse
failure, as we try to look at the NULL object pointer.

There are many ways a parse could fail, but most of them are hard to set
up in the tests (it's easy to make a bogus object, but update-ref will
refuse to point to it). The test here uses a tag which points to a wrong
object type. A parse of just the broken tag object will succeed, but
seeing both tag objects in the same process will lead to a parse error
(since we'll see the pointed-to object as both types).

Signed-off-by: Jeff King <peff@peff.net>
---
 ref-filter.c            |  2 +-
 t/t6300-for-each-ref.sh | 10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/ref-filter.c b/ref-filter.c
index f0bd32f714..a0adb4551d 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1608,7 +1608,7 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 
 	if (oi->info.contentp) {
 		*obj = parse_object_buffer(the_repository, &oi->oid, oi->type, oi->size, oi->content, &eaten);
-		if (!obj) {
+		if (!*obj) {
 			if (!eaten)
 				free(oi->content);
 			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index cac7f443d0..2e7c32d50c 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1134,4 +1134,14 @@ test_expect_success 'for-each-ref --ignore-case works on multiple sort keys' '
 	test_cmp expect actual
 '
 
+test_expect_success 'for-each-ref reports broken tags' '
+	git tag -m "good tag" broken-tag-good HEAD &&
+	git cat-file tag broken-tag-good >good &&
+	sed s/commit/blob/ <good >bad &&
+	bad=$(git hash-object -w -t tag bad) &&
+	git update-ref refs/tags/broken-tag-bad $bad &&
+	test_must_fail git for-each-ref --format="%(*objectname)" \
+		refs/tags/broken-tag-*
+'
+
 test_done
-- 
2.31.1.478.g72c5357f0d


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 0/5] mktag tests & fix for-each-ref segfault
  2021-04-01  8:32                     ` [PATCH] ref-filter: fix NULL check for parse object failure Jeff King
@ 2021-04-01 13:56                       ` Ævar Arnfjörð Bjarmason
  2021-04-01 13:56                         ` [PATCH v2 1/5] mktag tests: parse out options in helper Ævar Arnfjörð Bjarmason
                                           ` (5 more replies)
  2021-04-01 19:52                       ` [PATCH] ref-filter: fix NULL check for parse object failure Junio C Hamano
  1 sibling, 6 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-01 13:56 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Olga Telezhnaya, Ævar Arnfjörð Bjarmason

On Thu, Apr 01 2021, Jeff King wrote:

> On Thu, Apr 01, 2021 at 03:54:56AM -0400, Jeff King wrote:
>
>> On Wed, Mar 31, 2021 at 10:46:22PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> 
>> > > Neither of those types is the correct one. And the segfault is just a
>> > > bonus! :)
>> > >
>> > > I'd expect similar cases with parsing commit parents and tree pointers.
>> > > And probably tree entries whose modes are wrong.
>> > 
>> > So the segfault happens without my patches,
>> 
>> Yeah, sorry if that was unclear. It is definitely a pre-existing bug.
>
> Here's a patch to fix it. This is mostly orthogonal to your patch
> series. It happens to use a similar recipe to reproduce, but that is not
> the only way to do it, and the fix and the test shouldn't conflict
> textually or semantically.

Here's a proposed v2. We test the same case, but I thought it made
sense to test this more exhaustively.

The v1 will also leave t6300 in a bad state for whoever adds the next
test, trivial to fix with a test_create_repo, but this seems better.

Jeff King (1):
  ref-filter: fix NULL check for parse object failure

Ævar Arnfjörð Bjarmason (4):
  mktag tests: parse out options in helper
  mktag tests: invert --no-strict test
  mktag tests: do fsck on failure
  mktag tests: test for maybe segfaulting for-each-ref

 ref-filter.c     |  2 +-
 t/t3800-mktag.sh | 90 +++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 75 insertions(+), 17 deletions(-)

Range-diff:
-:  ----------- > 1:  45e0f100613 mktag tests: parse out options in helper
-:  ----------- > 2:  dd71740447d mktag tests: invert --no-strict test
-:  ----------- > 3:  688d7456843 mktag tests: do fsck on failure
-:  ----------- > 4:  403024b1cca mktag tests: test for maybe segfaulting for-each-ref
1:  9358541ce1f ! 5:  2ffe8f9fe3c ref-filter: fix NULL check for parse object failure
    @@ Commit message
     
         There are many ways a parse could fail, but most of them are hard to set
         up in the tests (it's easy to make a bogus object, but update-ref will
    -    refuse to point to it). The test here uses a tag which points to a wrong
    -    object type. A parse of just the broken tag object will succeed, but
    -    seeing both tag objects in the same process will lead to a parse error
    -    (since we'll see the pointed-to object as both types).
    +    refuse to point to it).
    +
    +    A minimal stand-alone test can be found at, but let's use the newly
    +    amended t3800-mktag.sh tests to test these cases exhaustively on all
    +    sorts of bad tags.
    +
    +    1. http://lore.kernel.org/git/YGWFGMdGcKeaqCQF@coredump.intra.peff.net
     
         Signed-off-by: Jeff King <peff@peff.net>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## ref-filter.c ##
     @@ ref-filter.c: static int get_object(struct ref_array_item *ref, int deref, struct object **obj
    @@ ref-filter.c: static int get_object(struct ref_array_item *ref, int deref, struc
      				free(oi->content);
      			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
     
    - ## t/t6300-for-each-ref.sh ##
    -@@ t/t6300-for-each-ref.sh: test_expect_success 'for-each-ref --ignore-case works on multiple sort keys' '
    - 	test_cmp expect actual
    - '
    + ## t/t3800-mktag.sh ##
    +@@ t/t3800-mktag.sh: check_verify_failure () {
    + 		git -C bad-tag for-each-ref "$tag_ref" >actual &&
    + 		test_cmp expected actual &&
    + 		# segfaults!
    +-		! git -C bad-tag for-each-ref --format="%(*objectname)"
    ++		test_must_fail git -C bad-tag for-each-ref --format="%(*objectname)"
    + 	'
    + }
      
    -+test_expect_success 'for-each-ref reports broken tags' '
    -+	git tag -m "good tag" broken-tag-good HEAD &&
    -+	git cat-file tag broken-tag-good >good &&
    -+	sed s/commit/blob/ <good >bad &&
    -+	bad=$(git hash-object -w -t tag bad) &&
    -+	git update-ref refs/tags/broken-tag-bad $bad &&
    -+	test_must_fail git for-each-ref --format="%(*objectname)" \
    -+		refs/tags/broken-tag-*
    -+'
    -+
    - test_done
-- 
2.31.1.474.g72d45d12706


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 1/5] mktag tests: parse out options in helper
  2021-04-01 13:56                       ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Ævar Arnfjörð Bjarmason
@ 2021-04-01 13:56                         ` Ævar Arnfjörð Bjarmason
  2021-04-01 13:56                         ` [PATCH v2 2/5] mktag tests: invert --no-strict test Ævar Arnfjörð Bjarmason
                                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-01 13:56 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Olga Telezhnaya, Ævar Arnfjörð Bjarmason

Change check_verify_failure() helper to parse out options from
$@. This makes it easier to add new options in the future. See
06ce79152be (mktag: add a --[no-]strict option, 2021-01-06) for the
initial implementation.

Let's also replace "" quotes with '' for the test body, the varables
we need are eval'd into the body, so there's no need for the quoting
confusion.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t3800-mktag.sh | 43 +++++++++++++++++++++++++++++++------------
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index 6275c98523f..e9008744e3d 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -12,15 +12,29 @@ test_description='git mktag: tag object verify test'
 # given in the expect.pat file.
 
 check_verify_failure () {
-	test_expect_success "$1" "
-		test_must_fail git mktag <tag.sig 2>message &&
-		grep '$2' message &&
-		if test '$3' != '--no-strict'
+	subject=$1 &&
+	message=$2 &&
+	shift 2 &&
+
+	no_strict= &&
+	while test $# != 0
+	do
+		case "$1" in
+		--no-strict)
+			no_strict=yes
+			;;
+		esac &&
+		shift
+	done &&
+
+	test_expect_success "fail with [--[no-]strict]: $subject" '
+		test_must_fail git mktag <tag.sig 2>err &&
+		if test -z "$no_strict"
 		then
-			test_must_fail git mktag --no-strict <tag.sig 2>message.no-strict &&
-			grep '$2' message.no-strict
+			test_must_fail git mktag <tag.sig 2>err2 &&
+			test_cmp err err2
 		fi
-	"
+	'
 }
 
 test_expect_mktag_success() {
@@ -243,7 +257,8 @@ tagger . <> 0 +0000
 EOF
 
 check_verify_failure 'verify tag-name check' \
-	'^error:.* badTagName:' '--no-strict'
+	'^error:.* badTagName:' \
+	--no-strict
 
 ############################################################
 # 11. tagger line label check #1
@@ -257,7 +272,8 @@ This is filler
 EOF
 
 check_verify_failure '"tagger" line label check #1' \
-	'^error:.* missingTaggerEntry:' '--no-strict'
+	'^error:.* missingTaggerEntry:' \
+	--no-strict
 
 ############################################################
 # 12. tagger line label check #2
@@ -272,7 +288,8 @@ This is filler
 EOF
 
 check_verify_failure '"tagger" line label check #2' \
-	'^error:.* missingTaggerEntry:' '--no-strict'
+	'^error:.* missingTaggerEntry:' \
+	--no-strict
 
 ############################################################
 # 13. allow missing tag author name like fsck
@@ -301,7 +318,8 @@ tagger T A Gger <
 EOF
 
 check_verify_failure 'disallow malformed tagger' \
-	'^error:.* badEmail:' '--no-strict'
+	'^error:.* badEmail:' \
+	--no-strict
 
 ############################################################
 # 15. allow empty tag email
@@ -425,7 +443,8 @@ this line should not be here
 EOF
 
 check_verify_failure 'detect invalid header entry' \
-	'^error:.* extraHeaderEntry:' '--no-strict'
+	'^error:.* extraHeaderEntry:' \
+	--no-strict
 
 test_expect_success 'invalid header entry config & fsck' '
 	test_must_fail git mktag <tag.sig &&
-- 
2.31.1.474.g72d45d12706


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 2/5] mktag tests: invert --no-strict test
  2021-04-01 13:56                       ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Ævar Arnfjörð Bjarmason
  2021-04-01 13:56                         ` [PATCH v2 1/5] mktag tests: parse out options in helper Ævar Arnfjörð Bjarmason
@ 2021-04-01 13:56                         ` Ævar Arnfjörð Bjarmason
  2021-04-01 13:56                         ` [PATCH v2 3/5] mktag tests: do fsck on failure Ævar Arnfjörð Bjarmason
                                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-01 13:56 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Olga Telezhnaya, Ævar Arnfjörð Bjarmason

Change the mktag --no-strict test to actually test success under
--no-strict, that test was added in 06ce79152be (mktag: add a
--[no-]strict option, 2021-01-06).

It doesn't make sense to check that we have the same failure except
when we want --no-strict, by doing that we're assuming that the
behavior will be different under --no-strict, bun nothing was testing
for that.

We should instead assert that --strict is the same as --no-strict,
except in the cases where we've declared that it's not.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t3800-mktag.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index e9008744e3d..951e6d39c2a 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -33,6 +33,8 @@ check_verify_failure () {
 		then
 			test_must_fail git mktag <tag.sig 2>err2 &&
 			test_cmp err err2
+		else
+			git mktag --no-strict <tag.sig
 		fi
 	'
 }
-- 
2.31.1.474.g72d45d12706


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 3/5] mktag tests: do fsck on failure
  2021-04-01 13:56                       ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Ævar Arnfjörð Bjarmason
  2021-04-01 13:56                         ` [PATCH v2 1/5] mktag tests: parse out options in helper Ævar Arnfjörð Bjarmason
  2021-04-01 13:56                         ` [PATCH v2 2/5] mktag tests: invert --no-strict test Ævar Arnfjörð Bjarmason
@ 2021-04-01 13:56                         ` Ævar Arnfjörð Bjarmason
  2021-04-01 13:56                         ` [PATCH v2 4/5] mktag tests: test for maybe segfaulting for-each-ref Ævar Arnfjörð Bjarmason
                                           ` (2 subsequent siblings)
  5 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-01 13:56 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Olga Telezhnaya, Ævar Arnfjörð Bjarmason

Change the check_verify_failure() function to do an fsck of the bad
object on failure.

Due to how fsck works and walks the graph the failure will be
different if the object is reachable, so we might succeed before we've
created the ref, let's make sure we always fail after it's created.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t3800-mktag.sh | 51 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 41 insertions(+), 10 deletions(-)

diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index 951e6d39c2a..4673235b1fd 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -16,6 +16,8 @@ check_verify_failure () {
 	message=$2 &&
 	shift 2 &&
 
+	no_strict= &&
+	fsck_obj_ok= &&
 	no_strict= &&
 	while test $# != 0
 	do
@@ -23,7 +25,10 @@ check_verify_failure () {
 		--no-strict)
 			no_strict=yes
 			;;
-		esac &&
+		--fsck-obj-ok)
+			fsck_obj_ok=yes
+			;;
+		esac
 		shift
 	done &&
 
@@ -35,7 +40,25 @@ check_verify_failure () {
 			test_cmp err err2
 		else
 			git mktag --no-strict <tag.sig
-		fi
+		fi &&
+
+		test_when_finished "rm -rf bad-tag" &&
+		test_create_repo bad-tag &&
+		bad_tag=$(git -C bad-tag hash-object -t tag -w --stdin --literally <tag.sig) &&
+		if test -n "$fsck_obj_ok"
+		then
+			git -C bad-tag fsck
+		else
+			test_must_fail git -C bad-tag fsck >out 2>err
+		fi &&
+
+		# Do update-ref anyway to see if it segfaults
+		tag_ref=refs/tags/bad_tag &&
+		test_might_fail git -C bad-tag update-ref "$tag_ref" "$bad_tag" &&
+		# The update-ref command itself might fail, but we are
+		# not testing that
+		echo "$bad_tag" >"bad-tag/.git/$tag_ref" &&
+		test_must_fail git -C bad-tag fsck
 	'
 }
 
@@ -183,7 +206,8 @@ tagger . <> 0 +0000
 EOF
 
 check_verify_failure 'verify object (hash/type) check -- correct type, nonexisting object' \
-	'^fatal: could not read tagged object'
+	'^fatal: could not read tagged object' \
+	--fsck-obj-ok
 
 cat >tag.sig <<EOF
 object $head
@@ -216,7 +240,8 @@ tagger . <> 0 +0000
 EOF
 
 check_verify_failure 'verify object (hash/type) check -- mismatched type, valid object' \
-	'^fatal: object.*tagged as.*tree.*but is.*commit'
+	'^fatal: object.*tagged as.*tree.*but is.*commit' \
+	--fsck-obj-ok
 
 ############################################################
 #  9.5. verify object (hash/type) check -- replacement
@@ -245,7 +270,8 @@ tagger . <> 0 +0000
 EOF
 
 check_verify_failure 'verify object (hash/type) check -- mismatched type, valid object' \
-	'^fatal: object.*tagged as.*tree.*but is.*blob'
+	'^fatal: object.*tagged as.*tree.*but is.*blob' \
+	--fsck-obj-ok
 
 ############################################################
 # 10. verify tag-name check
@@ -260,7 +286,8 @@ EOF
 
 check_verify_failure 'verify tag-name check' \
 	'^error:.* badTagName:' \
-	--no-strict
+	--no-strict \
+	--fsck-obj-ok
 
 ############################################################
 # 11. tagger line label check #1
@@ -275,7 +302,8 @@ EOF
 
 check_verify_failure '"tagger" line label check #1' \
 	'^error:.* missingTaggerEntry:' \
-	--no-strict
+	--no-strict \
+	--fsck-obj-ok
 
 ############################################################
 # 12. tagger line label check #2
@@ -291,7 +319,8 @@ EOF
 
 check_verify_failure '"tagger" line label check #2' \
 	'^error:.* missingTaggerEntry:' \
-	--no-strict
+	--no-strict \
+	--fsck-obj-ok
 
 ############################################################
 # 13. allow missing tag author name like fsck
@@ -321,7 +350,8 @@ EOF
 
 check_verify_failure 'disallow malformed tagger' \
 	'^error:.* badEmail:' \
-	--no-strict
+	--no-strict \
+	--fsck-obj-ok
 
 ############################################################
 # 15. allow empty tag email
@@ -446,7 +476,8 @@ EOF
 
 check_verify_failure 'detect invalid header entry' \
 	'^error:.* extraHeaderEntry:' \
-	--no-strict
+	--no-strict \
+	--fsck-obj-ok
 
 test_expect_success 'invalid header entry config & fsck' '
 	test_must_fail git mktag <tag.sig &&
-- 
2.31.1.474.g72d45d12706


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 4/5] mktag tests: test for maybe segfaulting for-each-ref
  2021-04-01 13:56                       ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Ævar Arnfjörð Bjarmason
                                           ` (2 preceding siblings ...)
  2021-04-01 13:56                         ` [PATCH v2 3/5] mktag tests: do fsck on failure Ævar Arnfjörð Bjarmason
@ 2021-04-01 13:56                         ` Ævar Arnfjörð Bjarmason
  2021-04-01 13:56                         ` [PATCH v2 5/5] ref-filter: fix NULL check for parse object failure Ævar Arnfjörð Bjarmason
  2021-04-01 19:56                         ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Junio C Hamano
  5 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-01 13:56 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Olga Telezhnaya, Ævar Arnfjörð Bjarmason

Add a test to check that "for-each-ref" fails on a repository with a
bad tag, this test intentionally uses "! " instead of "test_must_fail
" to hide a segfault. We'll fix the underlying bug in a subsequent
commit and convert it to "test_must_fail".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t3800-mktag.sh | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index 4673235b1fd..10e4fde28de 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -58,7 +58,13 @@ check_verify_failure () {
 		# The update-ref command itself might fail, but we are
 		# not testing that
 		echo "$bad_tag" >"bad-tag/.git/$tag_ref" &&
-		test_must_fail git -C bad-tag fsck
+		test_must_fail git -C bad-tag fsck &&
+
+		printf "%s tag\t%s\n" "$bad_tag" "$tag_ref" >expected &&
+		git -C bad-tag for-each-ref "$tag_ref" >actual &&
+		test_cmp expected actual &&
+		# segfaults!
+		! git -C bad-tag for-each-ref --format="%(*objectname)"
 	'
 }
 
-- 
2.31.1.474.g72d45d12706


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 5/5] ref-filter: fix NULL check for parse object failure
  2021-04-01 13:56                       ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Ævar Arnfjörð Bjarmason
                                           ` (3 preceding siblings ...)
  2021-04-01 13:56                         ` [PATCH v2 4/5] mktag tests: test for maybe segfaulting for-each-ref Ævar Arnfjörð Bjarmason
@ 2021-04-01 13:56                         ` Ævar Arnfjörð Bjarmason
  2021-04-01 19:19                           ` Ramsay Jones
  2021-04-01 19:56                         ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Junio C Hamano
  5 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-01 13:56 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Olga Telezhnaya, Jeff King,
	Ævar Arnfjörð Bjarmason

From: Jeff King <peff@peff.net>

After we run parse_object_buffer() to get an object's contents, we try
to check that the return value wasn't NULL. However, since our "struct
object" is a pointer-to-pointer, and we assign like:

  *obj = parse_object_buffer(...);

it's not correct to check:

  if (!obj)

That will always be true, since our double pointer will continue to
point to the single pointer (which is itself NULL). This is a regression
that was introduced by aa46a0da30 (ref-filter: use oid_object_info() to
get object, 2018-07-17); since that commit we'll segfault on a parse
failure, as we try to look at the NULL object pointer.

There are many ways a parse could fail, but most of them are hard to set
up in the tests (it's easy to make a bogus object, but update-ref will
refuse to point to it).

A minimal stand-alone test can be found at, but let's use the newly
amended t3800-mktag.sh tests to test these cases exhaustively on all
sorts of bad tags.

1. http://lore.kernel.org/git/YGWFGMdGcKeaqCQF@coredump.intra.peff.net

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 ref-filter.c     | 2 +-
 t/t3800-mktag.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index f0bd32f7141..a0adb4551d8 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1608,7 +1608,7 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 
 	if (oi->info.contentp) {
 		*obj = parse_object_buffer(the_repository, &oi->oid, oi->type, oi->size, oi->content, &eaten);
-		if (!obj) {
+		if (!*obj) {
 			if (!eaten)
 				free(oi->content);
 			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index 10e4fde28de..b175d639013 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -64,7 +64,7 @@ check_verify_failure () {
 		git -C bad-tag for-each-ref "$tag_ref" >actual &&
 		test_cmp expected actual &&
 		# segfaults!
-		! git -C bad-tag for-each-ref --format="%(*objectname)"
+		test_must_fail git -C bad-tag for-each-ref --format="%(*objectname)"
 	'
 }
 
-- 
2.31.1.474.g72d45d12706


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 5/5] ref-filter: fix NULL check for parse object failure
  2021-04-01 13:56                         ` [PATCH v2 5/5] ref-filter: fix NULL check for parse object failure Ævar Arnfjörð Bjarmason
@ 2021-04-01 19:19                           ` Ramsay Jones
  0 siblings, 0 replies; 142+ messages in thread
From: Ramsay Jones @ 2021-04-01 19:19 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Olga Telezhnaya, Jeff King

On Thu, Apr 01, 2021 at 03:56:30PM +0200, Ævar Arnfjörð Bjarmason wrote:
> From: Jeff King <peff@peff.net>
> 
[snip]
> 
> A minimal stand-alone test can be found at, but let's use the newly

... can be found at, ... Hmm, missing test number?

ATB,
Ramsay Jones

> amended t3800-mktag.sh tests to test these cases exhaustively on all
> sorts of bad tags.
> 
> 1. http://lore.kernel.org/git/YGWFGMdGcKeaqCQF@coredump.intra.peff.net
> 
> Signed-off-by: Jeff King <peff@peff.net>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  ref-filter.c     | 2 +-
>  t/t3800-mktag.sh | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
[snip]


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH] ref-filter: fix NULL check for parse object failure
  2021-04-01  8:32                     ` [PATCH] ref-filter: fix NULL check for parse object failure Jeff King
  2021-04-01 13:56                       ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Ævar Arnfjörð Bjarmason
@ 2021-04-01 19:52                       ` Junio C Hamano
  1 sibling, 0 replies; 142+ messages in thread
From: Junio C Hamano @ 2021-04-01 19:52 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Olga Telezhnaya, git,
	Taylor Blau, Elijah Newren, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> Here's a patch to fix it. This is mostly orthogonal to your patch
> series. It happens to use a similar recipe to reproduce, but that is not
> the only way to do it, and the fix and the test shouldn't conflict
> textually or semantically.
>
> -- >8 --
> Subject: [PATCH] ref-filter: fix NULL check for parse object failure
>
> After we run parse_object_buffer() to get an object's contents, we try
> to check that the return value wasn't NULL. However, since our "struct
> object" is a pointer-to-pointer, and we assign like:
>
>   *obj = parse_object_buffer(...);
>
> it's not correct to check:
>
>   if (!obj)
>
> That will always be true, since our double pointer will continue to
> point to the single pointer (which is itself NULL). This is a regression
> that was introduced by aa46a0da30 (ref-filter: use oid_object_info() to
> get object, 2018-07-17); since that commit we'll segfault on a parse
> failure, as we try to look at the NULL object pointer.
>
> There are many ways a parse could fail, but most of them are hard to set
> up in the tests (it's easy to make a bogus object, but update-ref will
> refuse to point to it). The test here uses a tag which points to a wrong
> object type. A parse of just the broken tag object will succeed, but
> seeing both tag objects in the same process will lead to a parse error
> (since we'll see the pointed-to object as both types).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  ref-filter.c            |  2 +-
>  t/t6300-for-each-ref.sh | 10 ++++++++++
>  2 files changed, 11 insertions(+), 1 deletion(-)

Makes sense.  Will queue.

> diff --git a/ref-filter.c b/ref-filter.c
> index f0bd32f714..a0adb4551d 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1608,7 +1608,7 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
>  
>  	if (oi->info.contentp) {
>  		*obj = parse_object_buffer(the_repository, &oi->oid, oi->type, oi->size, oi->content, &eaten);
> -		if (!obj) {
> +		if (!*obj) {
>  			if (!eaten)
>  				free(oi->content);
>  			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
> diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
> index cac7f443d0..2e7c32d50c 100755
> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -1134,4 +1134,14 @@ test_expect_success 'for-each-ref --ignore-case works on multiple sort keys' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'for-each-ref reports broken tags' '
> +	git tag -m "good tag" broken-tag-good HEAD &&
> +	git cat-file tag broken-tag-good >good &&
> +	sed s/commit/blob/ <good >bad &&
> +	bad=$(git hash-object -w -t tag bad) &&
> +	git update-ref refs/tags/broken-tag-bad $bad &&
> +	test_must_fail git for-each-ref --format="%(*objectname)" \
> +		refs/tags/broken-tag-*
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 0/5] mktag tests & fix for-each-ref segfault
  2021-04-01 13:56                       ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Ævar Arnfjörð Bjarmason
                                           ` (4 preceding siblings ...)
  2021-04-01 13:56                         ` [PATCH v2 5/5] ref-filter: fix NULL check for parse object failure Ævar Arnfjörð Bjarmason
@ 2021-04-01 19:56                         ` Junio C Hamano
  2021-04-02 11:37                           ` Ævar Arnfjörð Bjarmason
  5 siblings, 1 reply; 142+ messages in thread
From: Junio C Hamano @ 2021-04-01 19:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Olga Telezhnaya

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Here's a proposed v2. We test the same case, but I thought it made
> sense to test this more exhaustively.

Let's first make a targetted fix that can be applied to maint and
below.  After that is merged to 'master', you are free to add more
tests on top, but let's avoid to have more and more topics that go
overboard.

Thanks.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 0/5] mktag tests & fix for-each-ref segfault
  2021-04-01 19:56                         ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Junio C Hamano
@ 2021-04-02 11:37                           ` Ævar Arnfjörð Bjarmason
  2021-04-02 20:51                             ` Junio C Hamano
  0 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-02 11:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Olga Telezhnaya


On Thu, Apr 01 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Here's a proposed v2. We test the same case, but I thought it made
>> sense to test this more exhaustively.
>
> Let's first make a targetted fix that can be applied to maint and
> below.  After that is merged to 'master', you are free to add more
> tests on top, 

Makes sense. I based Jeff's patch on top of mine to demonstrate that
those tests also catch the segfault.

> but let's avoid to have more and more topics that go overboard.

So "submit a new version on-top" or "maybe deal with your existing
topics first as you're overflowing my inbox" ? :) I suspect the latter..

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 0/5] mktag tests & fix for-each-ref segfault
  2021-04-02 11:37                           ` Ævar Arnfjörð Bjarmason
@ 2021-04-02 20:51                             ` Junio C Hamano
  0 siblings, 0 replies; 142+ messages in thread
From: Junio C Hamano @ 2021-04-02 20:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Olga Telezhnaya

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Apr 01 2021, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>
>>> Here's a proposed v2. We test the same case, but I thought it made
>>> sense to test this more exhaustively.
>>
>> Let's first make a targetted fix that can be applied to maint and
>> below.  After that is merged to 'master', you are free to add more
>> tests on top, 
>
> Makes sense. I based Jeff's patch on top of mine to demonstrate that
> those tests also catch the segfault.
>
>> but let's avoid to have more and more topics that go overboard.
>
> So "submit a new version on-top" or "maybe deal with your existing
> topics first as you're overflowing my inbox" ? :) I suspect the latter..

What I meant is...

Comparing the five-patch series with Peff's small fix that is more
to the point, I have a feeling that the five-patch series, like many
other series from you, may be made unnecessarily large by not
resisting the temptation to including unessential "while at it"
changes.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 0/2] blob/object.c: trivial readability improvements
  2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                         ` (10 preceding siblings ...)
  2021-03-28  9:27       ` [PATCH v2 00/10] improve reporting of unexpected objects Jeff King
@ 2021-04-09  8:07       ` Ævar Arnfjörð Bjarmason
  2021-04-09  8:07         ` [PATCH 1/2] blob.c: remove buffer & size arguments to parse_blob_buffer() Ævar Arnfjörð Bjarmason
                           ` (3 more replies)
  11 siblings, 4 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

This is the initial and very trivil part of a re-starting the
ab/unexpected-object-type topic by splitting it up into more managable
pieces, see a note about the split in [1].

1. https://lore.kernel.org/git/87r1jkgvoc.fsf@evledraar.gmail.com/

Ævar Arnfjörð Bjarmason (2):
  blob.c: remove buffer & size arguments to parse_blob_buffer()
  object.c: initialize automatic variable in lookup_object()

 blob.c   | 2 +-
 blob.h   | 3 +--
 object.c | 8 ++++----
 3 files changed, 6 insertions(+), 7 deletions(-)

-- 
2.31.1.584.gf4baedee75


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 1/2] blob.c: remove buffer & size arguments to parse_blob_buffer()
  2021-04-09  8:07       ` [PATCH 0/2] blob/object.c: trivial readability improvements Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:07         ` Ævar Arnfjörð Bjarmason
  2021-04-09 17:51           ` Jeff King
  2021-04-09  8:07         ` [PATCH 2/2] object.c: initialize automatic variable in lookup_object() Ævar Arnfjörð Bjarmason
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

As noted in the comment introduced in 837d395a5c0 (Replace
parse_blob() with an explanatory comment, 2010-01-18) the old
parse_blob() function and the current parse_blob_buffer() exist merely
to provide consistency in the API.

We're not going to parse blobs like we "parse" commits, trees or
tags. So let's not have the parse_blob_buffer() take arguments that
pretends that we do. Its only use is to set the "parsed" flag.

See bd2c39f58f9 ([PATCH] don't load and decompress objects twice with
parse_object(), 2005-05-06) for the introduction of parse_blob_buffer().

I'm moving the prototype of parse_blob_buffer() below the comment
added in 837d395a5c0 while I'm at it. That comment was originally
meant to be a replacement for the missing parse_blob() function, but
it's much less confusing to have it be above the parse_blob_buffer()
function it refers to.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 blob.c   | 2 +-
 blob.h   | 3 +--
 object.c | 4 ++--
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/blob.c b/blob.c
index 182718aba9..389a7546dc 100644
--- a/blob.c
+++ b/blob.c
@@ -13,7 +13,7 @@ struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
 	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
-int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
+int parse_blob_buffer(struct blob *item)
 {
 	item->object.parsed = 1;
 	return 0;
diff --git a/blob.h b/blob.h
index 1664872055..ac1d4804a5 100644
--- a/blob.h
+++ b/blob.h
@@ -11,8 +11,6 @@ struct blob {
 
 struct blob *lookup_blob(struct repository *r, const struct object_id *oid);
 
-int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size);
-
 /**
  * Blobs do not contain references to other objects and do not have
  * structured data that needs parsing. However, code may use the
@@ -21,5 +19,6 @@ int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size);
  * parse_blob_buffer() is used (by object.c) to flag that the object
  * has been read successfully from the database.
  **/
+int parse_blob_buffer(struct blob *item);
 
 #endif /* BLOB_H */
diff --git a/object.c b/object.c
index 78343781ae..63896abf01 100644
--- a/object.c
+++ b/object.c
@@ -195,7 +195,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 	if (type == OBJ_BLOB) {
 		struct blob *blob = lookup_blob(r, oid);
 		if (blob) {
-			if (parse_blob_buffer(blob, buffer, size))
+			if (parse_blob_buffer(blob))
 				return NULL;
 			obj = &blob->object;
 		}
@@ -266,7 +266,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
-		parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
+		parse_blob_buffer(lookup_blob(r, oid));
 		return lookup_object(r, oid);
 	}
 
-- 
2.31.1.584.gf4baedee75


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 2/2] object.c: initialize automatic variable in lookup_object()
  2021-04-09  8:07       ` [PATCH 0/2] blob/object.c: trivial readability improvements Ævar Arnfjörð Bjarmason
  2021-04-09  8:07         ` [PATCH 1/2] blob.c: remove buffer & size arguments to parse_blob_buffer() Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:07         ` Ævar Arnfjörð Bjarmason
  2021-04-09 17:53           ` Jeff King
  2021-04-09  8:32         ` [PATCH 0/6] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
  2021-04-20 12:50         ` [PATCH v2 00/10] object.c et al: tests, small bug fixes etc Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Initialize a "struct object obj*" variable to NULL explicitly and
return it instead of leaving it uninitialized until the "while"
loop.

There was no bug here, it's just less confusing when debugging if the
"obj" is either NULL or a valid object, not some random invalid
pointer.

See 0556a11a0df (git object hash cleanups, 2006-06-30) for the initial
implementation.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/object.c b/object.c
index 63896abf01..7fdca3ed1e 100644
--- a/object.c
+++ b/object.c
@@ -87,10 +87,10 @@ static void insert_obj_hash(struct object *obj, struct object **hash, unsigned i
 struct object *lookup_object(struct repository *r, const struct object_id *oid)
 {
 	unsigned int i, first;
-	struct object *obj;
+	struct object *obj = NULL;
 
 	if (!r->parsed_objects->obj_hash)
-		return NULL;
+		return obj;
 
 	first = i = hash_obj(oid, r->parsed_objects->obj_hash_size);
 	while ((obj = r->parsed_objects->obj_hash[i]) != NULL) {
-- 
2.31.1.584.gf4baedee75


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 0/6] {tag,object}*.c: refactorings + prep for a larger change
  2021-04-09  8:07       ` [PATCH 0/2] blob/object.c: trivial readability improvements Ævar Arnfjörð Bjarmason
  2021-04-09  8:07         ` [PATCH 1/2] blob.c: remove buffer & size arguments to parse_blob_buffer() Ævar Arnfjörð Bjarmason
  2021-04-09  8:07         ` [PATCH 2/2] object.c: initialize automatic variable in lookup_object() Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:32         ` Ævar Arnfjörð Bjarmason
  2021-04-09  8:32           ` [PATCH 1/6] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
                             ` (7 more replies)
  2021-04-20 12:50         ` [PATCH v2 00/10] object.c et al: tests, small bug fixes etc Ævar Arnfjörð Bjarmason
  3 siblings, 8 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Based on top of[1] this is part of a split-up of [2] into more
digestable pieces. I've addressed feedback on this part of the series
this mostly-a-re-roll.

1. http://lore.kernel.org/git/cover-0.3-0000000000-20210409T080534Z-avarab@gmail.com
2. https://lore.kernel.org/git/cover-00.11-00000000000-20210328T021238Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (6):
  object.c: stop supporting len == -1 in type_from_string_gently()
  object.c: remove "gently" argument to type_from_string_gently()
  object.c: make type_from_string() return "enum object_type"
  object-file.c: make oid_object_info() return "enum object_type"
  object-name.c: make dependency on object_type order more obvious
  tag.c: use type_from_string_gently() when parsing tags

 builtin/blame.c      |  2 +-
 builtin/index-pack.c |  2 +-
 fsck.c               |  2 +-
 object-file.c        | 10 ++++------
 object-name.c        | 25 +++++++++++++------------
 object-store.h       |  4 +++-
 object.c             | 20 +++++++++++---------
 object.h             |  4 ++--
 packfile.c           |  2 +-
 tag.c                | 19 ++++++++++---------
 10 files changed, 47 insertions(+), 43 deletions(-)

-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 1/6] object.c: stop supporting len == -1 in type_from_string_gently()
  2021-04-09  8:32         ` [PATCH 0/6] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:32           ` Ævar Arnfjörð Bjarmason
  2021-04-09 18:06             ` Jeff King
  2021-04-09  8:32           ` [PATCH 2/6] object.c: remove "gently" argument to type_from_string_gently() Ævar Arnfjörð Bjarmason
                             ` (6 subsequent siblings)
  7 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change the type_from_string() macro into a function and drop the
support for passing len < 0.

Support for len < 0 was added in fe8e3b71805 (Refactor
type_from_string() to allow continuing after detecting an error,
2014-09-10), but no callers use that form. Let's drop it to simplify
this, and in preparation for simplifying these even further.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object.c | 10 +++++++---
 object.h |  2 +-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/object.c b/object.c
index 7fdca3ed1e..88de01e5ac 100644
--- a/object.c
+++ b/object.c
@@ -39,9 +39,6 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
 {
 	int i;
 
-	if (len < 0)
-		len = strlen(str);
-
 	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
 		if (!strncmp(str, object_type_strings[i], len) &&
 		    object_type_strings[i][len] == '\0')
@@ -53,6 +50,13 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
 	die(_("invalid object type \"%s\""), str);
 }
 
+int type_from_string(const char *str)
+{
+	size_t len = strlen(str);
+	int ret = type_from_string_gently(str, len, 0);
+	return ret;
+}
+
 /*
  * Return a numerical hash value between 0 and n-1 for the object with
  * the specified sha1.  n must be a power of 2.  Please note that the
diff --git a/object.h b/object.h
index 59daadce21..3ab3eb193d 100644
--- a/object.h
+++ b/object.h
@@ -94,7 +94,7 @@ struct object {
 
 const char *type_name(unsigned int type);
 int type_from_string_gently(const char *str, ssize_t, int gentle);
-#define type_from_string(str) type_from_string_gently(str, -1, 0)
+int type_from_string(const char *str);
 
 /*
  * Return the current number of buckets in the object hashmap.
-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 2/6] object.c: remove "gently" argument to type_from_string_gently()
  2021-04-09  8:32         ` [PATCH 0/6] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
  2021-04-09  8:32           ` [PATCH 1/6] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:32           ` Ævar Arnfjörð Bjarmason
  2021-04-09 18:10             ` Jeff King
  2021-04-09  8:32           ` [PATCH 3/6] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
                             ` (5 subsequent siblings)
  7 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Get rid of the "gently" argument to type_from_string_gently() to make
it consistent with most other *_gently() functions. It's already a
"gentle" function, it shouldn't need a boolean argument telling it to
be gentle.

The reason it had a "gentle" parameter was because until the preceding
commit "type_from_string()" was a macro resolving to
"type_from_string_gently()", it's now a function.

This refactoring of adding a third parameter was done in
fe8e3b71805 (Refactor type_from_string() to allow continuing after
detecting an error, 2014-09-10) in preparation for its use in
fsck.c.

Simplifying this means we can move the die() into the simpler
type_from_string() function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c        |  2 +-
 object-file.c |  2 +-
 object.c      | 12 +++++-------
 object.h      |  2 +-
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fsck.c b/fsck.c
index f5ed6a2635..8dda548c38 100644
--- a/fsck.c
+++ b/fsck.c
@@ -875,7 +875,7 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
 		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TYPE, "invalid format - unexpected end after 'type' line");
 		goto done;
 	}
-	*tagged_type = type_from_string_gently(buffer, eol - buffer, 1);
+	*tagged_type = type_from_string_gently(buffer, eol - buffer);
 	if (*tagged_type < 0)
 		ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_TYPE, "invalid 'type' value");
 	if (ret)
diff --git a/object-file.c b/object-file.c
index 624af408cd..b7c26b6735 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1314,7 +1314,7 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 		type_len++;
 	}
 
-	type = type_from_string_gently(type_buf, type_len, 1);
+	type = type_from_string_gently(type_buf, type_len);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
 	/*
diff --git a/object.c b/object.c
index 88de01e5ac..5477abc97c 100644
--- a/object.c
+++ b/object.c
@@ -35,7 +35,7 @@ const char *type_name(unsigned int type)
 	return object_type_strings[type];
 }
 
-int type_from_string_gently(const char *str, ssize_t len, int gentle)
+int type_from_string_gently(const char *str, ssize_t len)
 {
 	int i;
 
@@ -43,17 +43,15 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
 		if (!strncmp(str, object_type_strings[i], len) &&
 		    object_type_strings[i][len] == '\0')
 			return i;
-
-	if (gentle)
-		return -1;
-
-	die(_("invalid object type \"%s\""), str);
+	return -1;
 }
 
 int type_from_string(const char *str)
 {
 	size_t len = strlen(str);
-	int ret = type_from_string_gently(str, len, 0);
+	int ret = type_from_string_gently(str, len);
+	if (ret < 0)
+		die(_("invalid object type \"%s\""), str);
 	return ret;
 }
 
diff --git a/object.h b/object.h
index 3ab3eb193d..ffdc129830 100644
--- a/object.h
+++ b/object.h
@@ -93,7 +93,7 @@ struct object {
 };
 
 const char *type_name(unsigned int type);
-int type_from_string_gently(const char *str, ssize_t, int gentle);
+int type_from_string_gently(const char *str, ssize_t len);
 int type_from_string(const char *str);
 
 /*
-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 3/6] object.c: make type_from_string() return "enum object_type"
  2021-04-09  8:32         ` [PATCH 0/6] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
  2021-04-09  8:32           ` [PATCH 1/6] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
  2021-04-09  8:32           ` [PATCH 2/6] object.c: remove "gently" argument to type_from_string_gently() Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:32           ` Ævar Arnfjörð Bjarmason
  2021-04-09 18:14             ` Jeff King
  2021-04-09  8:32           ` [PATCH 4/6] object-file.c: make oid_object_info() " Ævar Arnfjörð Bjarmason
                             ` (4 subsequent siblings)
  7 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change the type_from_string*() functions to return an "enum
object_type", but don't refactor their callers to check for "==
OBJ_BAD" instead of "< 0".

Refactoring the check of the return value to check == OBJ_BAD would
now be equivalent to "ret < 0", but the consensus on an earlier
version of this patch was to not do that, and to instead use -1
consistently as a return value. It just so happens that OBJ_BAD == -1,
but let's not put a hard reliance on that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object.c | 8 ++++----
 object.h | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/object.c b/object.c
index 5477abc97c..2216cdcda2 100644
--- a/object.c
+++ b/object.c
@@ -35,9 +35,9 @@ const char *type_name(unsigned int type)
 	return object_type_strings[type];
 }
 
-int type_from_string_gently(const char *str, ssize_t len)
+enum object_type type_from_string_gently(const char *str, ssize_t len)
 {
-	int i;
+	enum object_type i;
 
 	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
 		if (!strncmp(str, object_type_strings[i], len) &&
@@ -46,10 +46,10 @@ int type_from_string_gently(const char *str, ssize_t len)
 	return -1;
 }
 
-int type_from_string(const char *str)
+enum object_type type_from_string(const char *str)
 {
 	size_t len = strlen(str);
-	int ret = type_from_string_gently(str, len);
+	enum object_type ret = type_from_string_gently(str, len);
 	if (ret < 0)
 		die(_("invalid object type \"%s\""), str);
 	return ret;
diff --git a/object.h b/object.h
index ffdc129830..5e7a523e85 100644
--- a/object.h
+++ b/object.h
@@ -93,8 +93,8 @@ struct object {
 };
 
 const char *type_name(unsigned int type);
-int type_from_string_gently(const char *str, ssize_t len);
-int type_from_string(const char *str);
+enum object_type type_from_string_gently(const char *str, ssize_t len);
+enum object_type type_from_string(const char *str);
 
 /*
  * Return the current number of buckets in the object hashmap.
-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 4/6] object-file.c: make oid_object_info() return "enum object_type"
  2021-04-09  8:32         ` [PATCH 0/6] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
                             ` (2 preceding siblings ...)
  2021-04-09  8:32           ` [PATCH 3/6] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:32           ` Ævar Arnfjörð Bjarmason
  2021-04-09 18:24             ` Jeff King
  2021-04-09  8:32           ` [PATCH 5/6] object-name.c: make dependency on object_type order more obvious Ævar Arnfjörð Bjarmason
                             ` (3 subsequent siblings)
  7 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change oid_object_info() to return an "enum object_type". Unlike
oid_object_info_extended() function the simpler oid_object_info()
explicitly returns the oi.typep member, which is itself an "enum
object_type".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/blame.c      |  2 +-
 builtin/index-pack.c |  2 +-
 object-file.c        |  8 +++-----
 object-name.c        | 19 +++++++++----------
 object-store.h       |  4 +++-
 packfile.c           |  2 +-
 6 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 641523ff9a..5dd3c38a8c 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -810,7 +810,7 @@ static int peel_to_commit_oid(struct object_id *oid_ret, void *cbdata)
 	oidcpy(&oid, oid_ret);
 	while (1) {
 		struct object *obj;
-		int kind = oid_object_info(r, &oid, NULL);
+		enum object_type kind = oid_object_info(r, &oid, NULL);
 		if (kind == OBJ_COMMIT) {
 			oidcpy(oid_ret, &oid);
 			return 0;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 15507b5cff..c0e3768c32 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -237,7 +237,7 @@ static unsigned check_object(struct object *obj)
 
 	if (!(obj->flags & FLAG_CHECKED)) {
 		unsigned long size;
-		int type = oid_object_info(the_repository, &obj->oid, &size);
+		enum object_type type = oid_object_info(the_repository, &obj->oid, &size);
 		if (type <= 0)
 			die(_("did not receive expected object %s"),
 			      oid_to_hex(&obj->oid));
diff --git a/object-file.c b/object-file.c
index b7c26b6735..8ed54d6f62 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1572,11 +1572,9 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 	return ret;
 }
 
-
-/* returns enum object_type or negative */
-int oid_object_info(struct repository *r,
-		    const struct object_id *oid,
-		    unsigned long *sizep)
+enum object_type oid_object_info(struct repository *r,
+				 const struct object_id *oid,
+				 unsigned long *sizep)
 {
 	enum object_type type;
 	struct object_info oi = OBJECT_INFO_INIT;
diff --git a/object-name.c b/object-name.c
index 64202de60b..4d7f0c66cf 100644
--- a/object-name.c
+++ b/object-name.c
@@ -239,9 +239,8 @@ static int disambiguate_committish_only(struct repository *r,
 					void *cb_data_unused)
 {
 	struct object *obj;
-	int kind;
+	enum object_type kind = oid_object_info(r, oid, NULL);
 
-	kind = oid_object_info(r, oid, NULL);
 	if (kind == OBJ_COMMIT)
 		return 1;
 	if (kind != OBJ_TAG)
@@ -258,7 +257,7 @@ static int disambiguate_tree_only(struct repository *r,
 				  const struct object_id *oid,
 				  void *cb_data_unused)
 {
-	int kind = oid_object_info(r, oid, NULL);
+	enum object_type kind = oid_object_info(r, oid, NULL);
 	return kind == OBJ_TREE;
 }
 
@@ -267,7 +266,7 @@ static int disambiguate_treeish_only(struct repository *r,
 				     void *cb_data_unused)
 {
 	struct object *obj;
-	int kind;
+	enum object_type kind;
 
 	kind = oid_object_info(r, oid, NULL);
 	if (kind == OBJ_TREE || kind == OBJ_COMMIT)
@@ -286,7 +285,7 @@ static int disambiguate_blob_only(struct repository *r,
 				  const struct object_id *oid,
 				  void *cb_data_unused)
 {
-	int kind = oid_object_info(r, oid, NULL);
+	enum object_type kind = oid_object_info(r, oid, NULL);
 	return kind == OBJ_BLOB;
 }
 
@@ -361,7 +360,7 @@ static int show_ambiguous_object(const struct object_id *oid, void *data)
 {
 	const struct disambiguate_state *ds = data;
 	struct strbuf desc = STRBUF_INIT;
-	int type;
+	enum object_type type;
 
 	if (ds->fn && !ds->fn(ds->repo, oid, ds->cb_data))
 		return 0;
@@ -405,10 +404,10 @@ static int repo_collect_ambiguous(struct repository *r,
 static int sort_ambiguous(const void *a, const void *b, void *ctx)
 {
 	struct repository *sort_ambiguous_repo = ctx;
-	int a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
-	int b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
-	int a_type_sort;
-	int b_type_sort;
+	enum object_type a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
+	enum object_type b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
+	enum object_type a_type_sort;
+	enum object_type b_type_sort;
 
 	/*
 	 * Sorts by hash within the same object type, just as
diff --git a/object-store.h b/object-store.h
index ec32c23dcb..eab9674d08 100644
--- a/object-store.h
+++ b/object-store.h
@@ -208,7 +208,9 @@ static inline void *repo_read_object_file(struct repository *r,
 #endif
 
 /* Read and unpack an object file into memory, write memory to an object file */
-int oid_object_info(struct repository *r, const struct object_id *, unsigned long *);
+enum object_type oid_object_info(struct repository *r,
+				 const struct object_id *,
+				 unsigned long *);
 
 int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 		     unsigned long len, const char *type,
diff --git a/packfile.c b/packfile.c
index 6661f3325a..3ee01ea732 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1266,7 +1266,7 @@ static int retry_bad_packed_offset(struct repository *r,
 				   struct packed_git *p,
 				   off_t obj_offset)
 {
-	int type;
+	enum object_type type;
 	uint32_t pos;
 	struct object_id oid;
 	if (offset_to_pack_pos(p, obj_offset, &pos) < 0)
-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 5/6] object-name.c: make dependency on object_type order more obvious
  2021-04-09  8:32         ` [PATCH 0/6] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
                             ` (3 preceding siblings ...)
  2021-04-09  8:32           ` [PATCH 4/6] object-file.c: make oid_object_info() " Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:32           ` Ævar Arnfjörð Bjarmason
  2021-04-09 18:36             ` Jeff King
  2021-04-09  8:32           ` [PATCH 6/6] tag.c: use type_from_string_gently() when parsing tags Ævar Arnfjörð Bjarmason
                             ` (2 subsequent siblings)
  7 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Add an assert to make it more obvious that we were effectively
hardcoding OBJ_TAG in sort_ambiguous() as "4".

I wrote this code in 5cc044e0257 (get_short_oid: sort ambiguous
objects by type, then SHA-1, 2018-05-10), there was already a comment
about this magic, but let's make sure that someone doing a potential
reordering of "enum object_type" in the future would notice it
breaking this function (and probably a bunch of other things...).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-name.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/object-name.c b/object-name.c
index 4d7f0c66cf..b6a7328b7a 100644
--- a/object-name.c
+++ b/object-name.c
@@ -408,6 +408,8 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
 	enum object_type b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
 	enum object_type a_type_sort;
 	enum object_type b_type_sort;
+	const enum object_type tag_type_offs = OBJ_TAG - OBJ_NONE;
+	assert(tag_type_offs == 4);
 
 	/*
 	 * Sorts by hash within the same object type, just as
@@ -425,8 +427,8 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
 	 * cleverly) do that with modulus, since the enum assigns 1 to
 	 * commit, so tag becomes 0.
 	 */
-	a_type_sort = a_type % 4;
-	b_type_sort = b_type % 4;
+	a_type_sort = a_type % tag_type_offs;
+	b_type_sort = b_type % tag_type_offs;
 	return a_type_sort > b_type_sort ? 1 : -1;
 }
 
-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 6/6] tag.c: use type_from_string_gently() when parsing tags
  2021-04-09  8:32         ` [PATCH 0/6] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
                             ` (4 preceding siblings ...)
  2021-04-09  8:32           ` [PATCH 5/6] object-name.c: make dependency on object_type order more obvious Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:32           ` Ævar Arnfjörð Bjarmason
  2021-04-09 18:42             ` Jeff King
  2021-04-09  8:49           ` [PATCH 0/7] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
  2021-04-20 13:00           ` [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
  7 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change a series of strcmp() to instead use type_from_string_gently()
to get the integer type early, and then use that for comparison.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 tag.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/tag.c b/tag.c
index 3e18a41841..871c4c9a14 100644
--- a/tag.c
+++ b/tag.c
@@ -135,7 +135,7 @@ void release_tag_memory(struct tag *t)
 int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size)
 {
 	struct object_id oid;
-	char type[20];
+	enum object_type type;
 	const char *bufptr = data;
 	const char *tail = bufptr + size;
 	const char *nl;
@@ -162,23 +162,24 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 		return -1;
 	bufptr += 5;
 	nl = memchr(bufptr, '\n', tail - bufptr);
-	if (!nl || sizeof(type) <= (nl - bufptr))
+	if (!nl)
+		return -1;
+	type = type_from_string_gently(bufptr, nl - bufptr);
+	if (type < 0)
 		return -1;
-	memcpy(type, bufptr, nl - bufptr);
-	type[nl - bufptr] = '\0';
 	bufptr = nl + 1;
 
-	if (!strcmp(type, blob_type)) {
+	if (type == OBJ_BLOB) {
 		item->tagged = (struct object *)lookup_blob(r, &oid);
-	} else if (!strcmp(type, tree_type)) {
+	} else if (type == OBJ_TREE) {
 		item->tagged = (struct object *)lookup_tree(r, &oid);
-	} else if (!strcmp(type, commit_type)) {
+	} else if (type == OBJ_COMMIT) {
 		item->tagged = (struct object *)lookup_commit(r, &oid);
-	} else if (!strcmp(type, tag_type)) {
+	} else if (type == OBJ_TAG) {
 		item->tagged = (struct object *)lookup_tag(r, &oid);
 	} else {
 		return error("unknown tag type '%s' in %s",
-			     type, oid_to_hex(&item->object.oid));
+			     type_name(type), oid_to_hex(&item->object.oid));
 	}
 
 	if (!item->tagged)
-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 0/7] object.c: add and use "is expected" utility function + object_as_type() use
  2021-04-09  8:32         ` [PATCH 0/6] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
                             ` (5 preceding siblings ...)
  2021-04-09  8:32           ` [PATCH 6/6] tag.c: use type_from_string_gently() when parsing tags Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:49           ` Ævar Arnfjörð Bjarmason
  2021-04-09  8:49             ` [PATCH 1/7] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
                               ` (7 more replies)
  2021-04-20 13:00           ` [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
  7 siblings, 8 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

This goes on top of [1] and is part of a split-up [2] into more
digestable pieces.

As in [2] we reduce duplication of "object %s is a %s not a %s" in
various places by moving those messages/errors/dies to using a utility
function.

For the more meaty "don't misreport objects" change I'm planning to
submit on top of this I'm then refactoring object_as_type() to not
take a "quiet" argument. This wasn't strictly needed, but makes things
simpler.

As it turned out we had various parts of the codebase (ab)using
object_as_type() just to check if something was of a given type,
anything that deals with parsed objects can/should just use obj->type
== OBJ_{COMMIT,TREE,BLOB,TAG} instead. This leaves object_as_type() as
a low-level function for use in the object API itself.

1. http://lore.kernel.org/git/cover-0.6-0000000000-20210409T082935Z-avarab@gmail.com
2. https://lore.kernel.org/git/cover-00.11-00000000000-20210328T021238Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (7):
  tree.c: fix misindentation in parse_tree_gently()
  object.c: add a utility function for "expected type X, got Y"
  object.c: add and use oid_is_type_or_die_msg() function
  commit-graph: use obj->type, not object_as_type()
  commit.c: don't use deref_tag() -> object_as_type()
  object.c: normalize brace style in object_as_type()
  object.c: remove "quiet" parameter from object_as_type()

 blob.c                 |  2 +-
 builtin/commit-graph.c |  2 +-
 builtin/fsck.c         |  2 +-
 builtin/index-pack.c   |  9 +++-----
 combine-diff.c         |  3 +--
 commit.c               | 28 ++++++++++++++---------
 merge-recursive.c      |  5 ++++-
 object.c               | 51 ++++++++++++++++++++++++++++++++++--------
 object.h               | 10 ++++++++-
 refs.c                 |  2 +-
 t/helper/test-reach.c  |  2 +-
 tag.c                  |  2 +-
 tree.c                 | 15 +++++++------
 13 files changed, 90 insertions(+), 43 deletions(-)

-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 1/7] tree.c: fix misindentation in parse_tree_gently()
  2021-04-09  8:49           ` [PATCH 0/7] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:49             ` Ævar Arnfjörð Bjarmason
  2021-04-09  8:49             ` [PATCH 2/7] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
                               ` (6 subsequent siblings)
  7 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

The variables declared in parse_tree_gently() had a single space after
the TAB. This dates back to their introduction in bd2c39f58f9 ([PATCH]
don't load and decompress objects twice with parse_object(),
2005-05-06). Let's fix them to follow the style of the rest of the
file.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 tree.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tree.c b/tree.c
index 410e3b477e..482a37d8fa 100644
--- a/tree.c
+++ b/tree.c
@@ -123,9 +123,9 @@ int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
 
 int parse_tree_gently(struct tree *item, int quiet_on_missing)
 {
-	 enum object_type type;
-	 void *buffer;
-	 unsigned long size;
+	enum object_type type;
+	void *buffer;
+	unsigned long size;
 
 	if (item->object.parsed)
 		return 0;
-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 2/7] object.c: add a utility function for "expected type X, got Y"
  2021-04-09  8:49           ` [PATCH 0/7] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
  2021-04-09  8:49             ` [PATCH 1/7] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:49             ` Ævar Arnfjörð Bjarmason
  2021-04-09  8:49             ` [PATCH 3/7] object.c: add and use oid_is_type_or_die_msg() function Ævar Arnfjörð Bjarmason
                               ` (5 subsequent siblings)
  7 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Refactor various "Object X is not Y" error messages so that they use
the same message as the long-standing object_as_type() error
message. Now we'll consistently report e.g. that we got a commit when
we expected a tag, not just that the object is not a tag.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/index-pack.c |  9 +++------
 combine-diff.c       |  3 +--
 commit.c             | 10 ++++------
 merge-recursive.c    |  1 +
 object.c             | 25 ++++++++++++++++++++++++-
 object.h             |  5 +++++
 tree.c               |  7 ++++---
 7 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index c0e3768c32..eabd9d4677 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -218,8 +218,8 @@ static int mark_link(struct object *obj, enum object_type type,
 	if (!obj)
 		return -1;
 
-	if (type != OBJ_ANY && obj->type != type)
-		die(_("object type mismatch at %s"), oid_to_hex(&obj->oid));
+	if (type != OBJ_ANY)
+		oid_is_type_or_die(&obj->oid, obj->type, &type);
 
 	obj->flags |= FLAG_LINK;
 	return 0;
@@ -241,10 +241,7 @@ static unsigned check_object(struct object *obj)
 		if (type <= 0)
 			die(_("did not receive expected object %s"),
 			      oid_to_hex(&obj->oid));
-		if (type != obj->type)
-			die(_("object %s: expected type %s, found %s"),
-			    oid_to_hex(&obj->oid),
-			    type_name(obj->type), type_name(type));
+		oid_is_type_or_die(&obj->oid, obj->type, &type);
 		obj->flags |= FLAG_CHECKED;
 		return 1;
 	}
diff --git a/combine-diff.c b/combine-diff.c
index 06635f91bc..aa767dbb8e 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -333,8 +333,7 @@ static char *grab_blob(struct repository *r,
 		free_filespec(df);
 	} else {
 		blob = read_object_file(oid, &type, size);
-		if (type != OBJ_BLOB)
-			die("object '%s' is not a blob!", oid_to_hex(oid));
+		oid_is_type_or_die(oid, OBJ_BLOB, &type);
 	}
 	return blob;
 }
diff --git a/commit.c b/commit.c
index 8ea55a447f..b370100367 100644
--- a/commit.c
+++ b/commit.c
@@ -299,9 +299,7 @@ const void *repo_get_commit_buffer(struct repository *r,
 		if (!ret)
 			die("cannot read commit object %s",
 			    oid_to_hex(&commit->object.oid));
-		if (type != OBJ_COMMIT)
-			die("expected commit for %s, got %s",
-			    oid_to_hex(&commit->object.oid), type_name(type));
+		oid_is_type_or_die(&commit->object.oid, OBJ_COMMIT, &type);
 		if (sizep)
 			*sizep = size;
 	}
@@ -489,10 +487,10 @@ int repo_parse_commit_internal(struct repository *r,
 		return quiet_on_missing ? -1 :
 			error("Could not read %s",
 			     oid_to_hex(&item->object.oid));
-	if (type != OBJ_COMMIT) {
+	ret = oid_is_type_or_error(&item->object.oid, OBJ_COMMIT, &type);
+	if (ret) {
 		free(buffer);
-		return error("Object %s not a commit",
-			     oid_to_hex(&item->object.oid));
+		return ret;
 	}
 
 	ret = parse_commit_buffer(r, item, buffer, size, 0);
diff --git a/merge-recursive.c b/merge-recursive.c
index ed31f9496c..be7f727b5a 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2971,6 +2971,7 @@ static int read_oid_strbuf(struct merge_options *opt,
 	if (!buf)
 		return err(opt, _("cannot read object %s"), oid_to_hex(oid));
 	if (type != OBJ_BLOB) {
+		const char* msg = oid_is_type_or_die_msg(oid, OBJ_BLOB, &type);
 		free(buf);
 		return err(opt, _("object %s is not a blob"), oid_to_hex(oid));
 	}
diff --git a/object.c b/object.c
index 2216cdcda2..8b2df3a94c 100644
--- a/object.c
+++ b/object.c
@@ -159,6 +159,29 @@ void *create_object(struct repository *r, const struct object_id *oid, void *o)
 	return obj;
 }
 
+static const char *object_type_mismatch_msg = N_("object %s is a %s, not a %s");
+
+void oid_is_type_or_die(const struct object_id *oid,
+			enum object_type want,
+			enum object_type *type)
+{
+	if (want == *type)
+		return;
+	die(_(object_type_mismatch_msg), oid_to_hex(oid),
+	    type_name(*type), type_name(want));
+}
+
+int oid_is_type_or_error(const struct object_id *oid,
+			 enum object_type want,
+			 enum object_type *type)
+{
+	if (want == *type)
+		return 0;
+	return error(_(object_type_mismatch_msg),
+		     oid_to_hex(oid), type_name(*type),
+		     type_name(want));
+}
+
 void *object_as_type(struct object *obj, enum object_type type, int quiet)
 {
 	if (obj->type == type)
@@ -172,7 +195,7 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 	}
 	else {
 		if (!quiet)
-			error(_("object %s is a %s, not a %s"),
+			error(_(object_type_mismatch_msg),
 			      oid_to_hex(&obj->oid),
 			      type_name(obj->type), type_name(type));
 		return NULL;
diff --git a/object.h b/object.h
index 5e7a523e85..d2d4a236d0 100644
--- a/object.h
+++ b/object.h
@@ -124,6 +124,11 @@ void *create_object(struct repository *r, const struct object_id *oid, void *obj
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
+void oid_is_type_or_die(const struct object_id *oid, enum object_type want,
+			enum object_type *type);
+int oid_is_type_or_error(const struct object_id *oid, enum object_type want,
+			 enum object_type *type);
+
 /*
  * Returns the object, having parsed it to find out what it is.
  *
diff --git a/tree.c b/tree.c
index 482a37d8fa..6717d982fa 100644
--- a/tree.c
+++ b/tree.c
@@ -126,6 +126,7 @@ int parse_tree_gently(struct tree *item, int quiet_on_missing)
 	enum object_type type;
 	void *buffer;
 	unsigned long size;
+	int ret;
 
 	if (item->object.parsed)
 		return 0;
@@ -134,10 +135,10 @@ int parse_tree_gently(struct tree *item, int quiet_on_missing)
 		return quiet_on_missing ? -1 :
 			error("Could not read %s",
 			     oid_to_hex(&item->object.oid));
-	if (type != OBJ_TREE) {
+	ret = oid_is_type_or_error(&item->object.oid, OBJ_TREE, &type);
+	if (ret) {
 		free(buffer);
-		return error("Object %s not a tree",
-			     oid_to_hex(&item->object.oid));
+		return ret;
 	}
 	return parse_tree_buffer(item, buffer, size);
 }
-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 3/7] object.c: add and use oid_is_type_or_die_msg() function
  2021-04-09  8:49           ` [PATCH 0/7] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
  2021-04-09  8:49             ` [PATCH 1/7] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
  2021-04-09  8:49             ` [PATCH 2/7] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:49             ` Ævar Arnfjörð Bjarmason
  2021-04-09  8:49             ` [PATCH 4/7] commit-graph: use obj->type, not object_as_type() Ævar Arnfjörð Bjarmason
                               ` (4 subsequent siblings)
  7 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Add a oid_is_type_or_die_msg() function to go with the "error" and
"die" forms for emitting "expected type X, got Y" messages. This is
useful for callers that want the message itself as a char *.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 merge-recursive.c |  6 ++++--
 object.c          | 12 ++++++++++++
 object.h          |  3 +++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index be7f727b5a..2429d2cb89 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2971,9 +2971,11 @@ static int read_oid_strbuf(struct merge_options *opt,
 	if (!buf)
 		return err(opt, _("cannot read object %s"), oid_to_hex(oid));
 	if (type != OBJ_BLOB) {
-		const char* msg = oid_is_type_or_die_msg(oid, OBJ_BLOB, &type);
+		char *msg = oid_is_type_or_die_msg(oid, OBJ_BLOB, &type);
+		int ret = err(opt, msg);
 		free(buf);
-		return err(opt, _("object %s is not a blob"), oid_to_hex(oid));
+		free(msg);
+		return ret;
 	}
 	strbuf_attach(dst, buf, size, size + 1);
 	return 0;
diff --git a/object.c b/object.c
index 8b2df3a94c..1573e571de 100644
--- a/object.c
+++ b/object.c
@@ -182,6 +182,18 @@ int oid_is_type_or_error(const struct object_id *oid,
 		     type_name(want));
 }
 
+char* oid_is_type_or_die_msg(const struct object_id *oid,
+				   enum object_type want,
+				   enum object_type *type)
+{
+	struct strbuf sb = STRBUF_INIT;
+	if (want == *type)
+		BUG("call this just to get the message!");
+	strbuf_addf(&sb, _(object_type_mismatch_msg), oid_to_hex(oid),
+		    type_name(*type), type_name(want));
+	return strbuf_detach(&sb, NULL);
+}
+
 void *object_as_type(struct object *obj, enum object_type type, int quiet)
 {
 	if (obj->type == type)
diff --git a/object.h b/object.h
index d2d4a236d0..cdc3242a12 100644
--- a/object.h
+++ b/object.h
@@ -128,6 +128,9 @@ void oid_is_type_or_die(const struct object_id *oid, enum object_type want,
 			enum object_type *type);
 int oid_is_type_or_error(const struct object_id *oid, enum object_type want,
 			 enum object_type *type);
+char* oid_is_type_or_die_msg(const struct object_id *oid,
+			     enum object_type want,
+			     enum object_type *type);
 
 /*
  * Returns the object, having parsed it to find out what it is.
-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 4/7] commit-graph: use obj->type, not object_as_type()
  2021-04-09  8:49           ` [PATCH 0/7] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
                               ` (2 preceding siblings ...)
  2021-04-09  8:49             ` [PATCH 3/7] object.c: add and use oid_is_type_or_die_msg() function Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:49             ` Ævar Arnfjörð Bjarmason
  2021-04-09  8:50             ` [PATCH 5/7] commit.c: don't use deref_tag() -> object_as_type() Ævar Arnfjörð Bjarmason
                               ` (3 subsequent siblings)
  7 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change a check of a deref_tag() return value to just use obj->type
instead of object_as_type(). The object_as_type() function is for
low-level use by fsck, {commit,tree,blob,tag}.c and the like, here we
can just assume the object is fully initialized.

As can be seen in plenty of existing uses in our codebase the return
value of deref_tag() won't be an obj->type == OBJ_NONE or
!obj->parsed. Fixes code added in 2f00c355cb7 (commit-graph: drop
COMMIT_GRAPH_WRITE_CHECK_OIDS flag, 2020-05-13).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index cd86315221..347d65abc8 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -158,7 +158,7 @@ static int read_one_commit(struct oidset *commits, struct progress *progress,
 			   NULL, 0);
 	if (!result)
 		return error(_("invalid object: %s"), hash);
-	else if (object_as_type(result, OBJ_COMMIT, 1))
+	else if (result->type == OBJ_COMMIT)
 		oidset_insert(commits, &result->oid);
 
 	display_progress(progress, oidset_size(commits));
-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 5/7] commit.c: don't use deref_tag() -> object_as_type()
  2021-04-09  8:49           ` [PATCH 0/7] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
                               ` (3 preceding siblings ...)
  2021-04-09  8:49             ` [PATCH 4/7] commit-graph: use obj->type, not object_as_type() Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:50             ` Ævar Arnfjörð Bjarmason
  2021-04-09  8:50             ` [PATCH 6/7] object.c: normalize brace style in object_as_type() Ævar Arnfjörð Bjarmason
                               ` (2 subsequent siblings)
  7 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change a use of the object_as_type() function introduced in
8ff226a9d5e (add object_as_type helper for casting objects,
2014-07-13) to instead assume that we're not dealing with OBJ_NONE (or
OBJ_BAD) from deref_tag().

This makes this code easier to read, as the reader isn't wondering why
the function would need to deal with that. We're simply doing a check
of OBJ_{COMMIT,TREE,BLOB,TAG} here, not the bare-bones initialization
object_as_type() might be called on to do.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index b370100367..437a8b8548 100644
--- a/commit.c
+++ b/commit.c
@@ -31,13 +31,19 @@ const char *commit_type = "commit";
 struct commit *lookup_commit_reference_gently(struct repository *r,
 		const struct object_id *oid, int quiet)
 {
-	struct object *obj = deref_tag(r,
-				       parse_object(r, oid),
-				       NULL, 0);
+	struct object *tmp = parse_object(r, oid);
+	struct object *obj = deref_tag(r, tmp, NULL, 0);
 
 	if (!obj)
 		return NULL;
-	return object_as_type(obj, OBJ_COMMIT, quiet);
+
+	if (obj->type != OBJ_COMMIT) {
+		enum object_type want = OBJ_COMMIT;
+		if (!quiet)
+			oid_is_type_or_error(oid, OBJ_COMMIT, &want);
+		return NULL;
+	}
+	return (struct commit *)obj;
 }
 
 struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid)
-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 6/7] object.c: normalize brace style in object_as_type()
  2021-04-09  8:49           ` [PATCH 0/7] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
                               ` (4 preceding siblings ...)
  2021-04-09  8:50             ` [PATCH 5/7] commit.c: don't use deref_tag() -> object_as_type() Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:50             ` Ævar Arnfjörð Bjarmason
  2021-04-09  8:50             ` [PATCH 7/7] object.c: remove "quiet" parameter from object_as_type() Ævar Arnfjörð Bjarmason
  2021-04-20 13:36             ` [PATCH v2 0/8] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
  7 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Normalize the brace style in this function introduced in
8ff226a9d5e (add object_as_type helper for casting objects,
2014-07-13) to be in line with the coding style of the project.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/object.c b/object.c
index 1573e571de..e3400d1039 100644
--- a/object.c
+++ b/object.c
@@ -196,16 +196,15 @@ char* oid_is_type_or_die_msg(const struct object_id *oid,
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet)
 {
-	if (obj->type == type)
+	if (obj->type == type) {
 		return obj;
-	else if (obj->type == OBJ_NONE) {
+	} else if (obj->type == OBJ_NONE) {
 		if (type == OBJ_COMMIT)
 			init_commit_node((struct commit *) obj);
 		else
 			obj->type = type;
 		return obj;
-	}
-	else {
+	} else {
 		if (!quiet)
 			error(_(object_type_mismatch_msg),
 			      oid_to_hex(&obj->oid),
-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 7/7] object.c: remove "quiet" parameter from object_as_type()
  2021-04-09  8:49           ` [PATCH 0/7] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
                               ` (5 preceding siblings ...)
  2021-04-09  8:50             ` [PATCH 6/7] object.c: normalize brace style in object_as_type() Ævar Arnfjörð Bjarmason
@ 2021-04-09  8:50             ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:36             ` [PATCH v2 0/8] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
  7 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09  8:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Remove the now-unused "quiet" parameter from object_as_type(). As
shown in preceding commits the previous users of this parameter were
better off with higher-level APIs.

The "quiet" parameter was originally introduced when the
object_as_type() function was added in 8ff226a9d5e (add object_as_type
helper for casting objects,.  2014-07-13), but the commit.c use-case
for it is now gone.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 blob.c                | 2 +-
 builtin/fsck.c        | 2 +-
 commit.c              | 8 +++++---
 object.c              | 9 ++++-----
 object.h              | 2 +-
 refs.c                | 2 +-
 t/helper/test-reach.c | 2 +-
 tag.c                 | 2 +-
 tree.c                | 2 +-
 9 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/blob.c b/blob.c
index 389a7546dc..b5bd27844e 100644
--- a/blob.c
+++ b/blob.c
@@ -10,7 +10,7 @@ struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_blob_node(r));
-	return object_as_type(obj, OBJ_BLOB, 0);
+	return object_as_type(obj, OBJ_BLOB);
 }
 
 int parse_blob_buffer(struct blob *item)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 70ff95837a..5d534cf218 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -221,7 +221,7 @@ static void mark_unreachable_referents(const struct object_id *oid)
 		enum object_type type = oid_object_info(the_repository,
 							&obj->oid, NULL);
 		if (type > 0)
-			object_as_type(obj, type, 0);
+			object_as_type(obj, type);
 	}
 
 	options.walk = mark_used;
diff --git a/commit.c b/commit.c
index 437a8b8548..3014559d66 100644
--- a/commit.c
+++ b/commit.c
@@ -38,9 +38,11 @@ struct commit *lookup_commit_reference_gently(struct repository *r,
 		return NULL;
 
 	if (obj->type != OBJ_COMMIT) {
-		enum object_type want = OBJ_COMMIT;
+		if (obj->type <= 0)
+			BUG("noes");
 		if (!quiet)
-			oid_is_type_or_error(oid, OBJ_COMMIT, &want);
+			fprintf(stderr, "noes ohes");/*
+			oid_is_type_or_error(oid, OBJ_COMMIT, &obj->type);*/
 		return NULL;
 	}
 	return (struct commit *)obj;
@@ -68,7 +70,7 @@ struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_commit_node(r));
-	return object_as_type(obj, OBJ_COMMIT, 0);
+	return object_as_type(obj, OBJ_COMMIT);
 }
 
 struct commit *lookup_commit_reference_by_name(const char *name)
diff --git a/object.c b/object.c
index e3400d1039..715e358603 100644
--- a/object.c
+++ b/object.c
@@ -194,7 +194,7 @@ char* oid_is_type_or_die_msg(const struct object_id *oid,
 	return strbuf_detach(&sb, NULL);
 }
 
-void *object_as_type(struct object *obj, enum object_type type, int quiet)
+void *object_as_type(struct object *obj, enum object_type type)
 {
 	if (obj->type == type) {
 		return obj;
@@ -205,10 +205,9 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 			obj->type = type;
 		return obj;
 	} else {
-		if (!quiet)
-			error(_(object_type_mismatch_msg),
-			      oid_to_hex(&obj->oid),
-			      type_name(obj->type), type_name(type));
+		error(_(object_type_mismatch_msg),
+		      oid_to_hex(&obj->oid),
+		      type_name(obj->type), type_name(type));
 		return NULL;
 	}
 }
diff --git a/object.h b/object.h
index cdc3242a12..61857ee48c 100644
--- a/object.h
+++ b/object.h
@@ -122,7 +122,7 @@ struct object *lookup_object(struct repository *r, const struct object_id *oid);
 
 void *create_object(struct repository *r, const struct object_id *oid, void *obj);
 
-void *object_as_type(struct object *obj, enum object_type type, int quiet);
+void *object_as_type(struct object *obj, enum object_type type);
 
 void oid_is_type_or_die(const struct object_id *oid, enum object_type want,
 			enum object_type *type);
diff --git a/refs.c b/refs.c
index 261fd82beb..7f4ca3441c 100644
--- a/refs.c
+++ b/refs.c
@@ -341,7 +341,7 @@ enum peel_status peel_object(const struct object_id *name, struct object_id *oid
 
 	if (o->type == OBJ_NONE) {
 		int type = oid_object_info(the_repository, name, NULL);
-		if (type < 0 || !object_as_type(o, type, 0))
+		if (type < 0 || !object_as_type(o, type))
 			return PEEL_INVALID;
 	}
 
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index cda804ed79..c9fd74b21f 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -67,7 +67,7 @@ int cmd__reach(int ac, const char **av)
 			die("failed to load commit for input %s resulting in oid %s\n",
 			    buf.buf, oid_to_hex(&oid));
 
-		c = object_as_type(peeled, OBJ_COMMIT, 0);
+		c = object_as_type(peeled, OBJ_COMMIT);
 
 		if (!c)
 			die("failed to load commit for input %s resulting in oid %s\n",
diff --git a/tag.c b/tag.c
index 871c4c9a14..e750b00cf5 100644
--- a/tag.c
+++ b/tag.c
@@ -104,7 +104,7 @@ struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_tag_node(r));
-	return object_as_type(obj, OBJ_TAG, 0);
+	return object_as_type(obj, OBJ_TAG);
 }
 
 static timestamp_t parse_tag_date(const char *buf, const char *tail)
diff --git a/tree.c b/tree.c
index 6717d982fa..0bd18abd64 100644
--- a/tree.c
+++ b/tree.c
@@ -107,7 +107,7 @@ struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_tree_node(r));
-	return object_as_type(obj, OBJ_TREE, 0);
+	return object_as_type(obj, OBJ_TREE);
 }
 
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
-- 
2.31.1.592.gdf54ba9003


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/2] blob.c: remove buffer & size arguments to parse_blob_buffer()
  2021-04-09  8:07         ` [PATCH 1/2] blob.c: remove buffer & size arguments to parse_blob_buffer() Ævar Arnfjörð Bjarmason
@ 2021-04-09 17:51           ` Jeff King
  2021-04-09 22:31             ` Junio C Hamano
  2021-04-10 12:57             ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 142+ messages in thread
From: Jeff King @ 2021-04-09 17:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Taylor Blau, Elijah Newren

On Fri, Apr 09, 2021 at 10:07:27AM +0200, Ævar Arnfjörð Bjarmason wrote:

> As noted in the comment introduced in 837d395a5c0 (Replace
> parse_blob() with an explanatory comment, 2010-01-18) the old
> parse_blob() function and the current parse_blob_buffer() exist merely
> to provide consistency in the API.
> 
> We're not going to parse blobs like we "parse" commits, trees or
> tags. So let's not have the parse_blob_buffer() take arguments that
> pretends that we do. Its only use is to set the "parsed" flag.
> 
> See bd2c39f58f9 ([PATCH] don't load and decompress objects twice with
> parse_object(), 2005-05-06) for the introduction of parse_blob_buffer().

OK. Calling it parse_blob_buffer() is a little silly since it doesn't
even take a buffer anymore. But I guess parse_blob() might imply that it
actually loads the contents from disk to check them (which the other
parse_foo() functions do), so that's not a good name.

So this might be the least bad thing. Given that there are only two
callers, just setting blob->object.parsed might not be unreasonable,
either. But I don't think it's worth spending too much time on.

> @@ -266,7 +266,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
>  			error(_("hash mismatch %s"), oid_to_hex(oid));
>  			return NULL;
>  		}
> -		parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
> +		parse_blob_buffer(lookup_blob(r, oid));
>  		return lookup_object(r, oid);

Not new in your patch, but I wondered if this could cause a segfault
when lookup_blob() returns NULL. I _think_ the answer is "no". We'd hit
this code path when either:

  - lookup_object() returns an object with type OBJ_BLOB, in which case
    lookup_blob() would return that same object

  - lookup_object() returned NULL, in which case lookup_blob() will call
    it again, get NULL again, and then auto-create the blob and return
    it

So I think it is OK. But there are a bunch of duplicate hash lookups in
this code. It would be clearer and more efficient as:

diff --git a/object.c b/object.c
index 2c32691dc4..2dfa038f13 100644
--- a/object.c
+++ b/object.c
@@ -262,12 +262,14 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
+		if (!obj)
+			obj = create_object(r, oid, alloc_blob_node(r));
 		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
-		parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
-		return lookup_object(r, oid);
+		parse_blob_buffer(obj, NULL, 0);
+		return obj;
 	}
 
 	buffer = repo_read_object_file(r, oid, &type, &size);

but I doubt the efficiency matters much in practice. Those hash lookups
will be lost in the noise of computing the hash of the blob contents.

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/2] object.c: initialize automatic variable in lookup_object()
  2021-04-09  8:07         ` [PATCH 2/2] object.c: initialize automatic variable in lookup_object() Ævar Arnfjörð Bjarmason
@ 2021-04-09 17:53           ` Jeff King
  2021-04-09 22:32             ` Junio C Hamano
  0 siblings, 1 reply; 142+ messages in thread
From: Jeff King @ 2021-04-09 17:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Taylor Blau, Elijah Newren

On Fri, Apr 09, 2021 at 10:07:28AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Initialize a "struct object obj*" variable to NULL explicitly and
> return it instead of leaving it uninitialized until the "while"
> loop.
> 
> There was no bug here, it's just less confusing when debugging if the
> "obj" is either NULL or a valid object, not some random invalid
> pointer.
>
> [...]
>
>  struct object *lookup_object(struct repository *r, const struct object_id *oid)
>  {
>  	unsigned int i, first;
> -	struct object *obj;
> +	struct object *obj = NULL;
>  
>  	if (!r->parsed_objects->obj_hash)
> -		return NULL;
> +		return obj;

I actually prefer the original style (where any "can we bail early"
checks just explicitly return NULL, rather than making you check to see
that obj is NULL). But it's pretty subjective, and I don't feel
strongly.

>  	first = i = hash_obj(oid, r->parsed_objects->obj_hash_size);
>  	while ((obj = r->parsed_objects->obj_hash[i]) != NULL) {

The important thing is that "obj" is not used uninitialized, which it
isn't (before or after).

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/6] object.c: stop supporting len == -1 in type_from_string_gently()
  2021-04-09  8:32           ` [PATCH 1/6] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
@ 2021-04-09 18:06             ` Jeff King
  2021-04-09 18:10               ` Jeff King
  0 siblings, 1 reply; 142+ messages in thread
From: Jeff King @ 2021-04-09 18:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Taylor Blau, Elijah Newren

On Fri, Apr 09, 2021 at 10:32:49AM +0200, Ævar Arnfjörð Bjarmason wrote:

> diff --git a/object.c b/object.c
> index 7fdca3ed1e..88de01e5ac 100644
> --- a/object.c
> +++ b/object.c
> @@ -39,9 +39,6 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
>  {
>  	int i;
>  
> -	if (len < 0)
> -		len = strlen(str);
> -

The "ssize_t len" in the parameters could become a size_t now, right?

Not strictly necessary, but in theory it may help static analysis catch
a caller who mistakenly tries to pass -1 (though in practice I suspect
it does not help that much, because any of gcc's sign-conversion
warnings generate far too much noise to be useful with our current
codebase).

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/6] object.c: stop supporting len == -1 in type_from_string_gently()
  2021-04-09 18:06             ` Jeff King
@ 2021-04-09 18:10               ` Jeff King
  0 siblings, 0 replies; 142+ messages in thread
From: Jeff King @ 2021-04-09 18:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Taylor Blau, Elijah Newren

On Fri, Apr 09, 2021 at 02:06:51PM -0400, Jeff King wrote:

> On Fri, Apr 09, 2021 at 10:32:49AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> > diff --git a/object.c b/object.c
> > index 7fdca3ed1e..88de01e5ac 100644
> > --- a/object.c
> > +++ b/object.c
> > @@ -39,9 +39,6 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
> >  {
> >  	int i;
> >  
> > -	if (len < 0)
> > -		len = strlen(str);
> > -
> 
> The "ssize_t len" in the parameters could become a size_t now, right?
> 
> Not strictly necessary, but in theory it may help static analysis catch
> a caller who mistakenly tries to pass -1 (though in practice I suspect
> it does not help that much, because any of gcc's sign-conversion
> warnings generate far too much noise to be useful with our current
> codebase).

Actually, seeing patch 2, which changes the signature, mostly deals with
this. The compiler would complain about any existing calls because of
dropping the "gentle" parameter (it is up to the human to realize that
they need to make sure we are not passing a negative len, but hopefully
they would look at the other commits at that point).

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/6] object.c: remove "gently" argument to type_from_string_gently()
  2021-04-09  8:32           ` [PATCH 2/6] object.c: remove "gently" argument to type_from_string_gently() Ævar Arnfjörð Bjarmason
@ 2021-04-09 18:10             ` Jeff King
  0 siblings, 0 replies; 142+ messages in thread
From: Jeff King @ 2021-04-09 18:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Taylor Blau, Elijah Newren

On Fri, Apr 09, 2021 at 10:32:50AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Get rid of the "gently" argument to type_from_string_gently() to make
> it consistent with most other *_gently() functions. It's already a
> "gentle" function, it shouldn't need a boolean argument telling it to
> be gentle.
> 
> The reason it had a "gentle" parameter was because until the preceding
> commit "type_from_string()" was a macro resolving to
> "type_from_string_gently()", it's now a function.
> 
> This refactoring of adding a third parameter was done in
> fe8e3b71805 (Refactor type_from_string() to allow continuing after
> detecting an error, 2014-09-10) in preparation for its use in
> fsck.c.
> 
> Simplifying this means we can move the die() into the simpler
> type_from_string() function.

Yeah, this makes sense (and the change of signature will catch any
callers in topics-in-flight).

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 3/6] object.c: make type_from_string() return "enum object_type"
  2021-04-09  8:32           ` [PATCH 3/6] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
@ 2021-04-09 18:14             ` Jeff King
  2021-04-09 19:42               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 142+ messages in thread
From: Jeff King @ 2021-04-09 18:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Taylor Blau, Elijah Newren

On Fri, Apr 09, 2021 at 10:32:51AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Change the type_from_string*() functions to return an "enum
> object_type", but don't refactor their callers to check for "==
> OBJ_BAD" instead of "< 0".
> 
> Refactoring the check of the return value to check == OBJ_BAD would
> now be equivalent to "ret < 0", but the consensus on an earlier
> version of this patch was to not do that, and to instead use -1
> consistently as a return value. It just so happens that OBJ_BAD == -1,
> but let's not put a hard reliance on that.

I think what the patch is doing is good, but this rationale misses the
main point of that discussion, I think. I doubt that the value of
OBJ_BAD would ever change. But the point was that we could grow a new
"failure" value at "-2", and we would want to catch here (I do consider
it relatively unlikely, but that IMHO is the reason to keep the negative
check).

I think for the same reason that "return OBJ_BAD" instead of "return -1"
would be just fine (it is not "just so happens" that OBJ_BAD is
negative; that was deliberate to allow exactly this convention). But I
am also OK with leaving the "return -1" calls.

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 4/6] object-file.c: make oid_object_info() return "enum object_type"
  2021-04-09  8:32           ` [PATCH 4/6] object-file.c: make oid_object_info() " Ævar Arnfjörð Bjarmason
@ 2021-04-09 18:24             ` Jeff King
  0 siblings, 0 replies; 142+ messages in thread
From: Jeff King @ 2021-04-09 18:24 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Taylor Blau, Elijah Newren

On Fri, Apr 09, 2021 at 10:32:52AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Change oid_object_info() to return an "enum object_type". Unlike
> oid_object_info_extended() function the simpler oid_object_info()
> explicitly returns the oi.typep member, which is itself an "enum
> object_type".

OK. I don't think there is much difference from the compiler perspective
(because of the equivalence of enums and ints in C), but it gives a clue
to the reader about how the value is meant to be interpreted.

> @@ -405,10 +404,10 @@ static int repo_collect_ambiguous(struct repository *r,
>  static int sort_ambiguous(const void *a, const void *b, void *ctx)
>  {
>  	struct repository *sort_ambiguous_repo = ctx;
> -	int a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
> -	int b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
> -	int a_type_sort;
> -	int b_type_sort;
> +	enum object_type a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
> +	enum object_type b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
> +	enum object_type a_type_sort;
> +	enum object_type b_type_sort;

Not new in your patch, but the way this function uses modulo is
interesting:

  a_type_sort = a_type % 4;
  b_type_sort = b_type % 4;

What happens if we got OBJ_BAD as one of the results? We are not
indexing any arrays here, so I guess the worst case is that we sort
things in a weird way (and presumably we'd barf later when trying to
show the output anyway).

> --- a/object-store.h
> +++ b/object-store.h
> @@ -208,7 +208,9 @@ static inline void *repo_read_object_file(struct repository *r,
>  #endif
>  
>  /* Read and unpack an object file into memory, write memory to an object file */
> -int oid_object_info(struct repository *r, const struct object_id *, unsigned long *);
> +enum object_type oid_object_info(struct repository *r,
> +				 const struct object_id *,
> +				 unsigned long *);

Also not new in your patch, but that comment sure is misleading. :)

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 5/6] object-name.c: make dependency on object_type order more obvious
  2021-04-09  8:32           ` [PATCH 5/6] object-name.c: make dependency on object_type order more obvious Ævar Arnfjörð Bjarmason
@ 2021-04-09 18:36             ` Jeff King
  0 siblings, 0 replies; 142+ messages in thread
From: Jeff King @ 2021-04-09 18:36 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Taylor Blau, Elijah Newren

On Fri, Apr 09, 2021 at 10:32:53AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Add an assert to make it more obvious that we were effectively
> hardcoding OBJ_TAG in sort_ambiguous() as "4".
> 
> I wrote this code in 5cc044e0257 (get_short_oid: sort ambiguous
> objects by type, then SHA-1, 2018-05-10), there was already a comment
> about this magic, but let's make sure that someone doing a potential
> reordering of "enum object_type" in the future would notice it
> breaking this function (and probably a bunch of other things...).

Yeah, those object type values are used for the on-disk formats, so
quite a lot would break.

> @@ -408,6 +408,8 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
>  	enum object_type b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
>  	enum object_type a_type_sort;
>  	enum object_type b_type_sort;
> +	const enum object_type tag_type_offs = OBJ_TAG - OBJ_NONE;
> +	assert(tag_type_offs == 4);

This protects us against shifting of the values or reordering within the
main 4 types, but it doesn't protect against new types, nor reordering
in which the main 4 types are no longer contiguous. E.g.:

  enum object_type {
          OBJ_NONE = 0,
	  OBJ_REF_DELTA = 1,
	  OBJ_OFS_DELTA = 2,
	  OBJ_COMMIT = 3,
	  OBJ_TAG = 4,
	  OBJ_BLOB = 5,
	  OBJ_TREE = 6,
  };

would be wrong. I dunno. I guess in some sense I am glad to see an
attempt at automated enforcement of assumptions. But I think if we are
worried about the object_type enum changing, we'd do better to write
this function in a less-clever way.

  /* sort tags before anything else */
  if (a_type == OBJ_TAG)
          a_type = 0;
  if (b_type == OBJ_TAG)
          b_type = 0;

Of course that is still assuming that normal values are all positive,
but that seems reasonable. If you really wanted to be agnostic, you
could assign the minimum value.  But you can't easily know that for an
enum. So you'd want to store them as ints (reversing your previous
commit!) and using INT_MIN.

The conditional probably performs less well in a tight loop, but I doubt
that matters for the size of array we expect to sort (if we cared about
performance we would not call oid_object_info() twice inside the
comparator!).

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 6/6] tag.c: use type_from_string_gently() when parsing tags
  2021-04-09  8:32           ` [PATCH 6/6] tag.c: use type_from_string_gently() when parsing tags Ævar Arnfjörð Bjarmason
@ 2021-04-09 18:42             ` Jeff King
  0 siblings, 0 replies; 142+ messages in thread
From: Jeff King @ 2021-04-09 18:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Taylor Blau, Elijah Newren

On Fri, Apr 09, 2021 at 10:32:54AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Change a series of strcmp() to instead use type_from_string_gently()
> to get the integer type early, and then use that for comparison.

The result looks much nicer.

One interesting note...

> @@ -162,23 +162,24 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
>  		return -1;
>  	bufptr += 5;
>  	nl = memchr(bufptr, '\n', tail - bufptr);
> -	if (!nl || sizeof(type) <= (nl - bufptr))
> +	if (!nl)
> +		return -1;
> +	type = type_from_string_gently(bufptr, nl - bufptr);
> +	if (type < 0)
>  		return -1;

If we got anything but the main-4 types here, we'll return an error.
So we know here:

> -	if (!strcmp(type, blob_type)) {
> +	if (type == OBJ_BLOB) {
>  		item->tagged = (struct object *)lookup_blob(r, &oid);
> -	} else if (!strcmp(type, tree_type)) {
> +	} else if (type == OBJ_TREE) {
>  		item->tagged = (struct object *)lookup_tree(r, &oid);
> -	} else if (!strcmp(type, commit_type)) {
> +	} else if (type == OBJ_COMMIT) {
>  		item->tagged = (struct object *)lookup_commit(r, &oid);
> -	} else if (!strcmp(type, tag_type)) {
> +	} else if (type == OBJ_TAG) {
>  		item->tagged = (struct object *)lookup_tag(r, &oid);
>  	} else {
>  		return error("unknown tag type '%s' in %s",
> -			     type, oid_to_hex(&item->object.oid));
> +			     type_name(type), oid_to_hex(&item->object.oid));
>  	}

That the final "else" clause can't be reached. I don't mind being
defensive, but if it _is_ reached, then we'd feed that unknown type to
type_name(), which will return NULL for unknown items (unless I guess it
has also learned about the hypothetical new type).

I think this should just be:

  else {
          BUG("type_from_string gave us an unknown type: %d", (int)type);
  }

which makes it clear we expect the code can't be reached, and doesn't
make any assumptions about what we can do with the odd value.

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 3/6] object.c: make type_from_string() return "enum object_type"
  2021-04-09 18:14             ` Jeff King
@ 2021-04-09 19:42               ` Ævar Arnfjörð Bjarmason
  2021-04-09 21:29                 ` Jeff King
  0 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-09 19:42 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Johannes Schindelin, Taylor Blau, Elijah Newren


On Fri, Apr 09 2021, Jeff King wrote:

> On Fri, Apr 09, 2021 at 10:32:51AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Change the type_from_string*() functions to return an "enum
>> object_type", but don't refactor their callers to check for "==
>> OBJ_BAD" instead of "< 0".
>> 
>> Refactoring the check of the return value to check == OBJ_BAD would
>> now be equivalent to "ret < 0", but the consensus on an earlier
>> version of this patch was to not do that, and to instead use -1
>> consistently as a return value. It just so happens that OBJ_BAD == -1,
>> but let's not put a hard reliance on that.
>
> I think what the patch is doing is good, but this rationale misses the
> main point of that discussion, I think. I doubt that the value of
> OBJ_BAD would ever change. But the point was that we could grow a new
> "failure" value at "-2", and we would want to catch here (I do consider
> it relatively unlikely, but that IMHO is the reason to keep the negative
> check).
>
> I think for the same reason that "return OBJ_BAD" instead of "return -1"
> would be just fine (it is not "just so happens" that OBJ_BAD is
> negative; that was deliberate to allow exactly this convention). But I
> am also OK with leaving the "return -1" calls.

I'm beginning to think in response to this and the comment on 5/6 that
it might be cleaner to split up the object_type enum, as demonstrated
for a config.[ch] feature in [1].

Converting back and forth between them is a bit nasty, and having
multiple interchangable OBJ_* constants with identical values just to
satisfy them being in different enums, but it would allow having the
compiler explicitly help check that callers cover all possible cases of
values they could get.

Most callers just get OBJ_{COMMIT,TREE,BLOB,TAG} some more get that plus
OBJ_{BAD,NONE}, almost nobody gets OBJ_{OFS,REF}_DELTA, and AFAICT just
the peel code cares about OBJ_ANY. We then have an OBJ_MAX nobody's ever
used for anything (I've got some unsubmitted patch somewhere to remove
it).

What do you think about that sort of approach? I haven't convinced
myself that it's a good idea, so far I just thought bridging the gap of
things that return "enum" actually having that as part of their
signature for human legibility, even if C itself doesn't care about the
difference, and we currently can't get much/any of the benefits of the
compiler catching non-exhaustive "case" statements (unless every
callsite is to include OBJ_OFS etc.).

1. https://lore.kernel.org/git/875z0wicmp.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 3/6] object.c: make type_from_string() return "enum object_type"
  2021-04-09 19:42               ` Ævar Arnfjörð Bjarmason
@ 2021-04-09 21:29                 ` Jeff King
  0 siblings, 0 replies; 142+ messages in thread
From: Jeff King @ 2021-04-09 21:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Taylor Blau, Elijah Newren

On Fri, Apr 09, 2021 at 09:42:17PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I think what the patch is doing is good, but this rationale misses the
> > main point of that discussion, I think. I doubt that the value of
> > OBJ_BAD would ever change. But the point was that we could grow a new
> > "failure" value at "-2", and we would want to catch here (I do consider
> > it relatively unlikely, but that IMHO is the reason to keep the negative
> > check).
> >
> > I think for the same reason that "return OBJ_BAD" instead of "return -1"
> > would be just fine (it is not "just so happens" that OBJ_BAD is
> > negative; that was deliberate to allow exactly this convention). But I
> > am also OK with leaving the "return -1" calls.
> 
> I'm beginning to think in response to this and the comment on 5/6 that
> it might be cleaner to split up the object_type enum, as demonstrated
> for a config.[ch] feature in [1].
> 
> Converting back and forth between them is a bit nasty, and having
> multiple interchangable OBJ_* constants with identical values just to
> satisfy them being in different enums, but it would allow having the
> compiler explicitly help check that callers cover all possible cases of
> values they could get.
> 
> Most callers just get OBJ_{COMMIT,TREE,BLOB,TAG} some more get that plus
> OBJ_{BAD,NONE}, almost nobody gets OBJ_{OFS,REF}_DELTA, and AFAICT just
> the peel code cares about OBJ_ANY. We then have an OBJ_MAX nobody's ever
> used for anything (I've got some unsubmitted patch somewhere to remove
> it).
> 
> What do you think about that sort of approach? I haven't convinced
> myself that it's a good idea, so far I just thought bridging the gap of
> things that return "enum" actually having that as part of their
> signature for human legibility, even if C itself doesn't care about the
> difference, and we currently can't get much/any of the benefits of the
> compiler catching non-exhaustive "case" statements (unless every
> callsite is to include OBJ_OFS etc.).

I suspect you'll end up with a lot of awkward spots. I do agree that
_most_ callers only care about getting one of the actual 4 object types,
or an error. But those values are tied to the delta ones internally
(when we see a delta type, and then later decide to promote it to the
"real" type). And of course those are all used to read and write the
on-disk bits, too.

So while there might be some way of doing this cleaner, it hasn't
historically been a place we've seen a lot of problems (at least not
that I recall). So it seems like a pretty deep rabbit hole that is not
likely to give a lot of benefit.

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/2] blob.c: remove buffer & size arguments to parse_blob_buffer()
  2021-04-09 17:51           ` Jeff King
@ 2021-04-09 22:31             ` Junio C Hamano
  2021-04-10 12:57             ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 142+ messages in thread
From: Junio C Hamano @ 2021-04-09 22:31 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Johannes Schindelin,
	Taylor Blau, Elijah Newren

Jeff King <peff@peff.net> writes:

> OK. Calling it parse_blob_buffer() is a little silly since it doesn't
> even take a buffer anymore. But I guess parse_blob() might imply that it
> actually loads the contents from disk to check them (which the other
> parse_foo() functions do), so that's not a good name.

mark_object_as_parsed(), perhaps?

> So this might be the least bad thing. Given that there are only two
> callers, just setting blob->object.parsed might not be unreasonable,
> either. But I don't think it's worth spending too much time on.

Yup.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/2] object.c: initialize automatic variable in lookup_object()
  2021-04-09 17:53           ` Jeff King
@ 2021-04-09 22:32             ` Junio C Hamano
  0 siblings, 0 replies; 142+ messages in thread
From: Junio C Hamano @ 2021-04-09 22:32 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Johannes Schindelin,
	Taylor Blau, Elijah Newren

Jeff King <peff@peff.net> writes:

>>  struct object *lookup_object(struct repository *r, const struct object_id *oid)
>>  {
>>  	unsigned int i, first;
>> -	struct object *obj;
>> +	struct object *obj = NULL;
>>  
>>  	if (!r->parsed_objects->obj_hash)
>> -		return NULL;
>> +		return obj;
>
> I actually prefer the original style (where any "can we bail early"
> checks just explicitly return NULL, rather than making you check to see
> that obj is NULL).

Perhaps I should return "me too" here.




^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/2] blob.c: remove buffer & size arguments to parse_blob_buffer()
  2021-04-09 17:51           ` Jeff King
  2021-04-09 22:31             ` Junio C Hamano
@ 2021-04-10 12:57             ` Ævar Arnfjörð Bjarmason
  2021-04-10 13:01               ` Ævar Arnfjörð Bjarmason
  2021-04-13  8:25               ` Jeff King
  1 sibling, 2 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-10 12:57 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Johannes Schindelin, Taylor Blau, Elijah Newren


On Fri, Apr 09 2021, Jeff King wrote:

> On Fri, Apr 09, 2021 at 10:07:27AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> As noted in the comment introduced in 837d395a5c0 (Replace
>> parse_blob() with an explanatory comment, 2010-01-18) the old
>> parse_blob() function and the current parse_blob_buffer() exist merely
>> to provide consistency in the API.
>> 
>> We're not going to parse blobs like we "parse" commits, trees or
>> tags. So let's not have the parse_blob_buffer() take arguments that
>> pretends that we do. Its only use is to set the "parsed" flag.
>> 
>> See bd2c39f58f9 ([PATCH] don't load and decompress objects twice with
>> parse_object(), 2005-05-06) for the introduction of parse_blob_buffer().
>
> OK. Calling it parse_blob_buffer() is a little silly since it doesn't
> even take a buffer anymore. But I guess parse_blob() might imply that it
> actually loads the contents from disk to check them (which the other
> parse_foo() functions do), so that's not a good name.
>
> So this might be the least bad thing. Given that there are only two
> callers, just setting blob->object.parsed might not be unreasonable,
> either. But I don't think it's worth spending too much time on.
>
>> @@ -266,7 +266,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
>>  			error(_("hash mismatch %s"), oid_to_hex(oid));
>>  			return NULL;
>>  		}
>> -		parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
>> +		parse_blob_buffer(lookup_blob(r, oid));
>>  		return lookup_object(r, oid);
>
> Not new in your patch, but I wondered if this could cause a segfault
> when lookup_blob() returns NULL. I _think_ the answer is "no". We'd hit
> this code path when either:
>
>   - lookup_object() returns an object with type OBJ_BLOB, in which case
>     lookup_blob() would return that same object
>
>   - lookup_object() returned NULL, in which case lookup_blob() will call
>     it again, get NULL again, and then auto-create the blob and return
>     it
>
> So I think it is OK. But there are a bunch of duplicate hash lookups in
> this code. It would be clearer and more efficient as:
>
> diff --git a/object.c b/object.c
> index 2c32691dc4..2dfa038f13 100644
> --- a/object.c
> +++ b/object.c
> @@ -262,12 +262,14 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
>  	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
>  	    (!obj && repo_has_object_file(r, oid) &&
>  	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
> +		if (!obj)
> +			obj = create_object(r, oid, alloc_blob_node(r));
>  		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
>  			error(_("hash mismatch %s"), oid_to_hex(oid));
>  			return NULL;
>  		}
> -		parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
> -		return lookup_object(r, oid);
> +		parse_blob_buffer(obj, NULL, 0);
> +		return obj;
>  	}
>  
>  	buffer = repo_read_object_file(r, oid, &type, &size);
>
> but I doubt the efficiency matters much in practice. Those hash lookups
> will be lost in the noise of computing the hash of the blob contents.

I was trying to keep the changes smaller, but what about just doing this?:

diff --git a/blob.c b/blob.c
index 182718aba9..69293e7d8e 100644
--- a/blob.c
+++ b/blob.c
@@ -5,16 +5,16 @@
 
 const char *blob_type = "blob";
 
+struct blob *create_blob(struct repository *r, const struct object_id *oid)
+{
+	return create_object(r, oid, alloc_blob_node(r));
+}
+
 struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
-		return create_object(r, oid, alloc_blob_node(r));
-	return object_as_type(obj, OBJ_BLOB, 0);
-}
+		return create_blob(r, oid);
 
-int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
-{
-	item->object.parsed = 1;
-	return 0;
+	return object_as_type(obj, OBJ_BLOB, 0);
 }
diff --git a/blob.h b/blob.h
index 1664872055..ad34f0e9cc 100644
--- a/blob.h
+++ b/blob.h
@@ -9,10 +9,9 @@ struct blob {
 	struct object object;
 };
 
+struct blob *create_blob(struct repository *r, const struct object_id *oid);
 struct blob *lookup_blob(struct repository *r, const struct object_id *oid);
 
-int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size);
-
 /**
  * Blobs do not contain references to other objects and do not have
  * structured data that needs parsing. However, code may use the
diff --git a/object.c b/object.c
index 78343781ae..2699431404 100644
--- a/object.c
+++ b/object.c
@@ -195,8 +195,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 	if (type == OBJ_BLOB) {
 		struct blob *blob = lookup_blob(r, oid);
 		if (blob) {
-			if (parse_blob_buffer(blob, buffer, size))
-				return NULL;
+			blob->object.parsed = 1;
 			obj = &blob->object;
 		}
 	} else if (type == OBJ_TREE) {
@@ -262,12 +261,16 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
+		struct blob *blob;
+		if (!obj)
+			blob = create_blob(r, oid);
 		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
-		parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
-		return lookup_object(r, oid);
+		obj = &blob->object;
+		obj->parsed = 1;
+		return obj;
 	}
 
 	buffer = repo_read_object_file(r, oid, &type, &size);

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/2] blob.c: remove buffer & size arguments to parse_blob_buffer()
  2021-04-10 12:57             ` Ævar Arnfjörð Bjarmason
@ 2021-04-10 13:01               ` Ævar Arnfjörð Bjarmason
  2021-04-13  8:25               ` Jeff King
  1 sibling, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-10 13:01 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Johannes Schindelin, Taylor Blau, Elijah Newren


On Sat, Apr 10 2021, Ævar Arnfjörð Bjarmason wrote:

> On Fri, Apr 09 2021, Jeff King wrote:
>
>> On Fri, Apr 09, 2021 at 10:07:27AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>>> As noted in the comment introduced in 837d395a5c0 (Replace
>>> parse_blob() with an explanatory comment, 2010-01-18) the old
>>> parse_blob() function and the current parse_blob_buffer() exist merely
>>> to provide consistency in the API.
>>> 
>>> We're not going to parse blobs like we "parse" commits, trees or
>>> tags. So let's not have the parse_blob_buffer() take arguments that
>>> pretends that we do. Its only use is to set the "parsed" flag.
>>> 
>>> See bd2c39f58f9 ([PATCH] don't load and decompress objects twice with
>>> parse_object(), 2005-05-06) for the introduction of parse_blob_buffer().
>>
>> OK. Calling it parse_blob_buffer() is a little silly since it doesn't
>> even take a buffer anymore. But I guess parse_blob() might imply that it
>> actually loads the contents from disk to check them (which the other
>> parse_foo() functions do), so that's not a good name.
>>
>> So this might be the least bad thing. Given that there are only two
>> callers, just setting blob->object.parsed might not be unreasonable,
>> either. But I don't think it's worth spending too much time on.
>>
>>> @@ -266,7 +266,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
>>>  			error(_("hash mismatch %s"), oid_to_hex(oid));
>>>  			return NULL;
>>>  		}
>>> -		parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
>>> +		parse_blob_buffer(lookup_blob(r, oid));
>>>  		return lookup_object(r, oid);
>>
>> Not new in your patch, but I wondered if this could cause a segfault
>> when lookup_blob() returns NULL. I _think_ the answer is "no". We'd hit
>> this code path when either:
>>
>>   - lookup_object() returns an object with type OBJ_BLOB, in which case
>>     lookup_blob() would return that same object
>>
>>   - lookup_object() returned NULL, in which case lookup_blob() will call
>>     it again, get NULL again, and then auto-create the blob and return
>>     it
>>
>> So I think it is OK. But there are a bunch of duplicate hash lookups in
>> this code. It would be clearer and more efficient as:
>>
>> diff --git a/object.c b/object.c
>> index 2c32691dc4..2dfa038f13 100644
>> --- a/object.c
>> +++ b/object.c
>> @@ -262,12 +262,14 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
>>  	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
>>  	    (!obj && repo_has_object_file(r, oid) &&
>>  	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
>> +		if (!obj)
>> +			obj = create_object(r, oid, alloc_blob_node(r));
>>  		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
>>  			error(_("hash mismatch %s"), oid_to_hex(oid));
>>  			return NULL;
>>  		}
>> -		parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
>> -		return lookup_object(r, oid);
>> +		parse_blob_buffer(obj, NULL, 0);
>> +		return obj;
>>  	}
>>  
>>  	buffer = repo_read_object_file(r, oid, &type, &size);
>>
>> but I doubt the efficiency matters much in practice. Those hash lookups
>> will be lost in the noise of computing the hash of the blob contents.
>
> I was trying to keep the changes smaller, but what about just doing this?:

Sent a bit too soon...:

> diff --git a/blob.c b/blob.c
> index 182718aba9..69293e7d8e 100644
> --- a/blob.c
> +++ b/blob.c
> @@ -5,16 +5,16 @@
>  
>  const char *blob_type = "blob";
>  
> +struct blob *create_blob(struct repository *r, const struct object_id *oid)
> +{
> +	return create_object(r, oid, alloc_blob_node(r));
> +}
> +
>  struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
>  {
>  	struct object *obj = lookup_object(r, oid);
>  	if (!obj)
> -		return create_object(r, oid, alloc_blob_node(r));
> -	return object_as_type(obj, OBJ_BLOB, 0);
> -}
> +		return create_blob(r, oid);
>  
> -int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
> -{
> -	item->object.parsed = 1;
> -	return 0;
> +	return object_as_type(obj, OBJ_BLOB, 0);
>  }
> diff --git a/blob.h b/blob.h
> index 1664872055..ad34f0e9cc 100644
> --- a/blob.h
> +++ b/blob.h
> @@ -9,10 +9,9 @@ struct blob {
>  	struct object object;
>  };
>  
> +struct blob *create_blob(struct repository *r, const struct object_id *oid);
>  struct blob *lookup_blob(struct repository *r, const struct object_id *oid);
>  
> -int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size);
> -
>  /**
>   * Blobs do not contain references to other objects and do not have
>   * structured data that needs parsing. However, code may use the
> diff --git a/object.c b/object.c
> index 78343781ae..2699431404 100644
> --- a/object.c
> +++ b/object.c
> @@ -195,8 +195,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
>  	if (type == OBJ_BLOB) {
>  		struct blob *blob = lookup_blob(r, oid);
>  		if (blob) {
> -			if (parse_blob_buffer(blob, buffer, size))
> -				return NULL;
> +			blob->object.parsed = 1;
>  			obj = &blob->object;
>  		}
>  	} else if (type == OBJ_TREE) {
> @@ -262,12 +261,16 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
>  	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
>  	    (!obj && repo_has_object_file(r, oid) &&
>  	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
> +		struct blob *blob;
> +		if (!obj)
> +			blob = create_blob(r, oid);
>
>  		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
>  			error(_("hash mismatch %s"), oid_to_hex(oid));
>  			return NULL;
>  		}
> -		parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
> -		return lookup_object(r, oid);
> +		obj = &blob->object;
> +		obj->parsed = 1;
> +		return obj;
>  	}
>  
>  	buffer = repo_read_object_file(r, oid, &type, &size);

Well, aside from this segfault think-o introduced while
experimenting. Needs to be:

@@ -262,12 +261,16 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
+		if (!obj) {
+			struct blob *blob = create_blob(r, oid);
+			obj = &blob->object;
+		}
 		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
-		parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
-		return lookup_object(r, oid);
+		obj->parsed = 1;
+		return obj;
 	}
 
 	buffer = repo_read_object_file(r, oid, &type, &size);




^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/2] blob.c: remove buffer & size arguments to parse_blob_buffer()
  2021-04-10 12:57             ` Ævar Arnfjörð Bjarmason
  2021-04-10 13:01               ` Ævar Arnfjörð Bjarmason
@ 2021-04-13  8:25               ` Jeff King
  1 sibling, 0 replies; 142+ messages in thread
From: Jeff King @ 2021-04-13  8:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Taylor Blau, Elijah Newren

On Sat, Apr 10, 2021 at 02:57:12PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Not new in your patch, but I wondered if this could cause a segfault
> > when lookup_blob() returns NULL. I _think_ the answer is "no". We'd hit
> > this code path when either:
> [...]
> I was trying to keep the changes smaller, but what about just doing this?:
> [...]

Yeah, that seems pretty reasonable to me. It cleans up the extra lookups
in parse_object() and gets rid of the funny-named "parse_blob_buffer()"
that takes no buffer.

-Peff

^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 00/10] object.c et al: tests, small bug fixes etc.
  2021-04-09  8:07       ` [PATCH 0/2] blob/object.c: trivial readability improvements Ævar Arnfjörð Bjarmason
                           ` (2 preceding siblings ...)
  2021-04-09  8:32         ` [PATCH 0/6] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
@ 2021-04-20 12:50         ` Ævar Arnfjörð Bjarmason
  2021-04-20 12:50           ` [PATCH v2 01/10] cat-file tests: test for bogus type name handling Ævar Arnfjörð Bjarmason
                             ` (9 more replies)
  3 siblings, 10 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 12:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

This is a re-roll of my v1 "blob/object.c: trivial readability
improvements"[1] which has grown from 2 patches to 10. As suggested by
Jeff King in v1 we can entirely get rid of parse_blob_buffer(), so we
now do that.

The other reason this has grown is to resolve a semantic conflict that
came up with type_from_string() and type_from_string_gently() early,
between this and the series I'm about to re-roll on top of it we'll
also remove the "emit an error" argument to type_from_string_gently().

This involves going through some existing callers ande improving their
error messages, and fixing some subtle existing bugs.

1. https://lore.kernel.org/git/cover-0.3-0000000000-20210409T080534Z-avarab@gmail.com

Ævar Arnfjörð Bjarmason (10):
  cat-file tests: test for bogus type name handling
  hash-object tests: more detailed test for invalid type
  mktree tests: add test for invalid object type
  object-file.c: take type id, not string, in
    read_object_with_reference()
  {commit,tree,blob,tag}.c: add a create_{commit,tree,blob,tag}()
  blob.c: remove parse_blob_buffer()
  object.c: simplify return semantic of parse_object_buffer()
  object.c: don't go past "len" under die() in type_from_string_gently()
  mktree: stop setting *ntr++ to NIL
  mktree: emit a more detailed error when the <type> is invalid

 blob.c                      | 13 ++++++-------
 blob.h                      | 12 +-----------
 builtin/cat-file.c          |  7 ++++---
 builtin/fast-import.c       |  6 +++---
 builtin/grep.c              |  4 ++--
 builtin/mktree.c            | 23 ++++++++++++++++-------
 builtin/pack-objects.c      |  2 +-
 cache.h                     |  2 +-
 commit-graph.c              |  2 +-
 commit.c                    |  7 ++++++-
 commit.h                    |  1 +
 object-file.c               |  7 +++----
 object.c                    | 26 +++++++++++++-------------
 t/helper/test-fast-rebase.c |  4 ++--
 t/t1006-cat-file.sh         | 16 ++++++++++++++++
 t/t1007-hash-object.sh      | 12 ++++++++++--
 t/t1010-mktree.sh           | 10 ++++++++++
 tag.c                       |  7 ++++++-
 tree-walk.c                 |  6 +++---
 tree.c                      |  7 ++++++-
 20 files changed, 111 insertions(+), 63 deletions(-)

Range-diff against v1:
 -:  ---------- >  1:  5818eca45d cat-file tests: test for bogus type name handling
 -:  ---------- >  2:  0b48389325 hash-object tests: more detailed test for invalid type
 -:  ---------- >  3:  cd585017a9 mktree tests: add test for invalid object type
 -:  ---------- >  4:  48aca62864 object-file.c: take type id, not string, in read_object_with_reference()
 -:  ---------- >  5:  5213d500b9 {commit,tree,blob,tag}.c: add a create_{commit,tree,blob,tag}()
 1:  68a7709fe5 !  6:  02c8d2a9ba blob.c: remove buffer & size arguments to parse_blob_buffer()
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    blob.c: remove buffer & size arguments to parse_blob_buffer()
    +    blob.c: remove parse_blob_buffer()
     
         As noted in the comment introduced in 837d395a5c0 (Replace
         parse_blob() with an explanatory comment, 2010-01-18) the old
    -    parse_blob() function and the current parse_blob_buffer() exist merely
    -    to provide consistency in the API.
    +    parse_blob() function and the parse_blob_buffer() existed to provide
    +    consistency in the API.
    +
    +    See bd2c39f58f9 ([PATCH] don't load and decompress objects twice with
    +    parse_object(), 2005-05-06) for the introduction of
    +    parse_blob_buffer().
     
         We're not going to parse blobs like we "parse" commits, trees or
    -    tags. So let's not have the parse_blob_buffer() take arguments that
    -    pretends that we do. Its only use is to set the "parsed" flag.
    +    tags. So we should not have the parse_blob_buffer() take arguments
    +    that pretends that we do. Its only use is to set the "parsed" flag.
     
    -    See bd2c39f58f9 ([PATCH] don't load and decompress objects twice with
    -    parse_object(), 2005-05-06) for the introduction of parse_blob_buffer().
    +    So let's entirely remove the function, and use our newly created
    +    create_blob() for the allocation. We can then set the "parsed" flag
    +    directly in parse_object_buffer() and parse_object() instead.
     
    -    I'm moving the prototype of parse_blob_buffer() below the comment
    -    added in 837d395a5c0 while I'm at it. That comment was originally
    -    meant to be a replacement for the missing parse_blob() function, but
    -    it's much less confusing to have it be above the parse_blob_buffer()
    -    function it refers to.
    +    At this point I could move the comment added in 837d395a5c0 to one or
    +    both of those object.c function, but let's just delete it instead. I
    +    think it's obvious from the flow of the code what's going on
    +    here. Setting the parsed flag no longer happens at a distance, so why
    +    we're doing it isn't unclear anymore.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## blob.c ##
    +@@
    + 
    + const char *blob_type = "blob";
    + 
    +-static struct blob *create_blob(struct repository *r, const struct object_id *oid)
    ++struct blob *create_blob(struct repository *r, const struct object_id *oid)
    + {
    + 	return create_object(r, oid, alloc_blob_node(r));
    + }
     @@ blob.c: struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
    + 		return create_blob(r, oid);
      	return object_as_type(obj, OBJ_BLOB, 0);
      }
    - 
    +-
     -int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
    -+int parse_blob_buffer(struct blob *item)
    - {
    - 	item->object.parsed = 1;
    - 	return 0;
    +-{
    +-	item->object.parsed = 1;
    +-	return 0;
    +-}
     
      ## blob.h ##
     @@ blob.h: struct blob {
    + 	struct object object;
    + };
      
    ++struct blob *create_blob(struct repository *r, const struct object_id *oid);
      struct blob *lookup_blob(struct repository *r, const struct object_id *oid);
      
     -int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size);
     -
    - /**
    -  * Blobs do not contain references to other objects and do not have
    -  * structured data that needs parsing. However, code may use the
    -@@ blob.h: int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size);
    -  * parse_blob_buffer() is used (by object.c) to flag that the object
    -  * has been read successfully from the database.
    -  **/
    -+int parse_blob_buffer(struct blob *item);
    - 
    +-/**
    +- * Blobs do not contain references to other objects and do not have
    +- * structured data that needs parsing. However, code may use the
    +- * "parsed" bit in the struct object for a blob to determine whether
    +- * its content has been found to actually be available, so
    +- * parse_blob_buffer() is used (by object.c) to flag that the object
    +- * has been read successfully from the database.
    +- **/
    +-
      #endif /* BLOB_H */
     
      ## object.c ##
    @@ object.c: struct object *parse_object_buffer(struct repository *r, const struct
      		struct blob *blob = lookup_blob(r, oid);
      		if (blob) {
     -			if (parse_blob_buffer(blob, buffer, size))
    -+			if (parse_blob_buffer(blob))
    - 				return NULL;
    +-				return NULL;
    ++			blob->object.parsed = 1;
      			obj = &blob->object;
      		}
    + 	} else if (type == OBJ_TREE) {
     @@ object.c: struct object *parse_object(struct repository *r, const struct object_id *oid)
    + 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
    + 	    (!obj && repo_has_object_file(r, oid) &&
    + 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
    ++		if (!obj) {
    ++			struct blob *blob = create_blob(r, oid);
    ++			obj = &blob->object;
    ++		}
    + 		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
      			error(_("hash mismatch %s"), oid_to_hex(oid));
      			return NULL;
      		}
     -		parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
    -+		parse_blob_buffer(lookup_blob(r, oid));
    - 		return lookup_object(r, oid);
    +-		return lookup_object(r, oid);
    ++		obj->parsed = 1;
    ++		return obj;
      	}
      
    + 	buffer = repo_read_object_file(r, oid, &type, &size);
 2:  f1fcc31717 !  7:  ee0b572f7d object.c: initialize automatic variable in lookup_object()
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    object.c: initialize automatic variable in lookup_object()
    +    object.c: simplify return semantic of parse_object_buffer()
     
    -    Initialize a "struct object obj*" variable to NULL explicitly and
    -    return it instead of leaving it uninitialized until the "while"
    -    loop.
    +    Remove the local "obj" variable from parse_object_buffer() and return
    +    the object directly instead.
     
    -    There was no bug here, it's just less confusing when debugging if the
    -    "obj" is either NULL or a valid object, not some random invalid
    -    pointer.
    +    The reason this variable was introduced was to free() a variable
    +    before returning in bd2c39f58f9 ([PATCH] don't load and decompress
    +    objects twice with parse_object() 2005-05-06). But that was when
    +    parse_object_buffer() didn't exist, there was only the parse_object()
    +    function.
     
    -    See 0556a11a0df (git object hash cleanups, 2006-06-30) for the initial
    -    implementation.
    +    Since the split-up of the two in 9f613ddd21c (Add git-for-each-ref:
    +    helper for language bindings, 2006-09-15) we have not needed this
    +    variable, and as demonstrated here not having to set it to (re)set it
    +    to NULL simplifies the function.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## object.c ##
    -@@ object.c: static void insert_obj_hash(struct object *obj, struct object **hash, unsigned i
    - struct object *lookup_object(struct repository *r, const struct object_id *oid)
    +@@ object.c: struct object *lookup_unknown_object(const struct object_id *oid)
    + 
    + struct object *parse_object_buffer(struct repository *r, const struct object_id *oid, enum object_type type, unsigned long size, void *buffer, int *eaten_p)
      {
    - 	unsigned int i, first;
     -	struct object *obj;
    -+	struct object *obj = NULL;
    + 	*eaten_p = 0;
      
    - 	if (!r->parsed_objects->obj_hash)
    --		return NULL;
    -+		return obj;
    +-	obj = NULL;
    + 	if (type == OBJ_BLOB) {
    + 		struct blob *blob = lookup_blob(r, oid);
    + 		if (blob) {
    + 			blob->object.parsed = 1;
    +-			obj = &blob->object;
    ++			return &blob->object;
    + 		}
    + 	} else if (type == OBJ_TREE) {
    + 		struct tree *tree = lookup_tree(r, oid);
    + 		if (tree) {
    +-			obj = &tree->object;
    + 			if (!tree->buffer)
    + 				tree->object.parsed = 0;
    + 			if (!tree->object.parsed) {
    +@@ object.c: struct object *parse_object_buffer(struct repository *r, const struct object_id
    + 					return NULL;
    + 				*eaten_p = 1;
    + 			}
    ++			return &tree->object;
    + 		}
    + 	} else if (type == OBJ_COMMIT) {
    + 		struct commit *commit = lookup_commit(r, oid);
    +@@ object.c: struct object *parse_object_buffer(struct repository *r, const struct object_id
    + 				set_commit_buffer(r, commit, buffer, size);
    + 				*eaten_p = 1;
    + 			}
    +-			obj = &commit->object;
    ++			return &commit->object;
    + 		}
    + 	} else if (type == OBJ_TAG) {
    + 		struct tag *tag = lookup_tag(r, oid);
    + 		if (tag) {
    + 			if (parse_tag_buffer(r, tag, buffer, size))
    + 			       return NULL;
    +-			obj = &tag->object;
    ++			return &tag->object;
    + 		}
    + 	} else {
    + 		warning(_("object %s has unknown type id %d"), oid_to_hex(oid), type);
    +-		obj = NULL;
    + 	}
    +-	return obj;
    ++	return NULL;
    + }
      
    - 	first = i = hash_obj(oid, r->parsed_objects->obj_hash_size);
    - 	while ((obj = r->parsed_objects->obj_hash[i]) != NULL) {
    + struct object *parse_object_or_die(const struct object_id *oid,
 -:  ---------- >  8:  f652d0fb5c object.c: don't go past "len" under die() in type_from_string_gently()
 -:  ---------- >  9:  e463fe5f6a mktree: stop setting *ntr++ to NIL
 -:  ---------- > 10:  fe75526a65 mktree: emit a more detailed error when the <type> is invalid
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 01/10] cat-file tests: test for bogus type name handling
  2021-04-20 12:50         ` [PATCH v2 00/10] object.c et al: tests, small bug fixes etc Ævar Arnfjörð Bjarmason
@ 2021-04-20 12:50           ` Ævar Arnfjörð Bjarmason
  2021-04-29  4:15             ` Junio C Hamano
  2021-04-20 12:50           ` [PATCH v2 02/10] hash-object tests: more detailed test for invalid type Ævar Arnfjörð Bjarmason
                             ` (8 subsequent siblings)
  9 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 12:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Add a test of how "cat-file" behaves when given a bogus type in its
"git cat-file <TYPE> <OBJECT>" mode. There were existing tests (just
below this one) for "-t bogus" or "--allow-unknown-type" modes, but
none for the switch-less mode.

This test is similar to the one that exists for "git hash-object"
already, see b7994af0f92 (type_from_string_gently: make sure length
matches, 2015-04-17).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5d2dc99b74..908797dcae 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,6 +315,22 @@ test_expect_success '%(deltabase) reports packed delta bases' '
 	}
 '
 
+test_expect_success 'cat-file complains about bogus type name' '
+	test_must_fail git cat-file co HEAD >out 2>err &&
+	test_must_be_empty out &&
+	cat >expected <<-\EOF &&
+	fatal: invalid object type "co"
+	EOF
+	test_cmp expected err &&
+
+	test_must_fail git cat-file bogus HEAD >out 2>err &&
+	test_must_be_empty out &&
+	cat >expected <<-\EOF &&
+	fatal: invalid object type "bogus"
+	EOF
+	test_cmp expected err
+'
+
 bogus_type="bogus"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 02/10] hash-object tests: more detailed test for invalid type
  2021-04-20 12:50         ` [PATCH v2 00/10] object.c et al: tests, small bug fixes etc Ævar Arnfjörð Bjarmason
  2021-04-20 12:50           ` [PATCH v2 01/10] cat-file tests: test for bogus type name handling Ævar Arnfjörð Bjarmason
@ 2021-04-20 12:50           ` Ævar Arnfjörð Bjarmason
  2021-04-20 12:50           ` [PATCH v2 03/10] mktree tests: add test for invalid object type Ævar Arnfjörð Bjarmason
                             ` (7 subsequent siblings)
  9 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 12:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change the tests added in b7994af0f92 (type_from_string_gently: make
sure length matches, 2015-04-17) to check the return code and error
that's emitted.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1007-hash-object.sh | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 64b340f227..74486f6f1a 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -230,11 +230,19 @@ test_expect_success 'corrupt tag' '
 '
 
 test_expect_success 'hash-object complains about bogus type name' '
-	test_must_fail git hash-object -t bogus --stdin </dev/null
+	test_must_fail git hash-object -t bogus --stdin 2>actual </dev/null &&
+	cat >expect <<-\EOF &&
+	fatal: invalid object type "bogus"
+	EOF
+	test_cmp expect actual
 '
 
 test_expect_success 'hash-object complains about truncated type name' '
-	test_must_fail git hash-object -t bl --stdin </dev/null
+	test_must_fail git hash-object -t bl --stdin 2>actual </dev/null &&
+	cat >expect <<-\EOF &&
+	fatal: invalid object type "bl"
+	EOF
+	test_cmp expect actual
 '
 
 test_expect_success '--literally' '
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 03/10] mktree tests: add test for invalid object type
  2021-04-20 12:50         ` [PATCH v2 00/10] object.c et al: tests, small bug fixes etc Ævar Arnfjörð Bjarmason
  2021-04-20 12:50           ` [PATCH v2 01/10] cat-file tests: test for bogus type name handling Ævar Arnfjörð Bjarmason
  2021-04-20 12:50           ` [PATCH v2 02/10] hash-object tests: more detailed test for invalid type Ævar Arnfjörð Bjarmason
@ 2021-04-20 12:50           ` Ævar Arnfjörð Bjarmason
  2021-04-20 12:50           ` [PATCH v2 04/10] object-file.c: take type id, not string, in read_object_with_reference() Ævar Arnfjörð Bjarmason
                             ` (6 subsequent siblings)
  9 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 12:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Add a missing test for an invalid object type to the mktree tests,
making it consistent with the same tests in cat-file.c. This tests the
interaction of mktree_line() and type_from_string().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1010-mktree.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index b946f87686..2a7b04aed8 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -58,6 +58,16 @@ test_expect_success 'allow missing object with --missing' '
 	test_cmp tree.missing actual
 '
 
+test_expect_success 'invalid object type' '
+	sed "s/tree/whee/g" <top >bad-type &&
+	test_must_fail git mktree <bad-type >out 2>err &&
+	test_must_be_empty out &&
+	cat >expected <<-\EOF &&
+	fatal: invalid object type "whee"
+	EOF
+	test_cmp expected err
+'
+
 test_expect_success 'mktree refuses to read ls-tree -r output (1)' '
 	test_must_fail git mktree <all >actual
 '
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 04/10] object-file.c: take type id, not string, in read_object_with_reference()
  2021-04-20 12:50         ` [PATCH v2 00/10] object.c et al: tests, small bug fixes etc Ævar Arnfjörð Bjarmason
                             ` (2 preceding siblings ...)
  2021-04-20 12:50           ` [PATCH v2 03/10] mktree tests: add test for invalid object type Ævar Arnfjörð Bjarmason
@ 2021-04-20 12:50           ` Ævar Arnfjörð Bjarmason
  2021-04-29  4:37             ` Junio C Hamano
  2021-04-20 12:50           ` [PATCH v2 05/10] {commit,tree,blob,tag}.c: add a create_{commit,tree,blob,tag}() Ævar Arnfjörð Bjarmason
                             ` (5 subsequent siblings)
  9 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 12:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Make the read_object_with_reference() function take "enum object_type"
instead of a "const char *" with a type name that it converted via
type_from_string().

Out of the nine callers of this function, only one wanted to pass a
"const char *". The others were simply passing along the
{commit,tree}_type string constants.

That one caller in builtin/cat-file.c did not expect to pass a "raw"
type (i.e. in invalid "--literally" type, but one gotten from
type_from_string(). Furthermore the read_object_with_reference()
function itself was calling type_from_string(), so this whole thing
amounted to unnecessarily going back and forth.

This API design dates back to f4913f91a96 ([PATCH] Accept commit in
some places when tree is needed., 2005-04-20). At that time there
wasn't an API like type_from_string(). That only arrived in
df8436622fb (formalize typename(), and add its reverse
type_from_string(), 2007-02-26).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/cat-file.c     | 7 ++++---
 builtin/fast-import.c  | 6 +++---
 builtin/grep.c         | 4 ++--
 builtin/pack-objects.c | 2 +-
 cache.h                | 2 +-
 object-file.c          | 7 +++----
 tree-walk.c            | 6 +++---
 7 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5ebf13359e..46fc7a32ba 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -66,7 +66,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 			int unknown_type)
 {
 	struct object_id oid;
-	enum object_type type;
+	enum object_type type, exp_type_id;
 	char *buf;
 	unsigned long size;
 	struct object_context obj_context;
@@ -154,7 +154,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		break;
 
 	case 0:
-		if (type_from_string(exp_type) == OBJ_BLOB) {
+		exp_type_id = type_from_string(exp_type);
+		if (exp_type_id == OBJ_BLOB) {
 			struct object_id blob_oid;
 			if (oid_object_info(the_repository, &oid, NULL) == OBJ_TAG) {
 				char *buffer = read_object_file(&oid, &type,
@@ -177,7 +178,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 			 */
 		}
 		buf = read_object_with_reference(the_repository,
-						 &oid, exp_type, &size, NULL);
+						 &oid, exp_type_id, &size, NULL);
 		break;
 
 	default:
diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 3afa81cf9a..ee52be02f8 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -2481,7 +2481,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa
 		unsigned long size;
 		char *buf = read_object_with_reference(the_repository,
 						       &commit_oid,
-						       commit_type, &size,
+						       OBJ_COMMIT, &size,
 						       &commit_oid);
 		if (!buf || size < the_hash_algo->hexsz + 6)
 			die("Not a valid commit: %s", p);
@@ -2553,7 +2553,7 @@ static void parse_from_existing(struct branch *b)
 		char *buf;
 
 		buf = read_object_with_reference(the_repository,
-						 &b->oid, commit_type, &size,
+						 &b->oid, OBJ_COMMIT, &size,
 						 &b->oid);
 		parse_from_commit(b, buf, size);
 		free(buf);
@@ -2649,7 +2649,7 @@ static struct hash_list *parse_merge(unsigned int *count)
 			unsigned long size;
 			char *buf = read_object_with_reference(the_repository,
 							       &n->oid,
-							       commit_type,
+							       OBJ_COMMIT,
 							       &size, &n->oid);
 			if (!buf || size < the_hash_algo->hexsz + 6)
 				die("Not a valid commit: %s", from);
diff --git a/builtin/grep.c b/builtin/grep.c
index 5de725f904..f2a73c92fa 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -467,7 +467,7 @@ static int grep_submodule(struct grep_opt *opt,
 		object = parse_object_or_die(oid, NULL);
 		obj_read_unlock();
 		data = read_object_with_reference(&subrepo,
-						  &object->oid, tree_type,
+						  &object->oid, OBJ_TREE,
 						  &size, NULL);
 		if (!data)
 			die(_("unable to read tree (%s)"), oid_to_hex(&object->oid));
@@ -635,7 +635,7 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 		int hit, len;
 
 		data = read_object_with_reference(opt->repo,
-						  &obj->oid, tree_type,
+						  &obj->oid, OBJ_TREE,
 						  &size, NULL);
 		if (!data)
 			die(_("unable to read tree (%s)"), oid_to_hex(&obj->oid));
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a1e33d7507..feb0320371 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1649,7 +1649,7 @@ static void add_preferred_base(struct object_id *oid)
 		return;
 
 	data = read_object_with_reference(the_repository, oid,
-					  tree_type, &size, &tree_oid);
+					  OBJ_TREE, &size, &tree_oid);
 	if (!data)
 		return;
 
diff --git a/cache.h b/cache.h
index 148d9ab5f1..dad9792c74 100644
--- a/cache.h
+++ b/cache.h
@@ -1508,7 +1508,7 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char
 
 void *read_object_with_reference(struct repository *r,
 				 const struct object_id *oid,
-				 const char *required_type,
+				 enum object_type object_type,
 				 unsigned long *size,
 				 struct object_id *oid_ret);
 
diff --git a/object-file.c b/object-file.c
index 624af408cd..d2f223dcef 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1669,25 +1669,24 @@ void *read_object_file_extended(struct repository *r,
 
 void *read_object_with_reference(struct repository *r,
 				 const struct object_id *oid,
-				 const char *required_type_name,
+				 enum object_type object_type,
 				 unsigned long *size,
 				 struct object_id *actual_oid_return)
 {
-	enum object_type type, required_type;
 	void *buffer;
 	unsigned long isize;
 	struct object_id actual_oid;
 
-	required_type = type_from_string(required_type_name);
 	oidcpy(&actual_oid, oid);
 	while (1) {
 		int ref_length = -1;
 		const char *ref_type = NULL;
+		enum object_type type;
 
 		buffer = repo_read_object_file(r, &actual_oid, &type, &isize);
 		if (!buffer)
 			return NULL;
-		if (type == required_type) {
+		if (type == object_type) {
 			*size = isize;
 			if (actual_oid_return)
 				oidcpy(actual_oid_return, &actual_oid);
diff --git a/tree-walk.c b/tree-walk.c
index 2d6226d5f1..e5db9291e1 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -89,7 +89,7 @@ void *fill_tree_descriptor(struct repository *r,
 	void *buf = NULL;
 
 	if (oid) {
-		buf = read_object_with_reference(r, oid, tree_type, &size, NULL);
+		buf = read_object_with_reference(r, oid, OBJ_TREE, &size, NULL);
 		if (!buf)
 			die("unable to read tree %s", oid_to_hex(oid));
 	}
@@ -605,7 +605,7 @@ int get_tree_entry(struct repository *r,
 	unsigned long size;
 	struct object_id root;
 
-	tree = read_object_with_reference(r, tree_oid, tree_type, &size, &root);
+	tree = read_object_with_reference(r, tree_oid, OBJ_TREE, &size, &root);
 	if (!tree)
 		return -1;
 
@@ -677,7 +677,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 			unsigned long size;
 			tree = read_object_with_reference(r,
 							  &current_tree_oid,
-							  tree_type, &size,
+							  OBJ_TREE, &size,
 							  &root);
 			if (!tree)
 				goto done;
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 05/10] {commit,tree,blob,tag}.c: add a create_{commit,tree,blob,tag}()
  2021-04-20 12:50         ` [PATCH v2 00/10] object.c et al: tests, small bug fixes etc Ævar Arnfjörð Bjarmason
                             ` (3 preceding siblings ...)
  2021-04-20 12:50           ` [PATCH v2 04/10] object-file.c: take type id, not string, in read_object_with_reference() Ævar Arnfjörð Bjarmason
@ 2021-04-20 12:50           ` Ævar Arnfjörð Bjarmason
  2021-04-29  4:45             ` Junio C Hamano
  2021-04-20 12:50           ` [PATCH v2 06/10] blob.c: remove parse_blob_buffer() Ævar Arnfjörð Bjarmason
                             ` (4 subsequent siblings)
  9 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 12:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Add a create_*() function for our built-in types as a handy but
trivial wrapper around their calls to create_object().

This allows for slightly simplifying code added in
96af91d410c (commit-graph: verify objects exist, 2018-06-27). The
remaining three functions are added for consistency for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 blob.c                      | 7 ++++++-
 commit-graph.c              | 2 +-
 commit.c                    | 7 ++++++-
 commit.h                    | 1 +
 t/helper/test-fast-rebase.c | 4 ++--
 tag.c                       | 7 ++++++-
 tree.c                      | 7 ++++++-
 7 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/blob.c b/blob.c
index 182718aba9..d98b6badc7 100644
--- a/blob.c
+++ b/blob.c
@@ -5,11 +5,16 @@
 
 const char *blob_type = "blob";
 
+static struct blob *create_blob(struct repository *r, const struct object_id *oid)
+{
+	return create_object(r, oid, alloc_blob_node(r));
+}
+
 struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
-		return create_object(r, oid, alloc_blob_node(r));
+		return create_blob(r, oid);
 	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index f18380b922..c456f84f41 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -2504,7 +2504,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
 		graph_commit = lookup_commit(r, &cur_oid);
-		odb_commit = (struct commit *)create_object(r, &cur_oid, alloc_commit_node(r));
+		odb_commit = create_commit(r, &cur_oid);
 		if (parse_commit_internal(odb_commit, 0, 0)) {
 			graph_report(_("failed to parse commit %s from object database for commit-graph"),
 				     oid_to_hex(&cur_oid));
diff --git a/commit.c b/commit.c
index 8ea55a447f..3580c62b92 100644
--- a/commit.c
+++ b/commit.c
@@ -57,11 +57,16 @@ struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref
 	return c;
 }
 
+struct commit *create_commit(struct repository *r, const struct object_id *oid)
+{
+	return create_object(r, oid, alloc_commit_node(r));
+}
+
 struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
-		return create_object(r, oid, alloc_commit_node(r));
+		return create_commit(r, oid);
 	return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
diff --git a/commit.h b/commit.h
index df42eb434f..09e2167f8c 100644
--- a/commit.h
+++ b/commit.h
@@ -63,6 +63,7 @@ enum decoration_type {
 void add_name_decoration(enum decoration_type type, const char *name, struct object *obj);
 const struct name_decoration *get_name_decoration(const struct object *obj);
 
+struct commit *create_commit(struct repository *r, const struct object_id *oid);
 struct commit *lookup_commit(struct repository *r, const struct object_id *oid);
 struct commit *lookup_commit_reference(struct repository *r,
 				       const struct object_id *oid);
diff --git a/t/helper/test-fast-rebase.c b/t/helper/test-fast-rebase.c
index 373212256a..e3d3e991a5 100644
--- a/t/helper/test-fast-rebase.c
+++ b/t/helper/test-fast-rebase.c
@@ -51,7 +51,7 @@ static char *get_author(const char *message)
 	return NULL;
 }
 
-static struct commit *create_commit(struct tree *tree,
+static struct commit *make_a_commit(struct tree *tree,
 				    struct commit *based_on,
 				    struct commit *parent)
 {
@@ -177,7 +177,7 @@ int cmd__fast_rebase(int argc, const char **argv)
 		if (!result.clean)
 			die("Aborting: Hit a conflict and restarting is not implemented.");
 		last_picked_commit = commit;
-		last_commit = create_commit(result.tree, commit, last_commit);
+		last_commit = make_a_commit(result.tree, commit, last_commit);
 	}
 	fprintf(stderr, "\nDone.\n");
 	/* TODO: There should be some kind of rev_info_free(&revs) call... */
diff --git a/tag.c b/tag.c
index 3e18a41841..ed7037256e 100644
--- a/tag.c
+++ b/tag.c
@@ -99,11 +99,16 @@ struct object *deref_tag_noverify(struct object *o)
 	return o;
 }
 
+static struct tag *create_tag(struct repository *r, const struct object_id *oid)
+{
+	return create_object(r, oid, alloc_tag_node(r));
+}
+
 struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
-		return create_object(r, oid, alloc_tag_node(r));
+		return create_tag(r, oid);
 	return object_as_type(obj, OBJ_TAG, 0);
 }
 
diff --git a/tree.c b/tree.c
index 410e3b477e..00958c581e 100644
--- a/tree.c
+++ b/tree.c
@@ -102,11 +102,16 @@ int cmp_cache_name_compare(const void *a_, const void *b_)
 				  ce2->name, ce2->ce_namelen, ce_stage(ce2));
 }
 
+static struct tree *create_tree(struct repository *r, const struct object_id *oid)
+{
+	return create_object(r, oid, alloc_tree_node(r));
+}
+
 struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
-		return create_object(r, oid, alloc_tree_node(r));
+		return create_tree(r, oid);
 	return object_as_type(obj, OBJ_TREE, 0);
 }
 
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 06/10] blob.c: remove parse_blob_buffer()
  2021-04-20 12:50         ` [PATCH v2 00/10] object.c et al: tests, small bug fixes etc Ævar Arnfjörð Bjarmason
                             ` (4 preceding siblings ...)
  2021-04-20 12:50           ` [PATCH v2 05/10] {commit,tree,blob,tag}.c: add a create_{commit,tree,blob,tag}() Ævar Arnfjörð Bjarmason
@ 2021-04-20 12:50           ` Ævar Arnfjörð Bjarmason
  2021-04-29  4:51             ` Junio C Hamano
  2021-04-20 12:50           ` [PATCH v2 07/10] object.c: simplify return semantic of parse_object_buffer() Ævar Arnfjörð Bjarmason
                             ` (3 subsequent siblings)
  9 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 12:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

As noted in the comment introduced in 837d395a5c0 (Replace
parse_blob() with an explanatory comment, 2010-01-18) the old
parse_blob() function and the parse_blob_buffer() existed to provide
consistency in the API.

See bd2c39f58f9 ([PATCH] don't load and decompress objects twice with
parse_object(), 2005-05-06) for the introduction of
parse_blob_buffer().

We're not going to parse blobs like we "parse" commits, trees or
tags. So we should not have the parse_blob_buffer() take arguments
that pretends that we do. Its only use is to set the "parsed" flag.

So let's entirely remove the function, and use our newly created
create_blob() for the allocation. We can then set the "parsed" flag
directly in parse_object_buffer() and parse_object() instead.

At this point I could move the comment added in 837d395a5c0 to one or
both of those object.c function, but let's just delete it instead. I
think it's obvious from the flow of the code what's going on
here. Setting the parsed flag no longer happens at a distance, so why
we're doing it isn't unclear anymore.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 blob.c   |  8 +-------
 blob.h   | 12 +-----------
 object.c | 11 +++++++----
 3 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/blob.c b/blob.c
index d98b6badc7..1308299eab 100644
--- a/blob.c
+++ b/blob.c
@@ -5,7 +5,7 @@
 
 const char *blob_type = "blob";
 
-static struct blob *create_blob(struct repository *r, const struct object_id *oid)
+struct blob *create_blob(struct repository *r, const struct object_id *oid)
 {
 	return create_object(r, oid, alloc_blob_node(r));
 }
@@ -17,9 +17,3 @@ struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
 		return create_blob(r, oid);
 	return object_as_type(obj, OBJ_BLOB, 0);
 }
-
-int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
-{
-	item->object.parsed = 1;
-	return 0;
-}
diff --git a/blob.h b/blob.h
index 1664872055..6e6b23a769 100644
--- a/blob.h
+++ b/blob.h
@@ -9,17 +9,7 @@ struct blob {
 	struct object object;
 };
 
+struct blob *create_blob(struct repository *r, const struct object_id *oid);
 struct blob *lookup_blob(struct repository *r, const struct object_id *oid);
 
-int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size);
-
-/**
- * Blobs do not contain references to other objects and do not have
- * structured data that needs parsing. However, code may use the
- * "parsed" bit in the struct object for a blob to determine whether
- * its content has been found to actually be available, so
- * parse_blob_buffer() is used (by object.c) to flag that the object
- * has been read successfully from the database.
- **/
-
 #endif /* BLOB_H */
diff --git a/object.c b/object.c
index 78343781ae..f4e419e5c3 100644
--- a/object.c
+++ b/object.c
@@ -195,8 +195,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 	if (type == OBJ_BLOB) {
 		struct blob *blob = lookup_blob(r, oid);
 		if (blob) {
-			if (parse_blob_buffer(blob, buffer, size))
-				return NULL;
+			blob->object.parsed = 1;
 			obj = &blob->object;
 		}
 	} else if (type == OBJ_TREE) {
@@ -262,12 +261,16 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
+		if (!obj) {
+			struct blob *blob = create_blob(r, oid);
+			obj = &blob->object;
+		}
 		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
-		parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
-		return lookup_object(r, oid);
+		obj->parsed = 1;
+		return obj;
 	}
 
 	buffer = repo_read_object_file(r, oid, &type, &size);
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 07/10] object.c: simplify return semantic of parse_object_buffer()
  2021-04-20 12:50         ` [PATCH v2 00/10] object.c et al: tests, small bug fixes etc Ævar Arnfjörð Bjarmason
                             ` (5 preceding siblings ...)
  2021-04-20 12:50           ` [PATCH v2 06/10] blob.c: remove parse_blob_buffer() Ævar Arnfjörð Bjarmason
@ 2021-04-20 12:50           ` Ævar Arnfjörð Bjarmason
  2021-04-20 12:50           ` [PATCH v2 08/10] object.c: don't go past "len" under die() in type_from_string_gently() Ævar Arnfjörð Bjarmason
                             ` (2 subsequent siblings)
  9 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 12:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Remove the local "obj" variable from parse_object_buffer() and return
the object directly instead.

The reason this variable was introduced was to free() a variable
before returning in bd2c39f58f9 ([PATCH] don't load and decompress
objects twice with parse_object() 2005-05-06). But that was when
parse_object_buffer() didn't exist, there was only the parse_object()
function.

Since the split-up of the two in 9f613ddd21c (Add git-for-each-ref:
helper for language bindings, 2006-09-15) we have not needed this
variable, and as demonstrated here not having to set it to (re)set it
to NULL simplifies the function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/object.c b/object.c
index f4e419e5c3..70af833ca1 100644
--- a/object.c
+++ b/object.c
@@ -188,20 +188,17 @@ struct object *lookup_unknown_object(const struct object_id *oid)
 
 struct object *parse_object_buffer(struct repository *r, const struct object_id *oid, enum object_type type, unsigned long size, void *buffer, int *eaten_p)
 {
-	struct object *obj;
 	*eaten_p = 0;
 
-	obj = NULL;
 	if (type == OBJ_BLOB) {
 		struct blob *blob = lookup_blob(r, oid);
 		if (blob) {
 			blob->object.parsed = 1;
-			obj = &blob->object;
+			return &blob->object;
 		}
 	} else if (type == OBJ_TREE) {
 		struct tree *tree = lookup_tree(r, oid);
 		if (tree) {
-			obj = &tree->object;
 			if (!tree->buffer)
 				tree->object.parsed = 0;
 			if (!tree->object.parsed) {
@@ -209,6 +206,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 					return NULL;
 				*eaten_p = 1;
 			}
+			return &tree->object;
 		}
 	} else if (type == OBJ_COMMIT) {
 		struct commit *commit = lookup_commit(r, oid);
@@ -219,20 +217,19 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 				set_commit_buffer(r, commit, buffer, size);
 				*eaten_p = 1;
 			}
-			obj = &commit->object;
+			return &commit->object;
 		}
 	} else if (type == OBJ_TAG) {
 		struct tag *tag = lookup_tag(r, oid);
 		if (tag) {
 			if (parse_tag_buffer(r, tag, buffer, size))
 			       return NULL;
-			obj = &tag->object;
+			return &tag->object;
 		}
 	} else {
 		warning(_("object %s has unknown type id %d"), oid_to_hex(oid), type);
-		obj = NULL;
 	}
-	return obj;
+	return NULL;
 }
 
 struct object *parse_object_or_die(const struct object_id *oid,
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 08/10] object.c: don't go past "len" under die() in type_from_string_gently()
  2021-04-20 12:50         ` [PATCH v2 00/10] object.c et al: tests, small bug fixes etc Ævar Arnfjörð Bjarmason
                             ` (6 preceding siblings ...)
  2021-04-20 12:50           ` [PATCH v2 07/10] object.c: simplify return semantic of parse_object_buffer() Ævar Arnfjörð Bjarmason
@ 2021-04-20 12:50           ` Ævar Arnfjörð Bjarmason
  2021-04-29  4:55             ` Junio C Hamano
  2021-04-20 12:50           ` [PATCH v2 09/10] mktree: stop setting *ntr++ to NIL Ævar Arnfjörð Bjarmason
  2021-04-20 12:50           ` [PATCH v2 10/10] mktree: emit a more detailed error when the <type> is invalid Ævar Arnfjörð Bjarmason
  9 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 12:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Fix a bug that's been with us ever since type_from_string_gently() was
split off from type_from_string() in fe8e3b71805 (Refactor
type_from_string() to allow continuing after detecting an error,
2014-09-10).

When the type was invalid and we were in the non-gently mode we'd die,
and then proceed to run off past the "len" of the buffer we were
provided with.

Luckily, I think that nothing ever used this function in that way. Any
non-gentle invocation came via type_from_string(), which was passing a
buffer with a NIL at the same place as the "len" would take us (we got
it via strlen()).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object.c b/object.c
index 70af833ca1..bad9e17f25 100644
--- a/object.c
+++ b/object.c
@@ -50,7 +50,7 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
 	if (gentle)
 		return -1;
 
-	die(_("invalid object type \"%s\""), str);
+	die(_("invalid object type \"%.*s\""), (int)len, str);
 }
 
 /*
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 09/10] mktree: stop setting *ntr++ to NIL
  2021-04-20 12:50         ` [PATCH v2 00/10] object.c et al: tests, small bug fixes etc Ævar Arnfjörð Bjarmason
                             ` (7 preceding siblings ...)
  2021-04-20 12:50           ` [PATCH v2 08/10] object.c: don't go past "len" under die() in type_from_string_gently() Ævar Arnfjörð Bjarmason
@ 2021-04-20 12:50           ` Ævar Arnfjörð Bjarmason
  2021-04-29  5:01             ` Junio C Hamano
  2021-04-20 12:50           ` [PATCH v2 10/10] mktree: emit a more detailed error when the <type> is invalid Ævar Arnfjörð Bjarmason
  9 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 12:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Since 58ce21b819e (builtin/mktree: remove hard-coded constant,
2018-10-15) we have not made any subsequent use of the ntr variable
itself, but we did rely on it to NIL-delimit the string we were about
to feed to type_from_string().

Using type_from_string() here results in needless work, as we'd do a
strlen() on it, just to find point at which we had a SPC
character (now NIL) earlier in this function.

We can instead skip incrementing the ntr pointer, then pass the
pointer and length to the type_from_string() function instead.

Doing so would have been buggy in cases where the type was invalid
until a preceding commit fixed the die() invocation in
type_from_string() to also pay attention to the length. A preceding
commit added a test to t1010-mktree.sh which would fail if not for
that fix in type_from_string(), i.e. we'd end up printing the rest of
the line, not just the invalid type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/mktree.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/builtin/mktree.c b/builtin/mktree.c
index 891991b00d..7a27cfa2e0 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -95,9 +95,6 @@ static void mktree_line(char *buf, int nul_term_line, int allow_missing)
 	if (S_ISGITLINK(mode))
 		allow_missing = 1;
 
-
-	*ntr++ = 0; /* now at the beginning of SHA1 */
-
 	path = (char *)p + 1;  /* at the beginning of name */
 	if (!nul_term_line && path[0] == '"') {
 		struct strbuf p_uq = STRBUF_INIT;
@@ -111,7 +108,7 @@ static void mktree_line(char *buf, int nul_term_line, int allow_missing)
 	 * These should all agree.
 	 */
 	mode_type = object_type(mode);
-	if (mode_type != type_from_string(ptr)) {
+	if (mode_type != type_from_string_gently(ptr, ntr - ptr, 0)) {
 		die("entry '%s' object type (%s) doesn't match mode type (%s)",
 			path, ptr, type_name(mode_type));
 	}
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 10/10] mktree: emit a more detailed error when the <type> is invalid
  2021-04-20 12:50         ` [PATCH v2 00/10] object.c et al: tests, small bug fixes etc Ævar Arnfjörð Bjarmason
                             ` (8 preceding siblings ...)
  2021-04-20 12:50           ` [PATCH v2 09/10] mktree: stop setting *ntr++ to NIL Ævar Arnfjörð Bjarmason
@ 2021-04-20 12:50           ` Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 12:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

When given an invalid <type> as part of a "<mode> SP [...]" line (see
the added comment) we'd use the generic die() message in
type_from_string_gently().

Let's do a bit better in that case and emit a message at the same
level of detail as the existing die() message if the type was valid,
but didn't match the mode.

In preceding commits we fixed the type_from_string_gently() function
for cases where gentle=0, now there are no more callers of it that
pass "gentle=0" that aren't type_from_string() itself.

So that fixing of a bug wasn't strictly needed for this end-state, but
helps to incrementally explain and test the changes we're making, and
of course leaves type_from_string_gently() in a good state for any
future gently=0 callers.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/mktree.c  | 20 ++++++++++++++++----
 t/t1010-mktree.sh |  2 +-
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/builtin/mktree.c b/builtin/mktree.c
index 7a27cfa2e0..67e11d8562 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -72,8 +72,17 @@ static void mktree_line(char *buf, int nul_term_line, int allow_missing)
 	char *ptr, *ntr;
 	const char *p;
 	unsigned mode;
-	enum object_type mode_type; /* object type derived from mode */
-	enum object_type obj_type; /* object type derived from sha */
+	/*
+	 * For a line like:
+	 *
+	 *     <mode> SP <type> SP <object> SP <object size> TAB <file>"
+	 *
+	 * We'll discover and validate the type from all of <mode>,
+	 * <type> and <object>
+	 */
+	enum object_type mode_type;
+	enum object_type type_type;
+	enum object_type obj_type;
 	char *path, *to_free = NULL;
 	struct object_id oid;
 
@@ -108,10 +117,13 @@ static void mktree_line(char *buf, int nul_term_line, int allow_missing)
 	 * These should all agree.
 	 */
 	mode_type = object_type(mode);
-	if (mode_type != type_from_string_gently(ptr, ntr - ptr, 0)) {
+	type_type = type_from_string_gently(ptr, ntr - ptr, 1);
+	if (type_type < 0)
+		die("entry '%s' object type '%.*s' is invalid (our derived mode type is '%s')",
+			path, (int)(ntr - ptr), ptr, type_name(mode_type));
+	else if (mode_type != type_type)
 		die("entry '%s' object type (%s) doesn't match mode type (%s)",
 			path, ptr, type_name(mode_type));
-	}
 
 	/* Check the type of object identified by sha1 */
 	obj_type = oid_object_info(the_repository, &oid, NULL);
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 2a7b04aed8..fe8601e7bb 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -63,7 +63,7 @@ test_expect_success 'invalid object type' '
 	test_must_fail git mktree <bad-type >out 2>err &&
 	test_must_be_empty out &&
 	cat >expected <<-\EOF &&
-	fatal: invalid object type "whee"
+	fatal: entry '"'"'a.'"'"' object type '"'"'whee'"'"' is invalid (our derived mode type is '"'"'tree'"'"')
 	EOF
 	test_cmp expected err
 '
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change
  2021-04-09  8:32         ` [PATCH 0/6] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
                             ` (6 preceding siblings ...)
  2021-04-09  8:49           ` [PATCH 0/7] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:00           ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:00             ` [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
                               ` (9 more replies)
  7 siblings, 10 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

This goes on top of my just-submitted
https://lore.kernel.org/git/cover-00.10-0000000000-20210420T124428Z-avarab@gmail.com/
for v1 of this see
https://lore.kernel.org/git/cover-0.6-0000000000-20210409T082935Z-avarab@gmail.com/

As noted in the What's Cooking etc. the previous version of this
series's base and this on had a semantic conflict with brian's
hash-object.c work. See
https://lore.kernel.org/git/87mttx121j.fsf@evledraar.gmail.com/

That's now solved by this and the preceding series going through the
codebase and (culminating in this one) removing type_from_string()
entirely. There's still a "BUG" here in case we have any other
in-flight "-1" caller.

This will textually conflict with brian's hash-object.c work. But as
noted in the above-linked thread I've proposed another way forward
with brian's series independent of this one. In any case, the conflict
isn't hard to resolve, and I wanted to re-roll this sooner than later.

Ævar Arnfjörð Bjarmason (10):
  object.c: stop supporting len == -1 in type_from_string_gently()
  object.c: remove "gently" argument to type_from_string_gently()
  object.c: make type_from_string() return "enum object_type"
  object-file.c: make oid_object_info() return "enum object_type"
  object-name.c: make dependency on object_type order more obvious
  tag.c: use type_from_string_gently() when parsing tags
  hash-object: pass along type length to object.c
  hash-object: refactor nested else/if/if into else if/else if
  hash-object: show usage on invalid --type
  object.c: move type_from_string() code to its last user

 builtin/blame.c        |  2 +-
 builtin/cat-file.c     |  4 ++-
 builtin/hash-object.c  | 63 ++++++++++++++++++++++++++----------------
 builtin/index-pack.c   |  2 +-
 builtin/mktree.c       |  2 +-
 fsck.c                 |  2 +-
 object-file.c          | 36 ++++++++++++------------
 object-name.c          | 25 +++++++++--------
 object-store.h         |  8 ++++--
 object.c               | 14 ++++------
 object.h               |  3 +-
 packfile.c             |  2 +-
 t/t1007-hash-object.sh | 10 ++++---
 tag.c                  | 19 +++++++------
 14 files changed, 106 insertions(+), 86 deletions(-)

Range-diff against v1:
 3:  820f3aed21 !  1:  0ff9c653c3 object.c: stop supporting len == -1 in type_from_string_gently()
    @@ Commit message
         2014-09-10), but no callers use that form. Let's drop it to simplify
         this, and in preparation for simplifying these even further.
     
    +    Even though the argument was changed from ssize_t to the unsigned
    +    size_t C is by design forgiving about passing -1 as an unsigned
    +    type (it's just an alias for "set all bits)", let's detect any
    +    outstanding in-flight callers passing a -1.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## object.c ##
    -@@ object.c: int type_from_string_gently(const char *str, ssize_t len, int gentle)
    +@@ object.c: const char *type_name(unsigned int type)
    + 	return object_type_strings[type];
    + }
    + 
    +-int type_from_string_gently(const char *str, ssize_t len, int gentle)
    ++int type_from_string_gently(const char *str, size_t len, int gentle)
      {
      	int i;
      
     -	if (len < 0)
     -		len = strlen(str);
    --
    ++	if (len == ~(size_t)0)
    ++		BUG("type-from-string-gently no longer allows unspecified length");
    + 
      	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
      		if (!strncmp(str, object_type_strings[i], len) &&
    - 		    object_type_strings[i][len] == '\0')
     @@ object.c: int type_from_string_gently(const char *str, ssize_t len, int gentle)
    - 	die(_("invalid object type \"%s\""), str);
    + 	die(_("invalid object type \"%.*s\""), (int)len, str);
      }
      
     +int type_from_string(const char *str)
    @@ object.c: int type_from_string_gently(const char *str, ssize_t len, int gentle)
     
      ## object.h ##
     @@ object.h: struct object {
    + };
      
      const char *type_name(unsigned int type);
    - int type_from_string_gently(const char *str, ssize_t, int gentle);
    +-int type_from_string_gently(const char *str, ssize_t, int gentle);
     -#define type_from_string(str) type_from_string_gently(str, -1, 0)
    ++int type_from_string_gently(const char *str, size_t, int gentle);
     +int type_from_string(const char *str);
      
      /*
 4:  daed40c479 !  2:  5fa3128127 object.c: remove "gently" argument to type_from_string_gently()
    @@ Commit message
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    + ## builtin/mktree.c ##
    +@@ builtin/mktree.c: static void mktree_line(char *buf, int nul_term_line, int allow_missing)
    + 	 * These should all agree.
    + 	 */
    + 	mode_type = object_type(mode);
    +-	type_type = type_from_string_gently(ptr, ntr - ptr, 1);
    ++	type_type = type_from_string_gently(ptr, ntr - ptr);
    + 	if (type_type < 0)
    + 		die("entry '%s' object type '%.*s' is invalid (our derived mode type is '%s')",
    + 			path, (int)(ntr - ptr), ptr, type_name(mode_type));
    +
      ## fsck.c ##
     @@ fsck.c: int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
      		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TYPE, "invalid format - unexpected end after 'type' line");
    @@ object.c: const char *type_name(unsigned int type)
      	return object_type_strings[type];
      }
      
    --int type_from_string_gently(const char *str, ssize_t len, int gentle)
    -+int type_from_string_gently(const char *str, ssize_t len)
    +-int type_from_string_gently(const char *str, size_t len, int gentle)
    ++int type_from_string_gently(const char *str, size_t len)
      {
      	int i;
      
    -@@ object.c: int type_from_string_gently(const char *str, ssize_t len, int gentle)
    +@@ object.c: int type_from_string_gently(const char *str, size_t len, int gentle)
      		if (!strncmp(str, object_type_strings[i], len) &&
      		    object_type_strings[i][len] == '\0')
      			return i;
    @@ object.c: int type_from_string_gently(const char *str, ssize_t len, int gentle)
     -	if (gentle)
     -		return -1;
     -
    --	die(_("invalid object type \"%s\""), str);
    +-	die(_("invalid object type \"%.*s\""), (int)len, str);
     +	return -1;
      }
      
    @@ object.h: struct object {
      };
      
      const char *type_name(unsigned int type);
    --int type_from_string_gently(const char *str, ssize_t, int gentle);
    -+int type_from_string_gently(const char *str, ssize_t len);
    +-int type_from_string_gently(const char *str, size_t, int gentle);
    ++int type_from_string_gently(const char *str, size_t len);
      int type_from_string(const char *str);
      
      /*
 5:  7fd86f6699 !  3:  f0ec7d1dbb object.c: make type_from_string() return "enum object_type"
    @@ object.c: const char *type_name(unsigned int type)
      	return object_type_strings[type];
      }
      
    --int type_from_string_gently(const char *str, ssize_t len)
    -+enum object_type type_from_string_gently(const char *str, ssize_t len)
    +-int type_from_string_gently(const char *str, size_t len)
    ++enum object_type type_from_string_gently(const char *str, size_t len)
      {
     -	int i;
     +	enum object_type i;
      
    - 	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
    - 		if (!strncmp(str, object_type_strings[i], len) &&
    -@@ object.c: int type_from_string_gently(const char *str, ssize_t len)
    + 	if (len == ~(size_t)0)
    + 		BUG("type-from-string-gently no longer allows unspecified length");
    +@@ object.c: int type_from_string_gently(const char *str, size_t len)
      	return -1;
      }
      
    @@ object.h: struct object {
      };
      
      const char *type_name(unsigned int type);
    --int type_from_string_gently(const char *str, ssize_t len);
    +-int type_from_string_gently(const char *str, size_t len);
     -int type_from_string(const char *str);
    -+enum object_type type_from_string_gently(const char *str, ssize_t len);
    ++enum object_type type_from_string_gently(const char *str, size_t len);
     +enum object_type type_from_string(const char *str);
      
      /*
 6:  ebea1b2b50 =  4:  fa97396517 object-file.c: make oid_object_info() return "enum object_type"
 7:  94e13611f0 =  5:  a607239b56 object-name.c: make dependency on object_type order more obvious
 8:  3279d67d2b =  6:  e6fe7ce064 tag.c: use type_from_string_gently() when parsing tags
 1:  68a7709fe5 !  7:  a5ac9f1dd8 blob.c: remove buffer & size arguments to parse_blob_buffer()
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    blob.c: remove buffer & size arguments to parse_blob_buffer()
    +    hash-object: pass along type length to object.c
     
    -    As noted in the comment introduced in 837d395a5c0 (Replace
    -    parse_blob() with an explanatory comment, 2010-01-18) the old
    -    parse_blob() function and the current parse_blob_buffer() exist merely
    -    to provide consistency in the API.
    +    Change the functions to do with passing the type down to
    +    hash_object_file_literally() to pass the length of the type as well as
    +    the "const char *" type name.
     
    -    We're not going to parse blobs like we "parse" commits, trees or
    -    tags. So let's not have the parse_blob_buffer() take arguments that
    -    pretends that we do. Its only use is to set the "parsed" flag.
    +    The immediate motivation for this is to move hash-object.c over to
    +    type_from_string_gently() to emit a better error message, but it will
    +    also allow us in the future to craft an invalid object with a "\0" in
    +    the type name.
     
    -    See bd2c39f58f9 ([PATCH] don't load and decompress objects twice with
    -    parse_object(), 2005-05-06) for the introduction of parse_blob_buffer().
    -
    -    I'm moving the prototype of parse_blob_buffer() below the comment
    -    added in 837d395a5c0 while I'm at it. That comment was originally
    -    meant to be a replacement for the missing parse_blob() function, but
    -    it's much less confusing to have it be above the parse_blob_buffer()
    -    function it refers to.
    +    We'd need to learn a --type-file=* option or similar (we can't of
    +    course, pass a string with "\0" on the command-line). Right now such
    +    an object can be manually crafted, but we can't test for it with
    +    --literally.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    - ## blob.c ##
    -@@ blob.c: struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
    - 	return object_as_type(obj, OBJ_BLOB, 0);
    + ## builtin/hash-object.c ##
    +@@ builtin/hash-object.c: static int hash_literally(struct object_id *oid, int fd, const char *type, unsig
    + 	if (strbuf_read(&buf, fd, 4096) < 0)
    + 		ret = -1;
    + 	else
    +-		ret = hash_object_file_literally(buf.buf, buf.len, type, oid,
    +-						 flags);
    ++		ret = hash_object_file_literally(buf.buf, buf.len, type,
    ++						 strlen(type), oid, flags);
    + 	strbuf_release(&buf);
    + 	return ret;
    + }
    + 
    +-static void hash_fd(int fd, const char *type, const char *path, unsigned flags,
    +-		    int literally)
    ++static void hash_fd(int fd, const char *type, size_t type_len,
    ++		    const char *path, unsigned flags, int literally)
    + {
    + 	struct stat st;
    + 	struct object_id oid;
    +@@ builtin/hash-object.c: static void hash_fd(int fd, const char *type, const char *path, unsigned flags,
    + 	maybe_flush_or_die(stdout, "hash to stdout");
    + }
    + 
    +-static void hash_object(const char *path, const char *type, const char *vpath,
    +-			unsigned flags, int literally)
    ++static void hash_object(const char *path, const char *type, size_t type_len,
    ++			const char *vpath, unsigned flags, int literally)
    + {
    + 	int fd;
    + 	fd = open(path, O_RDONLY);
    + 	if (fd < 0)
    + 		die_errno("Cannot open '%s'", path);
    +-	hash_fd(fd, type, vpath, flags, literally);
    ++	hash_fd(fd, type, type_len, vpath, flags, literally);
      }
      
    --int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
    -+int parse_blob_buffer(struct blob *item)
    +-static void hash_stdin_paths(const char *type, int no_filters, unsigned flags,
    +-			     int literally)
    ++static void hash_stdin_paths(const char *type, size_t type_len, int no_filters,
    ++			     unsigned flags, int literally)
      {
    - 	item->object.parsed = 1;
    + 	struct strbuf buf = STRBUF_INIT;
    + 	struct strbuf unquoted = STRBUF_INIT;
    + 
    + 	while (strbuf_getline(&buf, stdin) != EOF) {
    ++		const char *vpath;
    + 		if (buf.buf[0] == '"') {
    + 			strbuf_reset(&unquoted);
    + 			if (unquote_c_style(&unquoted, buf.buf, NULL))
    + 				die("line is badly quoted");
    + 			strbuf_swap(&buf, &unquoted);
    + 		}
    +-		hash_object(buf.buf, type, no_filters ? NULL : buf.buf, flags,
    +-			    literally);
    ++		vpath = no_filters ? NULL : buf.buf;
    ++		hash_object(buf.buf, type, type_len, vpath , flags, literally);
    + 	}
    + 	strbuf_release(&buf);
    + 	strbuf_release(&unquoted);
    +@@ builtin/hash-object.c: int cmd_hash_object(int argc, const char **argv, const char *prefix)
    + 		NULL
    + 	};
    + 	const char *type = blob_type;
    ++	size_t type_len;
    + 	int hashstdin = 0;
    + 	int stdin_paths = 0;
    + 	int no_filters = 0;
    +@@ builtin/hash-object.c: int cmd_hash_object(int argc, const char **argv, const char *prefix)
    + 		usage_with_options(hash_object_usage, hash_object_options);
    + 	}
    + 
    ++	type_len = strlen(type);
    + 	if (hashstdin)
    +-		hash_fd(0, type, vpath, flags, literally);
    ++		hash_fd(0, type, type_len, vpath, flags, literally);
    + 
    + 	for (i = 0 ; i < argc; i++) {
    + 		const char *arg = argv[i];
    +@@ builtin/hash-object.c: int cmd_hash_object(int argc, const char **argv, const char *prefix)
    + 
    + 		if (prefix)
    + 			arg = to_free = prefix_filename(prefix, arg);
    +-		hash_object(arg, type, no_filters ? NULL : vpath ? vpath : arg,
    ++		hash_object(arg, type, type_len, no_filters ? NULL : vpath ? vpath : arg,
    + 			    flags, literally);
    + 		free(to_free);
    + 	}
    + 
    + 	if (stdin_paths)
    +-		hash_stdin_paths(type, no_filters, flags, literally);
    ++		hash_stdin_paths(type, type_len, no_filters, flags, literally);
    + 
      	return 0;
    + }
     
    - ## blob.h ##
    -@@ blob.h: struct blob {
    - 
    - struct blob *lookup_blob(struct repository *r, const struct object_id *oid);
    - 
    --int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size);
    --
    - /**
    -  * Blobs do not contain references to other objects and do not have
    -  * structured data that needs parsing. However, code may use the
    -@@ blob.h: int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size);
    -  * parse_blob_buffer() is used (by object.c) to flag that the object
    -  * has been read successfully from the database.
    -  **/
    -+int parse_blob_buffer(struct blob *item);
    - 
    - #endif /* BLOB_H */
    + ## object-file.c ##
    +@@ object-file.c: void *read_object_with_reference(struct repository *r,
    + 
    + static void write_object_file_prepare(const struct git_hash_algo *algo,
    + 				      const void *buf, unsigned long len,
    +-				      const char *type, struct object_id *oid,
    +-				      char *hdr, int *hdrlen)
    ++				      const char *type, size_t type_len,
    ++				      struct object_id *oid, char *hdr,
    ++				      int *hdrlen)
    + {
    + 	git_hash_ctx c;
    + 
    + 	/* Generate the header */
    +-	*hdrlen = xsnprintf(hdr, *hdrlen, "%s %"PRIuMAX , type, (uintmax_t)len)+1;
    ++	*hdrlen = xsnprintf(hdr, *hdrlen, "%.*s %"PRIuMAX,
    ++			    (int)type_len, type, (uintmax_t)len) + 1;
    + 
    + 	/* Sha1.. */
    + 	algo->init_fn(&c);
    +@@ object-file.c: int hash_object_file(const struct git_hash_algo *algo, const void *buf,
    + {
    + 	char hdr[MAX_HEADER_LEN];
    + 	int hdrlen = sizeof(hdr);
    +-	write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
    ++	write_object_file_prepare(algo, buf, len, type, strlen(type), oid, hdr,
    ++				  &hdrlen);
    + 	return 0;
    + }
    + 
    +@@ object-file.c: int write_object_file(const void *buf, unsigned long len, const char *type,
    + {
    + 	char hdr[MAX_HEADER_LEN];
    + 	int hdrlen = sizeof(hdr);
    ++	size_t type_len = strlen(type);
    + 
    + 	/* Normally if we have it in the pack then we do not bother writing
    + 	 * it out into .git/objects/??/?{38} file.
    + 	 */
    +-	write_object_file_prepare(the_hash_algo, buf, len, type, oid, hdr,
    +-				  &hdrlen);
    ++	write_object_file_prepare(the_hash_algo, buf, len, type, type_len, oid,
    ++				  hdr, &hdrlen);
    + 	if (freshen_packed_object(oid) || freshen_loose_object(oid))
    + 		return 0;
    + 	return write_loose_object(oid, hdr, hdrlen, buf, len, 0);
    + }
    + 
    + int hash_object_file_literally(const void *buf, unsigned long len,
    +-			       const char *type, struct object_id *oid,
    +-			       unsigned flags)
    ++			       const char *type, size_t type_len,
    ++			       struct object_id *oid, unsigned flags)
    + {
    + 	char *header;
    + 	int hdrlen, status = 0;
    + 
    + 	/* type string, SP, %lu of the length plus NUL must fit this */
    +-	hdrlen = strlen(type) + MAX_HEADER_LEN;
    ++	hdrlen = type_len + MAX_HEADER_LEN;
    + 	header = xmalloc(hdrlen);
    +-	write_object_file_prepare(the_hash_algo, buf, len, type, oid, header,
    +-				  &hdrlen);
    ++	write_object_file_prepare(the_hash_algo, buf, len, type, type_len, oid,
    ++				  header, &hdrlen);
    + 
    + 	if (!(flags & HASH_WRITE_OBJECT))
    + 		goto cleanup;
     
    - ## object.c ##
    -@@ object.c: struct object *parse_object_buffer(struct repository *r, const struct object_id
    - 	if (type == OBJ_BLOB) {
    - 		struct blob *blob = lookup_blob(r, oid);
    - 		if (blob) {
    --			if (parse_blob_buffer(blob, buffer, size))
    -+			if (parse_blob_buffer(blob))
    - 				return NULL;
    - 			obj = &blob->object;
    - 		}
    -@@ object.c: struct object *parse_object(struct repository *r, const struct object_id *oid)
    - 			error(_("hash mismatch %s"), oid_to_hex(oid));
    - 			return NULL;
    - 		}
    --		parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
    -+		parse_blob_buffer(lookup_blob(r, oid));
    - 		return lookup_object(r, oid);
    - 	}
    + ## object-store.h ##
    +@@ object-store.h: int write_object_file(const void *buf, unsigned long len,
    + 		      const char *type, struct object_id *oid);
    + 
    + int hash_object_file_literally(const void *buf, unsigned long len,
    +-			       const char *type, struct object_id *oid,
    +-			       unsigned flags);
    ++			       const char *type, size_t type_len,
    ++			       struct object_id *oid, unsigned flags);
      
    + /*
    +  * Add an object file to the in-memory object store, without writing it
 -:  ---------- >  8:  7bf04edc74 hash-object: refactor nested else/if/if into else if/else if
 2:  f1fcc31717 !  9:  eaa1b8f44c object.c: initialize automatic variable in lookup_object()
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    object.c: initialize automatic variable in lookup_object()
    +    hash-object: show usage on invalid --type
     
    -    Initialize a "struct object obj*" variable to NULL explicitly and
    -    return it instead of leaving it uninitialized until the "while"
    -    loop.
    +    Change the error displayed on "hash-object -t bogus" (without
    +    --literally) to show the usage_with_options(), like we do for the
    +    other usage errors.
     
    -    There was no bug here, it's just less confusing when debugging if the
    -    "obj" is either NULL or a valid object, not some random invalid
    -    pointer.
    -
    -    See 0556a11a0df (git object hash cleanups, 2006-06-30) for the initial
    -    implementation.
    +    As noted in a preceding commit it makes sense to pass the "len" down
    +    to the object.c code, so now that we're using
    +    type_from_string_gently() let's do that.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    - ## object.c ##
    -@@ object.c: static void insert_obj_hash(struct object *obj, struct object **hash, unsigned i
    - struct object *lookup_object(struct repository *r, const struct object_id *oid)
    + ## builtin/hash-object.c ##
    +@@ builtin/hash-object.c: static int hash_literally(struct object_id *oid, int fd, const char *type, unsig
    + 	return ret;
    + }
    + 
    +-static void hash_fd(int fd, const char *type, size_t type_len,
    +-		    const char *path, unsigned flags, int literally)
    ++static void hash_fd(int fd, enum object_type otype, const char *type,
    ++		    size_t type_len, const char *path, unsigned flags,
    ++		    int literally)
    + {
    + 	struct stat st;
    + 	struct object_id oid;
    +@@ builtin/hash-object.c: static void hash_fd(int fd, const char *type, size_t type_len,
    + 	if (fstat(fd, &st) < 0 ||
    + 	    (literally
    + 	     ? hash_literally(&oid, fd, type, flags)
    +-	     : index_fd(the_repository->index, &oid, fd, &st,
    +-			type_from_string(type), path, flags)))
    ++	     : index_fd(the_repository->index, &oid, fd, &st, otype, path,
    ++			flags)))
    + 		die((flags & HASH_WRITE_OBJECT)
    + 		    ? "Unable to add %s to database"
    + 		    : "Unable to hash %s", path);
    +@@ builtin/hash-object.c: static void hash_fd(int fd, const char *type, size_t type_len,
    + 	maybe_flush_or_die(stdout, "hash to stdout");
    + }
    + 
    +-static void hash_object(const char *path, const char *type, size_t type_len,
    ++static void hash_object(const char *path, enum object_type otype,
    ++			const char *type, size_t type_len,
    + 			const char *vpath, unsigned flags, int literally)
      {
    - 	unsigned int i, first;
    --	struct object *obj;
    -+	struct object *obj = NULL;
    + 	int fd;
    + 	fd = open(path, O_RDONLY);
    + 	if (fd < 0)
    + 		die_errno("Cannot open '%s'", path);
    +-	hash_fd(fd, type, type_len, vpath, flags, literally);
    ++	hash_fd(fd, otype, type, type_len, vpath, flags, literally);
    + }
    + 
    +-static void hash_stdin_paths(const char *type, size_t type_len, int no_filters,
    ++static void hash_stdin_paths(enum object_type otype, const char *type,
    ++			     size_t type_len, int no_filters,
    + 			     unsigned flags, int literally)
    + {
    + 	struct strbuf buf = STRBUF_INIT;
    +@@ builtin/hash-object.c: static void hash_stdin_paths(const char *type, size_t type_len, int no_filters,
    + 			strbuf_swap(&buf, &unquoted);
    + 		}
    + 		vpath = no_filters ? NULL : buf.buf;
    +-		hash_object(buf.buf, type, type_len, vpath , flags, literally);
    ++		hash_object(buf.buf, otype, type, type_len, vpath , flags, literally);
    + 	}
    + 	strbuf_release(&buf);
    + 	strbuf_release(&unquoted);
    +@@ builtin/hash-object.c: int cmd_hash_object(int argc, const char **argv, const char *prefix)
    + 	};
    + 	const char *type = blob_type;
    + 	size_t type_len;
    ++	enum object_type otype = OBJ_BAD;
    + 	int hashstdin = 0;
    + 	int stdin_paths = 0;
    + 	int no_filters = 0;
    +@@ builtin/hash-object.c: int cmd_hash_object(int argc, const char **argv, const char *prefix)
    + 	};
    + 	int i;
    + 	const char *errstr = NULL;
    ++	int errstr_arg_type = 0;
    + 
    + 	argc = parse_options(argc, argv, prefix, hash_object_options,
    + 			     hash_object_usage, 0);
    +@@ builtin/hash-object.c: int cmd_hash_object(int argc, const char **argv, const char *prefix)
    + 
    + 	git_config(git_default_config, NULL);
    + 
    +-	if (stdin_paths) {
    ++	type_len = strlen(type);
    ++	otype = type_from_string_gently(type, type_len);
    ++	if (otype < 0 && !literally) {
    ++		errstr = "the object type \"%.*s\" is invalid, did you mean to use --literally?";
    ++		errstr_arg_type = 1;
    ++	} else if (stdin_paths) {
    + 		if (hashstdin)
    + 			errstr = "Can't use --stdin-paths with --stdin";
    + 		else if (argc)
    +@@ builtin/hash-object.c: int cmd_hash_object(int argc, const char **argv, const char *prefix)
    + 	}
    + 
    + 	if (errstr) {
    +-		error("%s", errstr);
    ++		if (errstr_arg_type)
    ++			error(errstr, (int)type_len, type);
    ++		else
    ++			error("%s", errstr);
    + 		usage_with_options(hash_object_usage, hash_object_options);
    + 	}
    + 
    +-	type_len = strlen(type);
    + 	if (hashstdin)
    +-		hash_fd(0, type, type_len, vpath, flags, literally);
    ++		hash_fd(0, otype, type, type_len, vpath, flags, literally);
    + 
    + 	for (i = 0 ; i < argc; i++) {
    + 		const char *arg = argv[i];
    + 		char *to_free = NULL;
    ++		const char *tmp;
    + 
    + 		if (prefix)
    + 			arg = to_free = prefix_filename(prefix, arg);
    +-		hash_object(arg, type, type_len, no_filters ? NULL : vpath ? vpath : arg,
    +-			    flags, literally);
    ++		tmp = no_filters ? NULL : vpath ? vpath : arg;
    ++		hash_object(arg, otype, type, type_len, tmp, flags, literally);
    + 		free(to_free);
    + 	}
    + 
    + 	if (stdin_paths)
    +-		hash_stdin_paths(type, type_len, no_filters, flags, literally);
    ++		hash_stdin_paths(otype, type, type_len, no_filters, flags,
    ++				 literally);
    + 
    + 	return 0;
    + }
    +
    + ## t/t1007-hash-object.sh ##
    +@@ t/t1007-hash-object.sh: test_expect_success 'corrupt tag' '
    + '
      
    - 	if (!r->parsed_objects->obj_hash)
    --		return NULL;
    -+		return obj;
    + test_expect_success 'hash-object complains about bogus type name' '
    +-	test_must_fail git hash-object -t bogus --stdin 2>actual </dev/null &&
    ++	test_expect_code 129 git hash-object -t bogus --stdin 2>err </dev/null &&
    ++	grep ^error err >actual &&
    + 	cat >expect <<-\EOF &&
    +-	fatal: invalid object type "bogus"
    ++	error: the object type "bogus" is invalid, did you mean to use --literally?
    + 	EOF
    + 	test_cmp expect actual
    + '
      
    - 	first = i = hash_obj(oid, r->parsed_objects->obj_hash_size);
    - 	while ((obj = r->parsed_objects->obj_hash[i]) != NULL) {
    + test_expect_success 'hash-object complains about truncated type name' '
    +-	test_must_fail git hash-object -t bl --stdin 2>actual </dev/null &&
    ++	test_expect_code 129 git hash-object -t bl --stdin 2>err </dev/null &&
    ++	grep ^error err >actual &&
    + 	cat >expect <<-\EOF &&
    +-	fatal: invalid object type "bl"
    ++	error: the object type "bl" is invalid, did you mean to use --literally?
    + 	EOF
    + 	test_cmp expect actual
    + '
 -:  ---------- > 10:  cb0ea49279 object.c: move type_from_string() code to its last user
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently()
  2021-04-20 13:00           ` [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:00             ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:00             ` [PATCH v2 02/10] object.c: remove "gently" argument to type_from_string_gently() Ævar Arnfjörð Bjarmason
                               ` (8 subsequent siblings)
  9 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change the type_from_string() macro into a function and drop the
support for passing len < 0.

Support for len < 0 was added in fe8e3b71805 (Refactor
type_from_string() to allow continuing after detecting an error,
2014-09-10), but no callers use that form. Let's drop it to simplify
this, and in preparation for simplifying these even further.

Even though the argument was changed from ssize_t to the unsigned
size_t C is by design forgiving about passing -1 as an unsigned
type (it's just an alias for "set all bits)", let's detect any
outstanding in-flight callers passing a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object.c | 13 ++++++++++---
 object.h |  4 ++--
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/object.c b/object.c
index bad9e17f25..aae2a27e55 100644
--- a/object.c
+++ b/object.c
@@ -35,12 +35,12 @@ const char *type_name(unsigned int type)
 	return object_type_strings[type];
 }
 
-int type_from_string_gently(const char *str, ssize_t len, int gentle)
+int type_from_string_gently(const char *str, size_t len, int gentle)
 {
 	int i;
 
-	if (len < 0)
-		len = strlen(str);
+	if (len == ~(size_t)0)
+		BUG("type-from-string-gently no longer allows unspecified length");
 
 	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
 		if (!strncmp(str, object_type_strings[i], len) &&
@@ -53,6 +53,13 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
 	die(_("invalid object type \"%.*s\""), (int)len, str);
 }
 
+int type_from_string(const char *str)
+{
+	size_t len = strlen(str);
+	int ret = type_from_string_gently(str, len, 0);
+	return ret;
+}
+
 /*
  * Return a numerical hash value between 0 and n-1 for the object with
  * the specified sha1.  n must be a power of 2.  Please note that the
diff --git a/object.h b/object.h
index 59daadce21..f9d8f4d22b 100644
--- a/object.h
+++ b/object.h
@@ -93,8 +93,8 @@ struct object {
 };
 
 const char *type_name(unsigned int type);
-int type_from_string_gently(const char *str, ssize_t, int gentle);
-#define type_from_string(str) type_from_string_gently(str, -1, 0)
+int type_from_string_gently(const char *str, size_t, int gentle);
+int type_from_string(const char *str);
 
 /*
  * Return the current number of buckets in the object hashmap.
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 02/10] object.c: remove "gently" argument to type_from_string_gently()
  2021-04-20 13:00           ` [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
  2021-04-20 13:00             ` [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:00             ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:00             ` [PATCH v2 03/10] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
                               ` (7 subsequent siblings)
  9 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Get rid of the "gently" argument to type_from_string_gently() to make
it consistent with most other *_gently() functions. It's already a
"gentle" function, it shouldn't need a boolean argument telling it to
be gentle.

The reason it had a "gentle" parameter was because until the preceding
commit "type_from_string()" was a macro resolving to
"type_from_string_gently()", it's now a function.

This refactoring of adding a third parameter was done in
fe8e3b71805 (Refactor type_from_string() to allow continuing after
detecting an error, 2014-09-10) in preparation for its use in
fsck.c.

Simplifying this means we can move the die() into the simpler
type_from_string() function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/mktree.c |  2 +-
 fsck.c           |  2 +-
 object-file.c    |  2 +-
 object.c         | 12 +++++-------
 object.h         |  2 +-
 5 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/builtin/mktree.c b/builtin/mktree.c
index 67e11d8562..7d3f323209 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -117,7 +117,7 @@ static void mktree_line(char *buf, int nul_term_line, int allow_missing)
 	 * These should all agree.
 	 */
 	mode_type = object_type(mode);
-	type_type = type_from_string_gently(ptr, ntr - ptr, 1);
+	type_type = type_from_string_gently(ptr, ntr - ptr);
 	if (type_type < 0)
 		die("entry '%s' object type '%.*s' is invalid (our derived mode type is '%s')",
 			path, (int)(ntr - ptr), ptr, type_name(mode_type));
diff --git a/fsck.c b/fsck.c
index f5ed6a2635..8dda548c38 100644
--- a/fsck.c
+++ b/fsck.c
@@ -875,7 +875,7 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
 		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TYPE, "invalid format - unexpected end after 'type' line");
 		goto done;
 	}
-	*tagged_type = type_from_string_gently(buffer, eol - buffer, 1);
+	*tagged_type = type_from_string_gently(buffer, eol - buffer);
 	if (*tagged_type < 0)
 		ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_TYPE, "invalid 'type' value");
 	if (ret)
diff --git a/object-file.c b/object-file.c
index d2f223dcef..4af4748edd 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1314,7 +1314,7 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 		type_len++;
 	}
 
-	type = type_from_string_gently(type_buf, type_len, 1);
+	type = type_from_string_gently(type_buf, type_len);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
 	/*
diff --git a/object.c b/object.c
index aae2a27e55..7028243c9a 100644
--- a/object.c
+++ b/object.c
@@ -35,7 +35,7 @@ const char *type_name(unsigned int type)
 	return object_type_strings[type];
 }
 
-int type_from_string_gently(const char *str, size_t len, int gentle)
+int type_from_string_gently(const char *str, size_t len)
 {
 	int i;
 
@@ -46,17 +46,15 @@ int type_from_string_gently(const char *str, size_t len, int gentle)
 		if (!strncmp(str, object_type_strings[i], len) &&
 		    object_type_strings[i][len] == '\0')
 			return i;
-
-	if (gentle)
-		return -1;
-
-	die(_("invalid object type \"%.*s\""), (int)len, str);
+	return -1;
 }
 
 int type_from_string(const char *str)
 {
 	size_t len = strlen(str);
-	int ret = type_from_string_gently(str, len, 0);
+	int ret = type_from_string_gently(str, len);
+	if (ret < 0)
+		die(_("invalid object type \"%s\""), str);
 	return ret;
 }
 
diff --git a/object.h b/object.h
index f9d8f4d22b..470b3c1b86 100644
--- a/object.h
+++ b/object.h
@@ -93,7 +93,7 @@ struct object {
 };
 
 const char *type_name(unsigned int type);
-int type_from_string_gently(const char *str, size_t, int gentle);
+int type_from_string_gently(const char *str, size_t len);
 int type_from_string(const char *str);
 
 /*
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 03/10] object.c: make type_from_string() return "enum object_type"
  2021-04-20 13:00           ` [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
  2021-04-20 13:00             ` [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
  2021-04-20 13:00             ` [PATCH v2 02/10] object.c: remove "gently" argument to type_from_string_gently() Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:00             ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:00             ` [PATCH v2 04/10] object-file.c: make oid_object_info() " Ævar Arnfjörð Bjarmason
                               ` (6 subsequent siblings)
  9 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change the type_from_string*() functions to return an "enum
object_type", but don't refactor their callers to check for "==
OBJ_BAD" instead of "< 0".

Refactoring the check of the return value to check == OBJ_BAD would
now be equivalent to "ret < 0", but the consensus on an earlier
version of this patch was to not do that, and to instead use -1
consistently as a return value. It just so happens that OBJ_BAD == -1,
but let's not put a hard reliance on that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object.c | 8 ++++----
 object.h | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/object.c b/object.c
index 7028243c9a..8f3ddfc8f4 100644
--- a/object.c
+++ b/object.c
@@ -35,9 +35,9 @@ const char *type_name(unsigned int type)
 	return object_type_strings[type];
 }
 
-int type_from_string_gently(const char *str, size_t len)
+enum object_type type_from_string_gently(const char *str, size_t len)
 {
-	int i;
+	enum object_type i;
 
 	if (len == ~(size_t)0)
 		BUG("type-from-string-gently no longer allows unspecified length");
@@ -49,10 +49,10 @@ int type_from_string_gently(const char *str, size_t len)
 	return -1;
 }
 
-int type_from_string(const char *str)
+enum object_type type_from_string(const char *str)
 {
 	size_t len = strlen(str);
-	int ret = type_from_string_gently(str, len);
+	enum object_type ret = type_from_string_gently(str, len);
 	if (ret < 0)
 		die(_("invalid object type \"%s\""), str);
 	return ret;
diff --git a/object.h b/object.h
index 470b3c1b86..a4eca10d72 100644
--- a/object.h
+++ b/object.h
@@ -93,8 +93,8 @@ struct object {
 };
 
 const char *type_name(unsigned int type);
-int type_from_string_gently(const char *str, size_t len);
-int type_from_string(const char *str);
+enum object_type type_from_string_gently(const char *str, size_t len);
+enum object_type type_from_string(const char *str);
 
 /*
  * Return the current number of buckets in the object hashmap.
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 04/10] object-file.c: make oid_object_info() return "enum object_type"
  2021-04-20 13:00           ` [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
                               ` (2 preceding siblings ...)
  2021-04-20 13:00             ` [PATCH v2 03/10] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:00             ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:00             ` [PATCH v2 05/10] object-name.c: make dependency on object_type order more obvious Ævar Arnfjörð Bjarmason
                               ` (5 subsequent siblings)
  9 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change oid_object_info() to return an "enum object_type". Unlike
oid_object_info_extended() function the simpler oid_object_info()
explicitly returns the oi.typep member, which is itself an "enum
object_type".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/blame.c      |  2 +-
 builtin/index-pack.c |  2 +-
 object-file.c        |  8 +++-----
 object-name.c        | 19 +++++++++----------
 object-store.h       |  4 +++-
 packfile.c           |  2 +-
 6 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 641523ff9a..5dd3c38a8c 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -810,7 +810,7 @@ static int peel_to_commit_oid(struct object_id *oid_ret, void *cbdata)
 	oidcpy(&oid, oid_ret);
 	while (1) {
 		struct object *obj;
-		int kind = oid_object_info(r, &oid, NULL);
+		enum object_type kind = oid_object_info(r, &oid, NULL);
 		if (kind == OBJ_COMMIT) {
 			oidcpy(oid_ret, &oid);
 			return 0;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 15507b5cff..c0e3768c32 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -237,7 +237,7 @@ static unsigned check_object(struct object *obj)
 
 	if (!(obj->flags & FLAG_CHECKED)) {
 		unsigned long size;
-		int type = oid_object_info(the_repository, &obj->oid, &size);
+		enum object_type type = oid_object_info(the_repository, &obj->oid, &size);
 		if (type <= 0)
 			die(_("did not receive expected object %s"),
 			      oid_to_hex(&obj->oid));
diff --git a/object-file.c b/object-file.c
index 4af4748edd..398f2b60f9 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1572,11 +1572,9 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 	return ret;
 }
 
-
-/* returns enum object_type or negative */
-int oid_object_info(struct repository *r,
-		    const struct object_id *oid,
-		    unsigned long *sizep)
+enum object_type oid_object_info(struct repository *r,
+				 const struct object_id *oid,
+				 unsigned long *sizep)
 {
 	enum object_type type;
 	struct object_info oi = OBJECT_INFO_INIT;
diff --git a/object-name.c b/object-name.c
index 64202de60b..4d7f0c66cf 100644
--- a/object-name.c
+++ b/object-name.c
@@ -239,9 +239,8 @@ static int disambiguate_committish_only(struct repository *r,
 					void *cb_data_unused)
 {
 	struct object *obj;
-	int kind;
+	enum object_type kind = oid_object_info(r, oid, NULL);
 
-	kind = oid_object_info(r, oid, NULL);
 	if (kind == OBJ_COMMIT)
 		return 1;
 	if (kind != OBJ_TAG)
@@ -258,7 +257,7 @@ static int disambiguate_tree_only(struct repository *r,
 				  const struct object_id *oid,
 				  void *cb_data_unused)
 {
-	int kind = oid_object_info(r, oid, NULL);
+	enum object_type kind = oid_object_info(r, oid, NULL);
 	return kind == OBJ_TREE;
 }
 
@@ -267,7 +266,7 @@ static int disambiguate_treeish_only(struct repository *r,
 				     void *cb_data_unused)
 {
 	struct object *obj;
-	int kind;
+	enum object_type kind;
 
 	kind = oid_object_info(r, oid, NULL);
 	if (kind == OBJ_TREE || kind == OBJ_COMMIT)
@@ -286,7 +285,7 @@ static int disambiguate_blob_only(struct repository *r,
 				  const struct object_id *oid,
 				  void *cb_data_unused)
 {
-	int kind = oid_object_info(r, oid, NULL);
+	enum object_type kind = oid_object_info(r, oid, NULL);
 	return kind == OBJ_BLOB;
 }
 
@@ -361,7 +360,7 @@ static int show_ambiguous_object(const struct object_id *oid, void *data)
 {
 	const struct disambiguate_state *ds = data;
 	struct strbuf desc = STRBUF_INIT;
-	int type;
+	enum object_type type;
 
 	if (ds->fn && !ds->fn(ds->repo, oid, ds->cb_data))
 		return 0;
@@ -405,10 +404,10 @@ static int repo_collect_ambiguous(struct repository *r,
 static int sort_ambiguous(const void *a, const void *b, void *ctx)
 {
 	struct repository *sort_ambiguous_repo = ctx;
-	int a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
-	int b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
-	int a_type_sort;
-	int b_type_sort;
+	enum object_type a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
+	enum object_type b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
+	enum object_type a_type_sort;
+	enum object_type b_type_sort;
 
 	/*
 	 * Sorts by hash within the same object type, just as
diff --git a/object-store.h b/object-store.h
index ec32c23dcb..eab9674d08 100644
--- a/object-store.h
+++ b/object-store.h
@@ -208,7 +208,9 @@ static inline void *repo_read_object_file(struct repository *r,
 #endif
 
 /* Read and unpack an object file into memory, write memory to an object file */
-int oid_object_info(struct repository *r, const struct object_id *, unsigned long *);
+enum object_type oid_object_info(struct repository *r,
+				 const struct object_id *,
+				 unsigned long *);
 
 int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 		     unsigned long len, const char *type,
diff --git a/packfile.c b/packfile.c
index 8668345d93..ccfebc0567 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1269,7 +1269,7 @@ static int retry_bad_packed_offset(struct repository *r,
 				   struct packed_git *p,
 				   off_t obj_offset)
 {
-	int type;
+	enum object_type type;
 	uint32_t pos;
 	struct object_id oid;
 	if (offset_to_pack_pos(p, obj_offset, &pos) < 0)
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 05/10] object-name.c: make dependency on object_type order more obvious
  2021-04-20 13:00           ` [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
                               ` (3 preceding siblings ...)
  2021-04-20 13:00             ` [PATCH v2 04/10] object-file.c: make oid_object_info() " Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:00             ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:00             ` [PATCH v2 06/10] tag.c: use type_from_string_gently() when parsing tags Ævar Arnfjörð Bjarmason
                               ` (4 subsequent siblings)
  9 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Add an assert to make it more obvious that we were effectively
hardcoding OBJ_TAG in sort_ambiguous() as "4".

I wrote this code in 5cc044e0257 (get_short_oid: sort ambiguous
objects by type, then SHA-1, 2018-05-10), there was already a comment
about this magic, but let's make sure that someone doing a potential
reordering of "enum object_type" in the future would notice it
breaking this function (and probably a bunch of other things...).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-name.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/object-name.c b/object-name.c
index 4d7f0c66cf..b6a7328b7a 100644
--- a/object-name.c
+++ b/object-name.c
@@ -408,6 +408,8 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
 	enum object_type b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
 	enum object_type a_type_sort;
 	enum object_type b_type_sort;
+	const enum object_type tag_type_offs = OBJ_TAG - OBJ_NONE;
+	assert(tag_type_offs == 4);
 
 	/*
 	 * Sorts by hash within the same object type, just as
@@ -425,8 +427,8 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
 	 * cleverly) do that with modulus, since the enum assigns 1 to
 	 * commit, so tag becomes 0.
 	 */
-	a_type_sort = a_type % 4;
-	b_type_sort = b_type % 4;
+	a_type_sort = a_type % tag_type_offs;
+	b_type_sort = b_type % tag_type_offs;
 	return a_type_sort > b_type_sort ? 1 : -1;
 }
 
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 06/10] tag.c: use type_from_string_gently() when parsing tags
  2021-04-20 13:00           ` [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
                               ` (4 preceding siblings ...)
  2021-04-20 13:00             ` [PATCH v2 05/10] object-name.c: make dependency on object_type order more obvious Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:00             ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:00             ` [PATCH v2 07/10] hash-object: pass along type length to object.c Ævar Arnfjörð Bjarmason
                               ` (3 subsequent siblings)
  9 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change a series of strcmp() to instead use type_from_string_gently()
to get the integer type early, and then use that for comparison.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 tag.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/tag.c b/tag.c
index ed7037256e..1bd81bf1d1 100644
--- a/tag.c
+++ b/tag.c
@@ -140,7 +140,7 @@ void release_tag_memory(struct tag *t)
 int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size)
 {
 	struct object_id oid;
-	char type[20];
+	enum object_type type;
 	const char *bufptr = data;
 	const char *tail = bufptr + size;
 	const char *nl;
@@ -167,23 +167,24 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 		return -1;
 	bufptr += 5;
 	nl = memchr(bufptr, '\n', tail - bufptr);
-	if (!nl || sizeof(type) <= (nl - bufptr))
+	if (!nl)
+		return -1;
+	type = type_from_string_gently(bufptr, nl - bufptr);
+	if (type < 0)
 		return -1;
-	memcpy(type, bufptr, nl - bufptr);
-	type[nl - bufptr] = '\0';
 	bufptr = nl + 1;
 
-	if (!strcmp(type, blob_type)) {
+	if (type == OBJ_BLOB) {
 		item->tagged = (struct object *)lookup_blob(r, &oid);
-	} else if (!strcmp(type, tree_type)) {
+	} else if (type == OBJ_TREE) {
 		item->tagged = (struct object *)lookup_tree(r, &oid);
-	} else if (!strcmp(type, commit_type)) {
+	} else if (type == OBJ_COMMIT) {
 		item->tagged = (struct object *)lookup_commit(r, &oid);
-	} else if (!strcmp(type, tag_type)) {
+	} else if (type == OBJ_TAG) {
 		item->tagged = (struct object *)lookup_tag(r, &oid);
 	} else {
 		return error("unknown tag type '%s' in %s",
-			     type, oid_to_hex(&item->object.oid));
+			     type_name(type), oid_to_hex(&item->object.oid));
 	}
 
 	if (!item->tagged)
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 07/10] hash-object: pass along type length to object.c
  2021-04-20 13:00           ` [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
                               ` (5 preceding siblings ...)
  2021-04-20 13:00             ` [PATCH v2 06/10] tag.c: use type_from_string_gently() when parsing tags Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:00             ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:00             ` [PATCH v2 08/10] hash-object: refactor nested else/if/if into else if/else if Ævar Arnfjörð Bjarmason
                               ` (2 subsequent siblings)
  9 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change the functions to do with passing the type down to
hash_object_file_literally() to pass the length of the type as well as
the "const char *" type name.

The immediate motivation for this is to move hash-object.c over to
type_from_string_gently() to emit a better error message, but it will
also allow us in the future to craft an invalid object with a "\0" in
the type name.

We'd need to learn a --type-file=* option or similar (we can't of
course, pass a string with "\0" on the command-line). Right now such
an object can be manually crafted, but we can't test for it with
--literally.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/hash-object.c | 31 +++++++++++++++++--------------
 object-file.c         | 26 +++++++++++++++-----------
 object-store.h        |  4 ++--
 3 files changed, 34 insertions(+), 27 deletions(-)

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 640ef4ded5..4d3b8c49d2 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -25,14 +25,14 @@ static int hash_literally(struct object_id *oid, int fd, const char *type, unsig
 	if (strbuf_read(&buf, fd, 4096) < 0)
 		ret = -1;
 	else
-		ret = hash_object_file_literally(buf.buf, buf.len, type, oid,
-						 flags);
+		ret = hash_object_file_literally(buf.buf, buf.len, type,
+						 strlen(type), oid, flags);
 	strbuf_release(&buf);
 	return ret;
 }
 
-static void hash_fd(int fd, const char *type, const char *path, unsigned flags,
-		    int literally)
+static void hash_fd(int fd, const char *type, size_t type_len,
+		    const char *path, unsigned flags, int literally)
 {
 	struct stat st;
 	struct object_id oid;
@@ -49,31 +49,32 @@ static void hash_fd(int fd, const char *type, const char *path, unsigned flags,
 	maybe_flush_or_die(stdout, "hash to stdout");
 }
 
-static void hash_object(const char *path, const char *type, const char *vpath,
-			unsigned flags, int literally)
+static void hash_object(const char *path, const char *type, size_t type_len,
+			const char *vpath, unsigned flags, int literally)
 {
 	int fd;
 	fd = open(path, O_RDONLY);
 	if (fd < 0)
 		die_errno("Cannot open '%s'", path);
-	hash_fd(fd, type, vpath, flags, literally);
+	hash_fd(fd, type, type_len, vpath, flags, literally);
 }
 
-static void hash_stdin_paths(const char *type, int no_filters, unsigned flags,
-			     int literally)
+static void hash_stdin_paths(const char *type, size_t type_len, int no_filters,
+			     unsigned flags, int literally)
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct strbuf unquoted = STRBUF_INIT;
 
 	while (strbuf_getline(&buf, stdin) != EOF) {
+		const char *vpath;
 		if (buf.buf[0] == '"') {
 			strbuf_reset(&unquoted);
 			if (unquote_c_style(&unquoted, buf.buf, NULL))
 				die("line is badly quoted");
 			strbuf_swap(&buf, &unquoted);
 		}
-		hash_object(buf.buf, type, no_filters ? NULL : buf.buf, flags,
-			    literally);
+		vpath = no_filters ? NULL : buf.buf;
+		hash_object(buf.buf, type, type_len, vpath , flags, literally);
 	}
 	strbuf_release(&buf);
 	strbuf_release(&unquoted);
@@ -87,6 +88,7 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 		NULL
 	};
 	const char *type = blob_type;
+	size_t type_len;
 	int hashstdin = 0;
 	int stdin_paths = 0;
 	int no_filters = 0;
@@ -141,8 +143,9 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 		usage_with_options(hash_object_usage, hash_object_options);
 	}
 
+	type_len = strlen(type);
 	if (hashstdin)
-		hash_fd(0, type, vpath, flags, literally);
+		hash_fd(0, type, type_len, vpath, flags, literally);
 
 	for (i = 0 ; i < argc; i++) {
 		const char *arg = argv[i];
@@ -150,13 +153,13 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 
 		if (prefix)
 			arg = to_free = prefix_filename(prefix, arg);
-		hash_object(arg, type, no_filters ? NULL : vpath ? vpath : arg,
+		hash_object(arg, type, type_len, no_filters ? NULL : vpath ? vpath : arg,
 			    flags, literally);
 		free(to_free);
 	}
 
 	if (stdin_paths)
-		hash_stdin_paths(type, no_filters, flags, literally);
+		hash_stdin_paths(type, type_len, no_filters, flags, literally);
 
 	return 0;
 }
diff --git a/object-file.c b/object-file.c
index 398f2b60f9..b27ed57e0b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1715,13 +1715,15 @@ void *read_object_with_reference(struct repository *r,
 
 static void write_object_file_prepare(const struct git_hash_algo *algo,
 				      const void *buf, unsigned long len,
-				      const char *type, struct object_id *oid,
-				      char *hdr, int *hdrlen)
+				      const char *type, size_t type_len,
+				      struct object_id *oid, char *hdr,
+				      int *hdrlen)
 {
 	git_hash_ctx c;
 
 	/* Generate the header */
-	*hdrlen = xsnprintf(hdr, *hdrlen, "%s %"PRIuMAX , type, (uintmax_t)len)+1;
+	*hdrlen = xsnprintf(hdr, *hdrlen, "%.*s %"PRIuMAX,
+			    (int)type_len, type, (uintmax_t)len) + 1;
 
 	/* Sha1.. */
 	algo->init_fn(&c);
@@ -1786,7 +1788,8 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 {
 	char hdr[MAX_HEADER_LEN];
 	int hdrlen = sizeof(hdr);
-	write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
+	write_object_file_prepare(algo, buf, len, type, strlen(type), oid, hdr,
+				  &hdrlen);
 	return 0;
 }
 
@@ -1940,29 +1943,30 @@ int write_object_file(const void *buf, unsigned long len, const char *type,
 {
 	char hdr[MAX_HEADER_LEN];
 	int hdrlen = sizeof(hdr);
+	size_t type_len = strlen(type);
 
 	/* Normally if we have it in the pack then we do not bother writing
 	 * it out into .git/objects/??/?{38} file.
 	 */
-	write_object_file_prepare(the_hash_algo, buf, len, type, oid, hdr,
-				  &hdrlen);
+	write_object_file_prepare(the_hash_algo, buf, len, type, type_len, oid,
+				  hdr, &hdrlen);
 	if (freshen_packed_object(oid) || freshen_loose_object(oid))
 		return 0;
 	return write_loose_object(oid, hdr, hdrlen, buf, len, 0);
 }
 
 int hash_object_file_literally(const void *buf, unsigned long len,
-			       const char *type, struct object_id *oid,
-			       unsigned flags)
+			       const char *type, size_t type_len,
+			       struct object_id *oid, unsigned flags)
 {
 	char *header;
 	int hdrlen, status = 0;
 
 	/* type string, SP, %lu of the length plus NUL must fit this */
-	hdrlen = strlen(type) + MAX_HEADER_LEN;
+	hdrlen = type_len + MAX_HEADER_LEN;
 	header = xmalloc(hdrlen);
-	write_object_file_prepare(the_hash_algo, buf, len, type, oid, header,
-				  &hdrlen);
+	write_object_file_prepare(the_hash_algo, buf, len, type, type_len, oid,
+				  header, &hdrlen);
 
 	if (!(flags & HASH_WRITE_OBJECT))
 		goto cleanup;
diff --git a/object-store.h b/object-store.h
index eab9674d08..8f043a6069 100644
--- a/object-store.h
+++ b/object-store.h
@@ -220,8 +220,8 @@ int write_object_file(const void *buf, unsigned long len,
 		      const char *type, struct object_id *oid);
 
 int hash_object_file_literally(const void *buf, unsigned long len,
-			       const char *type, struct object_id *oid,
-			       unsigned flags);
+			       const char *type, size_t type_len,
+			       struct object_id *oid, unsigned flags);
 
 /*
  * Add an object file to the in-memory object store, without writing it
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 08/10] hash-object: refactor nested else/if/if into else if/else if
  2021-04-20 13:00           ` [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
                               ` (6 preceding siblings ...)
  2021-04-20 13:00             ` [PATCH v2 07/10] hash-object: pass along type length to object.c Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:00             ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:00             ` [PATCH v2 09/10] hash-object: show usage on invalid --type Ævar Arnfjörð Bjarmason
  2021-04-20 13:00             ` [PATCH v2 10/10] object.c: move type_from_string() code to its last user Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Refactor code that was changed to this form in 4a3d85dcf67 (add
--no-filters option to git hash-object, 2008-08-03), seemingly for no
good reason.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/hash-object.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 4d3b8c49d2..4b337e0d25 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -130,12 +130,10 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 			errstr = "Can't specify files with --stdin-paths";
 		else if (vpath)
 			errstr = "Can't use --stdin-paths with --path";
-	}
-	else {
-		if (hashstdin > 1)
-			errstr = "Multiple --stdin arguments are not supported";
-		if (vpath && no_filters)
-			errstr = "Can't use --path with --no-filters";
+	} else if (hashstdin > 1) {
+		errstr = "Multiple --stdin arguments are not supported";
+	} else if (vpath && no_filters) {
+		errstr = "Can't use --path with --no-filters";
 	}
 
 	if (errstr) {
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 09/10] hash-object: show usage on invalid --type
  2021-04-20 13:00           ` [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
                               ` (7 preceding siblings ...)
  2021-04-20 13:00             ` [PATCH v2 08/10] hash-object: refactor nested else/if/if into else if/else if Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:00             ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:00             ` [PATCH v2 10/10] object.c: move type_from_string() code to its last user Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change the error displayed on "hash-object -t bogus" (without
--literally) to show the usage_with_options(), like we do for the
other usage errors.

As noted in a preceding commit it makes sense to pass the "len" down
to the object.c code, so now that we're using
type_from_string_gently() let's do that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/hash-object.c  | 44 ++++++++++++++++++++++++++++--------------
 t/t1007-hash-object.sh | 10 ++++++----
 2 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 4b337e0d25..705521bc7f 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -31,8 +31,9 @@ static int hash_literally(struct object_id *oid, int fd, const char *type, unsig
 	return ret;
 }
 
-static void hash_fd(int fd, const char *type, size_t type_len,
-		    const char *path, unsigned flags, int literally)
+static void hash_fd(int fd, enum object_type otype, const char *type,
+		    size_t type_len, const char *path, unsigned flags,
+		    int literally)
 {
 	struct stat st;
 	struct object_id oid;
@@ -40,8 +41,8 @@ static void hash_fd(int fd, const char *type, size_t type_len,
 	if (fstat(fd, &st) < 0 ||
 	    (literally
 	     ? hash_literally(&oid, fd, type, flags)
-	     : index_fd(the_repository->index, &oid, fd, &st,
-			type_from_string(type), path, flags)))
+	     : index_fd(the_repository->index, &oid, fd, &st, otype, path,
+			flags)))
 		die((flags & HASH_WRITE_OBJECT)
 		    ? "Unable to add %s to database"
 		    : "Unable to hash %s", path);
@@ -49,17 +50,19 @@ static void hash_fd(int fd, const char *type, size_t type_len,
 	maybe_flush_or_die(stdout, "hash to stdout");
 }
 
-static void hash_object(const char *path, const char *type, size_t type_len,
+static void hash_object(const char *path, enum object_type otype,
+			const char *type, size_t type_len,
 			const char *vpath, unsigned flags, int literally)
 {
 	int fd;
 	fd = open(path, O_RDONLY);
 	if (fd < 0)
 		die_errno("Cannot open '%s'", path);
-	hash_fd(fd, type, type_len, vpath, flags, literally);
+	hash_fd(fd, otype, type, type_len, vpath, flags, literally);
 }
 
-static void hash_stdin_paths(const char *type, size_t type_len, int no_filters,
+static void hash_stdin_paths(enum object_type otype, const char *type,
+			     size_t type_len, int no_filters,
 			     unsigned flags, int literally)
 {
 	struct strbuf buf = STRBUF_INIT;
@@ -74,7 +77,7 @@ static void hash_stdin_paths(const char *type, size_t type_len, int no_filters,
 			strbuf_swap(&buf, &unquoted);
 		}
 		vpath = no_filters ? NULL : buf.buf;
-		hash_object(buf.buf, type, type_len, vpath , flags, literally);
+		hash_object(buf.buf, otype, type, type_len, vpath , flags, literally);
 	}
 	strbuf_release(&buf);
 	strbuf_release(&unquoted);
@@ -89,6 +92,7 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	};
 	const char *type = blob_type;
 	size_t type_len;
+	enum object_type otype = OBJ_BAD;
 	int hashstdin = 0;
 	int stdin_paths = 0;
 	int no_filters = 0;
@@ -109,6 +113,7 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	};
 	int i;
 	const char *errstr = NULL;
+	int errstr_arg_type = 0;
 
 	argc = parse_options(argc, argv, prefix, hash_object_options,
 			     hash_object_usage, 0);
@@ -123,7 +128,12 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 
 	git_config(git_default_config, NULL);
 
-	if (stdin_paths) {
+	type_len = strlen(type);
+	otype = type_from_string_gently(type, type_len);
+	if (otype < 0 && !literally) {
+		errstr = "the object type \"%.*s\" is invalid, did you mean to use --literally?";
+		errstr_arg_type = 1;
+	} else if (stdin_paths) {
 		if (hashstdin)
 			errstr = "Can't use --stdin-paths with --stdin";
 		else if (argc)
@@ -137,27 +147,31 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	}
 
 	if (errstr) {
-		error("%s", errstr);
+		if (errstr_arg_type)
+			error(errstr, (int)type_len, type);
+		else
+			error("%s", errstr);
 		usage_with_options(hash_object_usage, hash_object_options);
 	}
 
-	type_len = strlen(type);
 	if (hashstdin)
-		hash_fd(0, type, type_len, vpath, flags, literally);
+		hash_fd(0, otype, type, type_len, vpath, flags, literally);
 
 	for (i = 0 ; i < argc; i++) {
 		const char *arg = argv[i];
 		char *to_free = NULL;
+		const char *tmp;
 
 		if (prefix)
 			arg = to_free = prefix_filename(prefix, arg);
-		hash_object(arg, type, type_len, no_filters ? NULL : vpath ? vpath : arg,
-			    flags, literally);
+		tmp = no_filters ? NULL : vpath ? vpath : arg;
+		hash_object(arg, otype, type, type_len, tmp, flags, literally);
 		free(to_free);
 	}
 
 	if (stdin_paths)
-		hash_stdin_paths(type, type_len, no_filters, flags, literally);
+		hash_stdin_paths(otype, type, type_len, no_filters, flags,
+				 literally);
 
 	return 0;
 }
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 74486f6f1a..cb1517bd90 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -230,17 +230,19 @@ test_expect_success 'corrupt tag' '
 '
 
 test_expect_success 'hash-object complains about bogus type name' '
-	test_must_fail git hash-object -t bogus --stdin 2>actual </dev/null &&
+	test_expect_code 129 git hash-object -t bogus --stdin 2>err </dev/null &&
+	grep ^error err >actual &&
 	cat >expect <<-\EOF &&
-	fatal: invalid object type "bogus"
+	error: the object type "bogus" is invalid, did you mean to use --literally?
 	EOF
 	test_cmp expect actual
 '
 
 test_expect_success 'hash-object complains about truncated type name' '
-	test_must_fail git hash-object -t bl --stdin 2>actual </dev/null &&
+	test_expect_code 129 git hash-object -t bl --stdin 2>err </dev/null &&
+	grep ^error err >actual &&
 	cat >expect <<-\EOF &&
-	fatal: invalid object type "bl"
+	error: the object type "bl" is invalid, did you mean to use --literally?
 	EOF
 	test_cmp expect actual
 '
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 10/10] object.c: move type_from_string() code to its last user
  2021-04-20 13:00           ` [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
                               ` (8 preceding siblings ...)
  2021-04-20 13:00             ` [PATCH v2 09/10] hash-object: show usage on invalid --type Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:00             ` Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

In commits leading up to this one various errors have been improved
and bugs fixed by moving various callers to
type_from_string_gently(). Now that there's no caller left of
type_from_string() except cat-file.c, let's move this function over to
it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/cat-file.c | 4 +++-
 object.c           | 9 ---------
 object.h           | 1 -
 3 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 46fc7a32ba..20c60045f6 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -154,7 +154,9 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		break;
 
 	case 0:
-		exp_type_id = type_from_string(exp_type);
+		exp_type_id = type_from_string_gently(exp_type, strlen(exp_type));
+		if (exp_type_id < 0)
+			die(_("invalid object type \"%s\""), exp_type);
 		if (exp_type_id == OBJ_BLOB) {
 			struct object_id blob_oid;
 			if (oid_object_info(the_repository, &oid, NULL) == OBJ_TAG) {
diff --git a/object.c b/object.c
index 8f3ddfc8f4..3c962da6c9 100644
--- a/object.c
+++ b/object.c
@@ -49,15 +49,6 @@ enum object_type type_from_string_gently(const char *str, size_t len)
 	return -1;
 }
 
-enum object_type type_from_string(const char *str)
-{
-	size_t len = strlen(str);
-	enum object_type ret = type_from_string_gently(str, len);
-	if (ret < 0)
-		die(_("invalid object type \"%s\""), str);
-	return ret;
-}
-
 /*
  * Return a numerical hash value between 0 and n-1 for the object with
  * the specified sha1.  n must be a power of 2.  Please note that the
diff --git a/object.h b/object.h
index a4eca10d72..85e7491815 100644
--- a/object.h
+++ b/object.h
@@ -94,7 +94,6 @@ struct object {
 
 const char *type_name(unsigned int type);
 enum object_type type_from_string_gently(const char *str, size_t len);
-enum object_type type_from_string(const char *str);
 
 /*
  * Return the current number of buckets in the object hashmap.
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 0/8] object.c: add and use "is expected" utility function + object_as_type() use
  2021-04-09  8:49           ` [PATCH 0/7] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
                               ` (6 preceding siblings ...)
  2021-04-09  8:50             ` [PATCH 7/7] object.c: remove "quiet" parameter from object_as_type() Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:36             ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:36               ` [PATCH v2 1/8] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
                                 ` (7 more replies)
  7 siblings, 8 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

This relatively simple series builds on [1] and [2] to intorduce a
utility function for the "expected type X, got Y" error messages. See
[3] for the v1 of this series.

There was an embarrassing error in v1 where I left in some WIP
assertion code, and we'd print a nonsensical error message in the
lookup_commit_reference_gently() codepath.

That's now fixed gone, and there's a new test to assert the exact
output we get from a failure in
lookup_commit_reference_gently(). Aside from not reading my own
patches carefully enough before submission, I didn't catch that case
because nothing tested for the output.

1. https://lore.kernel.org/git/cover-00.10-0000000000-20210420T124428Z-avarab@gmail.com/
2. https://lore.kernel.org/git/cover-00.10-0000000000-20210420T125415Z-avarab@gmail.com/
3. https://lore.kernel.org/git/cover-0.7-0000000000-20210409T083436Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (8):
  tree.c: fix misindentation in parse_tree_gently()
  object.c: add a utility function for "expected type X, got Y"
  object.c: add and use oid_is_type_or_die_msg() function
  commit-graph: use obj->type, not object_as_type()
  branch tests: assert lookup_commit_reference_gently() error
  commit.c: don't use deref_tag() -> object_as_type()
  object.c: normalize brace style in object_as_type()
  object.c: remove "quiet" parameter from object_as_type()

 blob.c                     |  2 +-
 builtin/commit-graph.c     |  2 +-
 builtin/fsck.c             |  2 +-
 builtin/index-pack.c       |  9 +++----
 combine-diff.c             |  3 +--
 commit.c                   | 29 ++++++++++++++--------
 merge-recursive.c          |  5 +++-
 object.c                   | 51 +++++++++++++++++++++++++++++++-------
 object.h                   | 10 +++++++-
 refs.c                     |  2 +-
 t/helper/test-reach.c      |  2 +-
 t/t3201-branch-contains.sh |  8 +++++-
 tag.c                      |  2 +-
 tree.c                     | 15 +++++------
 14 files changed, 98 insertions(+), 44 deletions(-)

Range-diff against v1:
1:  4bf9a4f7a1 = 1:  c39b235035 tree.c: fix misindentation in parse_tree_gently()
2:  0be843e838 = 2:  1b472fcd85 object.c: add a utility function for "expected type X, got Y"
3:  fb2e4feb3d = 3:  22e7d9a3db object.c: add and use oid_is_type_or_die_msg() function
4:  995135c814 = 4:  8e43d44911 commit-graph: use obj->type, not object_as_type()
-:  ---------- > 5:  8982c42127 branch tests: assert lookup_commit_reference_gently() error
5:  754d5ae267 ! 6:  f337a5442d commit.c: don't use deref_tag() -> object_as_type()
    @@ Commit message
         of OBJ_{COMMIT,TREE,BLOB,TAG} here, not the bare-bones initialization
         object_as_type() might be called on to do.
     
    +    Even though we can read deref_tag() and see that it won't return
    +    OBJ_NONE and friends, let's add a BUG() assertion here to help future
    +    maintenance.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## commit.c ##
    @@ commit.c: const char *commit_type = "commit";
      		return NULL;
     -	return object_as_type(obj, OBJ_COMMIT, quiet);
     +
    ++	if (obj->type <= 0)
    ++		BUG("should have initialized obj->type = OBJ_{COMMIT,TREE,BLOB,TAG} from deref_tag()");
     +	if (obj->type != OBJ_COMMIT) {
    -+		enum object_type want = OBJ_COMMIT;
    -+		if (!quiet)
    -+			oid_is_type_or_error(oid, OBJ_COMMIT, &want);
    ++		if (!quiet) {
    ++			enum object_type have = obj->type;
    ++			oid_is_type_or_error(oid, OBJ_COMMIT, &have);
    ++		}
     +		return NULL;
     +	}
     +	return (struct commit *)obj;
6:  e414cfe40c = 7:  893b178573 object.c: normalize brace style in object_as_type()
7:  64360ac260 ! 8:  a47d23f1b1 object.c: remove "quiet" parameter from object_as_type()
    @@ blob.c
     @@ blob.c: struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
      	struct object *obj = lookup_object(r, oid);
      	if (!obj)
    - 		return create_object(r, oid, alloc_blob_node(r));
    + 		return create_blob(r, oid);
     -	return object_as_type(obj, OBJ_BLOB, 0);
     +	return object_as_type(obj, OBJ_BLOB);
      }
    - 
    - int parse_blob_buffer(struct blob *item)
     
      ## builtin/fsck.c ##
     @@ builtin/fsck.c: static void mark_unreachable_referents(const struct object_id *oid)
    @@ builtin/fsck.c: static void mark_unreachable_referents(const struct object_id *o
      	options.walk = mark_used;
     
      ## commit.c ##
    -@@ commit.c: struct commit *lookup_commit_reference_gently(struct repository *r,
    - 		return NULL;
    - 
    - 	if (obj->type != OBJ_COMMIT) {
    --		enum object_type want = OBJ_COMMIT;
    -+		if (obj->type <= 0)
    -+			BUG("noes");
    - 		if (!quiet)
    --			oid_is_type_or_error(oid, OBJ_COMMIT, &want);
    -+			fprintf(stderr, "noes ohes");/*
    -+			oid_is_type_or_error(oid, OBJ_COMMIT, &obj->type);*/
    - 		return NULL;
    - 	}
    - 	return (struct commit *)obj;
     @@ commit.c: struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
      	struct object *obj = lookup_object(r, oid);
      	if (!obj)
    - 		return create_object(r, oid, alloc_commit_node(r));
    + 		return create_commit(r, oid);
     -	return object_as_type(obj, OBJ_COMMIT, 0);
     +	return object_as_type(obj, OBJ_COMMIT);
      }
    @@ tag.c
     @@ tag.c: struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
      	struct object *obj = lookup_object(r, oid);
      	if (!obj)
    - 		return create_object(r, oid, alloc_tag_node(r));
    + 		return create_tag(r, oid);
     -	return object_as_type(obj, OBJ_TAG, 0);
     +	return object_as_type(obj, OBJ_TAG);
      }
    @@ tree.c
     @@ tree.c: struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
      	struct object *obj = lookup_object(r, oid);
      	if (!obj)
    - 		return create_object(r, oid, alloc_tree_node(r));
    + 		return create_tree(r, oid);
     -	return object_as_type(obj, OBJ_TREE, 0);
     +	return object_as_type(obj, OBJ_TREE);
      }
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 1/8] tree.c: fix misindentation in parse_tree_gently()
  2021-04-20 13:36             ` [PATCH v2 0/8] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:36               ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:36               ` [PATCH v2 2/8] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
                                 ` (6 subsequent siblings)
  7 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

The variables declared in parse_tree_gently() had a single space after
the TAB. This dates back to their introduction in bd2c39f58f9 ([PATCH]
don't load and decompress objects twice with parse_object(),
2005-05-06). Let's fix them to follow the style of the rest of the
file.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 tree.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tree.c b/tree.c
index 00958c581e..e9d2bd7ffd 100644
--- a/tree.c
+++ b/tree.c
@@ -128,9 +128,9 @@ int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
 
 int parse_tree_gently(struct tree *item, int quiet_on_missing)
 {
-	 enum object_type type;
-	 void *buffer;
-	 unsigned long size;
+	enum object_type type;
+	void *buffer;
+	unsigned long size;
 
 	if (item->object.parsed)
 		return 0;
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 2/8] object.c: add a utility function for "expected type X, got Y"
  2021-04-20 13:36             ` [PATCH v2 0/8] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
  2021-04-20 13:36               ` [PATCH v2 1/8] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:36               ` Ævar Arnfjörð Bjarmason
  2021-04-21 22:02                 ` Jonathan Tan
  2021-04-20 13:36               ` [PATCH v2 3/8] object.c: add and use oid_is_type_or_die_msg() function Ævar Arnfjörð Bjarmason
                                 ` (5 subsequent siblings)
  7 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Refactor various "Object X is not Y" error messages so that they use
the same message as the long-standing object_as_type() error
message. Now we'll consistently report e.g. that we got a commit when
we expected a tag, not just that the object is not a tag.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/index-pack.c |  9 +++------
 combine-diff.c       |  3 +--
 commit.c             | 10 ++++------
 merge-recursive.c    |  1 +
 object.c             | 25 ++++++++++++++++++++++++-
 object.h             |  5 +++++
 tree.c               |  7 ++++---
 7 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index c0e3768c32..eabd9d4677 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -218,8 +218,8 @@ static int mark_link(struct object *obj, enum object_type type,
 	if (!obj)
 		return -1;
 
-	if (type != OBJ_ANY && obj->type != type)
-		die(_("object type mismatch at %s"), oid_to_hex(&obj->oid));
+	if (type != OBJ_ANY)
+		oid_is_type_or_die(&obj->oid, obj->type, &type);
 
 	obj->flags |= FLAG_LINK;
 	return 0;
@@ -241,10 +241,7 @@ static unsigned check_object(struct object *obj)
 		if (type <= 0)
 			die(_("did not receive expected object %s"),
 			      oid_to_hex(&obj->oid));
-		if (type != obj->type)
-			die(_("object %s: expected type %s, found %s"),
-			    oid_to_hex(&obj->oid),
-			    type_name(obj->type), type_name(type));
+		oid_is_type_or_die(&obj->oid, obj->type, &type);
 		obj->flags |= FLAG_CHECKED;
 		return 1;
 	}
diff --git a/combine-diff.c b/combine-diff.c
index 06635f91bc..aa767dbb8e 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -333,8 +333,7 @@ static char *grab_blob(struct repository *r,
 		free_filespec(df);
 	} else {
 		blob = read_object_file(oid, &type, size);
-		if (type != OBJ_BLOB)
-			die("object '%s' is not a blob!", oid_to_hex(oid));
+		oid_is_type_or_die(oid, OBJ_BLOB, &type);
 	}
 	return blob;
 }
diff --git a/commit.c b/commit.c
index 3580c62b92..3d7f1fba0c 100644
--- a/commit.c
+++ b/commit.c
@@ -304,9 +304,7 @@ const void *repo_get_commit_buffer(struct repository *r,
 		if (!ret)
 			die("cannot read commit object %s",
 			    oid_to_hex(&commit->object.oid));
-		if (type != OBJ_COMMIT)
-			die("expected commit for %s, got %s",
-			    oid_to_hex(&commit->object.oid), type_name(type));
+		oid_is_type_or_die(&commit->object.oid, OBJ_COMMIT, &type);
 		if (sizep)
 			*sizep = size;
 	}
@@ -494,10 +492,10 @@ int repo_parse_commit_internal(struct repository *r,
 		return quiet_on_missing ? -1 :
 			error("Could not read %s",
 			     oid_to_hex(&item->object.oid));
-	if (type != OBJ_COMMIT) {
+	ret = oid_is_type_or_error(&item->object.oid, OBJ_COMMIT, &type);
+	if (ret) {
 		free(buffer);
-		return error("Object %s not a commit",
-			     oid_to_hex(&item->object.oid));
+		return ret;
 	}
 
 	ret = parse_commit_buffer(r, item, buffer, size, 0);
diff --git a/merge-recursive.c b/merge-recursive.c
index 7618303f7b..b952106203 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2999,6 +2999,7 @@ static int read_oid_strbuf(struct merge_options *opt,
 	if (!buf)
 		return err(opt, _("cannot read object %s"), oid_to_hex(oid));
 	if (type != OBJ_BLOB) {
+		const char* msg = oid_is_type_or_die_msg(oid, OBJ_BLOB, &type);
 		free(buf);
 		return err(opt, _("object %s is not a blob"), oid_to_hex(oid));
 	}
diff --git a/object.c b/object.c
index 3c962da6c9..9e06c0ee92 100644
--- a/object.c
+++ b/object.c
@@ -153,6 +153,29 @@ void *create_object(struct repository *r, const struct object_id *oid, void *o)
 	return obj;
 }
 
+static const char *object_type_mismatch_msg = N_("object %s is a %s, not a %s");
+
+void oid_is_type_or_die(const struct object_id *oid,
+			enum object_type want,
+			enum object_type *type)
+{
+	if (want == *type)
+		return;
+	die(_(object_type_mismatch_msg), oid_to_hex(oid),
+	    type_name(*type), type_name(want));
+}
+
+int oid_is_type_or_error(const struct object_id *oid,
+			 enum object_type want,
+			 enum object_type *type)
+{
+	if (want == *type)
+		return 0;
+	return error(_(object_type_mismatch_msg),
+		     oid_to_hex(oid), type_name(*type),
+		     type_name(want));
+}
+
 void *object_as_type(struct object *obj, enum object_type type, int quiet)
 {
 	if (obj->type == type)
@@ -166,7 +189,7 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 	}
 	else {
 		if (!quiet)
-			error(_("object %s is a %s, not a %s"),
+			error(_(object_type_mismatch_msg),
 			      oid_to_hex(&obj->oid),
 			      type_name(obj->type), type_name(type));
 		return NULL;
diff --git a/object.h b/object.h
index 85e7491815..f8609a8518 100644
--- a/object.h
+++ b/object.h
@@ -123,6 +123,11 @@ void *create_object(struct repository *r, const struct object_id *oid, void *obj
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
+void oid_is_type_or_die(const struct object_id *oid, enum object_type want,
+			enum object_type *type);
+int oid_is_type_or_error(const struct object_id *oid, enum object_type want,
+			 enum object_type *type);
+
 /*
  * Returns the object, having parsed it to find out what it is.
  *
diff --git a/tree.c b/tree.c
index e9d2bd7ffd..f1c6e8f647 100644
--- a/tree.c
+++ b/tree.c
@@ -131,6 +131,7 @@ int parse_tree_gently(struct tree *item, int quiet_on_missing)
 	enum object_type type;
 	void *buffer;
 	unsigned long size;
+	int ret;
 
 	if (item->object.parsed)
 		return 0;
@@ -139,10 +140,10 @@ int parse_tree_gently(struct tree *item, int quiet_on_missing)
 		return quiet_on_missing ? -1 :
 			error("Could not read %s",
 			     oid_to_hex(&item->object.oid));
-	if (type != OBJ_TREE) {
+	ret = oid_is_type_or_error(&item->object.oid, OBJ_TREE, &type);
+	if (ret) {
 		free(buffer);
-		return error("Object %s not a tree",
-			     oid_to_hex(&item->object.oid));
+		return ret;
 	}
 	return parse_tree_buffer(item, buffer, size);
 }
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 3/8] object.c: add and use oid_is_type_or_die_msg() function
  2021-04-20 13:36             ` [PATCH v2 0/8] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
  2021-04-20 13:36               ` [PATCH v2 1/8] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
  2021-04-20 13:36               ` [PATCH v2 2/8] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:36               ` Ævar Arnfjörð Bjarmason
  2021-04-21 22:07                 ` Jonathan Tan
  2021-04-21 23:28                 ` Josh Steadmon
  2021-04-20 13:36               ` [PATCH v2 4/8] commit-graph: use obj->type, not object_as_type() Ævar Arnfjörð Bjarmason
                                 ` (4 subsequent siblings)
  7 siblings, 2 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Add a oid_is_type_or_die_msg() function to go with the "error" and
"die" forms for emitting "expected type X, got Y" messages. This is
useful for callers that want the message itself as a char *.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 merge-recursive.c |  6 ++++--
 object.c          | 12 ++++++++++++
 object.h          |  3 +++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index b952106203..c74239544f 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2999,9 +2999,11 @@ static int read_oid_strbuf(struct merge_options *opt,
 	if (!buf)
 		return err(opt, _("cannot read object %s"), oid_to_hex(oid));
 	if (type != OBJ_BLOB) {
-		const char* msg = oid_is_type_or_die_msg(oid, OBJ_BLOB, &type);
+		char *msg = oid_is_type_or_die_msg(oid, OBJ_BLOB, &type);
+		int ret = err(opt, msg);
 		free(buf);
-		return err(opt, _("object %s is not a blob"), oid_to_hex(oid));
+		free(msg);
+		return ret;
 	}
 	strbuf_attach(dst, buf, size, size + 1);
 	return 0;
diff --git a/object.c b/object.c
index 9e06c0ee92..0f07f976fb 100644
--- a/object.c
+++ b/object.c
@@ -176,6 +176,18 @@ int oid_is_type_or_error(const struct object_id *oid,
 		     type_name(want));
 }
 
+char* oid_is_type_or_die_msg(const struct object_id *oid,
+				   enum object_type want,
+				   enum object_type *type)
+{
+	struct strbuf sb = STRBUF_INIT;
+	if (want == *type)
+		BUG("call this just to get the message!");
+	strbuf_addf(&sb, _(object_type_mismatch_msg), oid_to_hex(oid),
+		    type_name(*type), type_name(want));
+	return strbuf_detach(&sb, NULL);
+}
+
 void *object_as_type(struct object *obj, enum object_type type, int quiet)
 {
 	if (obj->type == type)
diff --git a/object.h b/object.h
index f8609a8518..7ae6407598 100644
--- a/object.h
+++ b/object.h
@@ -127,6 +127,9 @@ void oid_is_type_or_die(const struct object_id *oid, enum object_type want,
 			enum object_type *type);
 int oid_is_type_or_error(const struct object_id *oid, enum object_type want,
 			 enum object_type *type);
+char* oid_is_type_or_die_msg(const struct object_id *oid,
+			     enum object_type want,
+			     enum object_type *type);
 
 /*
  * Returns the object, having parsed it to find out what it is.
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 4/8] commit-graph: use obj->type, not object_as_type()
  2021-04-20 13:36             ` [PATCH v2 0/8] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
                                 ` (2 preceding siblings ...)
  2021-04-20 13:36               ` [PATCH v2 3/8] object.c: add and use oid_is_type_or_die_msg() function Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:36               ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:36               ` [PATCH v2 5/8] branch tests: assert lookup_commit_reference_gently() error Ævar Arnfjörð Bjarmason
                                 ` (3 subsequent siblings)
  7 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change a check of a deref_tag() return value to just use obj->type
instead of object_as_type(). The object_as_type() function is for
low-level use by fsck, {commit,tree,blob,tag}.c and the like, here we
can just assume the object is fully initialized.

As can be seen in plenty of existing uses in our codebase the return
value of deref_tag() won't be an obj->type == OBJ_NONE or
!obj->parsed. Fixes code added in 2f00c355cb7 (commit-graph: drop
COMMIT_GRAPH_WRITE_CHECK_OIDS flag, 2020-05-13).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index cd86315221..347d65abc8 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -158,7 +158,7 @@ static int read_one_commit(struct oidset *commits, struct progress *progress,
 			   NULL, 0);
 	if (!result)
 		return error(_("invalid object: %s"), hash);
-	else if (object_as_type(result, OBJ_COMMIT, 1))
+	else if (result->type == OBJ_COMMIT)
 		oidset_insert(commits, &result->oid);
 
 	display_progress(progress, oidset_size(commits));
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 5/8] branch tests: assert lookup_commit_reference_gently() error
  2021-04-20 13:36             ` [PATCH v2 0/8] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
                                 ` (3 preceding siblings ...)
  2021-04-20 13:36               ` [PATCH v2 4/8] commit-graph: use obj->type, not object_as_type() Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:36               ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:36               ` [PATCH v2 6/8] commit.c: don't use deref_tag() -> object_as_type() Ævar Arnfjörð Bjarmason
                                 ` (2 subsequent siblings)
  7 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Assert the exact error returned by lookup_commit_reference_gently()
and "branch' itself in the non-quiet mode invoked by branch.c (via
parse_opt_commits()). This will be used to assert a subsequent change
that changes the lookup_commit_reference_gently() code.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t3201-branch-contains.sh | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/t/t3201-branch-contains.sh b/t/t3201-branch-contains.sh
index 349a810cee..5bd4b05b6e 100755
--- a/t/t3201-branch-contains.sh
+++ b/t/t3201-branch-contains.sh
@@ -166,7 +166,13 @@ test_expect_success 'implicit --list conflicts with modification options' '
 '
 
 test_expect_success 'Assert that --contains only works on commits, not trees & blobs' '
-	test_must_fail git branch --contains main^{tree} &&
+	tree=$(git rev-parse main^{tree}) &&
+	test_must_fail git branch --contains main^{tree} 2>actual &&
+	cat >expect <<-EOF &&
+	error: object $tree is a tree, not a commit
+	error: no such commit main^{tree}
+	EOF
+	test_cmp expect actual &&
 	blob=$(git hash-object -w --stdin <<-\EOF
 	Some blob
 	EOF
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 6/8] commit.c: don't use deref_tag() -> object_as_type()
  2021-04-20 13:36             ` [PATCH v2 0/8] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
                                 ` (4 preceding siblings ...)
  2021-04-20 13:36               ` [PATCH v2 5/8] branch tests: assert lookup_commit_reference_gently() error Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:36               ` Ævar Arnfjörð Bjarmason
  2021-04-21 22:26                 ` Jonathan Tan
  2021-04-20 13:36               ` [PATCH v2 7/8] object.c: normalize brace style in object_as_type() Ævar Arnfjörð Bjarmason
  2021-04-20 13:37               ` [PATCH v2 8/8] object.c: remove "quiet" parameter from object_as_type() Ævar Arnfjörð Bjarmason
  7 siblings, 1 reply; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Change a use of the object_as_type() function introduced in
8ff226a9d5e (add object_as_type helper for casting objects,
2014-07-13) to instead assume that we're not dealing with OBJ_NONE (or
OBJ_BAD) from deref_tag().

This makes this code easier to read, as the reader isn't wondering why
the function would need to deal with that. We're simply doing a check
of OBJ_{COMMIT,TREE,BLOB,TAG} here, not the bare-bones initialization
object_as_type() might be called on to do.

Even though we can read deref_tag() and see that it won't return
OBJ_NONE and friends, let's add a BUG() assertion here to help future
maintenance.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index 3d7f1fba0c..c3bc6cbec4 100644
--- a/commit.c
+++ b/commit.c
@@ -31,13 +31,22 @@ const char *commit_type = "commit";
 struct commit *lookup_commit_reference_gently(struct repository *r,
 		const struct object_id *oid, int quiet)
 {
-	struct object *obj = deref_tag(r,
-				       parse_object(r, oid),
-				       NULL, 0);
+	struct object *tmp = parse_object(r, oid);
+	struct object *obj = deref_tag(r, tmp, NULL, 0);
 
 	if (!obj)
 		return NULL;
-	return object_as_type(obj, OBJ_COMMIT, quiet);
+
+	if (obj->type <= 0)
+		BUG("should have initialized obj->type = OBJ_{COMMIT,TREE,BLOB,TAG} from deref_tag()");
+	if (obj->type != OBJ_COMMIT) {
+		if (!quiet) {
+			enum object_type have = obj->type;
+			oid_is_type_or_error(oid, OBJ_COMMIT, &have);
+		}
+		return NULL;
+	}
+	return (struct commit *)obj;
 }
 
 struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid)
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 7/8] object.c: normalize brace style in object_as_type()
  2021-04-20 13:36             ` [PATCH v2 0/8] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
                                 ` (5 preceding siblings ...)
  2021-04-20 13:36               ` [PATCH v2 6/8] commit.c: don't use deref_tag() -> object_as_type() Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:36               ` Ævar Arnfjörð Bjarmason
  2021-04-20 13:37               ` [PATCH v2 8/8] object.c: remove "quiet" parameter from object_as_type() Ævar Arnfjörð Bjarmason
  7 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Normalize the brace style in this function introduced in
8ff226a9d5e (add object_as_type helper for casting objects,
2014-07-13) to be in line with the coding style of the project.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/object.c b/object.c
index 0f07f976fb..f694db7e87 100644
--- a/object.c
+++ b/object.c
@@ -190,16 +190,15 @@ char* oid_is_type_or_die_msg(const struct object_id *oid,
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet)
 {
-	if (obj->type == type)
+	if (obj->type == type) {
 		return obj;
-	else if (obj->type == OBJ_NONE) {
+	} else if (obj->type == OBJ_NONE) {
 		if (type == OBJ_COMMIT)
 			init_commit_node((struct commit *) obj);
 		else
 			obj->type = type;
 		return obj;
-	}
-	else {
+	} else {
 		if (!quiet)
 			error(_(object_type_mismatch_msg),
 			      oid_to_hex(&obj->oid),
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH v2 8/8] object.c: remove "quiet" parameter from object_as_type()
  2021-04-20 13:36             ` [PATCH v2 0/8] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
                                 ` (6 preceding siblings ...)
  2021-04-20 13:36               ` [PATCH v2 7/8] object.c: normalize brace style in object_as_type() Ævar Arnfjörð Bjarmason
@ 2021-04-20 13:37               ` Ævar Arnfjörð Bjarmason
  7 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-20 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren, Ævar Arnfjörð Bjarmason

Remove the now-unused "quiet" parameter from object_as_type(). As
shown in preceding commits the previous users of this parameter were
better off with higher-level APIs.

The "quiet" parameter was originally introduced when the
object_as_type() function was added in 8ff226a9d5e (add object_as_type
helper for casting objects,.  2014-07-13), but the commit.c use-case
for it is now gone.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 blob.c                | 2 +-
 builtin/fsck.c        | 2 +-
 commit.c              | 2 +-
 object.c              | 9 ++++-----
 object.h              | 2 +-
 refs.c                | 2 +-
 t/helper/test-reach.c | 2 +-
 tag.c                 | 2 +-
 tree.c                | 2 +-
 9 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/blob.c b/blob.c
index 1308299eab..f8d8f0b84e 100644
--- a/blob.c
+++ b/blob.c
@@ -15,5 +15,5 @@ struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_blob(r, oid);
-	return object_as_type(obj, OBJ_BLOB, 0);
+	return object_as_type(obj, OBJ_BLOB);
 }
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 70ff95837a..5d534cf218 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -221,7 +221,7 @@ static void mark_unreachable_referents(const struct object_id *oid)
 		enum object_type type = oid_object_info(the_repository,
 							&obj->oid, NULL);
 		if (type > 0)
-			object_as_type(obj, type, 0);
+			object_as_type(obj, type);
 	}
 
 	options.walk = mark_used;
diff --git a/commit.c b/commit.c
index c3bc6cbec4..918c7c7a66 100644
--- a/commit.c
+++ b/commit.c
@@ -76,7 +76,7 @@ struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_commit(r, oid);
-	return object_as_type(obj, OBJ_COMMIT, 0);
+	return object_as_type(obj, OBJ_COMMIT);
 }
 
 struct commit *lookup_commit_reference_by_name(const char *name)
diff --git a/object.c b/object.c
index f694db7e87..9f6f36707b 100644
--- a/object.c
+++ b/object.c
@@ -188,7 +188,7 @@ char* oid_is_type_or_die_msg(const struct object_id *oid,
 	return strbuf_detach(&sb, NULL);
 }
 
-void *object_as_type(struct object *obj, enum object_type type, int quiet)
+void *object_as_type(struct object *obj, enum object_type type)
 {
 	if (obj->type == type) {
 		return obj;
@@ -199,10 +199,9 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 			obj->type = type;
 		return obj;
 	} else {
-		if (!quiet)
-			error(_(object_type_mismatch_msg),
-			      oid_to_hex(&obj->oid),
-			      type_name(obj->type), type_name(type));
+		error(_(object_type_mismatch_msg),
+		      oid_to_hex(&obj->oid),
+		      type_name(obj->type), type_name(type));
 		return NULL;
 	}
 }
diff --git a/object.h b/object.h
index 7ae6407598..bb65a6cd5a 100644
--- a/object.h
+++ b/object.h
@@ -121,7 +121,7 @@ struct object *lookup_object(struct repository *r, const struct object_id *oid);
 
 void *create_object(struct repository *r, const struct object_id *oid, void *obj);
 
-void *object_as_type(struct object *obj, enum object_type type, int quiet);
+void *object_as_type(struct object *obj, enum object_type type);
 
 void oid_is_type_or_die(const struct object_id *oid, enum object_type want,
 			enum object_type *type);
diff --git a/refs.c b/refs.c
index 261fd82beb..7f4ca3441c 100644
--- a/refs.c
+++ b/refs.c
@@ -341,7 +341,7 @@ enum peel_status peel_object(const struct object_id *name, struct object_id *oid
 
 	if (o->type == OBJ_NONE) {
 		int type = oid_object_info(the_repository, name, NULL);
-		if (type < 0 || !object_as_type(o, type, 0))
+		if (type < 0 || !object_as_type(o, type))
 			return PEEL_INVALID;
 	}
 
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index cda804ed79..c9fd74b21f 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -67,7 +67,7 @@ int cmd__reach(int ac, const char **av)
 			die("failed to load commit for input %s resulting in oid %s\n",
 			    buf.buf, oid_to_hex(&oid));
 
-		c = object_as_type(peeled, OBJ_COMMIT, 0);
+		c = object_as_type(peeled, OBJ_COMMIT);
 
 		if (!c)
 			die("failed to load commit for input %s resulting in oid %s\n",
diff --git a/tag.c b/tag.c
index 1bd81bf1d1..25d79c3db3 100644
--- a/tag.c
+++ b/tag.c
@@ -109,7 +109,7 @@ struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_tag(r, oid);
-	return object_as_type(obj, OBJ_TAG, 0);
+	return object_as_type(obj, OBJ_TAG);
 }
 
 static timestamp_t parse_tag_date(const char *buf, const char *tail)
diff --git a/tree.c b/tree.c
index f1c6e8f647..fd3ad18051 100644
--- a/tree.c
+++ b/tree.c
@@ -112,7 +112,7 @@ struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_tree(r, oid);
-	return object_as_type(obj, OBJ_TREE, 0);
+	return object_as_type(obj, OBJ_TREE);
 }
 
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
-- 
2.31.1.723.ga5d7868e4a


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 2/8] object.c: add a utility function for "expected type X, got Y"
  2021-04-20 13:36               ` [PATCH v2 2/8] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
@ 2021-04-21 22:02                 ` Jonathan Tan
  2021-04-22  6:10                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 142+ messages in thread
From: Jonathan Tan @ 2021-04-21 22:02 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, Johannes.Schindelin, peff, me, newren, Jonathan Tan

> diff --git a/merge-recursive.c b/merge-recursive.c
> index 7618303f7b..b952106203 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -2999,6 +2999,7 @@ static int read_oid_strbuf(struct merge_options *opt,
>  	if (!buf)
>  		return err(opt, _("cannot read object %s"), oid_to_hex(oid));
>  	if (type != OBJ_BLOB) {
> +		const char* msg = oid_is_type_or_die_msg(oid, OBJ_BLOB, &type);
>  		free(buf);
>  		return err(opt, _("object %s is not a blob"), oid_to_hex(oid));
>  	}

Stray extra line.

> +void oid_is_type_or_die(const struct object_id *oid,
> +			enum object_type want,
> +			enum object_type *type)
> +{

Thanks - this looks like a good simplification.

Why is type a pointer? Maybe it's to better distinguish the values at
the call site (one pointer, one not), but this solution is confusing
too.

> +int oid_is_type_or_error(const struct object_id *oid,
> +			 enum object_type want,
> +			 enum object_type *type)
> +{

Same comment.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 3/8] object.c: add and use oid_is_type_or_die_msg() function
  2021-04-20 13:36               ` [PATCH v2 3/8] object.c: add and use oid_is_type_or_die_msg() function Ævar Arnfjörð Bjarmason
@ 2021-04-21 22:07                 ` Jonathan Tan
  2021-04-21 23:28                 ` Josh Steadmon
  1 sibling, 0 replies; 142+ messages in thread
From: Jonathan Tan @ 2021-04-21 22:07 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, Johannes.Schindelin, peff, me, newren, Jonathan Tan

> +char* oid_is_type_or_die_msg(const struct object_id *oid,
> +				   enum object_type want,
> +				   enum object_type *type)
> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	if (want == *type)
> +		BUG("call this just to get the message!");
> +	strbuf_addf(&sb, _(object_type_mismatch_msg), oid_to_hex(oid),
> +		    type_name(*type), type_name(want));
> +	return strbuf_detach(&sb, NULL);
> +}

It would be more convenient for the caller if this function also checks
want vs type and returns NULL if they match. That would also be more
consistent with the other functions, and the caller wouldn't need to
remember that this function only works if want and type are different.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 6/8] commit.c: don't use deref_tag() -> object_as_type()
  2021-04-20 13:36               ` [PATCH v2 6/8] commit.c: don't use deref_tag() -> object_as_type() Ævar Arnfjörð Bjarmason
@ 2021-04-21 22:26                 ` Jonathan Tan
  0 siblings, 0 replies; 142+ messages in thread
From: Jonathan Tan @ 2021-04-21 22:26 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, Johannes.Schindelin, peff, me, newren, Jonathan Tan

> Change a use of the object_as_type() function introduced in
> 8ff226a9d5e (add object_as_type helper for casting objects,
> 2014-07-13) to instead assume that we're not dealing with OBJ_NONE (or
> OBJ_BAD) from deref_tag().
> 
> This makes this code easier to read, as the reader isn't wondering why
> the function would need to deal with that. We're simply doing a check
> of OBJ_{COMMIT,TREE,BLOB,TAG} here, not the bare-bones initialization
> object_as_type() might be called on to do.

I think the benefit of using object_as_type() here (functionality that
checks the object type, with optional "quiet" behavior) outweighs the
drawback (additional functionality that we don't need). If we're worried
that the reader would wonder about the OBJ_NONE case, we can include the
BUG check as you did.

> Even though we can read deref_tag() and see that it won't return
> OBJ_NONE and friends, let's add a BUG() assertion here to help future
> maintenance.

This is reasonable.

> diff --git a/commit.c b/commit.c
> index 3d7f1fba0c..c3bc6cbec4 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -31,13 +31,22 @@ const char *commit_type = "commit";
>  struct commit *lookup_commit_reference_gently(struct repository *r,
>  		const struct object_id *oid, int quiet)
>  {
> -	struct object *obj = deref_tag(r,
> -				       parse_object(r, oid),
> -				       NULL, 0);
> +	struct object *tmp = parse_object(r, oid);
> +	struct object *obj = deref_tag(r, tmp, NULL, 0);

This change isn't unnecessary, I think. "tmp" isn't used anywhere else.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 3/8] object.c: add and use oid_is_type_or_die_msg() function
  2021-04-20 13:36               ` [PATCH v2 3/8] object.c: add and use oid_is_type_or_die_msg() function Ævar Arnfjörð Bjarmason
  2021-04-21 22:07                 ` Jonathan Tan
@ 2021-04-21 23:28                 ` Josh Steadmon
  2021-04-28  4:12                   ` Junio C Hamano
  1 sibling, 1 reply; 142+ messages in thread
From: Josh Steadmon @ 2021-04-21 23:28 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Jeff King, Taylor Blau,
	Elijah Newren

On 2021.04.20 15:36, Ævar Arnfjörð Bjarmason wrote:
> diff --git a/object.c b/object.c
> index 9e06c0ee92..0f07f976fb 100644
> --- a/object.c
> +++ b/object.c
> @@ -176,6 +176,18 @@ int oid_is_type_or_error(const struct object_id *oid,
>  		     type_name(want));
>  }
>  
> +char* oid_is_type_or_die_msg(const struct object_id *oid,

It's kind of a nitpick, but I found the function name to be confusing.
It sounds like you're going to die with a custom message. Maybe
something like "get_oid_type_mismatch_msg()" would be more
straightforward.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 2/8] object.c: add a utility function for "expected type X, got Y"
  2021-04-21 22:02                 ` Jonathan Tan
@ 2021-04-22  6:10                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-22  6:10 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster, Johannes.Schindelin, peff, me, newren


On Thu, Apr 22 2021, Jonathan Tan wrote:

>> diff --git a/merge-recursive.c b/merge-recursive.c
>> index 7618303f7b..b952106203 100644
>> --- a/merge-recursive.c
>> +++ b/merge-recursive.c
>> @@ -2999,6 +2999,7 @@ static int read_oid_strbuf(struct merge_options *opt,
>>  	if (!buf)
>>  		return err(opt, _("cannot read object %s"), oid_to_hex(oid));
>>  	if (type != OBJ_BLOB) {
>> +		const char* msg = oid_is_type_or_die_msg(oid, OBJ_BLOB, &type);
>>  		free(buf);
>>  		return err(opt, _("object %s is not a blob"), oid_to_hex(oid));
>>  	}
>
> Stray extra line.
>
>> +void oid_is_type_or_die(const struct object_id *oid,
>> +			enum object_type want,
>> +			enum object_type *type)
>> +{
>
> Thanks - this looks like a good simplification.
>
> Why is type a pointer? Maybe it's to better distinguish the values at
> the call site (one pointer, one not), but this solution is confusing
> too.

Yeah I came up with it because of that, so you wouldn't confuse the
OBJ_COMMIT with (presumably) a variable with the same.

But in some other cases I end up having to do:

    enum object_type type = OBJ_COMMIT;

And then pass that &type in, do you think it's worth it? Maybe I should
just change it...

>> +int oid_is_type_or_error(const struct object_id *oid,
>> +			 enum object_type want,
>> +			 enum object_type *type)
>> +{
>
> Same comment.


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently()
  2021-03-28 18:25             ` Junio C Hamano
@ 2021-04-22 18:09               ` Felipe Contreras
  0 siblings, 0 replies; 142+ messages in thread
From: Felipe Contreras @ 2021-04-22 18:09 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Taylor Blau, Elijah Newren, Johannes Schindelin

Junio C Hamano wrote:
> I am reasonably sure I and a few others on the list are net
> suppliers of the reviewer bandwidth.  I do not expect all the
> prolific contributors to become net suppliers; after all, designing
> and writing their own stuff is always fun.  But I wish that the most
> prominent contributors in the community to be reviewing others'
> topics and ushering these topics to completion from time to time,
> and I am hoping to see that happen more.

The problem is that the suppliers are a club who often agree on what
code is not ready to be merged, which is most of it, and also agree it's
better to apply the reject hammer.

This club is by definition small.

There's a spectrum of perfectness, and the suppliers are on the far left
side: code has to be perfect, while most of the consumers are on the
right side, or even on the sensible middle side: do not let the perfect
be the enemy of the good.

For the suppliers club good is usually not good enough.

I'm fairly confident most of the consumers would agree the bar on what
constitutes "good enough" is too damn high, so why would they spend time
raising it any higher? They won't.

If anything they are more often going to dissagree with the suppliers
club, in order to increase the likelihood of perfectly good patches to
be merged.

As long as you keep insisting on making the perfect being the enemy of
the good, you are going to ensure the supply is *always* going to be
low.

Cheers.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 3/8] object.c: add and use oid_is_type_or_die_msg() function
  2021-04-21 23:28                 ` Josh Steadmon
@ 2021-04-28  4:12                   ` Junio C Hamano
  0 siblings, 0 replies; 142+ messages in thread
From: Junio C Hamano @ 2021-04-28  4:12 UTC (permalink / raw)
  To: Josh Steadmon
  Cc: Ævar Arnfjörð Bjarmason, git, Johannes Schindelin,
	Jeff King, Taylor Blau, Elijah Newren

Josh Steadmon <steadmon@google.com> writes:

> On 2021.04.20 15:36, Ævar Arnfjörð Bjarmason wrote:
>> diff --git a/object.c b/object.c
>> index 9e06c0ee92..0f07f976fb 100644
>> --- a/object.c
>> +++ b/object.c
>> @@ -176,6 +176,18 @@ int oid_is_type_or_error(const struct object_id *oid,
>>  		     type_name(want));
>>  }
>>  
>> +char* oid_is_type_or_die_msg(const struct object_id *oid,
>
> It's kind of a nitpick, but I found the function name to be confusing.
> It sounds like you're going to die with a custom message. Maybe
> something like "get_oid_type_mismatch_msg()" would be more
> straightforward.

Yeah, in an older round I found this function's name was confusing,
too.

Also, there is a style (in our codebase, asterisk to signal the
pointer-ness sticks to the identifier, not to the type name).

Thanks.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 01/10] cat-file tests: test for bogus type name handling
  2021-04-20 12:50           ` [PATCH v2 01/10] cat-file tests: test for bogus type name handling Ævar Arnfjörð Bjarmason
@ 2021-04-29  4:15             ` Junio C Hamano
  0 siblings, 0 replies; 142+ messages in thread
From: Junio C Hamano @ 2021-04-29  4:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Johannes Schindelin, Jeff King, Taylor Blau, Elijah Newren

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Add a test of how "cat-file" behaves when given a bogus type in its
> "git cat-file <TYPE> <OBJECT>" mode. There were existing tests (just
> below this one) for "-t bogus" or "--allow-unknown-type" modes, but
> none for the switch-less mode.
>
> This test is similar to the one that exists for "git hash-object"
> already, see b7994af0f92 (type_from_string_gently: make sure length
> matches, 2015-04-17).
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  t/t1006-cat-file.sh | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 5d2dc99b74..908797dcae 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -315,6 +315,22 @@ test_expect_success '%(deltabase) reports packed delta bases' '
>  	}
>  '
>  
> +test_expect_success 'cat-file complains about bogus type name' '
> +	test_must_fail git cat-file co HEAD >out 2>err &&
> +	test_must_be_empty out &&
> +	cat >expected <<-\EOF &&
> +	fatal: invalid object type "co"
> +	EOF
> +	test_cmp expected err &&
> +
> +	test_must_fail git cat-file bogus HEAD >out 2>err &&
> +	test_must_be_empty out &&
> +	cat >expected <<-\EOF &&
> +	fatal: invalid object type "bogus"
> +	EOF
> +	test_cmp expected err
> +'

I am not 100% sure if it is worth testing both "co" and "bogus", but
if it were, then I'd prefer to have these two as independent test
pieces, because a convincing "we need to have both of these tested
for this reason" would very much likely say that these two kinds of
bugs can come from different/independent bug sources.

FWIW, the commit b7994af0 (type_from_string_gently: make sure length
matches, 2015-04-17) used as a model for this patch uses two
separate tests (one for truncated, the other for bogus).

>  bogus_type="bogus"
>  bogus_content="bogus"
>  bogus_size=$(strlen "$bogus_content")

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 04/10] object-file.c: take type id, not string, in read_object_with_reference()
  2021-04-20 12:50           ` [PATCH v2 04/10] object-file.c: take type id, not string, in read_object_with_reference() Ævar Arnfjörð Bjarmason
@ 2021-04-29  4:37             ` Junio C Hamano
  0 siblings, 0 replies; 142+ messages in thread
From: Junio C Hamano @ 2021-04-29  4:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Johannes Schindelin, Jeff King, Taylor Blau, Elijah Newren

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Make the read_object_with_reference() function take "enum object_type"
> instead of a "const char *" with a type name that it converted via
> type_from_string().
>
> Out of the nine callers of this function, only one wanted to pass a
> "const char *". The others were simply passing along the
> {commit,tree}_type string constants.

s/wanted to pass a/wanted to pass an arbitrary/, I would think.
Other than that, nicely analyzed.

> That one caller in builtin/cat-file.c did not expect to pass a "raw"
> type (i.e. in invalid "--literally" type, but one gotten from
> type_from_string().

It is unclear what you are trying to say here.  Missing closing ')'
that should close '(i.e.' does not help, either, because it muddles
what you wanted to imply by bringing up the "--literally" option.

    The one caller in builtin/cat-file.c passes the typename the
    end-user typed on the command line i.e. "git cat-file <TYPE>
    <NAME>", but read_object_with_reference() called in the codepath
    is *NOT* expected to deal with objects with unknown/bogus type
    crafted with "hash-object --literally" (note: "git cat-file"
    does have the "--allow-unknown-type" option, but it can only be
    used with the "-s" and "-t" options).  It won't result in loss
    of functionality if we restricted the required_type parameter
    given to read_object_with_reference() to the four known types by
    changing the function signature to take the enum instead of
    string.

is probably what you meant to say, but I am only guessing.

> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 5ebf13359e..46fc7a32ba 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -66,7 +66,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
>  			int unknown_type)
>  {
>  	struct object_id oid;
> -	enum object_type type;
> +	enum object_type type, exp_type_id;
>  	char *buf;
>  	unsigned long size;
>  	struct object_context obj_context;
> @@ -154,7 +154,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
>  		break;
>  
>  	case 0:
> -		if (type_from_string(exp_type) == OBJ_BLOB) {
> +		exp_type_id = type_from_string(exp_type);
> +		if (exp_type_id == OBJ_BLOB) {
>  			struct object_id blob_oid;
>  			if (oid_object_info(the_repository, &oid, NULL) == OBJ_TAG) {
>  				char *buffer = read_object_file(&oid, &type,
> @@ -177,7 +178,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
>  			 */
>  		}
>  		buf = read_object_with_reference(the_repository,
> -						 &oid, exp_type, &size, NULL);
> +						 &oid, exp_type_id, &size, NULL);
>  		break;
>  
>  	default:

And this is the caller we just discussed.

> diff --git a/object-file.c b/object-file.c
> index 624af408cd..d2f223dcef 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1669,25 +1669,24 @@ void *read_object_file_extended(struct repository *r,
>  
>  void *read_object_with_reference(struct repository *r,
>  				 const struct object_id *oid,
> -				 const char *required_type_name,
> +				 enum object_type object_type,
>  				 unsigned long *size,
>  				 struct object_id *actual_oid_return)
>  {
> -	enum object_type type, required_type;
>  	void *buffer;
>  	unsigned long isize;
>  	struct object_id actual_oid;
>  
> -	required_type = type_from_string(required_type_name);
>  	oidcpy(&actual_oid, oid);
>  	while (1) {
>  		int ref_length = -1;
>  		const char *ref_type = NULL;
> +		enum object_type type;
>  
>  		buffer = repo_read_object_file(r, &actual_oid, &type, &isize);
>  		if (!buffer)
>  			return NULL;
> -		if (type == required_type) {
> +		if (type == object_type) {
>  			*size = isize;
>  			if (actual_oid_return)
>  				oidcpy(actual_oid_return, &actual_oid);

I do not think it is a good change to effectively rename
required_type to object_type.  Swapping the required_type_name
parameter with required_type parameter of type "enum object_type"
and dropping the now unneeded type_from_string() call would have
been much easier to follow.


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 05/10] {commit,tree,blob,tag}.c: add a create_{commit,tree,blob,tag}()
  2021-04-20 12:50           ` [PATCH v2 05/10] {commit,tree,blob,tag}.c: add a create_{commit,tree,blob,tag}() Ævar Arnfjörð Bjarmason
@ 2021-04-29  4:45             ` Junio C Hamano
  2021-04-29 12:01               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 142+ messages in thread
From: Junio C Hamano @ 2021-04-29  4:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Johannes Schindelin, Jeff King, Taylor Blau, Elijah Newren

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Add a create_*() function for our built-in types as a handy but
> trivial wrapper around their calls to create_object().
>
> This allows for slightly simplifying code added in
> 96af91d410c (commit-graph: verify objects exist, 2018-06-27). The
> remaining three functions are added for consistency for now.

"for now" puzzles me.  As file-scope static functions, they do not
hurt all that much, but on the other hand, having to say
"create_object(r, oid, alloc_blob_node(r))" is not hurting at all.

The worst part of this "consistency" is that callers cannot call
create_blob() because it is not external, even though they learn
create_commit() as a handy way to use the create_object() API, which
is not consistent at all.

And since most callers should be calling lookup_blob() etc., and
should not be calling create_blob(), we shouldn't tempt people to
push for making them externally available.

Which in turn makes me wonder if the use of create_object() added to
the commit-graph.c was a good idea to begin with.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 06/10] blob.c: remove parse_blob_buffer()
  2021-04-20 12:50           ` [PATCH v2 06/10] blob.c: remove parse_blob_buffer() Ævar Arnfjörð Bjarmason
@ 2021-04-29  4:51             ` Junio C Hamano
  0 siblings, 0 replies; 142+ messages in thread
From: Junio C Hamano @ 2021-04-29  4:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Johannes Schindelin, Jeff King, Taylor Blau, Elijah Newren

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> -int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
> -{
> -	item->object.parsed = 1;
> -	return 0;
> -}

Understandable.

> diff --git a/object.c b/object.c
> index 78343781ae..f4e419e5c3 100644
> --- a/object.c
> +++ b/object.c
> @@ -195,8 +195,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
>  	if (type == OBJ_BLOB) {
>  		struct blob *blob = lookup_blob(r, oid);
>  		if (blob) {
> -			if (parse_blob_buffer(blob, buffer, size))
> -				return NULL;
> +			blob->object.parsed = 1;
>  			obj = &blob->object;
>  		}

Understandable, too.

>  	} else if (type == OBJ_TREE) {
> @@ -262,12 +261,16 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
>  	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
>  	    (!obj && repo_has_object_file(r, oid) &&
>  	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
> +		if (!obj) {
> +			struct blob *blob = create_blob(r, oid);
> +			obj = &blob->object;
> +		}

I do not recall this change explained or justified in the log
message.  What am I missing?

>  		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
>  			error(_("hash mismatch %s"), oid_to_hex(oid));
>  			return NULL;
>  		}
> -		parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
> -		return lookup_object(r, oid);
> +		obj->parsed = 1;
> +		return obj;

Likewise.  Why isn't it just a call to lookup_blob() followed by
setting of its .parsed bit?  

In other words, it is not clear why we need to expose create_blob().

>  	}
>  
>  	buffer = repo_read_object_file(r, oid, &type, &size);

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 08/10] object.c: don't go past "len" under die() in type_from_string_gently()
  2021-04-20 12:50           ` [PATCH v2 08/10] object.c: don't go past "len" under die() in type_from_string_gently() Ævar Arnfjörð Bjarmason
@ 2021-04-29  4:55             ` Junio C Hamano
  0 siblings, 0 replies; 142+ messages in thread
From: Junio C Hamano @ 2021-04-29  4:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Johannes Schindelin, Jeff King, Taylor Blau, Elijah Newren

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Fix a bug that's been with us ever since type_from_string_gently() was
> split off from type_from_string() in fe8e3b71805 (Refactor
> type_from_string() to allow continuing after detecting an error,
> 2014-09-10).
>
> When the type was invalid and we were in the non-gently mode we'd die,
> and then proceed to run off past the "len" of the buffer we were
> provided with.
>
> Luckily, I think that nothing ever used this function in that way. Any
> non-gentle invocation came via type_from_string(), which was passing a
> buffer with a NIL at the same place as the "len" would take us (we got
> it via strlen()).

NIL???


>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  object.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/object.c b/object.c
> index 70af833ca1..bad9e17f25 100644
> --- a/object.c
> +++ b/object.c
> @@ -50,7 +50,7 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
>  	if (gentle)
>  		return -1;
>  
> -	die(_("invalid object type \"%s\""), str);
> +	die(_("invalid object type \"%.*s\""), (int)len, str);
>  }

This makes total sense.  This is one of the reasons why I hate to
review your topics---many patches in them seem unwarranted churn,
but there are clear gems like this commit buried in late steps in
them so I need to read through them to find these anyway :-)



^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 09/10] mktree: stop setting *ntr++ to NIL
  2021-04-20 12:50           ` [PATCH v2 09/10] mktree: stop setting *ntr++ to NIL Ævar Arnfjörð Bjarmason
@ 2021-04-29  5:01             ` Junio C Hamano
  0 siblings, 0 replies; 142+ messages in thread
From: Junio C Hamano @ 2021-04-29  5:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Johannes Schindelin, Jeff King, Taylor Blau, Elijah Newren

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Since 58ce21b819e (builtin/mktree: remove hard-coded constant,
> 2018-10-15) we have not made any subsequent use of the ntr variable
> itself, but we did rely on it to NIL-delimit the string we were about
> to feed to type_from_string().
>
> Using type_from_string() here results in needless work, as we'd do a
> strlen() on it, just to find point at which we had a SPC
> character (now NIL) earlier in this function.

Since when do we write in LISP? ;-)  The name of the ASCII character
with value 0 is NUL (null).

> We can instead skip incrementing the ntr pointer, then pass the
> pointer and length to the type_from_string() function instead.

Makes sense.  Not clobbering the input buffer is good.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH v2 05/10] {commit,tree,blob,tag}.c: add a create_{commit,tree,blob,tag}()
  2021-04-29  4:45             ` Junio C Hamano
@ 2021-04-29 12:01               ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 142+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-29 12:01 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Schindelin, Jeff King, Taylor Blau, Elijah Newren


On Thu, Apr 29 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Add a create_*() function for our built-in types as a handy but
>> trivial wrapper around their calls to create_object().
>>
>> This allows for slightly simplifying code added in
>> 96af91d410c (commit-graph: verify objects exist, 2018-06-27). The
>> remaining three functions are added for consistency for now.
>
> "for now" puzzles me.  As file-scope static functions, they do not
> hurt all that much, but on the other hand, having to say
> "create_object(r, oid, alloc_blob_node(r))" is not hurting at all.
>
> The worst part of this "consistency" is that callers cannot call
> create_blob() because it is not external, even though they learn
> create_commit() as a handy way to use the create_object() API, which
> is not consistent at all.
>
> And since most callers should be calling lookup_blob() etc., and
> should not be calling create_blob(), we shouldn't tempt people to
> push for making them externally available.

The API is for our own internal use. So I figured it was better to leave
the ones that aren't used elsewhere "static" for now, and if anyone
needed them in the future that commit could remove the "static".

> Which in turn makes me wonder if the use of create_object() added to
> the commit-graph.c was a good idea to begin with.

Yes we could just drop this and inline the various "alloc", i.e. not
this & similar in the future:

-		odb_commit = (struct commit *)create_object(r, &cur_oid, alloc_commit_node(r));
+		odb_commit = create_commit(r, &cur_oid);

It just seemed like a net improvement for maintenance/readability to
have the simpler wrapper for the allocation / object creation v.s. the
existing alloc_X_node() + cast.

^ permalink raw reply	[flat|nested] 142+ messages in thread

end of thread, back to index

Thread overview: 142+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-22  0:33 [PATCH 0/2] Pull objects of various types Daniel Barkalow
2005-06-22  0:35 ` [PATCH 1/2] Parse tags for absent objects Daniel Barkalow
2021-03-08 20:04   ` [PATCH 0/7] improve reporting of unexpected objects Ævar Arnfjörð Bjarmason
2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-03-28  5:35         ` Junio C Hamano
2021-03-28 15:46           ` Ævar Arnfjörð Bjarmason
2021-03-28 18:25             ` Junio C Hamano
2021-04-22 18:09               ` Felipe Contreras
2021-03-28  2:13       ` [PATCH v2 02/10] object.c: refactor type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 03/10] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 04/10] object-file.c: make oid_object_info() " Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 05/10] object-name.c: make dependency on object_type order more obvious Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 06/10] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 07/10] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 08/10] object.c: add and use oid_is_type_or_die_msg() function Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 09/10] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
2021-03-30  5:50         ` Junio C Hamano
2021-03-31 11:02           ` Jeff King
2021-03-31 18:05             ` Junio C Hamano
2021-03-31 18:31             ` Ævar Arnfjörð Bjarmason
2021-03-31 18:59               ` Jeff King
2021-03-31 20:46                 ` Ævar Arnfjörð Bjarmason
2021-04-01  7:54                   ` Jeff King
2021-04-01  8:32                     ` [PATCH] ref-filter: fix NULL check for parse object failure Jeff King
2021-04-01 13:56                       ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Ævar Arnfjörð Bjarmason
2021-04-01 13:56                         ` [PATCH v2 1/5] mktag tests: parse out options in helper Ævar Arnfjörð Bjarmason
2021-04-01 13:56                         ` [PATCH v2 2/5] mktag tests: invert --no-strict test Ævar Arnfjörð Bjarmason
2021-04-01 13:56                         ` [PATCH v2 3/5] mktag tests: do fsck on failure Ævar Arnfjörð Bjarmason
2021-04-01 13:56                         ` [PATCH v2 4/5] mktag tests: test for maybe segfaulting for-each-ref Ævar Arnfjörð Bjarmason
2021-04-01 13:56                         ` [PATCH v2 5/5] ref-filter: fix NULL check for parse object failure Ævar Arnfjörð Bjarmason
2021-04-01 19:19                           ` Ramsay Jones
2021-04-01 19:56                         ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Junio C Hamano
2021-04-02 11:37                           ` Ævar Arnfjörð Bjarmason
2021-04-02 20:51                             ` Junio C Hamano
2021-04-01 19:52                       ` [PATCH] ref-filter: fix NULL check for parse object failure Junio C Hamano
2021-03-31 18:41             ` [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors Junio C Hamano
2021-03-31 19:00               ` Jeff King
2021-03-28  9:27       ` [PATCH v2 00/10] improve reporting of unexpected objects Jeff King
2021-03-29 13:34         ` Ævar Arnfjörð Bjarmason
2021-03-31 10:43           ` Jeff King
2021-04-09  8:07       ` [PATCH 0/2] blob/object.c: trivial readability improvements Ævar Arnfjörð Bjarmason
2021-04-09  8:07         ` [PATCH 1/2] blob.c: remove buffer & size arguments to parse_blob_buffer() Ævar Arnfjörð Bjarmason
2021-04-09 17:51           ` Jeff King
2021-04-09 22:31             ` Junio C Hamano
2021-04-10 12:57             ` Ævar Arnfjörð Bjarmason
2021-04-10 13:01               ` Ævar Arnfjörð Bjarmason
2021-04-13  8:25               ` Jeff King
2021-04-09  8:07         ` [PATCH 2/2] object.c: initialize automatic variable in lookup_object() Ævar Arnfjörð Bjarmason
2021-04-09 17:53           ` Jeff King
2021-04-09 22:32             ` Junio C Hamano
2021-04-09  8:32         ` [PATCH 0/6] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
2021-04-09  8:32           ` [PATCH 1/6] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-04-09 18:06             ` Jeff King
2021-04-09 18:10               ` Jeff King
2021-04-09  8:32           ` [PATCH 2/6] object.c: remove "gently" argument to type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-04-09 18:10             ` Jeff King
2021-04-09  8:32           ` [PATCH 3/6] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
2021-04-09 18:14             ` Jeff King
2021-04-09 19:42               ` Ævar Arnfjörð Bjarmason
2021-04-09 21:29                 ` Jeff King
2021-04-09  8:32           ` [PATCH 4/6] object-file.c: make oid_object_info() " Ævar Arnfjörð Bjarmason
2021-04-09 18:24             ` Jeff King
2021-04-09  8:32           ` [PATCH 5/6] object-name.c: make dependency on object_type order more obvious Ævar Arnfjörð Bjarmason
2021-04-09 18:36             ` Jeff King
2021-04-09  8:32           ` [PATCH 6/6] tag.c: use type_from_string_gently() when parsing tags Ævar Arnfjörð Bjarmason
2021-04-09 18:42             ` Jeff King
2021-04-09  8:49           ` [PATCH 0/7] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
2021-04-09  8:49             ` [PATCH 1/7] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
2021-04-09  8:49             ` [PATCH 2/7] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
2021-04-09  8:49             ` [PATCH 3/7] object.c: add and use oid_is_type_or_die_msg() function Ævar Arnfjörð Bjarmason
2021-04-09  8:49             ` [PATCH 4/7] commit-graph: use obj->type, not object_as_type() Ævar Arnfjörð Bjarmason
2021-04-09  8:50             ` [PATCH 5/7] commit.c: don't use deref_tag() -> object_as_type() Ævar Arnfjörð Bjarmason
2021-04-09  8:50             ` [PATCH 6/7] object.c: normalize brace style in object_as_type() Ævar Arnfjörð Bjarmason
2021-04-09  8:50             ` [PATCH 7/7] object.c: remove "quiet" parameter from object_as_type() Ævar Arnfjörð Bjarmason
2021-04-20 13:36             ` [PATCH v2 0/8] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
2021-04-20 13:36               ` [PATCH v2 1/8] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
2021-04-20 13:36               ` [PATCH v2 2/8] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
2021-04-21 22:02                 ` Jonathan Tan
2021-04-22  6:10                   ` Ævar Arnfjörð Bjarmason
2021-04-20 13:36               ` [PATCH v2 3/8] object.c: add and use oid_is_type_or_die_msg() function Ævar Arnfjörð Bjarmason
2021-04-21 22:07                 ` Jonathan Tan
2021-04-21 23:28                 ` Josh Steadmon
2021-04-28  4:12                   ` Junio C Hamano
2021-04-20 13:36               ` [PATCH v2 4/8] commit-graph: use obj->type, not object_as_type() Ævar Arnfjörð Bjarmason
2021-04-20 13:36               ` [PATCH v2 5/8] branch tests: assert lookup_commit_reference_gently() error Ævar Arnfjörð Bjarmason
2021-04-20 13:36               ` [PATCH v2 6/8] commit.c: don't use deref_tag() -> object_as_type() Ævar Arnfjörð Bjarmason
2021-04-21 22:26                 ` Jonathan Tan
2021-04-20 13:36               ` [PATCH v2 7/8] object.c: normalize brace style in object_as_type() Ævar Arnfjörð Bjarmason
2021-04-20 13:37               ` [PATCH v2 8/8] object.c: remove "quiet" parameter from object_as_type() Ævar Arnfjörð Bjarmason
2021-04-20 13:00           ` [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 02/10] object.c: remove "gently" argument to type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 03/10] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 04/10] object-file.c: make oid_object_info() " Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 05/10] object-name.c: make dependency on object_type order more obvious Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 06/10] tag.c: use type_from_string_gently() when parsing tags Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 07/10] hash-object: pass along type length to object.c Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 08/10] hash-object: refactor nested else/if/if into else if/else if Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 09/10] hash-object: show usage on invalid --type Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 10/10] object.c: move type_from_string() code to its last user Ævar Arnfjörð Bjarmason
2021-04-20 12:50         ` [PATCH v2 00/10] object.c et al: tests, small bug fixes etc Ævar Arnfjörð Bjarmason
2021-04-20 12:50           ` [PATCH v2 01/10] cat-file tests: test for bogus type name handling Ævar Arnfjörð Bjarmason
2021-04-29  4:15             ` Junio C Hamano
2021-04-20 12:50           ` [PATCH v2 02/10] hash-object tests: more detailed test for invalid type Ævar Arnfjörð Bjarmason
2021-04-20 12:50           ` [PATCH v2 03/10] mktree tests: add test for invalid object type Ævar Arnfjörð Bjarmason
2021-04-20 12:50           ` [PATCH v2 04/10] object-file.c: take type id, not string, in read_object_with_reference() Ævar Arnfjörð Bjarmason
2021-04-29  4:37             ` Junio C Hamano
2021-04-20 12:50           ` [PATCH v2 05/10] {commit,tree,blob,tag}.c: add a create_{commit,tree,blob,tag}() Ævar Arnfjörð Bjarmason
2021-04-29  4:45             ` Junio C Hamano
2021-04-29 12:01               ` Ævar Arnfjörð Bjarmason
2021-04-20 12:50           ` [PATCH v2 06/10] blob.c: remove parse_blob_buffer() Ævar Arnfjörð Bjarmason
2021-04-29  4:51             ` Junio C Hamano
2021-04-20 12:50           ` [PATCH v2 07/10] object.c: simplify return semantic of parse_object_buffer() Ævar Arnfjörð Bjarmason
2021-04-20 12:50           ` [PATCH v2 08/10] object.c: don't go past "len" under die() in type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-04-29  4:55             ` Junio C Hamano
2021-04-20 12:50           ` [PATCH v2 09/10] mktree: stop setting *ntr++ to NIL Ævar Arnfjörð Bjarmason
2021-04-29  5:01             ` Junio C Hamano
2021-04-20 12:50           ` [PATCH v2 10/10] mktree: emit a more detailed error when the <type> is invalid Ævar Arnfjörð Bjarmason
2021-03-08 20:04   ` [PATCH 1/7] object.c: refactor type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-03-08 20:52     ` Taylor Blau
2021-03-09 10:46     ` Jeff King
2021-03-08 20:04   ` [PATCH 2/7] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
2021-03-08 20:56     ` Taylor Blau
2021-03-08 21:48     ` Junio C Hamano
2021-03-08 20:04   ` [PATCH 3/7] oid_object_info(): " Ævar Arnfjörð Bjarmason
2021-03-08 21:54     ` Junio C Hamano
2021-03-08 22:32       ` Junio C Hamano
2021-03-09 10:34     ` Jeff King
2021-03-08 20:04   ` [PATCH 4/7] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
2021-03-08 20:04   ` [PATCH 5/7] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
2021-03-08 20:59     ` Taylor Blau
2021-03-08 22:15     ` Junio C Hamano
2021-03-08 20:04   ` [PATCH 6/7] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
2021-03-09 10:44     ` Jeff King
2021-03-28  1:35       ` Ævar Arnfjörð Bjarmason
2021-03-28  9:06         ` Jeff King
2021-03-28 15:39           ` Ævar Arnfjörð Bjarmason
2021-03-29  9:16             ` Jeff King
2021-03-08 20:04   ` [PATCH 7/7] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
2005-06-22  0:35 ` [PATCH 2/2] Pull misc objects Daniel Barkalow

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git