All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Derrick Stolee <stolee@gmail.com>,
	git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 5/5] load_ref_decorations(): avoid parsing non-tag objects
Date: Tue, 22 Jun 2021 20:27:53 +0200	[thread overview]
Message-ID: <87zgvh20j3.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <YNIYqFFFti73UT5+@coredump.intra.peff.net>


On Tue, Jun 22 2021, Jeff King wrote:

> On Tue, Jun 22, 2021 at 12:35:46PM -0400, Derrick Stolee wrote:
>
>> On 6/22/2021 12:08 PM, Jeff King wrote:
>> 
>> > -	obj = parse_object(the_repository, oid);
>> > -	if (!obj)
>> > +	objtype = oid_object_info(the_repository, oid, NULL);
>> > +	if (type < 0)
>> >  		return 0;
>> 
>> Do you mean "if (objtype < 0)" here? There is a 'type' variable,
>> but it is an enum decoration_type and I can't find a reason why
>> it would be negative. oid_object_info() _does_ return -1 if there
>> is a problem loading the object, so that would make sense.
>
> Whoops, thanks for catching that. I originally called it "enum
> object_type type", but then of course the compiler informed that there
> was already a "type" variable in the function. So I renamed it to
> "objtype" but missed updating that line. But it still compiled. Yikes. :)

[Enter Captain Hindsight]

If you use a slightly different coding style and leverage the
information the compiler has to work with you'd get it to error for you,
e.g. this on your original patch would catch it:

	diff --git a/log-tree.c b/log-tree.c
	index 8b700e9c142..7e3a011b533 100644
	--- a/log-tree.c
	+++ b/log-tree.c
	@@ -157,9 +157,12 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
	 	}
	 
	 	objtype = oid_object_info(the_repository, oid, NULL);
	-	if (type < 0)
	+	switch (type) {
	+	case OBJ_BAD:
	 		return 0;
	-	obj = lookup_object_by_type(the_repository, oid, objtype);
	+	default:
	+		obj = lookup_object_by_type(the_repository, oid, objtype);
	+	}
	 
	 	if (starts_with(refname, "refs/heads/"))
	 		type = DECORATION_REF_LOCAL;

IMO the real problem is an over-reliance on C being so happy to treat
enums as ints (well, with them being ints). If you consistently use
labels you get the compiler to do the checking. For me with gcc and
clang with that on top:
	
	log-tree.c:161:2: error: case value ‘4294967295’ not in enumerated type ‘enum decoration_type’ [-Werror=switch]
	  case OBJ_BAD:
	  ^~~~
	log-tree.c:161:7: error: case value not in enumerated type 'enum decoration_type' [-Werror,-Wswitch]
	        case OBJ_BAD:
	             ^

I think we've disagreed on that exact point before recently, i.e. you
think we shouldn't rely on OBJ_BAD in that way, and instead check for
any negative value:
https://lore.kernel.org/git/YHCZh5nLNVEHCWV2@coredump.intra.peff.net/

This sort of thing is a good reason to pick the opposite pattern. You
get the same type checking you'd usually get with anything else in C.

Yes, it is more verbose e.g. in this case, and particularly (as noted
downthread of what I linked to) because "enum object_type" contains so
many uncommon things, and really should be split up.

In practice I don't think it's too verbose, because once you start
consistently using the pattern you'll usually not be doing conversions
all over the place, and would just do this sort of thing via a helper
that does the type checking, e.g. something like this (or anything else
where you don't lose the type & labels):
	
	diff --git a/log-tree.c b/log-tree.c
	index 8b700e9c142..a61fb01ba3f 100644
	--- a/log-tree.c
	+++ b/log-tree.c
	@@ -130,6 +130,30 @@ static int ref_filter_match(const char *refname,
	 	return 1;
	 }
	 
	+static enum object_type oid_object_info_ok(struct repository *repo,
	+					   struct object_id *oid,
	+					   enum object_type *typep,
	+					   unsigned long *sizep)
	+{
	+	enum object_type type = oid_object_info(repo, oid, sizep);
	+	*typep = type;
	+	switch (type) {
	+	case OBJ_BAD:
	+		return 0;
	+	case OBJ_COMMIT:
	+	case OBJ_TREE:
	+	case OBJ_BLOB:
	+	case OBJ_TAG:
	+		return 1;
	+	case OBJ_NONE:
	+	case OBJ_OFS_DELTA:
	+	case OBJ_REF_DELTA:
	+	case OBJ_ANY:
	+	case OBJ_MAX:
	+		BUG("the enum_object type is too large!");
	+	}
	+}
	+
	 static int add_ref_decoration(const char *refname, const struct object_id *oid,
	 			      int flags, void *cb_data)
	 {
	@@ -156,8 +180,7 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
	 		return 0;
	 	}
	 
	-	objtype = oid_object_info(the_repository, oid, NULL);
	-	if (type < 0)
	+	if (!oid_object_info_ok(the_repository, oid, &type, NULL))
	 		return 0;
	 	obj = lookup_object_by_type(the_repository, oid, objtype);
	 

With that pattern GCC narrowlry pulls ahead with showing 4 warnings just
about the loss of the type, with Clang at 3 :)

  parent reply	other threads:[~2021-06-22 18:54 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-22 16:03 [PATCH 0/5] some "log --decorate" optimizations Jeff King
2021-06-22 16:03 ` [PATCH 1/5] pretty.h: update and expand docstring for userformat_find_requirements() Jeff King
2021-06-22 16:04 ` [PATCH 2/5] log: avoid loading decorations for userformats that don't need it Jeff King
2021-06-22 16:05 ` [PATCH 3/5] object.h: expand docstring for lookup_unknown_object() Jeff King
2021-06-22 16:06 ` [PATCH 4/5] object.h: add lookup_object_by_type() function Jeff King
2021-06-22 16:08 ` [PATCH 5/5] load_ref_decorations(): avoid parsing non-tag objects Jeff King
2021-06-22 16:35   ` Derrick Stolee
2021-06-22 17:06     ` Jeff King
2021-06-22 17:09       ` Jeff King
2021-06-22 17:25         ` Derrick Stolee
2021-06-22 18:27       ` Ævar Arnfjörð Bjarmason [this message]
2021-06-22 19:08         ` Jeff King
2021-06-22 17:06   ` Ævar Arnfjörð Bjarmason
2021-06-22 18:57     ` Jeff King
2021-06-23  2:46   ` Taylor Blau
2021-06-23 21:51     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgvh20j3.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.