All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Stefan Beller <sbeller@google.com>,
	Ramsay Jones <ramsay@ramsayjones.plus.com>
Subject: Re: Re*: [PATCH v3] fetch: replace string-list used as a look-up table with a hashmap
Date: Wed, 31 Oct 2018 15:50:00 +0100 (STD)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.1810311542560.4546@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <xmqqk1m3c2dz.fsf_-_@gitster-ct.c.googlers.com>

Hi Junio,

On Sat, 27 Oct 2018, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > Just one thing^W^Wa couple of things:
> >
> > It would probably make more sense to `hashmap_get_from_hash()` and
> > `strhash()` here (and `strhash()` should probably be used everywhere
> > instead of `memhash(str, strlen(str))`).
> 
> hashmap_get_from_hash() certainly is much better suited for simpler
> usage pattern like these callsites, and the ones in sequencer.c.  It
> is a shame that a more complex variant takes the shorter-and-sweeter
> name hashmap_get().

I agree, at least in part.

From what I understand, hashmap_get_from_hash() needs a little assistance
from the comparison function with which the hashmap is configured, see
e.g. this function in the sequencer:

	static int labels_cmp(const void *fndata, const struct labels_entry *a,
			      const struct labels_entry *b, const void *key)
	{
		return key ? strcmp(a->label, key) : strcmp(a->label, b->label);
	}

See how that first tests whether `key` is non-`NULL`, and then takes a
shortcut, not even looking at `b`? This is important, because `b` does not
refer to a complete `labels_entry` when we call `hashmap_get_from_hash()`.
It only refers to a `hashmap_entry`. Looking at `b->label` would access
some random memory, and do most certainly the wrong thing.

> I wish we named the latter hashmap_get_fullblown_feature_rich() and
> called the _from_hash() thing a simple hashmap_get() from day one,
> but it is way too late.
> 
> I looked briefly the users of the _get() variant, and some of their
> uses are legitimately not-simple and cannot be reduced to use the
> simpler _get_from_hash variant, it seems.  But others like those in
> builtin/difftool.c should be straight-forward to convert to use the
> simpler get_from_hash variant.  It could be a low-hanging fruit left
> for later clean-up, perhaps.

Right. #leftoverbits

> >> @@ -271,10 +319,10 @@ static void find_non_local_tags(const struct ref *refs,
> >>  			    !has_object_file_with_flags(&ref->old_oid,
> >>  							OBJECT_INFO_QUICK) &&
> >>  			    !will_fetch(head, ref->old_oid.hash) &&
> >> -			    !has_sha1_file_with_flags(item->util,
> >> +			    !has_sha1_file_with_flags(item->oid.hash,
> >
> > I am not sure that we need to test for null OIDs here, given that...
> > ...
> > Of course, `has_sha1_file_with_flags()` is supposed to return `false` for
> > null OIDs, I guess.
> 
> Yup.  An alternative is to make item->oid a pointer to oid, not an
> oid object itself, so that we can express "no OID for this ref" in a
> more explicit way, but is_null_oid() is already used as "no OID" in
> many other codepaths, so...

Right, and it would complicate the code. So I am fine with your version of
it.

> >> +	for_each_string_list_item(remote_ref_item, &remote_refs_list) {
> >> +		const char *refname = remote_ref_item->string;
> >> +		struct hashmap_entry key;
> >> +
> >> +		hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> >> +		item = hashmap_get(&remote_refs, &key, refname);
> >> +		if (!item)
> >> +			continue; /* can this happen??? */
> >
> > This would indicate a BUG, no?
> 
> Possibly.  Alternatively, we can just use item without checking and
> let the runtime segfault.

Hahaha! Yep. We could also cause a crash. I do prefer the BUG() call.

> Here is an incremental on top that can be squashed in to turn v3
> into v4.

Nice.

Thanks!
Dscho

> 
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 0f8e333022..aee1d9bf21 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -259,7 +259,7 @@ static struct refname_hash_entry *refname_hash_add(struct hashmap *map,
>  	size_t len = strlen(refname);
>  
>  	FLEX_ALLOC_MEM(ent, refname, refname, len);
> -	hashmap_entry_init(ent, memhash(refname, len));
> +	hashmap_entry_init(ent, strhash(refname));
>  	oidcpy(&ent->oid, oid);
>  	hashmap_add(map, ent);
>  	return ent;
> @@ -282,11 +282,7 @@ static void refname_hash_init(struct hashmap *map)
>  
>  static int refname_hash_exists(struct hashmap *map, const char *refname)
>  {
> -	struct hashmap_entry key;
> -	size_t len = strlen(refname);
> -	hashmap_entry_init(&key, memhash(refname, len));
> -
> -	return !!hashmap_get(map, &key, refname);
> +	return !!hashmap_get_from_hash(map, strhash(refname), refname);
>  }
>  
>  static void find_non_local_tags(const struct ref *refs,
> @@ -365,12 +361,10 @@ static void find_non_local_tags(const struct ref *refs,
>  	 */
>  	for_each_string_list_item(remote_ref_item, &remote_refs_list) {
>  		const char *refname = remote_ref_item->string;
> -		struct hashmap_entry key;
>  
> -		hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> -		item = hashmap_get(&remote_refs, &key, refname);
> +		item = hashmap_get_from_hash(&remote_refs, strhash(refname), refname);
>  		if (!item)
> -			continue; /* can this happen??? */
> +			BUG("unseen remote ref?");
>  
>  		/* Unless we have already decided to ignore this item... */
>  		if (!is_null_oid(&item->oid)) {
> @@ -497,12 +491,12 @@ static struct ref *get_ref_map(struct remote *remote,
>  
>  	for (rm = ref_map; rm; rm = rm->next) {
>  		if (rm->peer_ref) {
> -			struct hashmap_entry key;
>  			const char *refname = rm->peer_ref->name;
>  			struct refname_hash_entry *peer_item;
>  
> -			hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> -			peer_item = hashmap_get(&existing_refs, &key, refname);
> +			peer_item = hashmap_get_from_hash(&existing_refs,
> +							  strhash(refname),
> +							  refname);
>  			if (peer_item) {
>  				struct object_id *old_oid = &peer_item->oid;
>  				oidcpy(&rm->peer_ref->old_oid, old_oid);
> 

      reply	other threads:[~2018-10-31 14:50 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-26 21:28 [PATCH] fetch: replace string-list used as a look-up table with a hashmap Junio C Hamano
2018-09-26 22:59 ` Stefan Beller
2018-09-27  5:54   ` Jeff King
2018-09-29 18:31     ` Junio C Hamano
2018-09-30  4:57       ` Jeff King
2018-09-27  5:34 ` Jeff King
2018-09-30  2:11   ` Junio C Hamano
2018-10-19  3:48     ` [PATCH v3] " Junio C Hamano
2018-10-22  9:57       ` Johannes Schindelin
2018-10-27  6:47         ` Re*: " Junio C Hamano
2018-10-31 14:50           ` Johannes Schindelin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.1810311542560.4546@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.