All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] fetch: replace string-list used as a look-up table with a hashmap
Date: Thu, 27 Sep 2018 01:34:19 -0400	[thread overview]
Message-ID: <20180927053418.GB14178@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqqin2sj6df.fsf@gitster-ct.c.googlers.com>

On Wed, Sep 26, 2018 at 02:28:28PM -0700, Junio C Hamano wrote:

> In find_non_local_tags() helper function (used to implement the
> "follow tags"), we use string_list_has_string() on two string lists
> as a way to see if a refname has already been processed, etc.
> 
> All this code predates more modern in-core lookup API like hashmap;
> replace them with two hashmaps and one string list---the hashmaps
> are used for look-ups and the string list is to keep the order of
> items in the returned result stable (which is the only single thing
> hashmap does worse than lookups on string-list).
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> 
>  * Just to remind ourselves that we talked about getting rid of
>    string-list used as look-up tables by replacing them with hashmap
>    but haven't seen much effort expended on it.  I think this is my
>    first semi-serious use of hashmap, and the code is expected to be
>    full of newbie mistakes, inefficiencies and ignorance of idioms;
>    pointing out any of which is much appreciated.

As you probably noticed, there's a bit of boilerplate required to use a
hashmap. I had figured we could replace most of these with a single
"strmap" API to map a string to a void pointer (which is basically
what string-list gives you).

That would save on the boilerplate. But your solution here replaces a
"void *" pointer with an actual "struct object_id" member, which
improves type-safety. It also removes questions about memory lifetimes
(at the minor cost of copying the oids). So this path is probably better
if we don't mind a little extra code.

I do note that your struct just has the same information as "struct
ref":

> +struct refname_hash_entry {
> +	struct hashmap_entry ent; /* must be the first member */
> +	struct object_id oid;
> +	char refname[FLEX_ARRAY];
> +};

So yet another alternative here is to just define a single hashmap that
stores void pointers. That also throws away some type safety, but is
maybe conceptually simpler. I dunno. It's actually a pain to do that
with "struct hashmap" because it requires the caller to handle
allocation. An open-addressed hash table, as we use elsewhere (and in
khash.h) is a bit simpler, since it doesn't need to do any per-entry
malloc.

To be clear, I'm perfectly happy with the approach in your patch here.
I'm just musing on what might might be the least painful thing for doing
more of them.

-Peff

  parent reply	other threads:[~2018-09-27  5:34 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-26 21:28 [PATCH] fetch: replace string-list used as a look-up table with a hashmap Junio C Hamano
2018-09-26 22:59 ` Stefan Beller
2018-09-27  5:54   ` Jeff King
2018-09-29 18:31     ` Junio C Hamano
2018-09-30  4:57       ` Jeff King
2018-09-27  5:34 ` Jeff King [this message]
2018-09-30  2:11   ` Junio C Hamano
2018-10-19  3:48     ` [PATCH v3] " Junio C Hamano
2018-10-22  9:57       ` Johannes Schindelin
2018-10-27  6:47         ` Re*: " Junio C Hamano
2018-10-31 14:50           ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180927053418.GB14178@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.