All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 2/5] strmap: new utility functions
Date: Fri, 21 Aug 2020 15:48:57 -0400	[thread overview]
Message-ID: <20200821194857.GD1165@coredump.intra.peff.net> (raw)
In-Reply-To: <a86fd5fdcc47baf85046eb07257f4db9f9498084.1598035949.git.gitgitgadget@gmail.com>

On Fri, Aug 21, 2020 at 06:52:26PM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> Add strmap as a new struct and associated utility functions,
> specifically for hashmaps that map strings to some value.  The API is
> taken directly from Peff's proposal at
> https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

Uh oh. You are encouraging me in the belief that I can send half-baked
ideas to the list and somebody will come along and implement them for
me. ;)

> Peff only included the header, not the implementation, so it isn't clear what
> the structure was he was going to use for the hash entries.  Instead of having
> my str_entry struct have three subfields (the hashmap_entry, the string, and
> the void* value), I made it only have two -- the hashmap_entry and a
> string_list_item, for two reasons:

I'd probably have done:

  struct strmap_entry {
	struct hashmap_entry ent;
	void *value;
	char key[FLEX_ALLOC];
  };

That saves 8 bytes (plus malloc overhead)per item, plus avoids an extra
pointer-chase for each item we consider when looking up.

>   1) a strmap is often the data structure we want where string_list has
>      been used in the past.  Using the same building block for
>      individual entries in both makes it easier to adopt and reuse
>      parts of the string_list API in strmap.

I can see that there might be some value in being able to interchange
the items for code that's expecting a string_list_item. But I have to
wonder if the potential for confusion is worth it. I.e., should that
code really be expecting a raw string pointer (possibly with a separate
void pointer, or even better an actual typed pointer).

I'll keep an eye out as I read the rest of the series for code which
uses this.

>   2) In some cases, after doing lots of other work, I want to be able
>      to iterate over the items in my strmap in sorted order.  hashmap
>      obviously doesn't support that, but I wanted to be able to export
>      the strmap to a string_list easily and then use its functions.
>      (Note: I do not need the data structure to both be sorted and have
>      efficient lookup at all times.  If I did, I might use a B-tree
>      instead, as suggested by brian in response to Peff in the thread
>      noted above.  In my case, most strmaps will never need sorting, but
>      in one special case at the very end of a bunch of other work I want
>      to iterate over the items in sorted order without doing any more
>      lookups afterward.)

Hmm. Likewise, I'll keep an eye open for how this works in practice. I
do suspect that a B-tree might be a better solution here, but
implementing it is non-trivial, and most callers don't care about this
property.

If the interim solution is to just dump it to a string_list and sort
that, that's really not that bad, assuming it just happens once after
we've added everything. I'm not sure there's that big a benefit to using
string_list_item internally, since presumably that conversion needs to
write a whole new array of string_list_items anyway.

> Also, I removed the STRMAP_INIT macro, since it cannot be used to
> correctly initialize a strmap; the underlying hashmap needs a call to
> hashmap_init() to allocate the hash table first.

Since access to the underlying hashmap happens through strmap functions,
they could lazily initialize it. That's how oidmap works.

> diff --git a/strmap.c b/strmap.c
> new file mode 100644
> index 0000000000..1c9fdb3b1e
> --- /dev/null
> +++ b/strmap.c
> @@ -0,0 +1,81 @@
> +#include "git-compat-util.h"
> +#include "strmap.h"
> +
> +static int cmp_str_entry(const void *hashmap_cmp_fn_data,
> +			 const struct hashmap_entry *entry1,
> +			 const struct hashmap_entry *entry2,
> +			 const void *keydata)
> +{
> +	const struct str_entry *e1, *e2;
> +
> +	e1 = container_of(entry1, const struct str_entry, ent);
> +	e2 = container_of(entry2, const struct str_entry, ent);
> +	return strcmp(e1->item.string, e2->item.string);
> +}

If you do go the FLEX_ALLOC route, obviously lookups won't want to
allocate a str_entry struct for the lookup key. You'd use keydata there
(and prefer it over looking at entry2 at all). See remotes_hash_cmp()
for an example.

> +static struct str_entry *find_str_entry(struct strmap *map,
> +					const char *str)
> +{
> +	struct str_entry entry;
> +	hashmap_entry_init(&entry.ent, strhash(str));
> +	entry.item.string = (char *)str;
> +	return hashmap_get_entry(&map->map, &entry, ent, NULL);
> +}

Casting away constness here is awkward. It could likewise benefit from
using keydata, so you wouldn't need to create a temporary
string_list_item (which is where the non-constness comes from).

> +void strmap_clear(struct strmap *map, int free_util)
> +{
> +	struct hashmap_iter iter;
> +	struct str_entry *e;
> +
> +	if (!map)
> +		return;

In a lazy-init world, this becomes:

  if (!map || !map->map.table)

Of course it would be better still if the hashmap code learned to do the
lazy-init stuff itself.

> +	hashmap_for_each_entry(&map->map, &iter, e, ent /* member name */) {
> +		free(e->item.string);
> +		if (free_util)
> +			free(e->item.util);
> +	}
> +	hashmap_free_entries(&map->map, struct str_entry, ent);

With a flex-alloc struct, you can avoid the extra string free. But I
guess you still wouldn't avoid the loop if you want to support
free_entries().

I wonder if it would make the API simpler if the struct knew whether it
owned the void pointer values or not. Then you'd do:

  struct strmap foo = { .free_values = 1 };
  ...
  strmap_put(&foo, "key", value);
  ...
  strmap_clear(&foo);

and wouldn't have to remember to do the right thing at clear-time. It is
a little less flexible (e.g., if you transfer ownership after a certain
point in the code), but I wonder if any callers actually need that (and
they could always set the free_values flag then).

> +/*
> + * Insert "str" into the map, pointing to "data". A copy of "str" is made, so
> + * it does not need to persist after the this function is called.
> + *
> + * If an entry for "str" already exists, its data pointer is overwritten, and
> + * the original data pointer returned. Otherwise, returns NULL.
> + */
> +void *strmap_put(struct strmap *map, const char *str, void *data)

Minor, but IMHO we should avoid copying the docstrings to the
implementation, since it gives two places that people have to remember
to update if the API changes.

> +void *strmap_put(struct strmap *map, const char *str, void *data)
> +{
> +	struct str_entry *entry = find_str_entry(map, str);
> +	void *old = NULL;

In a lazy-init world, this is:

  if (!map->map.table) {
	strmap_init(map);
	entry = NULL;
  } else {
        entry = find_str_entry(map, str);
  }

(or just call find_str_entry() in both cases and let it realize there's
nothing to find).

> +	if (entry) {
> +		old = entry->item.util;
> +		entry->item.util = data;
> +	} else {
> +		entry = xmalloc(sizeof(*entry));
> +		hashmap_entry_init(&entry->ent, strhash(str));
> +		entry->item.string = strdup(str);
> +		entry->item.util = data;
> +		hashmap_add(&map->map, &entry->ent);
> +	}

And in a flex-alloc world, this second block is:

  FLEX_ALLOC_STR(entry, key, str);
  hashmap_entry_init(&entry->ent, strhash(str));
  entry->value = data;
  hashmap_add(&map->map, &entry->ent);

> +void *strmap_get(struct strmap *map, const char *str)
> +{
> +	struct str_entry *entry = find_str_entry(map, str);
> +	return entry ? entry->item.util : NULL;
> +}

In a lazy world, this is:

  if (!map->map.table)
          return NULL;

> +int strmap_contains(struct strmap *map, const char *str)
> +{
> +	return find_str_entry(map, str) != NULL;
> +}

And likewise:

  if (!map->map.table)
          return NULL;

It might actually be easier to stick that in find_str_entry().

The rest of it all looked good to me.

-Peff

  reply	other threads:[~2020-08-21 19:49 UTC|newest]

Thread overview: 144+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21 18:52 [PATCH 0/5] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
2020-08-21 18:52 ` [PATCH 1/5] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-08-21 19:22   ` Jeff King
2020-08-21 18:52 ` [PATCH 2/5] strmap: new utility functions Elijah Newren via GitGitGadget
2020-08-21 19:48   ` Jeff King [this message]
2020-08-21 18:52 ` [PATCH 3/5] strmap: add more " Elijah Newren via GitGitGadget
2020-08-21 19:58   ` Jeff King
2020-08-21 18:52 ` [PATCH 4/5] strmap: add strdup_strings option Elijah Newren via GitGitGadget
2020-08-21 20:01   ` Jeff King
2020-08-21 20:41     ` Elijah Newren
2020-08-21 21:03       ` Jeff King
2020-08-21 22:25         ` Elijah Newren
2020-08-28  7:08           ` Jeff King
2020-08-28 17:20             ` Elijah Newren
2020-08-21 18:52 ` [PATCH 5/5] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-08-21 20:10   ` Jeff King
2020-08-21 20:51     ` Elijah Newren
2020-08-21 21:05       ` Jeff King
2020-08-21 20:16 ` [PATCH 0/5] Add struct strmap and associated utility functions Jeff King
2020-08-21 21:33   ` Elijah Newren
2020-08-21 22:28     ` Elijah Newren
2020-08-28  7:03     ` Jeff King
2020-08-28 15:29       ` Elijah Newren
2020-09-01  9:27         ` Jeff King
2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
2020-10-13  0:40   ` [PATCH v2 01/10] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-10-30 12:50     ` Jeff King
2020-10-30 19:55       ` Elijah Newren
2020-11-03 16:26         ` Jeff King
2020-11-03 16:48           ` Elijah Newren
2020-10-13  0:40   ` [PATCH v2 02/10] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-10-30 12:51     ` Jeff King
2020-10-13  0:40   ` [PATCH v2 03/10] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-10-30 13:35     ` Jeff King
2020-10-30 15:37       ` Elijah Newren
2020-11-03 16:08         ` Jeff King
2020-11-03 16:16           ` Elijah Newren
2020-10-13  0:40   ` [PATCH v2 04/10] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-10-30 13:41     ` Jeff King
2020-10-30 16:03       ` Elijah Newren
2020-11-03 16:10         ` Jeff King
2020-10-13  0:40   ` [PATCH v2 05/10] strmap: new utility functions Elijah Newren via GitGitGadget
2020-10-30 14:12     ` Jeff King
2020-10-30 16:26       ` Elijah Newren
2020-10-13  0:40   ` [PATCH v2 06/10] strmap: add more " Elijah Newren via GitGitGadget
2020-10-30 14:23     ` Jeff King
2020-10-30 16:43       ` Elijah Newren
2020-11-03 16:12         ` Jeff King
2020-10-13  0:40   ` [PATCH v2 07/10] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-10-30 14:27     ` Jeff King
2020-10-13  0:40   ` [PATCH v2 08/10] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-10-30 14:39     ` Jeff King
2020-10-30 17:28       ` Elijah Newren
2020-11-03 16:20         ` Jeff King
2020-11-03 16:46           ` Elijah Newren
2020-10-13  0:40   ` [PATCH v2 09/10] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-10-30 14:44     ` Jeff King
2020-10-30 18:02       ` Elijah Newren
2020-10-13  0:40   ` [PATCH v2 10/10] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-10-30 14:56     ` Jeff King
2020-10-30 19:31       ` Elijah Newren
2020-11-03 16:24         ` Jeff King
2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 02/13] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 03/13] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 04/13] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 05/13] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 06/13] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 07/13] strmap: add more " Elijah Newren via GitGitGadget
2020-11-04 20:13       ` Jeff King
2020-11-04 20:24         ` Elijah Newren
2020-11-02 18:55     ` [PATCH v3 08/13] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-04 20:21       ` Jeff King
2020-11-02 18:55     ` [PATCH v3 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-04 20:31       ` Jeff King
2020-11-02 18:55     ` [PATCH v3 11/13] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-04 20:43       ` Jeff King
2020-11-02 18:55     ` [PATCH v3 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-04 20:48       ` Jeff King
2020-11-04 20:52     ` [PATCH v3 00/13] Add struct strmap and associated utility functions Jeff King
2020-11-04 22:20       ` Elijah Newren
2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 02/13] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 03/13] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 04/13] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 05/13] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 06/13] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 07/13] strmap: add more " Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 08/13] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 11/13] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-05 13:29       ` [PATCH v4 00/13] Add struct strmap and associated utility functions Jeff King
2020-11-05 20:25         ` Junio C Hamano
2020-11-05 21:17           ` Jeff King
2020-11-05 21:22           ` Elijah Newren
2020-11-05 22:15             ` Junio C Hamano
2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 02/15] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 03/15] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 04/15] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 05/15] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 06/15] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 07/15] strmap: add more " Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 08/15] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 09/15] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 10/15] strmap: split create_entry() out of strmap_put() Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 11/15] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-11 17:33           ` Phillip Wood
2020-11-11 18:49             ` Elijah Newren
2020-11-11 19:01             ` Jeff King
2020-11-11 20:34               ` Chris Torek
2020-11-06  0:24         ` [PATCH v5 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 14/15] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 15/15] shortlog: use strset from strmap.h Elijah Newren via GitGitGadget
2020-11-06  2:00         ` [PATCH v5 00/15] Add struct strmap and associated utility functions Junio C Hamano
2020-11-06  2:42           ` Elijah Newren
2020-11-06  2:48             ` Jeff King
2020-11-06 17:32               ` Junio C Hamano
2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 02/15] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 03/15] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 04/15] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 05/15] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 06/15] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 07/15] strmap: add more " Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 08/15] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 09/15] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 10/15] strmap: split create_entry() out of strmap_put() Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 11/15] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 14/15] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 15/15] shortlog: use strset from strmap.h Elijah Newren via GitGitGadget
2020-11-11 20:07           ` [PATCH v6 00/15] Add struct strmap and associated utility functions Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200821194857.GD1165@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.