Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v2 05/10] strmap: new utility functions
Date: Fri, 30 Oct 2020 09:26:02 -0700
Message-ID: <CABPp-BGTjUdgmiEneVNsLWyODZtH-d+x6uTFfwbOtyba3K76HQ@mail.gmail.com> (raw)
In-Reply-To: <20201030141233.GE3277724@coredump.intra.peff.net>

On Fri, Oct 30, 2020 at 7:12 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Oct 13, 2020 at 12:40:45AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > Add strmap as a new struct and associated utility functions,
> > specifically for hashmaps that map strings to some value.  The API is
> > taken directly from Peff's proposal at
> > https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/
>
> This looks overall sane to me. I mentioned elsewhere that we could be
> using FLEXPTR_ALLOC to save an extra allocation. I think it's easy and
> worth doing here, as the logic would be completely contained within
> strmap_put():
>
>   if (strdup_strings)
>         FLEXPTR_ALLOC_STR(entry, key, str);
>   else {
>         entry = xmalloc(sizeof(*entry));
>         entry->key = str;
>   }
>
> And free_entries() then doesn't even have to care about strdup_strings.

Yeah, as you noted in your review of 10/10 this idea wouldn't play
well with the later mem_pool changes.

> > A couple of items of note:
> >
> >   * Similar to string-list, I have a strdup_strings setting.  However,
> >     unlike string-list, strmap_init() does not take a parameter for this
> >     setting and instead automatically sets it to 1; callers who want to
> >     control this detail need to instead call strmap_ocd_init().
>
> That seems reasonable. It could just be a parameter, but I like that you
> push people in the direction of doing the simple and safe thing, rather
> than having them wonder whether they ought to set strdup_strings or not.

Well, in my first round of the series where I did make it a parameter
you balked pretty loudly.  ;-)

> >   * I do not have a STRMAP_INIT macro.  I could possibly add one, but
> >       #define STRMAP_INIT { { NULL, cmp_str_entry, NULL, 0, 0, 0, 0, 0 }, 1 }
> >     feels a bit unwieldy and possibly error-prone in terms of future
> >     expansion of the hashmap struct.  The fact that cmp_str_entry needs to
> >     be in there prevents us from passing all zeros for the hashmap, and makes
> >     me worry that STRMAP_INIT would just be more trouble than it is worth.
>
> You can actually omit everything after cmp_str_entry, and those fields
> would all get zero-initialized. But we also allow C99 designed
> initializers these days. Coupled with the HASHMAP_INIT() I mentioned in
> the earlier email, you'd have:
>
>   #define STRMAP_INIT { \
>                 .map = HASHMAP_INIT(cmp_strmap_entry, NULL), \
>                 .strdup_strings = 1, \
>           }
>
> which seems pretty maintainable.

Makes sense; will add.

> > +static int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
> > +                         const struct hashmap_entry *entry1,
> > +                         const struct hashmap_entry *entry2,
> > +                         const void *keydata)
> > +{
> > +     const struct strmap_entry *e1, *e2;
> > +
> > +     e1 = container_of(entry1, const struct strmap_entry, ent);
> > +     e2 = container_of(entry2, const struct strmap_entry, ent);
> > +     return strcmp(e1->key, e2->key);
> > +}
>
> I expected to use keydata here, but it's pretty easy to make a fake
> strmap_entry because of the use of the "key" pointer. So that makes
> sense.
>
> > +static void strmap_free_entries_(struct strmap *map, int free_util)
>
> You use the term "value" for the mapped-to value in this iteration. So
> perhaps free_values here (and in other functions) would be a better
> name?

Oops, yes, definitely.

> > +     /*
> > +      * We need to iterate over the hashmap entries and free
> > +      * e->key and e->value ourselves; hashmap has no API to
> > +      * take care of that for us.  Since we're already iterating over
> > +      * the hashmap, though, might as well free e too and avoid the need
> > +      * to make some call into the hashmap API to do that.
> > +      */
> > +     hashmap_for_each_entry(&map->map, &iter, e, ent) {
> > +             if (free_util)
> > +                     free(e->value);
> > +             if (map->strdup_strings)
> > +                     free((char*)e->key);
> > +             free(e);
> > +     }
> > +}
>
> Yep, makes sense.
>
> > +void strmap_clear(struct strmap *map, int free_util)
> > +{
> > +     strmap_free_entries_(map, free_util);
> > +     hashmap_free(&map->map);
> > +}
>
> This made me wonder about a partial_clear(), but it looks like that
> comes later.
>
> > +void *strmap_put(struct strmap *map, const char *str, void *data)
> > +{
> > +     struct strmap_entry *entry = find_strmap_entry(map, str);
> > +     void *old = NULL;
> > +
> > +     if (entry) {
> > +             old = entry->value;
> > +             entry->value = data;
>
> Here's a weird hypothetical. If strdup_strings is not set and I do:
>
>   const char *one = xstrdup("foo");
>   const char *two = xstrdup("foo");
>
>   hashmap_put(map, one, x);
>   hashmap_put(map, two, y);
>
> it's clear that the value should be pointing to "y" afterwards (and you
> return "x" so the caller can free it or whatever, good).
>
> But which key should the entry be pointing to? The old one or the new
> one? I'm trying and failing to think of a case where it would matter.
> Certainly I could add a free() to the toy above where it would, but it
> feels like a real caller would have to have pretty convoluted memory
> lifetime semantics for it to make a difference.
>
> So I'm not really asking for a particular behavior, but just bringing it
> up in case you can think of something relevant.

I'll keep mulling it over, but I likewise can't currently think of a
case where it'd matter.

>
> > +     } else {
> > +             /*
> > +              * We won't modify entry->key so it really should be const.
> > +              */
> > +             const char *key = str;
>
> The "should be" here confused me. It _is_ const. I'd probably just
> delete the comment entirely, but perhaps:
>
>   /*
>    * We'll store a const pointer. For non-duplicated strings, they belong
>    * to the caller and we received them as const in the first place. For
>    * our duplicated ones, they do point to memory we own, but they're
>    * still conceptually constant within the lifetime of an entry.
>    */
>
> Though it might make more sense in the struct definition, not here.

Either I was (mistakenly) worried about "I'm going to allocate and
copy, but during the copy it isn't actually const", or this is a
leftover artifact from some of the other iterations I tried.  Anyway,
I think this comment isn't useful; I'll just strike it.

> > +void *strmap_get(struct strmap *map, const char *str)
> > +{
> > +     struct strmap_entry *entry = find_strmap_entry(map, str);
> > +     return entry ? entry->value : NULL;
> > +}
>
> Just noting that the caller can't tell the difference between "no such
> entry" and "the entry is storing NULL". I think the simplicity offered
> by this interface makes it worth having (and being the primary one). If
> some caller really needs to tell the difference between the two, we can
> add another function later.
>
> Obviously they could use strmap_contains(), but that would mean two hash
> lookups.

Yep, addressed later by strmap_get_entry() in another patch, as you
noticed in your later review.

> > +/*
> > + * Same as strmap_init, but for those who want to control the memory management
> > + * carefully instead of using the default of strdup_strings=1.
> > + * (OCD = Obsessive Compulsive Disorder, a joke that those who use this function
> > + * are obsessing over minor details.)
> > + */
> > +void strmap_ocd_init(struct strmap *map,
> > +                  int strdup_strings);
>
> I'm not personally bothered by this name, but I wonder if some people
> may be (because they have or know somebody who actually has OCD).
>
> Perhaps strmap_init_with_options() would be a better name? It likewise
> would extend well if we want to add other non-default options later.

Doh!  That's going to push a bunch of lines past 80 characters.  Sigh...

It's probably a better name though; I'll change it.

  reply index

Thread overview: 144+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21 18:52 [PATCH 0/5] Add struct strmap and associated " Elijah Newren via GitGitGadget
2020-08-21 18:52 ` [PATCH 1/5] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-08-21 19:22   ` Jeff King
2020-08-21 18:52 ` [PATCH 2/5] strmap: new utility functions Elijah Newren via GitGitGadget
2020-08-21 19:48   ` Jeff King
2020-08-21 18:52 ` [PATCH 3/5] strmap: add more " Elijah Newren via GitGitGadget
2020-08-21 19:58   ` Jeff King
2020-08-21 18:52 ` [PATCH 4/5] strmap: add strdup_strings option Elijah Newren via GitGitGadget
2020-08-21 20:01   ` Jeff King
2020-08-21 20:41     ` Elijah Newren
2020-08-21 21:03       ` Jeff King
2020-08-21 22:25         ` Elijah Newren
2020-08-28  7:08           ` Jeff King
2020-08-28 17:20             ` Elijah Newren
2020-08-21 18:52 ` [PATCH 5/5] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-08-21 20:10   ` Jeff King
2020-08-21 20:51     ` Elijah Newren
2020-08-21 21:05       ` Jeff King
2020-08-21 20:16 ` [PATCH 0/5] Add struct strmap and associated utility functions Jeff King
2020-08-21 21:33   ` Elijah Newren
2020-08-21 22:28     ` Elijah Newren
2020-08-28  7:03     ` Jeff King
2020-08-28 15:29       ` Elijah Newren
2020-09-01  9:27         ` Jeff King
2020-10-13  0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
2020-10-13  0:40   ` [PATCH v2 01/10] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-10-30 12:50     ` Jeff King
2020-10-30 19:55       ` Elijah Newren
2020-11-03 16:26         ` Jeff King
2020-11-03 16:48           ` Elijah Newren
2020-10-13  0:40   ` [PATCH v2 02/10] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-10-30 12:51     ` Jeff King
2020-10-13  0:40   ` [PATCH v2 03/10] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-10-30 13:35     ` Jeff King
2020-10-30 15:37       ` Elijah Newren
2020-11-03 16:08         ` Jeff King
2020-11-03 16:16           ` Elijah Newren
2020-10-13  0:40   ` [PATCH v2 04/10] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-10-30 13:41     ` Jeff King
2020-10-30 16:03       ` Elijah Newren
2020-11-03 16:10         ` Jeff King
2020-10-13  0:40   ` [PATCH v2 05/10] strmap: new utility functions Elijah Newren via GitGitGadget
2020-10-30 14:12     ` Jeff King
2020-10-30 16:26       ` Elijah Newren [this message]
2020-10-13  0:40   ` [PATCH v2 06/10] strmap: add more " Elijah Newren via GitGitGadget
2020-10-30 14:23     ` Jeff King
2020-10-30 16:43       ` Elijah Newren
2020-11-03 16:12         ` Jeff King
2020-10-13  0:40   ` [PATCH v2 07/10] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-10-30 14:27     ` Jeff King
2020-10-13  0:40   ` [PATCH v2 08/10] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-10-30 14:39     ` Jeff King
2020-10-30 17:28       ` Elijah Newren
2020-11-03 16:20         ` Jeff King
2020-11-03 16:46           ` Elijah Newren
2020-10-13  0:40   ` [PATCH v2 09/10] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-10-30 14:44     ` Jeff King
2020-10-30 18:02       ` Elijah Newren
2020-10-13  0:40   ` [PATCH v2 10/10] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-10-30 14:56     ` Jeff King
2020-10-30 19:31       ` Elijah Newren
2020-11-03 16:24         ` Jeff King
2020-11-02 18:55   ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 02/13] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 03/13] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 04/13] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 05/13] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 06/13] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 07/13] strmap: add more " Elijah Newren via GitGitGadget
2020-11-04 20:13       ` Jeff King
2020-11-04 20:24         ` Elijah Newren
2020-11-02 18:55     ` [PATCH v3 08/13] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-04 20:21       ` Jeff King
2020-11-02 18:55     ` [PATCH v3 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-04 20:31       ` Jeff King
2020-11-02 18:55     ` [PATCH v3 11/13] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-02 18:55     ` [PATCH v3 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-04 20:43       ` Jeff King
2020-11-02 18:55     ` [PATCH v3 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-04 20:48       ` Jeff King
2020-11-04 20:52     ` [PATCH v3 00/13] Add struct strmap and associated utility functions Jeff King
2020-11-04 22:20       ` Elijah Newren
2020-11-05  0:22     ` [PATCH v4 " Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 02/13] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 03/13] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 04/13] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 05/13] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 06/13] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 07/13] strmap: add more " Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 08/13] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 11/13] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-05  0:22       ` [PATCH v4 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-05 13:29       ` [PATCH v4 00/13] Add struct strmap and associated utility functions Jeff King
2020-11-05 20:25         ` Junio C Hamano
2020-11-05 21:17           ` Jeff King
2020-11-05 21:22           ` Elijah Newren
2020-11-05 22:15             ` Junio C Hamano
2020-11-06  0:24       ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 02/15] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 03/15] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 04/15] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 05/15] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 06/15] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 07/15] strmap: add more " Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 08/15] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 09/15] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 10/15] strmap: split create_entry() out of strmap_put() Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 11/15] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-11 17:33           ` Phillip Wood
2020-11-11 18:49             ` Elijah Newren
2020-11-11 19:01             ` Jeff King
2020-11-11 20:34               ` Chris Torek
2020-11-06  0:24         ` [PATCH v5 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 14/15] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-06  0:24         ` [PATCH v5 15/15] shortlog: use strset from strmap.h Elijah Newren via GitGitGadget
2020-11-06  2:00         ` [PATCH v5 00/15] Add struct strmap and associated utility functions Junio C Hamano
2020-11-06  2:42           ` Elijah Newren
2020-11-06  2:48             ` Jeff King
2020-11-06 17:32               ` Junio C Hamano
2020-11-11 20:02         ` [PATCH v6 " Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 02/15] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 03/15] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 04/15] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 05/15] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 06/15] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 07/15] strmap: add more " Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 08/15] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 09/15] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 10/15] strmap: split create_entry() out of strmap_put() Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 11/15] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 14/15] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-11 20:02           ` [PATCH v6 15/15] shortlog: use strset from strmap.h Elijah Newren via GitGitGadget
2020-11-11 20:07           ` [PATCH v6 00/15] Add struct strmap and associated utility functions Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BGTjUdgmiEneVNsLWyODZtH-d+x6uTFfwbOtyba3K76HQ@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git