All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: Slavica Djukic via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Slavica Djukic <slawica92@hotmail.com>
Subject: Re: [PATCH 07/11] Add a function to determine unique prefixes for a list of strings
Date: Mon, 13 May 2019 14:48:18 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.1905131405210.44@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <1be9b470-6daa-a50e-ba3a-432520721b0f@jeffhostetler.com>

Hi Jeff,

On Thu, 18 Apr 2019, Jeff Hostetler wrote:

> On 4/10/2019 1:37 PM, Slavica Djukic via GitGitGadget wrote:
> > From: Slavica Djukic <slawica92@hotmail.com>
> >
> > In the `git add -i` command, we show unique prefixes of the commands and
> > files, to give an indication what prefix would select them.
> >
> > Naturally, the C implementation looks a lot different than the Perl
> > implementation: in Perl, a trie is much easier implemented, while we
> > already have a pretty neat hashmap implementation in C that we use for
> > the purpose of storing (not necessarily unique) prefixes.
> >
> > The idea: for each item that we add, we generate prefixes starting with
> > the first letter, then the first two letters, then three, etc, until we
> > find a prefix that is unique (or until the prefix length would be
> > longer than we want). If we encounter a previously-unique prefix on the
> > way, we adjust that item's prefix to make it unique again (or we mark it
> > as having no unique prefix if we failed to find one). These partial
> > prefixes are stored in a hash map (for quick lookup times).
> >
> > To make sure that this function works as expected, we add a test using a
> > special-purpose test helper that was added for that purpose.
> >
> > Note: We expect the list of prefix items to be passed in as a list of
> > pointers rather than as regular list to avoid having to copy information
> > (the actual items will most likely contain more information than just
> > the name and the length of the unique prefix, but passing in `struct
> > prefix_item *` would not allow for that).
> >
> > Signed-off-by: Slavica Djukic <slawica92@hotmail.com>
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> > diff --git a/prefix-map.c b/prefix-map.c
> > new file mode 100644
> > index 0000000000..3c5ae4ae0a
> > --- /dev/null
> > +++ b/prefix-map.c
> > @@ -0,0 +1,111 @@
> > +#include "cache.h"
> > +#include "prefix-map.h"
> > +
> > +static int map_cmp(const void *unused_cmp_data,
> > +		   const void *entry,
> > +		   const void *entry_or_key,
> > +		   const void *unused_keydata)
> > +{
> > +	const struct prefix_map_entry *a = entry;
> > +	const struct prefix_map_entry *b = entry_or_key;
> > +
> > +	return a->prefix_length != b->prefix_length ||
> > +		strncmp(a->name, b->name, a->prefix_length);
> > +}
> > +
> > +static void add_prefix_entry(struct hashmap *map, const char *name,
> > +			     size_t prefix_length, struct prefix_item *item)
> > +{
> > +	struct prefix_map_entry *result = xmalloc(sizeof(*result));
> > +	result->name = name;
> > +	result->prefix_length = prefix_length;
> > +	result->item = item;
> > +	hashmap_entry_init(result, memhash(name, prefix_length));
> > +	hashmap_add(map, result);
> > +}
> > +
> > +static void init_prefix_map(struct prefix_map *prefix_map,
> > +			    int min_prefix_length, int max_prefix_length)
> > +{
> > +	hashmap_init(&prefix_map->map, map_cmp, NULL, 0);
> > +	prefix_map->min_length = min_prefix_length;
> > +	prefix_map->max_length = max_prefix_length;
> > +}
> > +
> > +static void add_prefix_item(struct prefix_map *prefix_map,
> > +			    struct prefix_item *item)
> > +{
> > +	struct prefix_map_entry *e = xmalloc(sizeof(*e)), *e2;
> > +	int j;
> > +
> > +	e->item = item;
> > +	e->name = e->item->name;
> > +
> > +	for (j = prefix_map->min_length; j <= prefix_map->max_length; j++) {
> > +		if (!isascii(e->name[j])) {
>
> This feels odd, if I understand the intent.
>
> First, why "isascii()" rather than just non-zero?

That's to imitate `git-add--interactive.perl`'s

	if (ord($letters[0]) > 127 ||
	    ($soft_limit && $j + 1 > $soft_limit))

See https://github.com/git/git/blob/v2.21.0/git-add--interactive.perl#L410
for more complete context.

I think the main benefit here is that we avoid running into the trap of
using incomplete UTF-8 multi-byte sequences in prefixes.

I guess we could throw in an extra safety on the C side by excluding
control characters, too. But that would be a deviation from Perl, and I
actually do not even feel strongly about excluding, say, a HT (horizontal
tab) from the prefixes.

> But mainly, can we walk off the end of the array and read
> potentially uninitialized memory?  Shouldn't we have something
> at the top of the function like:
>
>     len = strlen(item->name);
>     if (len < prefix_map->min_length)
>         return;

Ooops, you're right. But I would not use `strlen() here, we can easily
just add `&& e->name[j]` to the loop condition.

> (And maybe avoid the xmalloc() too?)

Hmm. At first, I thought: no, we use `*e` *both* for lookup and for adding
a new item once we did not find any existing for the current prefix
length.

But it does indeed become a lot clearer when I separate those. It's not
even performance or memory critical a code path.

> And maybe do " j <= min(len, max_length) " in the loop?
> But I see you're modifying "j" down in the body of the loop,
> so I'll wait on suggesting that.
>
> > +			free(e);
> > +			break;
> > +		}
> > +
> > +		e->prefix_length = j;
> > +		hashmap_entry_init(e, memhash(e->name, j));
> > +		e2 = hashmap_get(&prefix_map->map, e, NULL);
> > +		if (!e2) {
> > +			/* prefix is unique so far */
> > +			e->item->prefix_length = j;
> > +			hashmap_add(&prefix_map->map, e);
> > +			break;
> > +		}
> > +
> > +		if (!e2->item)
> > +			continue; /* non-unique prefix */
> > +
> > +		if (j != e2->item->prefix_length)
> > +			BUG("unexpected prefix length: %d != %d",
> > +			    (int)j, (int)e2->item->prefix_length);
>
> IIUC, this assurance comes directly from map_cmp(), right?
> We could strengthen this to
>      (j != e2->item->prefix_length || strncmp(...))
> if we wanted to, right?

Right, I'll actually go for `memcmp()` here, but the idea is the same.

> > +
> > +		/* skip common prefix */
> > +		for (; j < prefix_map->max_length && e->name[j]; j++) {
> > +			if (e->item->name[j] != e2->item->name[j])
> > +				break;
>
> Same comment here about walking off of the defined end of both arrays.

Actually, no, not here, as I already test for `e->name[j]` in the loop
condition. If we reach the end of `e2->item->name`, the inner condition
will break out of the loop.

> I'm going to stop here.  I'm getting confused.

Oh no ;-)

Thank you for your helpful comments!

Ciao,
Dscho

  reply	other threads:[~2019-05-13 12:48 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-10 17:37 [PATCH 00/11] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-04-10 17:37 ` [PATCH 01/11] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-04-18 14:31   ` Jeff Hostetler
2019-04-18 16:06     ` Jeff King
2019-04-30 23:40       ` Johannes Schindelin
2019-05-01  2:21         ` Jeff King
2019-05-13 11:14           ` Johannes Schindelin
2019-04-10 17:37 ` [PATCH 02/11] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-04-10 17:37 ` [PATCH 03/11] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-04-10 17:37 ` [PATCH 04/11] built-in add -i: refresh the index before running `status` Johannes Schindelin via GitGitGadget
2019-04-10 17:37 ` [PATCH 05/11] built-in add -i: color the header in the `status` command Johannes Schindelin via GitGitGadget
2019-04-10 17:37 ` [PATCH 06/11] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-04-18 16:49   ` Jeff Hostetler
2019-05-13 12:04     ` Johannes Schindelin
2019-04-10 17:37 ` [PATCH 07/11] Add a function to determine unique prefixes for a list of strings Slavica Djukic via GitGitGadget
2019-04-18 17:57   ` Jeff Hostetler
2019-05-13 12:48     ` Johannes Schindelin [this message]
2019-04-10 17:37 ` [PATCH 08/11] built-in add -i: show unique prefixes of the commands Slavica Djukic via GitGitGadget
2019-04-10 17:37 ` [PATCH 09/11] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-04-10 17:37 ` [PATCH 10/11] built-in add -i: use color in the main loop Slavica Djukic via GitGitGadget
2019-04-10 17:37 ` [PATCH 11/11] built-in add -i: implement the `help` command Johannes Schindelin via GitGitGadget
2019-05-13 17:27 ` [PATCH v2 00/11] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 01/11] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 02/11] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 03/11] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 04/11] built-in add -i: refresh the index before running `status` Johannes Schindelin via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 05/11] built-in add -i: color the header in the `status` command Johannes Schindelin via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 06/11] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 07/11] Add a function to determine unique prefixes for a list of strings Slavica Djukic via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 08/11] built-in add -i: show unique prefixes of the commands Slavica Djukic via GitGitGadget
2019-05-13 17:28   ` [PATCH v2 09/11] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-05-13 17:28   ` [PATCH v2 10/11] built-in add -i: use color in the main loop Slavica Djukic via GitGitGadget
2019-05-13 17:28   ` [PATCH v2 11/11] built-in add -i: implement the `help` command Johannes Schindelin via GitGitGadget
2019-07-16 14:58   ` [PATCH v3 00/11] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-07-16 14:58     ` [PATCH v3 01/11] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-07-31 17:52       ` Junio C Hamano
2019-08-26 21:26         ` Johannes Schindelin
2019-08-27 22:25           ` Junio C Hamano
2019-08-28 15:06             ` Johannes Schindelin
2019-08-28 15:37               ` Junio C Hamano
2019-07-16 14:58     ` [PATCH v3 02/11] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-07-31 17:59       ` Junio C Hamano
2019-08-27  9:22         ` Johannes Schindelin
2019-07-16 14:58     ` [PATCH v3 03/11] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-07-31 18:12       ` Junio C Hamano
2019-08-27 10:04         ` Johannes Schindelin
2019-07-16 14:58     ` [PATCH v3 04/11] built-in add -i: refresh the index before running `status` Johannes Schindelin via GitGitGadget
2019-07-16 14:58     ` [PATCH v3 05/11] built-in add -i: color the header in the `status` command Johannes Schindelin via GitGitGadget
2019-07-16 14:58     ` [PATCH v3 06/11] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-07-31 18:14       ` Junio C Hamano
2019-07-16 14:58     ` [PATCH v3 08/11] built-in add -i: show unique prefixes of the commands Slavica Djukic via GitGitGadget
2019-07-16 14:58     ` [PATCH v3 07/11] Add a function to determine unique prefixes for a list of strings Slavica Djukic via GitGitGadget
2019-07-31 18:39       ` Junio C Hamano
2019-08-24 12:38       ` SZEDER Gábor
2019-08-27 12:14         ` Johannes Schindelin
2019-08-28 16:30           ` SZEDER Gábor
2019-08-28 16:34             ` [PATCH] [PoC] A simpler find_unique_prefixes() implementation SZEDER Gábor
2019-08-30 20:12               ` Johannes Schindelin
2019-07-16 14:58     ` [PATCH v3 09/11] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-07-16 14:58     ` [PATCH v3 10/11] built-in add -i: use color in the main loop Slavica Djukic via GitGitGadget
2019-07-16 14:58     ` [PATCH v3 11/11] built-in add -i: implement the `help` command Johannes Schindelin via GitGitGadget
2019-08-02 21:04       ` Junio C Hamano
2019-08-02 22:26         ` Jeff King
2019-07-16 18:38     ` [PATCH v3 00/11] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin
2019-08-02 21:06       ` Junio C Hamano
2019-08-27 12:57     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 01/11] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 02/11] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 04/11] built-in add -i: refresh the index before running `status` Johannes Schindelin via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 03/11] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 05/11] built-in add -i: color the header in " Johannes Schindelin via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 07/11] Add a function to determine unique prefixes for a list of strings Slavica Djukic via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 06/11] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 08/11] built-in add -i: show unique prefixes of the commands Slavica Djukic via GitGitGadget
2019-08-27 12:58       ` [PATCH v4 09/11] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-08-27 12:58       ` [PATCH v4 10/11] built-in add -i: use color in the main loop Slavica Djukic via GitGitGadget
2019-08-27 12:58       ` [PATCH v4 11/11] built-in add -i: implement the `help` command Johannes Schindelin via GitGitGadget
2019-11-04 12:15       ` [PATCH v5 0/9] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-11-04 12:15         ` [PATCH v5 1/9] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-11-08  4:49           ` Junio C Hamano
2019-11-09 11:06             ` Johannes Schindelin
2019-11-10  7:18               ` Junio C Hamano
2019-11-11  9:15                 ` Johannes Schindelin
2019-11-11 12:09                   ` Junio C Hamano
2019-11-12 15:03                     ` Johannes Schindelin
2019-11-13  3:54                       ` Junio C Hamano
2019-11-13 12:30                         ` Johannes Schindelin
2019-11-13 14:01                           ` Junio C Hamano
2019-11-04 12:15         ` [PATCH v5 2/9] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-11-08  4:56           ` Junio C Hamano
2019-11-04 12:15         ` [PATCH v5 3/9] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-11-08  5:01           ` Junio C Hamano
2019-11-04 12:15         ` [PATCH v5 4/9] built-in add -i: color the header in " Slavica Đukić via GitGitGadget
2019-11-04 12:15         ` [PATCH v5 5/9] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-11-08  5:17           ` Junio C Hamano
2019-11-09 11:21             ` Johannes Schindelin
2019-11-04 12:15         ` [PATCH v5 6/9] built-in add -i: show unique prefixes of the commands Johannes Schindelin via GitGitGadget
2019-11-04 12:15         ` [PATCH v5 7/9] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-11-04 12:15         ` [PATCH v5 8/9] built-in add -i: use color in the main loop Slavica Đukić via GitGitGadget
2019-11-04 12:15         ` [PATCH v5 9/9] built-in add -i: implement the `help` command Slavica Đukić via GitGitGadget
2019-11-13 12:40         ` [PATCH v6 0/9] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-11-13 12:40           ` [PATCH v6 1/9] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-11-14  2:15             ` Junio C Hamano
2019-11-14 15:07               ` Johannes Schindelin
2019-11-15  4:35                 ` Junio C Hamano
2019-11-13 12:40           ` [PATCH v6 2/9] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-11-13 12:40           ` [PATCH v6 3/9] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-11-13 12:41           ` [PATCH v6 4/9] built-in add -i: color the header in " Slavica Đukić via GitGitGadget
2019-11-13 12:41           ` [PATCH v6 5/9] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-11-13 12:41           ` [PATCH v6 6/9] built-in add -i: show unique prefixes of the commands Johannes Schindelin via GitGitGadget
2019-11-13 12:41           ` [PATCH v6 7/9] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-11-13 12:41           ` [PATCH v6 8/9] built-in add -i: use color in the main loop Slavica Đukić via GitGitGadget
2019-11-13 12:41           ` [PATCH v6 9/9] built-in add -i: implement the `help` command Slavica Đukić via GitGitGadget
2019-11-13 12:46           ` [PATCH v6 0/9] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin
2019-11-15 11:11           ` [PATCH v7 " Johannes Schindelin via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 1/9] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 2/9] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 3/9] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 4/9] built-in add -i: color the header in " Slavica Đukić via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 5/9] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 6/9] built-in add -i: show unique prefixes of the commands Johannes Schindelin via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 7/9] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 8/9] built-in add -i: use color in the main loop Slavica Đukić via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 9/9] built-in add -i: implement the `help` command Slavica Đukić via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.1905131405210.44@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=slawica92@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.