All of lore.kernel.org
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, "SZEDER Gábor" <szeder.dev@gmail.com>
Subject: Re: [PATCH v2 05/13] midx: check both pack and index names for containment
Date: Fri, 5 Apr 2019 22:18:17 +0200	[thread overview]
Message-ID: <8f188559-c1a9-7717-4b2c-a7397cfaa6bc@web.de> (raw)
In-Reply-To: <20190405180604.GE32243@sigill.intra.peff.net>

Am 05.04.2019 um 20:06 schrieb Jeff King:
> A midx file (and the struct we parse from it) contains a list of all of
> the covered packfiles, mentioned by their ".idx" names (e.g.,
> "pack-1234.idx", etc). And thus calls to midx_contains_pack() expect
> callers to provide the idx name.
>
> This works for most of the calls, but the one in open_packed_git_1()
> tries to feed a packed_git->pack_name, which is the ".pack" name,
> meaning we'll never find a match (even if the pack is covered by the
> midx).
>
> We can fix this by converting the ".pack" to ".idx" in the caller.
> However, that requires allocating a new string. Instead, let's make
> midx_contains_pack() a bit friendlier, and allow it take _either_ the
> .pack or .idx variant.
>
> All cleverness in the matching code is credited to René. Bugs are mine.

I didn't consider it to be particularly tricky -- but then kept the dots
in both filename extension strings, which is not going to fly, as they
are skipped by the common-prefix loop..  Thanks for fixing that.

> There's no test here, because while this does fix _a_ bug, it's masked
> by another bug in that same caller. That will be covered (with a test)
> in the next patch.
>
> Helped-by: René Scharfe <l.s.r@web.de>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I was tempted to suggest that the midx struct just store the base name
> without ".idx" at all, but having callers pass that is no less tricky
> than passing ".idx" (they still have to allocate a new string).

The middle part of the comparison function would become:

	if (!*idx_name && (!strcmp(idx_or_pack_name, ".idx") ||
			   !strcmp(idx_or_pack_name, ".pack"))
		return 0;

No allocations needed -- except when building the list.

(And this time we'd actually need the dots.)

>
>  midx.c | 36 ++++++++++++++++++++++++++++++++++--
>  midx.h |  2 +-
>  2 files changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/midx.c b/midx.c
> index 8a505fd423..0ceca1938f 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -307,7 +307,39 @@ int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct mu
>  	return nth_midxed_pack_entry(m, e, pos);
>  }
>
> -int midx_contains_pack(struct multi_pack_index *m, const char *idx_name)
> +/* Match "foo.idx" against either "foo.pack" _or_ "foo.idx". */
> +static int cmp_idx_or_pack_name(const char *idx_or_pack_name,
> +				const char *idx_name)
> +{
> +	/* Skip past any initial matching prefix. */
> +	while (*idx_name && *idx_name == *idx_or_pack_name) {
> +		idx_name++;
> +		idx_or_pack_name++;
> +	}
> +
> +	/*
> +	 * If we didn't match completely, we may have matched "pack-1234." and
> +	 * be left with "idx" and "pack" respectively, which is also OK. We do
> +	 * not have to check for "idx" and "idx", because that would have been
> +	 * a complete match (and in that case these strcmps will be false, but
> +	 * we'll correctly return 0 from the final strcmp() below.
> +	 *
> +	 * Technically this matches "fooidx" and "foopack", but we'd never have
> +	 * such names in the first place.
> +	 */
> +	if (!strcmp(idx_name, "idx") && !strcmp(idx_or_pack_name, "pack"))
> +		return 0;

This is asymmetric, and thus the function should not be used for sorting,
where it would be called for random pairs of values. For the binary search
it's fine, assuming the list contains only index filenames (ending with
".idx"), as the search string is always passed in as idx_or_pack_name.

And an extension-insensitive lookup works fine in a strcmp()-sorted list
because the order wouldn't change when sorting it again with a looser
stable extension-sensitive sort.

> +
> +	/*
> +	 * This not only checks for a complete match, but also orders based on
> +	 * the first non-identical character, which means our ordering will
> +	 * match a raw strcmp(). That makes it OK to use this to binary search
> +	 * a naively-sorted list.
> +	 */
> +	return strcmp(idx_or_pack_name, idx_name);

At this point we could also compare the chars like this:

	return (unsigned char)(*idx_or_pack_name) - (unsigned char)(*idx_name);

This avoids a function call, but doesn't look very pretty.  And I'm not
fully sure the casts are correct.

> +}
> +
> +int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
>  {
>  	uint32_t first = 0, last = m->num_packs;
>
> @@ -317,7 +349,7 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_name)
>  		int cmp;
>
>  		current = m->pack_names[mid];
> -		cmp = strcmp(idx_name, current);
> +		cmp = cmp_idx_or_pack_name(idx_or_pack_name, current);
>  		if (!cmp)
>  			return 1;
>  		if (cmp > 0) {
> diff --git a/midx.h b/midx.h
> index 774f652530..26dd042d63 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -43,7 +43,7 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
>  					struct multi_pack_index *m,
>  					uint32_t n);
>  int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct multi_pack_index *m);
> -int midx_contains_pack(struct multi_pack_index *m, const char *idx_name);
> +int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name);
>  int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
>
>  int write_midx_file(const char *object_dir);
>


  reply	other threads:[~2019-04-05 20:18 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-04 23:21 [PATCH 0/12] a rabbit hole of update-server-info fixes Jeff King
2019-04-04 23:22 ` [PATCH 01/12] t5319: fix bogus cat-file argument Jeff King
2019-04-05  0:44   ` Ramsay Jones
2019-04-05  1:41     ` Jeff King
2019-04-05  1:46       ` Jeff King
2019-04-04 23:22 ` [PATCH 02/12] t5319: drop useless --buffer from cat-file Jeff King
2019-04-04 23:22 ` [PATCH 03/12] packfile: factor out .pack to .idx name conversion Jeff King
2019-04-04 23:25 ` [PATCH 04/12] packfile: check midx coverage with .idx rather than .pack Jeff King
2019-04-05  8:05   ` René Scharfe
2019-04-05 13:21     ` Jeff King
2019-04-05 13:30       ` Jeff King
2019-04-05 12:01   ` SZEDER Gábor
2019-04-05 13:40     ` Jeff King
2019-04-04 23:27 ` [PATCH 05/12] http: simplify parsing of remote objects/info/packs Jeff King
2019-04-05 10:41   ` René Scharfe
2019-04-05 18:11     ` Jeff King
2019-04-05 20:17       ` René Scharfe
2019-04-04 23:27 ` [PATCH 06/12] server-info: fix blind pointer arithmetic Jeff King
2019-04-04 23:28 ` [PATCH 07/12] server-info: simplify cleanup in parse_pack_def() Jeff King
2019-04-04 23:28 ` [PATCH 08/12] server-info: use strbuf to read old info/packs file Jeff King
2019-04-04 23:29 ` [PATCH 09/12] server-info: drop nr_alloc struct member Jeff King
2019-04-04 23:30 ` [PATCH 10/12] packfile.h: drop extern from function declarations Jeff King
2019-04-04 23:30 ` [PATCH 11/12] server-info: drop objdirlen pointer arithmetic Jeff King
2019-04-04 23:31 ` [PATCH 12/12] update_info_refs(): drop unused force parameter Jeff King
2019-04-05  5:46 ` [PATCH 0/12] a rabbit hole of update-server-info fixes Junio C Hamano
2019-04-05 18:03 ` [PATCH v2 0/13] a rabbit hole of update-server-info (and midx!) fixes Jeff King
2019-04-05 18:03   ` [PATCH v2 01/13] packfile.h: drop extern from function declarations Jeff King
2019-04-05 19:19     ` Ramsay Jones
2019-04-05 19:25       ` Jeff King
2019-04-08  5:13         ` Junio C Hamano
2019-04-08 20:32           ` Jeff King
2019-04-09 15:08         ` Junio C Hamano
2019-04-09 15:15           ` Jeff King
2019-04-05 18:04   ` [PATCH v2 02/13] pack-revindex: open index if necessary Jeff King
2019-04-05 18:04   ` [PATCH v2 03/13] t5319: fix bogus cat-file argument Jeff King
2019-04-05 18:05   ` [PATCH v2 04/13] t5319: drop useless --buffer from cat-file Jeff King
2019-04-05 18:06   ` [PATCH v2 05/13] midx: check both pack and index names for containment Jeff King
2019-04-05 20:18     ` René Scharfe [this message]
2019-04-05 18:06   ` [PATCH v2 06/13] packfile: fix pack basename computation Jeff King
2019-04-05 18:12   ` [PATCH v2 07/13] http: simplify parsing of remote objects/info/packs Jeff King
2019-04-05 18:13   ` [PATCH v2 08/13] server-info: fix blind pointer arithmetic Jeff King
2019-04-05 18:13   ` [PATCH v2 09/13] server-info: simplify cleanup in parse_pack_def() Jeff King
2019-04-05 18:16     ` Jeff King
2019-04-05 18:13   ` [PATCH v2 10/13] server-info: use strbuf to read old info/packs file Jeff King
2019-04-05 18:14   ` [PATCH v2 11/13] server-info: drop nr_alloc struct member Jeff King
2019-04-05 18:14   ` [PATCH v2 12/13] server-info: drop objdirlen pointer arithmetic Jeff King
2019-04-05 18:14   ` [PATCH v2 13/13] update_info_refs(): drop unused force parameter Jeff King
2019-04-05 18:19   ` [PATCH v2 0/13] a rabbit hole of update-server-info (and midx!) fixes Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f188559-c1a9-7717-4b2c-a7397cfaa6bc@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.