All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jacob Vosmaer <jacob@gitlab.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/1] ls-refs.c: minimize number of refs visited
Date: Tue, 19 Jan 2021 11:12:45 -0500	[thread overview]
Message-ID: <YAcE/dTuOB9PbGQU@nand.local> (raw)
In-Reply-To: <20210119144251.27924-2-jacob@gitlab.com>

Hi Jacob,

On Tue, Jan 19, 2021 at 03:42:51PM +0100, Jacob Vosmaer wrote:
> The previous implementation of ls-refs would perform exactly one ref
> walk, matching each ref against the prefixes (if any) provided by the
> user. This can be expensive if there are a lot of refs and the user
> only cares about a small subset of them.
>
> In this patch we analyze the prefixes provided by the user and build a
> minimal set of disjoint prefixes that contains all of them. We then do
> a ref walk for each of these minimal prefixes.

This reminds me of b31e2680c4 (ref-filter.c: find disjoint pattern
prefixes, 2019-06-26), where we solved a very similar problem for 'git
for-each-ref'. The difference here is that we are operating on a set of
prefixes, not a set of refs.

But, I think that we could get pretty far by treating the prefixes as
refs so that we can call ref-filter.c:find_longest_prefixes(). For its
purposes, it doesn't really care about whether or not the arguments
actually are references. It simply returns the longest common prefix
among all of its arguments (delimited by '/' characters).

> It is tempting to have just one strvec for the prefixes and use it
> both for matching and for iterating. But every time I tried that, it
> made things more complicated. I settled on leaving the existing ref
> matching (using &data.prefixes) alone, and I added a new layer around
> it for the ref walk optimization (using &iter_prefixes).

I think the implementation in b31e2680c4 deals with this nicely: it
takes a pointer to a strvec and dumps prefixes in there.

> This commit also fixes a bug in ls-refs.c that was not triggered
> before: we were using a strvec set to zero, which is not how you are
> supposed to initialize a strvec. We now call strvec_init after zeroing.

Good.

> Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
> ---
>  ls-refs.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 62 insertions(+), 1 deletion(-)
>
> diff --git a/ls-refs.c b/ls-refs.c
> index a1e0b473e4..6d5f0c769a 100644
> --- a/ls-refs.c
> +++ b/ls-refs.c
> @@ -84,12 +84,44 @@ static int ls_refs_config(const char *var, const char *value, void *data)
>  	return parse_hide_refs_config(var, value, "uploadpack");
>  }
>
> +static int cmp_prefix(const void *a_, const void *b_){
> +	const char *a = *(const char **)a_;
> +	const char *b = *(const char **)b_;
> +	return strcmp(a, b);
> +}
> +
> +static void deduplicate_prefixes(struct strvec *prefixes) {
> +	int i;
> +
> +	QSORT(prefixes->v, prefixes->nr, cmp_prefix);
> +
> +	for (i = 1; i < prefixes->nr;) {
> +		const char *p = prefixes->v[i];
> +
> +		/*
> +		 * If p is "refs/foobar" and its predecessor is "refs/foo" then we should
> +		 * drop p, both to avoid sending duplicate refs to the user, and to avoid
> +		 * doing unnecessary work.
> +		 */
> +		if (starts_with(p, prefixes->v[i - 1])) {
> +			MOVE_ARRAY(&prefixes->v[i], &prefixes->v[i + 1], prefixes->nr - (i + 1));
> +			prefixes->v[prefixes->nr - 1] = p;
> +			strvec_pop(prefixes);
> +		} else {
> +			i++;
> +		}
> +	}
> +}
> +

Indeed, this and the below code are very reminiscent of b31e2680c4. So,
I wonder if it's possible to use the existing implementation rather than
implement what is roughly the same thing twice.

Below is a completely untested patch to try and reuse the code from
b31e2680c4. (It compiles, but that's the extent of my guarantees about
it ;-).) It's all smashed into one huge patch, so if you're happy with
the direction I'll take care of cleaning it up. The new function in
ref-filter.h really belongs in refs.h, but I left the implementation in
ref-filter.c to avoid creating more noise in the diff.

Let me know what you think.

Thanks,
Taylor

--- >8 ---

Subject: [PATCH] ls-refs: iterate longest common refs prefix

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ls-refs.c    |  4 +++-
 ref-filter.c | 46 ++++++++++++++++++++++++++++++++--------------
 ref-filter.h | 10 ++++++++++
 3 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/ls-refs.c b/ls-refs.c
index a1e0b473e4..6a3e11d45c 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -6,6 +6,7 @@
 #include "ls-refs.h"
 #include "pkt-line.h"
 #include "config.h"
+#include "ref-filter.h"

 /*
  * Check if one of the prefixes is a prefix of the ref.
@@ -109,7 +110,8 @@ int ls_refs(struct repository *r, struct strvec *keys,
 		die(_("expected flush after ls-refs arguments"));

 	head_ref_namespaced(send_ref, &data);
-	for_each_namespaced_ref(send_ref, &data);
+	for_each_fullref_in_prefixes(get_git_namespace(), data.prefixes.v,
+				     send_ref, &data, 0);
 	packet_flush(1);
 	strvec_clear(&data.prefixes);
 	return 0;
diff --git a/ref-filter.c b/ref-filter.c
index aa260bfd09..c34bf34d06 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1987,6 +1987,36 @@ static void find_longest_prefixes(struct string_list *out,
 	strbuf_release(&prefix);
 }

+int for_each_fullref_in_prefixes(const char *namespace,
+				 const char **patterns,
+				 each_ref_fn cb,
+				 void *cb_data,
+				 int broken)
+{
+	struct string_list prefixes = STRING_LIST_INIT_DUP;
+	struct string_list_item *prefix;
+	struct strbuf buf = STRBUF_INIT;
+	int ret = 0, namespace_len;
+
+	find_longest_prefixes(&prefixes, patterns);
+
+	if (namespace)
+		strbuf_addstr(&buf, namespace);
+	namespace_len = buf.len;
+
+	for_each_string_list_item(prefix, &prefixes) {
+		strbuf_addf(&buf, prefix->string);
+		ret = for_each_fullref_in(buf.buf, cb, cb_data, broken);
+		if (ret)
+			break;
+		strbuf_setlen(&buf, namespace_len);
+	}
+
+	string_list_clear(&prefixes, 0);
+	strbuf_release(&buf);
+	return ret;
+}
+
 /*
  * This is the same as for_each_fullref_in(), but it tries to iterate
  * only over the patterns we'll care about. Note that it _doesn't_ do a full
@@ -1997,10 +2027,6 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 				       void *cb_data,
 				       int broken)
 {
-	struct string_list prefixes = STRING_LIST_INIT_DUP;
-	struct string_list_item *prefix;
-	int ret;
-
 	if (!filter->match_as_path) {
 		/*
 		 * in this case, the patterns are applied after
@@ -2024,16 +2050,8 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		return for_each_fullref_in("", cb, cb_data, broken);
 	}

-	find_longest_prefixes(&prefixes, filter->name_patterns);
-
-	for_each_string_list_item(prefix, &prefixes) {
-		ret = for_each_fullref_in(prefix->string, cb, cb_data, broken);
-		if (ret)
-			break;
-	}
-
-	string_list_clear(&prefixes, 0);
-	return ret;
+	return for_each_fullref_in_prefixes(NULL, filter->name_patterns,
+					    cb, cb_data, broken);
 }

 /*
diff --git a/ref-filter.h b/ref-filter.h
index feaef4a8fd..f666a0fb49 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -146,4 +146,14 @@ struct ref_array_item *ref_array_push(struct ref_array *array,
 				      const char *refname,
 				      const struct object_id *oid);

+/**
+ * iterate all refs which descend from the longest common prefix among
+ * "patterns".
+ */
+int for_each_fullref_in_prefixes(const char *namespace,
+				 const char **patterns,
+				 each_ref_fn cb,
+				 void *cb_data,
+				 int broken);
+
 #endif /*  REF_FILTER_H  */
--
2.30.0.138.g6d7191ea01


  reply	other threads:[~2021-01-19 18:27 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 14:42 [PATCH 0/1] ls-refs.c: minimize number of refs visited Jacob Vosmaer
2021-01-19 14:42 ` [PATCH 1/1] " Jacob Vosmaer
2021-01-19 16:12   ` Taylor Blau [this message]
2021-01-19 17:42     ` Jacob Vosmaer
2021-01-19 18:19       ` [PATCH 0/2] ls-refs: only traverse through longest common ref prefix Taylor Blau
2021-01-19 18:19         ` [PATCH 1/2] refs: expose 'for_each_fullref_in_prefixes' Taylor Blau
2021-01-19 18:19         ` [PATCH 2/2] ls-refs.c: traverse longest common ref prefix Taylor Blau
2021-01-19 23:09           ` Jeff King
2021-01-19 23:52             ` Taylor Blau
2021-01-20  0:08               ` Jeff King
2021-01-20 11:00           ` Jacob Vosmaer
2021-01-20 16:04         ` [PATCH v2 0/3] ls-refs: traverse prefixes of disjoint "ref-prefix" sets Taylor Blau
2021-01-20 16:04           ` [PATCH v2 1/3] refs: expose 'for_each_fullref_in_prefixes' Taylor Blau
2021-01-20 19:56             ` Jeff King
2021-01-20 20:12               ` Taylor Blau
2021-01-23  2:59             ` Junio C Hamano
2021-01-25  1:35               ` Taylor Blau
2021-01-20 16:04           ` [PATCH v2 2/3] ls-refs.c: initialize 'prefixes' before using it Taylor Blau
2021-01-20 19:58             ` Jeff King
2021-01-20 20:13               ` Taylor Blau
2021-01-20 21:50             ` Jacob Vosmaer
2021-01-20 16:04           ` [PATCH v2 3/3] ls-refs.c: traverse prefixes of disjoint "ref-prefix" sets Taylor Blau
2021-01-23 17:55           ` [PATCH v2 0/3] ls-refs: " Junio C Hamano
2021-01-19 19:09       ` [PATCH 1/1] ls-refs.c: minimize number of refs visited Taylor Blau
2021-01-19 21:59         ` Jeff King
2021-01-19 22:15           ` Jeff King
2021-01-19 22:23             ` Taylor Blau
2021-01-19 22:52               ` Jeff King
2021-01-19 22:59                 ` Jeff King
2021-01-19 23:02                   ` Taylor Blau
2021-01-19 22:53   ` Jeff King
2021-01-19 23:00     ` Taylor Blau
2021-01-19 23:11       ` Jeff King
2021-01-20 10:40         ` Jacob Vosmaer
2021-01-20 10:44           ` Jacob Vosmaer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YAcE/dTuOB9PbGQU@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=jacob@gitlab.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.