git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: ZheNing Hu <adlternative@gmail.com>
Subject: [PATCH 1/2] ref-filter: hacky "streaming" mode
Date: Sat, 4 Sep 2021 08:41:28 -0400	[thread overview]
Message-ID: <YTNpeH+jO0zQgAVT@coredump.intra.peff.net> (raw)
In-Reply-To: <YTNpQ7Od1U/5i0R7@coredump.intra.peff.net>

The ref-filter code is very keen to collect all of the refs in an array
before writing any output. This makes things slower than necessary when
using the default sort order (which is already sorted by refname when we
call for_each_ref()), and when no filtering options require it.

This commit adds a mildly-ugly interface to detect this case and stream
directly from filter_refs(). The caller just needs to call the
"maybe_stream" function. Either way, they can call the usual sort/print
functions; they'll just be noops if we did stream instead of collecting.

Here are some timings on a fully-packed 500k-ref repo:

  Benchmark #1: ./git.orig for-each-ref --format='%(objectname) %(refname)'
    Time (mean ± σ):     340.2 ms ±   5.3 ms    [User: 300.5 ms, System: 39.6 ms]
    Range (min … max):   332.9 ms … 347.0 ms    10 runs

  Benchmark #2: ./git.stream for-each-ref --format='%(objectname) %(refname)'
    Time (mean ± σ):     224.0 ms ±   3.4 ms    [User: 222.7 ms, System: 1.3 ms]
    Range (min … max):   218.1 ms … 228.5 ms    13 runs

  Summary
    './git.stream for-each-ref --format='%(objectname) %(refname)'' ran
      1.52 ± 0.03 times faster than './git.orig for-each-ref --format='%(objectname) %(refname)''

So we definitely shave off some time, though we're still _much_ slower
than a simple `wc -l <packed-refs` (which is around 10ms, though of
course it isn't actually doing as much work).

The point here is to reduce overall effort, though of course the time to
first output is much improved in the streaming version, too (about 250ms
versus 4ms).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/for-each-ref.c |  7 ++++++
 ref-filter.c           | 50 ++++++++++++++++++++++++++++++++++--------
 ref-filter.h           |  8 +++++++
 3 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 89cb6307d4..fe0b92443f 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -70,6 +70,13 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	if (verify_ref_format(&format))
 		usage_with_options(for_each_ref_usage, opts);
 
+	/*
+	 * try streaming, but only without maxcount; in theory the ref-filter
+	 * code could learn about maxcount, but for now it is our own thing
+	 */
+	if (!maxcount)
+		ref_filter_maybe_stream(&filter, sorting, icase, &format);
+
 	if (!sorting)
 		sorting = ref_default_sorting();
 	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
diff --git a/ref-filter.c b/ref-filter.c
index 93ce2a6ef2..17b78b1d30 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2283,15 +2283,19 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
 			return 0;
 	}
 
-	/*
-	 * We do not open the object yet; sort may only need refname
-	 * to do its job and the resulting list may yet to be pruned
-	 * by maxcount logic.
-	 */
-	ref = ref_array_push(ref_cbdata->array, refname, oid);
-	ref->commit = commit;
-	ref->flag = flag;
-	ref->kind = kind;
+	if (ref_cbdata->filter->streaming_format) {
+		pretty_print_ref(refname, oid, ref_cbdata->filter->streaming_format);
+	} else {
+		/*
+		 * We do not open the object yet; sort may only need refname
+		 * to do its job and the resulting list may yet to be pruned
+		 * by maxcount logic.
+		 */
+		ref = ref_array_push(ref_cbdata->array, refname, oid);
+		ref->commit = commit;
+		ref->flag = flag;
+		ref->kind = kind;
+	}
 
 	return 0;
 }
@@ -2563,6 +2567,34 @@ void ref_array_sort(struct ref_sorting *sorting, struct ref_array *array)
 	QSORT_S(array->items, array->nr, compare_refs, sorting);
 }
 
+void ref_filter_maybe_stream(struct ref_filter *filter,
+			     const struct ref_sorting *sort, int icase,
+			     struct ref_format *format)
+{
+	/* streaming only works with default for_each_ref sort order */
+	if (sort || icase)
+		return;
+
+	/* these filters want to see all candidates up front */
+	if (filter->reachable_from || filter->unreachable_from)
+		return;
+
+	/*
+	 * the %(symref) placeholder is broken with pretty_print_ref(),
+	 * which our streaming code uses. I suspect this is a sign of breakage
+	 * in other callers like verify_tag(), which should be fixed. But for
+	 * now just disable streaming.
+	 *
+	 * Note that this implies we've parsed the format already with
+	 * verify_ref_format().
+	 */
+	if (need_symref)
+		return;
+
+	/* OK to stream */
+	filter->streaming_format = format;
+}
+
 static void append_literal(const char *cp, const char *ep, struct ref_formatting_state *state)
 {
 	struct strbuf *s = &state->stack->output;
diff --git a/ref-filter.h b/ref-filter.h
index c15dee8d6b..ecea1837a2 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -69,6 +69,9 @@ struct ref_filter {
 		lines;
 	int abbrev,
 		verbose;
+
+	/* if non-NULL, streaming output during filter_refs() is enabled */
+	struct ref_format *streaming_format;
 };
 
 struct ref_format {
@@ -135,6 +138,11 @@ char *get_head_description(void);
 /*  Set up translated strings in the output. */
 void setup_ref_filter_porcelain_msg(void);
 
+/* enable streaming during filter_refs() if options allow it */
+void ref_filter_maybe_stream(struct ref_filter *filter,
+			     const struct ref_sorting *sort, int icase,
+			     struct ref_format *format);
+
 /*
  * Print a single ref, outside of any ref-filter. Note that the
  * name must be a fully qualified refname.
-- 
2.33.0.618.g5b11852304


  reply	other threads:[~2021-09-04 12:41 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-04 12:40 [hacky PATCH 0/2] speeding up trivial for-each-ref invocations Jeff King
2021-09-04 12:41 ` Jeff King [this message]
2021-09-05  8:20   ` [PATCH 1/2] ref-filter: hacky "streaming" mode ZheNing Hu
2021-09-05 13:04     ` Jeff King
2021-09-07  5:28       ` ZheNing Hu
2021-09-07 18:01         ` Jeff King
2021-09-09 14:45           ` ZheNing Hu
2021-09-10 14:26             ` Jeff King
2021-09-15 12:27               ` ZheNing Hu
2021-09-15 14:23                 ` ZheNing Hu
2021-09-16 21:45                   ` Jeff King
2021-09-20  7:42                     ` ZheNing Hu
2021-09-16 21:31                 ` Jeff King
2021-09-05 13:15     ` Jeff King
2021-09-07  5:42       ` ZheNing Hu
2021-09-04 12:42 ` [PATCH 2/2] ref-filter: implement "quick" formats Jeff King
2021-09-05  8:20   ` ZheNing Hu
2021-09-05 13:07     ` Jeff King
2021-09-06 13:34       ` ZheNing Hu
2021-09-07 20:06       ` Junio C Hamano
2021-09-05  8:19 ` [hacky PATCH 0/2] speeding up trivial for-each-ref invocations ZheNing Hu
2021-09-05 12:49   ` Jeff King
2021-09-06 13:30     ` ZheNing Hu
2021-09-07 17:28       ` Jeff King
2021-09-09 13:20         ` ZheNing Hu
2021-09-06  6:54 ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YTNpeH+jO0zQgAVT@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=adlternative@gmail.com \
    --cc=git@vger.kernel.org \
    --subject='Re: [PATCH 1/2] ref-filter: hacky "streaming" mode' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).