All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Emily Shaffer <emilyshaffer@google.com>,
	git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 1/2] progress: create progress struct in 'verbose' mode
Date: Thu, 9 Jul 2020 22:17:40 -0400	[thread overview]
Message-ID: <20200710021740.GA39052@syl.lan> (raw)
In-Reply-To: <ed6b2339-83cd-a7b9-c653-bc86d8686680@gmail.com>

Hi both,

On Thu, Jul 09, 2020 at 10:00:23PM -0400, Derrick Stolee wrote:
> On 7/9/2020 9:42 PM, Emily Shaffer wrote:
> > Before now, the progress API is used by conditionally calling
> > start_progress() or a similar call, and then unconditionally calling
> > display_progress() and stop_progress(), both of which are tolerant of
> > NULL or uninitialized inputs. However, in
> > 98a136474082cdc7228d7e0e45672c5274fab701 (trace: log progress time and
> > throughput), the progress library learned to log traces during expensive
> > operations. In cases where progress should not be displayed to the user
> > - such as when Git is called by a script - no traces will be logged,
> > because the progress object is never created.

This is such a fantastic idea. Just the other day, I was thinking of
getting your (for clarification, Emily, since I'm responding to
Stolee's mail) progress-emits-trace2-events work hooked into GitHub's
trace2 pipeline.

There were two unfortunate things that prevented this from working:

  1. GitHub filters which trace2 categories are of interest to us (these
     interesting ones get logged, and the uninteresting ones get
     discarded) using an environment variable of comma-separated
     categories. Since all of the trace2 metrics generated by the
     progress API don't have categories, taking in one interesting
     metric meant taking them all in, which is a non-starter for us.

  2. On top of that, we don't even _generate_ these progress events most
     of the time, since we're often running without a tty, and so we
     never end up hitting those 'if (progress) start_progress()'
     conditionals in the first place.

If we had something like this, it would reduce the problem to only (1),
which would make a lot of my headaches go away. (It would also give me a
good excuse to convert many of our custom trace2 regions into patches on
the list, and get rid of a non-trivial amount of code that generates
merge conflicts often).

> > Instead, to allow us to collect traces from scripted Git commands, teach
> > a progress->verbose flag, which is specified via a new argument to
> > start_progress() and friends. display_progress() also learns to filter
> > for that flag. With these changes, start_progress() can be called
> > unconditionally but with a conditional as an argument to determine
> > whether to report progress to the user.
> >
> > Since this changes the API, also modify callers of start_progress() and
> > friends to drop their conditional and pass a new argument in instead.

I don't think that this is why I was CC'd, but could you perhaps talk a
little bit about why this is all in the same patch? I don't think this
change needs to be broken out by the area affected per-se, but the
current form is a little unruly to review all at once.

> This is a worthwhile change. Thanks! I was hoping that we would
> get some of these regions for free, which extends what we can get
> out of trace2 events.
>
> CC'ing Taylor because he had some thoughts on adding a possible
> trace2 category to make it easier to reason about the regions,
> when appropriate. Not sure if he's ready to apply that change
> on top of this series.

This is what I was talking about above. It would be nice if we could
somehow teach the 'start_*_progress()' API about a trace2 category.

Unfortunately, this would need to be a new parameter, since we need to
know the category when we enter the region. So, the API changes might be
far-reaching. It would be nice if there was a way to limit the blast
radius (i.e., 'start_progress_trace2(..., category)'), but I haven't
thought deeply about it.

I don't want to delay this patch series with that. I'd be happy to build
it in myself on top after this graduates.

> > diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> > index f520111eda..f64cad8390 100644
> > --- a/builtin/rev-list.c
> > +++ b/builtin/rev-list.c
> > @@ -620,8 +620,13 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
> >  	if (bisect_list)
> >  		revs.limited = 1;
> >
> > -	if (show_progress)
> > -		progress = start_delayed_progress(show_progress, 0);
> > +	/*
> > +	 * When progress is not printed to the user, we still want to be able to
> > +	 * classify the progress during tracing. So, use a placeholder name.
> > +	 */
> > +	progress = start_delayed_progress(
> > +			show_progress ? show_progress : _("Quiet rev-list operation"),
> > +			0, show_progress != NULL)
>
> This is so strange, how we let the command-lines specify a progress
> indicator. I guess it is necessary when we use rev-list as a
> subcommand instead of in-process. One such case is check_connected()
> in connected.c.
>
> It's stranger still that "show_progress" is actually a string here,
> as opposed to being an int in most other places.
>
> Your transformation is correct, here, though. Thanks for calling it
> out in the commit message.
>
> >
> >  	if (use_bitmap_index) {
> >  		if (!try_bitmap_count(&revs, &filter_options))
> > diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> > index dd4a75e030..719d446916 100644
> > --- a/builtin/unpack-objects.c
> > +++ b/builtin/unpack-objects.c
> > @@ -498,8 +498,7 @@ static void unpack_all(void)
> >  			ntohl(hdr->hdr_version));
> >  	use(sizeof(struct pack_header));
> >
> > -	if (!quiet)
> > -		progress = start_progress(_("Unpacking objects"), nr_objects);
> > +	progress = start_progress(_("Unpacking objects"), nr_objects, !quiet);
> >  	obj_list = xcalloc(nr_objects, sizeof(*obj_list));
> >  	for (i = 0; i < nr_objects; i++) {
> >  		unpack_one(i);
> > diff --git a/commit-graph.c b/commit-graph.c
> > index 328ab06fd4..b9a784fece 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -1152,10 +1152,10 @@ static void write_graph_chunk_bloom_indexes(struct hashfile *f,
> >  	struct progress *progress = NULL;
> >  	int i = 0;
> >
> > -	if (ctx->report_progress)
> > -		progress = start_delayed_progress(
> > -			_("Writing changed paths Bloom filters index"),
> > -			ctx->commits.nr);
> > +	progress = start_delayed_progress(
> > +		_("Writing changed paths Bloom filters index"),
> > +		ctx->commits.nr,
> > +		ctx->report_progress);
>
> There are a lot of blocks like this, where the progress string is long enough to
> require the first param to be after the method name. Since we are changing the
> API and every caller, would the resulting code be cleaner if the string value
> was the last parameter? That would allow this code pattern in most cases:
>
> 	progress = start_delayed_progress(count, show_progress,
> 					  _("My special string!"));
>
> Just a thought. Not super-important.
>
> The rest of the changes look to be correct.
>
> > diff --git a/t/helper/test-progress.c b/t/helper/test-progress.c
> > index 5d05cbe789..19b874f9cd 100644
> > --- a/t/helper/test-progress.c
> > +++ b/t/helper/test-progress.c
> > @@ -23,16 +23,18 @@
> >  int cmd__progress(int argc, const char **argv)
> >  {
> >  	int total = 0;
> > +	int quiet = 0;
> >  	const char *title;
> >  	struct strbuf line = STRBUF_INIT;
> >  	struct progress *progress;
> >
> >  	const char *usage[] = {
> > -		"test-tool progress [--total=<n>] <progress-title>",
> > +		"test-tool progress [--total=<n>] [--quiet] <progress-title>",
> >  		NULL
> >  	};
> >  	struct option options[] = {
> >  		OPT_INTEGER(0, "total", &total, "total number of items"),
> > +		OPT_BOOL(0, "quiet", &quiet, "suppress stderr"),
> >  		OPT_END(),
> >  	};
> >
> > @@ -42,7 +44,7 @@ int cmd__progress(int argc, const char **argv)
> >  	title = argv[0];
> >
> >  	progress_testing = 1;
> > -	progress = start_progress(title, total);
> > +	progress = start_progress(title, total, !quiet);
> >  	while (strbuf_getline(&line, stdin) != EOF) {
> >  		char *end;
> >
> > diff --git a/t/t0500-progress-display.sh b/t/t0500-progress-display.sh
> > index 1ed1df351c..9d6e6274ad 100755
> > --- a/t/t0500-progress-display.sh
> > +++ b/t/t0500-progress-display.sh
> > @@ -309,4 +309,31 @@ test_expect_success 'progress generates traces' '
> >  	grep "\"key\":\"total_bytes\",\"value\":\"409600\"" trace.event
> >  '
> >
> > +test_expect_success 'progress generates traces even quietly' '
> > +	cat >in <<-\EOF &&
> > +	throughput 102400 1000
> > +	update
> > +	progress 10
> > +	throughput 204800 2000
> > +	update
> > +	progress 20
> > +	throughput 307200 3000
> > +	update
> > +	progress 30
> > +	throughput 409600 4000
> > +	update
> > +	progress 40
> > +	EOF
> > +
> > +	GIT_TRACE2_EVENT="$(pwd)/trace.event" test-tool progress --total=40 \
> > +		--quiet "Working hard" <in 2>stderr &&
> > +
> > +	# t0212/parse_events.perl intentionally omits regions and data.
> > +	grep -e "region_enter" -e "\"category\":\"progress\"" trace.event &&
> > +	grep -e "region_leave" -e "\"category\":\"progress\"" trace.event &&
> > +	grep "\"key\":\"total_objects\",\"value\":\"40\"" trace.event &&
> > +	grep "\"key\":\"total_bytes\",\"value\":\"409600\"" trace.event
> > +'
>
> Thanks for adding a test, including the resulting trace events!
>
> The patch of Taylor's that I mentioned earlier changes the "category"
> to something a bit more specific than "progress", when appropriate.
>
> > +
> > +
> >  test_done
>
> nit: extra empty line
>
> Thanks!
> -Stolee

Thanks,
Taylor

  reply	other threads:[~2020-07-10  2:17 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-10  1:42 [PATCH 0/2] enable progress traces even when quiet Emily Shaffer
2020-07-10  1:42 ` [PATCH 1/2] progress: create progress struct in 'verbose' mode Emily Shaffer
2020-07-10  2:00   ` Derrick Stolee
2020-07-10  2:17     ` Taylor Blau [this message]
2020-07-10 19:21       ` Emily Shaffer
2020-07-10  2:14   ` brian m. carlson
2020-07-10 19:24     ` Emily Shaffer
2020-07-10 21:09     ` Junio C Hamano
2020-07-10 22:00       ` Emily Shaffer
2020-07-10 22:40   ` Junio C Hamano
2020-07-14  0:15     ` Emily Shaffer
2020-08-17 22:19       ` Emily Shaffer
2020-08-17 22:44         ` Junio C Hamano
2020-08-17 22:49           ` Junio C Hamano
2020-09-09 22:42             ` Jonathan Tan
2020-09-09 22:36     ` Jonathan Tan
2020-09-09 23:06       ` Junio C Hamano
2020-09-10  0:31   ` Jonathan Nieder
2020-09-10  5:09     ` Junio C Hamano
2020-07-10  1:42 ` [PATCH 2/2] progress: remove redundant null-checking Emily Shaffer
2020-07-10  2:01   ` Derrick Stolee
2020-07-10  2:20     ` Taylor Blau
2020-07-10 18:50       ` Junio C Hamano
2020-07-10 19:27         ` Emily Shaffer
2020-07-10 19:58           ` Junio C Hamano
2020-07-10 20:29             ` Emily Shaffer
2020-07-10 23:03               ` Emily Shaffer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200710021740.GA39052@syl.lan \
    --to=me@ttaylorr.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.