All of lore.kernel.org
 help / color / mirror / Atom feed
From: Emily Shaffer <emilyshaffer@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/2] progress: create progress struct in 'verbose' mode
Date: Mon, 17 Aug 2020 15:19:38 -0700	[thread overview]
Message-ID: <20200817221938.GA331156@google.com> (raw)
In-Reply-To: <20200714001504.GI3189386@google.com>

On Mon, Jul 13, 2020 at 05:15:04PM -0700, Emily Shaffer wrote:
> 
> On Fri, Jul 10, 2020 at 03:40:25PM -0700, Junio C Hamano wrote:
> > > diff --git a/builtin/blame.c b/builtin/blame.c
> > > index 94ef57c1cc..1d72b27fda 100644
> > > --- a/builtin/blame.c
> > > +++ b/builtin/blame.c
> > > @@ -1129,8 +1129,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
> > >  
> > >  	sb.found_guilty_entry = &found_guilty_entry;
> > >  	sb.found_guilty_entry_data = &pi;
> > > -	if (show_progress)
> > > -		pi.progress = start_delayed_progress(_("Blaming lines"), sb.num_lines);
> > > +	pi.progress = start_delayed_progress(_("Blaming lines"), sb.num_lines,
> > > +					     show_progress);
> > >  
> > >  	assign_blame(&sb, opt);
> > 
> > But there are others that look a bit problematic.  In this example
> > taken from fsck, we open all the pack index, only because it is
> > needed to show the progress, and the existing conditionals are ways
> > to avoid spending unneeded cycles.
> 
> That info is still used, when performing trace log during
> stop_progress():
> 
>   if (p_progress && *p_progress) {
>           trace2_data_intmax("progress", the_repository, "total_objects",
>                              (*p_progress)->total);
> 
>           if ((*p_progress)->throughput)
>                   trace2_data_intmax("progress", the_repository,
>                                      "total_bytes",
>                                      (*p_progress)->throughput->curr_total);
> 
>           trace2_region_leave("progress", (*p_progress)->title, the_repository);
>   }
> 
> Because the progress struct is no longer NULL, we can write down the
> number of objects calculated to the trace2 log file. So the expense of
> calculating it is not wasted, at least in scenarios where someone cares
> about the output of the trace log.
> 
> But yes, lots of users don't care about trace log - they are using their
> home computer and don't care about metrics like how many objects were
> transferred in every single push. Now it becomes a little strange - is
> it better to instead wrap each 'struct progress' constructor in "if
> (should_i_log || should_i_trace)"? Have we got a simple way to look up
> 'should_i_trace'?
> 
> Some later comments have the same point - "why am I calculating the
> totals if the user won't see and doesn't care?" - so I won't respond to
> each other one; my reply is the same.
> 
> > 
> > > @@ -836,16 +836,15 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
> > >  			uint32_t total = 0, count = 0;
> > >  			struct progress *progress = NULL;
> > >  
> > > -			if (show_progress) {
> > > -				for (p = get_all_packs(the_repository); p;
> > > -				     p = p->next) {
> > > -					if (open_pack_index(p))
> > > -						continue;
> > > -					total += p->num_objects;
> > > -				}
> > > -
> > > -				progress = start_progress(_("Checking objects"), total);
> > > +			for (p = get_all_packs(the_repository); p;
> > > +			     p = p->next) {
> > > +				if (open_pack_index(p))
> > > +					continue;
> > > +				total += p->num_objects;
> > >  			}
> > > +
> > > +			progress = start_progress(_("Checking objects"), total,
> > > +						  show_progress);
> > >  			for (p = get_all_packs(the_repository); p;
> > >  			     p = p->next) {
> > >  				/* verify gives error messages itself */
> > 
> > Likewise, we do not even have to be scanning the index entries
> > upfront if we are not showing the progress report (and more
> > importantly, the user likely has chosen the speed/cycles over eye
> > candy progress meter) while checking out paths from the index.
> > 
> > > diff --git a/unpack-trees.c b/unpack-trees.c
> > > index 4be5fc3075..fa84244466 100644
> > > --- a/unpack-trees.c
> > > +++ b/unpack-trees.c
> > > @@ -338,16 +338,14 @@ static struct progress *get_progress(struct unpack_trees_options *o,
> > >  {
> > >  	unsigned cnt = 0, total = 0;
> > >  
> > > -	if (!o->update || !o->verbose_update)
> > > -		return NULL;
> > > -
> > >  	for (; cnt < index->cache_nr; cnt++) {
> > >  		const struct cache_entry *ce = index->cache[cnt];
> > >  		if (ce->ce_flags & (CE_UPDATE | CE_WT_REMOVE))
> > >  			total++;
> > >  	}
> > >  
> > > -	return start_delayed_progress(_("Updating files"), total);
> > > +	return start_delayed_progress(_("Updating files"), total,
> > > +				      (o->update && o->verbose_update));
> > >  }
> > >  
> > >  static void setup_collided_checkout_detection(struct checkout *state,
> > 
> > 
> > And there are cases where the "is progress enabled?" parameter (by
> > the way, I am not sure 'verbose' is quite the right name for the
> 
> I struggled some with this; I also considered using 'quiet' but it
> required inverting the logic everywhere when all callers already were
> asking "should I print?" instead of "should I not print?". Maybe "print"
> is better?
> 
> > parameter and the field name, as that word implies there can be
> > multiple verbosity levels---what I see you've done is that you added
> > an "is this enabled?" flag) is not a simple variable, like these.
> > Some are simple enough, but some are not.  Some of the worst appear
> > in preload-index.c and read-cache.c---they may become easier to read
> > if we introduced a variable to be passed as the "enable?"  parameter
> > and compute it before makng a call, and that might be enough, though.
> 
> Sure, makes sense.
> 
> > > diff --git a/read-cache.c b/read-cache.c
> > > index aa427c5c17..2ddc422dbd 100644
> > > --- a/read-cache.c
> > > +++ b/read-cache.c
> > > @@ -1532,9 +1532,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
> > >  	const char *unmerged_fmt;
> > >  	struct progress *progress = NULL;
> > >  
> > > -	if (flags & REFRESH_PROGRESS && isatty(2))
> > > -		progress = start_delayed_progress(_("Refresh index"),
> > > -						  istate->cache_nr);
> > > +	progress = start_delayed_progress(_("Refresh index"),
> > > +					  istate->cache_nr,
> > > +					  (flags & REFRESH_PROGRESS && isatty(2)));
> > >  
> > >  	trace_performance_enter();
> > >  	modified_fmt   = in_porcelain ? "M\t%s\n" : "%s: needs update\n";
> > 
> > 
> > And here is the only one that may benefit from this change, which
> > cares that the codepath prepared in rev-list, primarily to support
> > the progress meter, is exercised because those pieces of code
> > happens to be used by the tracing output as well.  Perhaps we
> > shouldn't have tied the "when showing progress, we can also trace"
> > earlier and we didn't have to touch this many codepath only to spend
> > cycles when progress output is not desired?
> 
> I guess I don't see how this is the only caller that benefits. The
> tracing output should appear in all the places where I touched the code;
> that was the point of the earlier change. Since the object total is provided in
> start_progress() and friends, and the runtime is logged by
> region_enter() in start_progress() and region_leave() in
> stop_progress(), then every caller is now unconditionally leaving traces
> to give some impression of how much work and how long it took. So I am
> confused - can you explain how the other callers don't benefit?
> 
> > 
> > > diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> > > index f520111eda..f64cad8390 100644
> > > --- a/builtin/rev-list.c
> > > +++ b/builtin/rev-list.c
> > > @@ -620,8 +620,13 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
> > >  	if (bisect_list)
> > >  		revs.limited = 1;
> > >  
> > > -	if (show_progress)
> > > -		progress = start_delayed_progress(show_progress, 0);
> > > +	/*
> > > +	 * When progress is not printed to the user, we still want to be able to
> > > +	 * classify the progress during tracing. So, use a placeholder name.
> > > +	 */
> > > +	progress = start_delayed_progress(
> > > +			show_progress ? show_progress : _("Quiet rev-list operation"),
> > > +			0, show_progress != NULL);
> > 
> > I've cut out the hunks that exhibits similar looking patterns as the
> > ones I covered in the above discussion.  You are right to say that
> > these are simple and similar changes all over the place.  I am not
> > yet convinced that the simplicity supports that this change is the
> > right thing to do, though.  The only saving grace might be that
> > there aren't that many hunks that explicitly do unnecessary things
> > like the ones in unpack-trees.c and builtin/fsck.c even when we are
> > not showing progress.  
> > 
> > But the other codepaths may be doing conditional computation not
> > based on "if (show_progress)" but on "if (progress)", in which case
> 
> Hm, good point. For what it's worth, I performed the changes manually:
> 
>   git grep "start_progress()"
>   for each caller, open in vim
>   /progress
>   delete or move if necessary
>   (repeat until all matches for /progress look right)
> 
> and I did not see cases where 'if (progress)' was being used. That is
> hard to express in a diff; I can try to audit it better. (This process
> is how I also noticed the things to change in 2/2 in this series, even
> though they are unrelated to start_progress()).
> 
> > with this patch, we may be paying a lot more overhead even if we
> > know progress meter won't be shown and the worse part from
> > reviewability point of view is that this patch does not explicitly
> > do anything to make it happen because start_delayed_progress() now
> > unconditionally give a non-NULL progress structure to enable them.
> > 
> > So I dunno.  It certainly is not for the upcoming release, but we do
> > want to make sure that trace requested even when the command is quiet
> > gets collected in the longer term (i.e. during the next cycle).
> 
> Ok. It sounds like you are saying you want the spirit of the change,
> even if not in this exact form - that is good to know.

A month later, are we still interested in this change? I'd like to
pursue it - we are trying to use this progress tracing internally to
generate some metrics and not having this change (or some version of it)
means those metrics aren't accurate.

The main concern I saw here was "we are doing a lot of work that isn't
used if the user doesn't want to log traces" - should I approach a
reroll of this topic by trying to be smarter about whether to set
'quiet' or 'print' or 'verbose' or whatever it is renamed to, based on
whether there is a trace destination? Then for systems which are logging
traces the extra work is worth it, but for everyone else it can function
as before.

I don't love it from a design perspective - it feels a little like
progress module is looking a little too closely at trace module
internals - but from a performance perspective I see the appeal.

On the other hand, progress operations are showing progress because they
are slow - is a little more work on something that's already
human-noticeable slow such a big deal?

 - Emily

  reply	other threads:[~2020-08-17 22:20 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-10  1:42 [PATCH 0/2] enable progress traces even when quiet Emily Shaffer
2020-07-10  1:42 ` [PATCH 1/2] progress: create progress struct in 'verbose' mode Emily Shaffer
2020-07-10  2:00   ` Derrick Stolee
2020-07-10  2:17     ` Taylor Blau
2020-07-10 19:21       ` Emily Shaffer
2020-07-10  2:14   ` brian m. carlson
2020-07-10 19:24     ` Emily Shaffer
2020-07-10 21:09     ` Junio C Hamano
2020-07-10 22:00       ` Emily Shaffer
2020-07-10 22:40   ` Junio C Hamano
2020-07-14  0:15     ` Emily Shaffer
2020-08-17 22:19       ` Emily Shaffer [this message]
2020-08-17 22:44         ` Junio C Hamano
2020-08-17 22:49           ` Junio C Hamano
2020-09-09 22:42             ` Jonathan Tan
2020-09-09 22:36     ` Jonathan Tan
2020-09-09 23:06       ` Junio C Hamano
2020-09-10  0:31   ` Jonathan Nieder
2020-09-10  5:09     ` Junio C Hamano
2020-07-10  1:42 ` [PATCH 2/2] progress: remove redundant null-checking Emily Shaffer
2020-07-10  2:01   ` Derrick Stolee
2020-07-10  2:20     ` Taylor Blau
2020-07-10 18:50       ` Junio C Hamano
2020-07-10 19:27         ` Emily Shaffer
2020-07-10 19:58           ` Junio C Hamano
2020-07-10 20:29             ` Emily Shaffer
2020-07-10 23:03               ` Emily Shaffer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200817221938.GA331156@google.com \
    --to=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.