All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Hamza Mahfooz <someguy@effective-light.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] pretty: colorize pattern matches in commit messages
Date: Wed, 01 Sep 2021 16:26:37 -0700	[thread overview]
Message-ID: <xmqqwnnznahe.fsf@gitster.g> (raw)
In-Reply-To: <20210901121616.2109658-1-someguy@effective-light.com> (Hamza Mahfooz's message of "Wed, 1 Sep 2021 08:16:16 -0400")

Hamza Mahfooz <someguy@effective-light.com> writes:

> Currently, for example when
>
>   git log --grep=pattern
>
> is executed, the outputted commits that are matched by the pattern do not
> have the relevant substring matches highlighted.

A proposed log message that stops after a description of the current
status alone invites a "so what?" response.  While it may be so
obvious to you why the current state is undesirable (after all, it
motivated you enough to write a patch to improve the situation), the
job of the proposed message is to explain and convince others to
agree what is wrong in the current state and what is a reasonable
design to improve it.  Among these three ingredients, the latter two
are missing from the above.

Because it is our convention to talk about the current status in
present tense first, you do not have to start the description with
"Currently,".

Taking the above two paragraphs together, perhaps:

    The "git log" command limits its output to the commits that
    contain strings that matched the "--grep=<pattern>" option, but
    unlike output from "git grep -e <pattern>", the matches are not
    highlighted, making them harder to spot.

    Teach the pretty-printer code to highlight matches from the
    "--grep=<pattern>", "--author=<pattern>" and
    "--committer=<pattern>" options (to view the last one, you may
    have to ask for --pretty=fuller).

Or something like that.

>  pretty.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 98 insertions(+), 11 deletions(-)

This new feature deserves to be tested.  I am surprised that we do
not have "git log --grep" tests whose expected output we need to
adjust for this change.  Perhaps it is becuase we are not otherwise
showing colors at all in "git log" tests?

It also needs a documentation update, if only to tell readers how to
customize the color used to paint the hits.

> +static void append_matched_line(struct grep_opt *opt, const char *line,
> +				size_t linelen, enum grep_pat_token token,
> +				int field, struct strbuf *sb)
> +{
> +	struct grep_pat *pat;
> +	struct strbuf tmp_sb;
> +	regmatch_t tmp_match, match;
> +	char *buf, *eol, *color;
> +	int cflags = 0;
> +
> +	strbuf_init(&tmp_sb, linelen + 1);
> +	strbuf_add(&tmp_sb, line, linelen);
> +	buf = tmp_sb.buf;
> +	eol = buf + linelen;

This copy of the whole line is wasted when ...

> +	if (!opt || !want_color(opt->color))
> +		goto skip;

... we are not doing any coloring.  Can we avoid it?

> +	color = opt->colors[GREP_COLOR_MATCH_CONTEXT];

Why is the context (as opposed to selected) color used here?  

In general, when you allow the foreground color to be customized,
you must also make the background color customizable, so that the
end user can avoid low contrast combinations.  If we look at how
opt->color is handled in grep.c::show_line(), we can tell how both
match_color (foregroud) and line_color (background) are taken from
the palette that the end user can customize.  We should do the same
here, without assuming that 'color' here will have a good contrast
against the COLOR_RESET backdrop.

> +	for (;;) {
> +		match.rm_so = match.rm_eo = -1;
> +
> +		for (pat = (token == GREP_PATTERN_HEAD ?
> +			    opt->header_list : opt->pattern_list);
> +		     pat; pat = pat->next) {
> +			if (pat->token == token &&
> +			    (field == -1 || pat->field == field) &&
> +			    !regexec(&pat->regexp, buf, 1, &tmp_match,
> +				     cflags)) {
> +
> +				if ((match.rm_so >= 0 && match.rm_eo >= 0) &&
> +				    (tmp_match.rm_so > match.rm_so ||
> +				     (tmp_match.rm_so == match.rm_so &&
> +				      tmp_match.rm_eo < match.rm_eo)))
> +					continue;
> +
> +				match.rm_so = tmp_match.rm_so;
> +				match.rm_eo = tmp_match.rm_eo;
> +			}
> +		}

For the current commit to come this far, "git log --grep=<pattern>"
must have done the above exact regexec() to decide if we need to
call this function in the first place, right?

We must be redoing the same computation here, which is unfortunate
for two reasons.  Performance and maintainability.

How much extra cycles are we looking at with this additional code?
Depending on how inefficient this code makes, we may need to make it
an optional feature, turned off by default.

Worse yet, this can easily become a source of future bugs, since the
above matching logic must be kept in sync with the existing matching
code elsewhere.  I would not be surprised if the above logic is
already broken when various options that affects how pattern
matching works with "log --grep" (e.g. "--invert-grep", "-E", "-i")
are in use.

Also, this function is misnamed.  It is not a function that appends
matched line.  From the caller's point of view, it is used to append
each and every line it wants to add to sb [*], and they do not even
need or want to know how the callee decides which parts of the line
to paint in different colors.  Perhaps append_line_with_color() or
something?

	Side note.  By the way, why is the sb the last parameter to
	this function?  Usually functions that operate on a strbuf
	take it as the first argument.

> +		if (match.rm_so == match.rm_eo)
> +			break;
> +
> +		strbuf_grow(sb, strlen(color) + strlen(GIT_COLOR_RESET));
> +		strbuf_add(sb, buf, match.rm_so);
> +		strbuf_add(sb, color, strlen(color));
> +		strbuf_add(sb, buf + match.rm_so,
> +			   match.rm_eo - match.rm_so);
> +		strbuf_add(sb, GIT_COLOR_RESET,
> +			   strlen(GIT_COLOR_RESET));
> +		buf += match.rm_eo;
> +		cflags = REG_NOTBOL;
> +	}
> +
> +skip:
> +	strbuf_add(sb, buf, eol - buf);
> +
> +	strbuf_release(&tmp_sb);
> +}
> +

I like what the new feature wants to do, but I am not sure if this
is the best execution of the idea (yet).

Thanks.

  parent reply	other threads:[~2021-09-01 23:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-01 12:16 [PATCH] pretty: colorize pattern matches in commit messages Hamza Mahfooz
2021-09-01 17:17 ` Felipe Contreras
2021-09-01 23:26 ` Junio C Hamano [this message]
2021-09-05 12:05 Hamza Mahfooz
2021-09-05 18:01 ` Eric Sunshine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqwnnznahe.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=someguy@effective-light.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.