git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Zoeller <rtzoeller@rtzoeller.com>
To: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [RFC 0/1] Leading whitespace as a function identification heuristic?
Date: Thu, 24 Sep 2020 22:01:48 +0000	[thread overview]
Message-ID: <c759b472-c889-ef85-bcf2-6d9cbc588b51@rtzoeller.com> (raw)
In-Reply-To: <20200924211725.GA3103003@coredump.intra.peff.net>

On 9/24/20 4:17 PM, Jeff King wrote:
> 
> 
> If I understand the problem correctly, I don't think a static regex can
> accomplish this, because you need context from the original line. E.g.,
> consider something like this:
> 
>    void foo() {
> 	void bar() {
> 		...code 1...
> 	}
> 	...code 2...
>    }
> 
> If we change one of the lines of code, then we find the function header
> by walking backwards up the lines, evaluating a regex for each line. But
> for "code 2", how do we know to keep walking past bar() and up to foo()?
> Or more specifically, what is different when evaluating a change from
> "code 2" that is different than when we would evaluate "code 1"?
> 
> If the only input to the question of "is this line a function header" is
> the regex from the config, then changes to either line of code must
> produce the same answer (either bar() if we allow leading whitespace, or
> foo() if we do not).
> 
> So I think Ryan's proposal is to give that search an extra piece of
> information: the indentation of the original changed line. Which is
> enough to differentiate the two cases.

You've explained this better than I could have. Thanks.

> If I understand the patch correctly, it is always picking the first line
> where indentation is lessened (and which matches the funcname pattern).
> That works out of the box with existing funcname patterns, which is
> nice. Though I wonder if there are cases where the funcname regex could
> use the information more flexibly (i.e., some marker in the regex that
> means "match less than this many spaces" or something).

My original intent was to work with existing funcname expressions 
without modifications. Some of the funcname regexes are rather 
impenetrable at first glance, and not requiring modifications seemed 
like an easy win.

Especially for funcname patterns specified by a user, I assumed it
would be easier to set an additional configuration option than
rewrite an existing regex to handle this complexity.

> I do agree that this should not be on automatically for all languages.
> Some file formats may want to show a header that's at the same
> indentation as the change. Adding a diff.foo.funcnameIndentation config
> option would be one solution. But requiring the funcname pattern to make
> explicit use of the information is another (and would allow a format to
> match some things at one indentation level and some at another; but
> again, I'm hand-waving a bit on whether this level of flexibility is
> even useful)
If the configuration option is implemented correctly (i.e. as an enum 
rather than a binary toggle), I think we could leave the door open for
a more flexible approach in the future, without needing to answer how 
useful that flexibility is now. I couldn't think of any situations 
requiring this flexibility, but that doesn't mean they don't exist.

Ryan


  reply	other threads:[~2020-09-24 22:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-23 21:59 Ryan Zoeller
2020-09-23 21:59 ` [RFC 1/1] xdiff: use leading whitespace in function heuristic Ryan Zoeller
2020-09-24  6:45 ` [RFC 0/1] Leading whitespace as a function identification heuristic? Junio C Hamano
2020-09-24 21:17   ` Jeff King
2020-09-24 22:01     ` Ryan Zoeller [this message]
2020-09-25  9:11       ` Phillip Wood
2020-09-25 18:43         ` Jeff King
2020-09-25 19:01           ` Phillip Wood
2020-09-25 19:05             ` Jeff King
2020-09-25 18:12 ` Johannes Sixt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c759b472-c889-ef85-bcf2-6d9cbc588b51@rtzoeller.com \
    --to=rtzoeller@rtzoeller.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --subject='Re: [RFC 0/1] Leading whitespace as a function identification heuristic?' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).