git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/1] Leading whitespace as a function identification heuristic?
@ 2020-09-23 21:59 Ryan Zoeller
  2020-09-23 21:59 ` [RFC 1/1] xdiff: use leading whitespace in function heuristic Ryan Zoeller
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Ryan Zoeller @ 2020-09-23 21:59 UTC (permalink / raw)
  To: git; +Cc: Ryan Zoeller

I've recently been annoyed by git's choice of lines when invoking
`git diff --function-context`. git isn't doing a _bad_ job per se,
given that it's using regular expressions to parse nonregular languages,
but in my opinion it could do better.

The issue I'm seeing is that constructs like nested functions cause
`git diff --function-context` to stop taking lines prematurely; a line
containing a function declaration has been reached, but it's not the one
I'm expecting git to stop at. This results in some of the function being
omitted from the output of `git diff --function-context`, and also
manifests itself as the incorrect function being listed in the hunk header
in general.

A trivial example: in a Python file containing the following contents,
modifying the indicated line will result in git picking `bar()` for the
function header, rather than `foo()`. This is obviously worse for longer
functions, or code with multiple inner functions or classes.

```
def foo(xs, i):
    def bar(x):
        return x + i

    # Intentionally not using a lambda,
    # because the issue occurs for inner functions.
    # Making this comment extra long so the function name
    # doesn't get automatically included in the context.

    return map(bar, xs) # Modify this line
```

Even without properly parsing the file as code, git could use
indentation as a heuristic to determine which lines containing functions
are relevant. I've used Python as an example here, but I think the
issue and heuristic apply to many languages.

This patch implements a proof of concept for this heuristic: only
lines which are indented less than the hunk are considered eligible
to contain function definitions. It only supports `git diff`,
ignoring things like `git grep` for simplicity (and introducing
`-1`s to the diff in leiu of handling them).

While I welcome feedback on the existing implementation, I'm really
requesting commentary on the following:

1. Is this indentation-aware function identification useful, and
    generally worth pursuing further?
2. What level of granularity would be desirable for enabling this?
    Something akin to xfuncname in .gitattributes, where it can be
    enabled per path?
3. How do we handle files with a mix of tabs and spaces?

Best,
Ryan

Ryan Zoeller (1):
  xdiff: use leading whitespace in function heuristic

 grep.c            |  2 +-
 line-range.c      |  2 +-
 xdiff-interface.c | 14 +++++++++++++-
 xdiff/xdiff.h     |  2 +-
 xdiff/xemit.c     | 23 +++++++++++++++++------
 5 files changed, 33 insertions(+), 10 deletions(-)

-- 
2.28.0.586.g47c91ef7fe



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-09-25 20:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 21:59 [RFC 0/1] Leading whitespace as a function identification heuristic? Ryan Zoeller
2020-09-23 21:59 ` [RFC 1/1] xdiff: use leading whitespace in function heuristic Ryan Zoeller
2020-09-24  6:45 ` [RFC 0/1] Leading whitespace as a function identification heuristic? Junio C Hamano
2020-09-24 21:17   ` Jeff King
2020-09-24 22:01     ` Ryan Zoeller
2020-09-25  9:11       ` Phillip Wood
2020-09-25 18:43         ` Jeff King
2020-09-25 19:01           ` Phillip Wood
2020-09-25 19:05             ` Jeff King
2020-09-25 18:12 ` Johannes Sixt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).