* [RFC 0/1] Leading whitespace as a function identification heuristic? @ 2020-09-23 21:59 Ryan Zoeller 2020-09-23 21:59 ` [RFC 1/1] xdiff: use leading whitespace in function heuristic Ryan Zoeller ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Ryan Zoeller @ 2020-09-23 21:59 UTC (permalink / raw) To: git; +Cc: Ryan Zoeller I've recently been annoyed by git's choice of lines when invoking `git diff --function-context`. git isn't doing a _bad_ job per se, given that it's using regular expressions to parse nonregular languages, but in my opinion it could do better. The issue I'm seeing is that constructs like nested functions cause `git diff --function-context` to stop taking lines prematurely; a line containing a function declaration has been reached, but it's not the one I'm expecting git to stop at. This results in some of the function being omitted from the output of `git diff --function-context`, and also manifests itself as the incorrect function being listed in the hunk header in general. A trivial example: in a Python file containing the following contents, modifying the indicated line will result in git picking `bar()` for the function header, rather than `foo()`. This is obviously worse for longer functions, or code with multiple inner functions or classes. ``` def foo(xs, i): def bar(x): return x + i # Intentionally not using a lambda, # because the issue occurs for inner functions. # Making this comment extra long so the function name # doesn't get automatically included in the context. return map(bar, xs) # Modify this line ``` Even without properly parsing the file as code, git could use indentation as a heuristic to determine which lines containing functions are relevant. I've used Python as an example here, but I think the issue and heuristic apply to many languages. This patch implements a proof of concept for this heuristic: only lines which are indented less than the hunk are considered eligible to contain function definitions. It only supports `git diff`, ignoring things like `git grep` for simplicity (and introducing `-1`s to the diff in leiu of handling them). While I welcome feedback on the existing implementation, I'm really requesting commentary on the following: 1. Is this indentation-aware function identification useful, and generally worth pursuing further? 2. What level of granularity would be desirable for enabling this? Something akin to xfuncname in .gitattributes, where it can be enabled per path? 3. How do we handle files with a mix of tabs and spaces? Best, Ryan Ryan Zoeller (1): xdiff: use leading whitespace in function heuristic grep.c | 2 +- line-range.c | 2 +- xdiff-interface.c | 14 +++++++++++++- xdiff/xdiff.h | 2 +- xdiff/xemit.c | 23 +++++++++++++++++------ 5 files changed, 33 insertions(+), 10 deletions(-) -- 2.28.0.586.g47c91ef7fe ^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC 1/1] xdiff: use leading whitespace in function heuristic 2020-09-23 21:59 [RFC 0/1] Leading whitespace as a function identification heuristic? Ryan Zoeller @ 2020-09-23 21:59 ` Ryan Zoeller 2020-09-24 6:45 ` [RFC 0/1] Leading whitespace as a function identification heuristic? Junio C Hamano 2020-09-25 18:12 ` Johannes Sixt 2 siblings, 0 replies; 10+ messages in thread From: Ryan Zoeller @ 2020-09-23 21:59 UTC (permalink / raw) To: git; +Cc: Ryan Zoeller The regular expressions specified in userdiff.c, as well as user-defined expressions, allow git to detect which lines of code which declare functions (as well as other notable lines, such as class declarations). Although useful, these regular expressions can't identify which function a line of code belongs to, only the closest function to it. Languages which allow for nested functions -- or functions inside of classes -- can trip this mechanism up. Since many languages use indentation to associate lines of code with a function (either semantically or cosmetically), we can use indentation as an additional heuristic for identifying the owning function. Specifically, this assumes code belongs to a function which is less indented than it. Signed-off-by: Ryan Zoeller <rtzoeller@rtzoeller.com> --- grep.c | 2 +- line-range.c | 2 +- xdiff-interface.c | 14 +++++++++++++- xdiff/xdiff.h | 2 +- xdiff/xemit.c | 23 +++++++++++++++++------ 5 files changed, 33 insertions(+), 10 deletions(-) diff --git a/grep.c b/grep.c index 54af9f813e..3281f19977 100644 --- a/grep.c +++ b/grep.c @@ -1555,7 +1555,7 @@ static int match_funcname(struct grep_opt *opt, struct grep_source *gs, char *bo if (xecfg) { char buf[1]; - return xecfg->find_func(bol, eol - bol, buf, 1, + return xecfg->find_func(bol, eol - bol, buf, 1, -1, xecfg->find_func_priv) >= 0; } diff --git a/line-range.c b/line-range.c index 9b50583dc0..eb9540bc76 100644 --- a/line-range.c +++ b/line-range.c @@ -119,7 +119,7 @@ static int match_funcname(xdemitconf_t *xecfg, const char *bol, const char *eol) { if (xecfg) { char buf[1]; - return xecfg->find_func(bol, eol - bol, buf, 1, + return xecfg->find_func(bol, eol - bol, buf, 1, -1, xecfg->find_func_priv) >= 0; } diff --git a/xdiff-interface.c b/xdiff-interface.c index 4d20069302..d93cb5c72e 100644 --- a/xdiff-interface.c +++ b/xdiff-interface.c @@ -201,7 +201,7 @@ struct ff_regs { }; static long ff_regexp(const char *line, long len, - char *buffer, long buffer_size, void *priv) + char *buffer, long buffer_size, long max_leading_spaces, void *priv) { struct ff_regs *regs = priv; regmatch_t pmatch[2]; @@ -216,6 +216,18 @@ static long ff_regexp(const char *line, long len, len--; } + // TODO: Is it faster to check whitespace only after matching the regex? + if (max_leading_spaces >= 0) { + long leading_spaces; + for (leading_spaces = 0; leading_spaces < len + && leading_spaces <= max_leading_spaces + && isspace(line[leading_spaces]); leading_spaces++) + ; + + if (leading_spaces > max_leading_spaces) + return -1; + } + for (i = 0; i < regs->nr; i++) { struct ff_reg *reg = regs->array + i; if (!regexec_buf(®->re, line, len, 2, pmatch, 0)) { diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h index 032e3a9f41..f78c30c527 100644 --- a/xdiff/xdiff.h +++ b/xdiff/xdiff.h @@ -93,7 +93,7 @@ typedef struct s_xdemitcb { int (*out_line)(void *, mmbuffer_t *, int); } xdemitcb_t; -typedef long (*find_func_t)(const char *line, long line_len, char *buffer, long buffer_size, void *priv); +typedef long (*find_func_t)(const char *line, long line_len, char *buffer, long buffer_size, long max_leading_spaces, void *priv); typedef int (*xdl_emit_hunk_consume_func_t)(long start_a, long count_a, long start_b, long count_b, diff --git a/xdiff/xemit.c b/xdiff/xemit.c index 9d7d6c5087..1de68008f9 100644 --- a/xdiff/xemit.c +++ b/xdiff/xemit.c @@ -95,7 +95,7 @@ xdchange_t *xdl_get_hunk(xdchange_t **xscr, xdemitconf_t const *xecfg) } -static long def_ff(const char *rec, long len, char *buf, long sz, void *priv) +static long def_ff(const char *rec, long len, char *buf, long sz, long max_leading_spaces, void *priv) { if (len > 0 && (isalpha((unsigned char)*rec) || /* identifier? */ @@ -112,19 +112,19 @@ static long def_ff(const char *rec, long len, char *buf, long sz, void *priv) } static long match_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri, - char *buf, long sz) + char *buf, long sz, long max_leading_spaces) { const char *rec; long len = xdl_get_rec(xdf, ri, &rec); if (!xecfg->find_func) - return def_ff(rec, len, buf, sz, xecfg->find_func_priv); - return xecfg->find_func(rec, len, buf, sz, xecfg->find_func_priv); + return def_ff(rec, len, buf, sz, max_leading_spaces, xecfg->find_func_priv); + return xecfg->find_func(rec, len, buf, sz, max_leading_spaces, xecfg->find_func_priv); } static int is_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri) { char dummy[1]; - return match_func_rec(xdf, xecfg, ri, dummy, sizeof(dummy)) >= 0; + return match_func_rec(xdf, xecfg, ri, dummy, -1, sizeof(dummy)) >= 0; } struct func_line { @@ -137,12 +137,23 @@ static long get_func_line(xdfenv_t *xe, xdemitconf_t const *xecfg, { long l, size, step = (start > limit) ? -1 : 1; char *buf, dummy[1]; + long leading_spaces; + + if (start - step >= 0 && start - step < xe->xdf1.nrec) { + xrecord_t *first_line = xe->xdf1.recs[start - step]; + + for (leading_spaces = 0; first_line->ptr[leading_spaces] + && isspace(first_line->ptr[leading_spaces]); leading_spaces++) + ; + } else { + leading_spaces = 0; + } buf = func_line ? func_line->buf : dummy; size = func_line ? sizeof(func_line->buf) : sizeof(dummy); for (l = start; l != limit && 0 <= l && l < xe->xdf1.nrec; l += step) { - long len = match_func_rec(&xe->xdf1, xecfg, l, buf, size); + long len = match_func_rec(&xe->xdf1, xecfg, l, buf, size, leading_spaces - 1); if (len >= 0) { if (func_line) func_line->len = len; -- 2.28.0.586.g47c91ef7fe ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC 0/1] Leading whitespace as a function identification heuristic? 2020-09-23 21:59 [RFC 0/1] Leading whitespace as a function identification heuristic? Ryan Zoeller 2020-09-23 21:59 ` [RFC 1/1] xdiff: use leading whitespace in function heuristic Ryan Zoeller @ 2020-09-24 6:45 ` Junio C Hamano 2020-09-24 21:17 ` Jeff King 2020-09-25 18:12 ` Johannes Sixt 2 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2020-09-24 6:45 UTC (permalink / raw) To: Ryan Zoeller; +Cc: git Ryan Zoeller <rtzoeller@rtzoeller.com> writes: > 1. Is this indentation-aware function identification useful, and > generally worth pursuing further? I cannot shake the feeling off that this is being overly generous and inviting for misidentification for languages whose usual convention is not to nest and indent the definitions freely. IOW, why can't the "we allow leading whitespaces" a per-language thing? IOW, why do we even need any new code---shouldn't it be just the matter of defining xfuncname patterns for such a language with nested and indented definitions? So, a mild Meh from me at this point. I may change my mind in the morning, though ;-) Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 0/1] Leading whitespace as a function identification heuristic? 2020-09-24 6:45 ` [RFC 0/1] Leading whitespace as a function identification heuristic? Junio C Hamano @ 2020-09-24 21:17 ` Jeff King 2020-09-24 22:01 ` Ryan Zoeller 0 siblings, 1 reply; 10+ messages in thread From: Jeff King @ 2020-09-24 21:17 UTC (permalink / raw) To: Junio C Hamano; +Cc: Ryan Zoeller, git On Wed, Sep 23, 2020 at 11:45:55PM -0700, Junio C Hamano wrote: > Ryan Zoeller <rtzoeller@rtzoeller.com> writes: > > > 1. Is this indentation-aware function identification useful, and > > generally worth pursuing further? > > I cannot shake the feeling off that this is being overly generous > and inviting for misidentification for languages whose usual > convention is not to nest and indent the definitions freely. > > IOW, why can't the "we allow leading whitespaces" a per-language > thing? IOW, why do we even need any new code---shouldn't it be just > the matter of defining xfuncname patterns for such a language with > nested and indented definitions? If I understand the problem correctly, I don't think a static regex can accomplish this, because you need context from the original line. E.g., consider something like this: void foo() { void bar() { ...code 1... } ...code 2... } If we change one of the lines of code, then we find the function header by walking backwards up the lines, evaluating a regex for each line. But for "code 2", how do we know to keep walking past bar() and up to foo()? Or more specifically, what is different when evaluating a change from "code 2" that is different than when we would evaluate "code 1"? If the only input to the question of "is this line a function header" is the regex from the config, then changes to either line of code must produce the same answer (either bar() if we allow leading whitespace, or foo() if we do not). So I think Ryan's proposal is to give that search an extra piece of information: the indentation of the original changed line. Which is enough to differentiate the two cases. If I understand the patch correctly, it is always picking the first line where indentation is lessened (and which matches the funcname pattern). That works out of the box with existing funcname patterns, which is nice. Though I wonder if there are cases where the funcname regex could use the information more flexibly (i.e., some marker in the regex that means "match less than this many spaces" or something). I do agree that this should not be on automatically for all languages. Some file formats may want to show a header that's at the same indentation as the change. Adding a diff.foo.funcnameIndentation config option would be one solution. But requiring the funcname pattern to make explicit use of the information is another (and would allow a format to match some things at one indentation level and some at another; but again, I'm hand-waving a bit on whether this level of flexibility is even useful). -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 0/1] Leading whitespace as a function identification heuristic? 2020-09-24 21:17 ` Jeff King @ 2020-09-24 22:01 ` Ryan Zoeller 2020-09-25 9:11 ` Phillip Wood 0 siblings, 1 reply; 10+ messages in thread From: Ryan Zoeller @ 2020-09-24 22:01 UTC (permalink / raw) To: Jeff King, Junio C Hamano; +Cc: git On 9/24/20 4:17 PM, Jeff King wrote: > > > If I understand the problem correctly, I don't think a static regex can > accomplish this, because you need context from the original line. E.g., > consider something like this: > > void foo() { > void bar() { > ...code 1... > } > ...code 2... > } > > If we change one of the lines of code, then we find the function header > by walking backwards up the lines, evaluating a regex for each line. But > for "code 2", how do we know to keep walking past bar() and up to foo()? > Or more specifically, what is different when evaluating a change from > "code 2" that is different than when we would evaluate "code 1"? > > If the only input to the question of "is this line a function header" is > the regex from the config, then changes to either line of code must > produce the same answer (either bar() if we allow leading whitespace, or > foo() if we do not). > > So I think Ryan's proposal is to give that search an extra piece of > information: the indentation of the original changed line. Which is > enough to differentiate the two cases. You've explained this better than I could have. Thanks. > If I understand the patch correctly, it is always picking the first line > where indentation is lessened (and which matches the funcname pattern). > That works out of the box with existing funcname patterns, which is > nice. Though I wonder if there are cases where the funcname regex could > use the information more flexibly (i.e., some marker in the regex that > means "match less than this many spaces" or something). My original intent was to work with existing funcname expressions without modifications. Some of the funcname regexes are rather impenetrable at first glance, and not requiring modifications seemed like an easy win. Especially for funcname patterns specified by a user, I assumed it would be easier to set an additional configuration option than rewrite an existing regex to handle this complexity. > I do agree that this should not be on automatically for all languages. > Some file formats may want to show a header that's at the same > indentation as the change. Adding a diff.foo.funcnameIndentation config > option would be one solution. But requiring the funcname pattern to make > explicit use of the information is another (and would allow a format to > match some things at one indentation level and some at another; but > again, I'm hand-waving a bit on whether this level of flexibility is > even useful) If the configuration option is implemented correctly (i.e. as an enum rather than a binary toggle), I think we could leave the door open for a more flexible approach in the future, without needing to answer how useful that flexibility is now. I couldn't think of any situations requiring this flexibility, but that doesn't mean they don't exist. Ryan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 0/1] Leading whitespace as a function identification heuristic? 2020-09-24 22:01 ` Ryan Zoeller @ 2020-09-25 9:11 ` Phillip Wood 2020-09-25 18:43 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Phillip Wood @ 2020-09-25 9:11 UTC (permalink / raw) To: Ryan Zoeller, Jeff King, Junio C Hamano; +Cc: git On 24/09/2020 23:01, Ryan Zoeller wrote: > On 9/24/20 4:17 PM, Jeff King wrote: >> >> >> If I understand the problem correctly, I don't think a static regex can >> accomplish this, because you need context from the original line. E.g., >> consider something like this: >> >> void foo() { >> void bar() { >> ...code 1... >> } >> ...code 2... >> } >> If I've understood correctly when ...code 2... contains changes that are themselves indented then we'll pick the wrong function header for example void foo() { void bar() { ...code 1... } for (...) { // if this line is changed we pick bar rather // than foo because it is the first function // header with a smaller indentation than the // first changed line. } } Unfortunately I suspect code like that is not uncommon and the diff would regress with this simple heuristic. It might be possible to recalculate the required indentation as we walk backwards up the file though, so when we hit the "for" line we reduce the maximum indentation allowed for a match and so skip "bar" as a function header. Best Wishes Phillip >> If we change one of the lines of code, then we find the function header >> by walking backwards up the lines, evaluating a regex for each line. But >> for "code 2", how do we know to keep walking past bar() and up to foo()? >> Or more specifically, what is different when evaluating a change from >> "code 2" that is different than when we would evaluate "code 1"? >> >> If the only input to the question of "is this line a function header" is >> the regex from the config, then changes to either line of code must >> produce the same answer (either bar() if we allow leading whitespace, or >> foo() if we do not). >> >> So I think Ryan's proposal is to give that search an extra piece of >> information: the indentation of the original changed line. Which is >> enough to differentiate the two cases. > > You've explained this better than I could have. Thanks. > >> If I understand the patch correctly, it is always picking the first line >> where indentation is lessened (and which matches the funcname pattern). >> That works out of the box with existing funcname patterns, which is >> nice. Though I wonder if there are cases where the funcname regex could >> use the information more flexibly (i.e., some marker in the regex that >> means "match less than this many spaces" or something). > > My original intent was to work with existing funcname expressions > without modifications. Some of the funcname regexes are rather > impenetrable at first glance, and not requiring modifications seemed > like an easy win. > > Especially for funcname patterns specified by a user, I assumed it > would be easier to set an additional configuration option than > rewrite an existing regex to handle this complexity. > >> I do agree that this should not be on automatically for all languages. >> Some file formats may want to show a header that's at the same >> indentation as the change. Adding a diff.foo.funcnameIndentation config >> option would be one solution. But requiring the funcname pattern to make >> explicit use of the information is another (and would allow a format to >> match some things at one indentation level and some at another; but >> again, I'm hand-waving a bit on whether this level of flexibility is >> even useful) > If the configuration option is implemented correctly (i.e. as an enum > rather than a binary toggle), I think we could leave the door open for > a more flexible approach in the future, without needing to answer how > useful that flexibility is now. I couldn't think of any situations > requiring this flexibility, but that doesn't mean they don't exist. > > Ryan > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 0/1] Leading whitespace as a function identification heuristic? 2020-09-25 9:11 ` Phillip Wood @ 2020-09-25 18:43 ` Jeff King 2020-09-25 19:01 ` Phillip Wood 0 siblings, 1 reply; 10+ messages in thread From: Jeff King @ 2020-09-25 18:43 UTC (permalink / raw) To: phillip.wood; +Cc: Ryan Zoeller, Junio C Hamano, git On Fri, Sep 25, 2020 at 10:11:33AM +0100, Phillip Wood wrote: > If I've understood correctly when ...code 2... contains changes that are > themselves indented then we'll pick the wrong function header for example > > void foo() { > void bar() { > ...code 1... > } > for (...) { > // if this line is changed we pick bar rather > // than foo because it is the first function > // header with a smaller indentation than the > // first changed line. > } > } > > Unfortunately I suspect code like that is not uncommon and the diff would > regress with this simple heuristic. It might be possible to recalculate the > required indentation as we walk backwards up the file though, so when we hit > the "for" line we reduce the maximum indentation allowed for a match and so > skip "bar" as a function header. Thanks, that's a great counter-example I hadn't considered. Yes, I agree that adjusting the desired indentation as we walk backwards would work. That's assuming indentation is hierarchical, but I think that's implied by the existence of this feature at all. Another possible corner case: tabs vs spaces. If I have: <space><space><space><space><space><space><space><space>foo <tab><tab>bar which is more indented? Counting isspace(), it is the first one. But visually, it would _usually_ be the second one. But of course it would depend on your tabstops. The above example is obviously stupid and contrived, but I wonder if there are legitimate confusing cases where people mix tabs and spaces (e.g., mixed tabs and spaces to align function parameters, etc). -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 0/1] Leading whitespace as a function identification heuristic? 2020-09-25 18:43 ` Jeff King @ 2020-09-25 19:01 ` Phillip Wood 2020-09-25 19:05 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Phillip Wood @ 2020-09-25 19:01 UTC (permalink / raw) To: Jeff King, phillip.wood; +Cc: Ryan Zoeller, Junio C Hamano, git Hi Peff On 25/09/2020 19:43, Jeff King wrote: > On Fri, Sep 25, 2020 at 10:11:33AM +0100, Phillip Wood wrote: > >> If I've understood correctly when ...code 2... contains changes that are >> themselves indented then we'll pick the wrong function header for example >> >> void foo() { >> void bar() { >> ...code 1... >> } >> for (...) { >> // if this line is changed we pick bar rather >> // than foo because it is the first function >> // header with a smaller indentation than the >> // first changed line. >> } >> } >> >> Unfortunately I suspect code like that is not uncommon and the diff would >> regress with this simple heuristic. It might be possible to recalculate the >> required indentation as we walk backwards up the file though, so when we hit >> the "for" line we reduce the maximum indentation allowed for a match and so >> skip "bar" as a function header. > > Thanks, that's a great counter-example I hadn't considered. > > Yes, I agree that adjusting the desired indentation as we walk backwards > would work. That's assuming indentation is hierarchical, but I think > that's implied by the existence of this feature at all. > > Another possible corner case: tabs vs spaces. If I have: > > <space><space><space><space><space><space><space><space>foo > <tab><tab>bar > > which is more indented? Counting isspace(), it is the first one. But > visually, it would _usually_ be the second one. But of course it would > depend on your tabstops. > > The above example is obviously stupid and contrived, but I wonder if > there are legitimate confusing cases where people mix tabs and spaces > (e.g., mixed tabs and spaces to align function parameters, etc). To calculate the indentation for diff --color-moved-ws=allow-indentation-change in fill_es_indent_data() we use the tabwidth whitespace attribute to calculate the width of a tab in spaces int tab_width = es->flags & WS_TAB_WIDTH_MASK; /* calculate the visual width of indentation */ while(1) { if (s[off] == ' ') { width++; off++; } else if (s[off] == '\t') { width += tab_width - (width % tab_width); while (s[++off] == '\t') width += tab_width; } else { break; } } I think we could probably do something similar here assuming it is the visual width of the indentation that we care about. Best Wishes Phillip > -Peff > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 0/1] Leading whitespace as a function identification heuristic? 2020-09-25 19:01 ` Phillip Wood @ 2020-09-25 19:05 ` Jeff King 0 siblings, 0 replies; 10+ messages in thread From: Jeff King @ 2020-09-25 19:05 UTC (permalink / raw) To: phillip.wood; +Cc: Ryan Zoeller, Junio C Hamano, git On Fri, Sep 25, 2020 at 08:01:07PM +0100, Phillip Wood wrote: > > Another possible corner case: tabs vs spaces. If I have: > > > > <space><space><space><space><space><space><space><space>foo > > <tab><tab>bar > > > > which is more indented? Counting isspace(), it is the first one. But > > visually, it would _usually_ be the second one. But of course it would > > depend on your tabstops. > > > > The above example is obviously stupid and contrived, but I wonder if > > there are legitimate confusing cases where people mix tabs and spaces > > (e.g., mixed tabs and spaces to align function parameters, etc). > > To calculate the indentation for diff > --color-moved-ws=allow-indentation-change in fill_es_indent_data() we use > the tabwidth whitespace attribute to calculate the width of a tab in spaces > [...] Ah, right, I forgot we already had "tabwidth" config to deal with this. I agree we could make use of that same technique here. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 0/1] Leading whitespace as a function identification heuristic? 2020-09-23 21:59 [RFC 0/1] Leading whitespace as a function identification heuristic? Ryan Zoeller 2020-09-23 21:59 ` [RFC 1/1] xdiff: use leading whitespace in function heuristic Ryan Zoeller 2020-09-24 6:45 ` [RFC 0/1] Leading whitespace as a function identification heuristic? Junio C Hamano @ 2020-09-25 18:12 ` Johannes Sixt 2 siblings, 0 replies; 10+ messages in thread From: Johannes Sixt @ 2020-09-25 18:12 UTC (permalink / raw) To: Ryan Zoeller; +Cc: git Am 23.09.20 um 23:59 schrieb Ryan Zoeller: > I've recently been annoyed by git's choice of lines when invoking > `git diff --function-context`. git isn't doing a _bad_ job per se, > given that it's using regular expressions to parse nonregular languages, > but in my opinion it could do better. > > The issue I'm seeing is that constructs like nested functions cause > `git diff --function-context` to stop taking lines prematurely; a line > containing a function declaration has been reached, but it's not the one > I'm expecting git to stop at. This results in some of the function being > omitted from the output of `git diff --function-context`, and also > manifests itself as the incorrect function being listed in the hunk header > in general. There are two mechanisms at play here: The first is that the context above and below a point of interest is extended to the next function. This is relevant for both `git diff` and `git grep` when --function-context is specified. The second is the determination of the hunk header. It is only relevant for `git diff`. The purpose of the hunk header is to get an estimation of where the hunk is located. It is most helpful in this case when the closest anchor point is listed; it is not important to find where the semantically surrounding nesting context begins. Therefore, I suggest that you aim only for the --function-context case, because there it makes sense to find the correctly nested surrounding context if possible. -- Hannes ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-09-25 20:36 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-23 21:59 [RFC 0/1] Leading whitespace as a function identification heuristic? Ryan Zoeller 2020-09-23 21:59 ` [RFC 1/1] xdiff: use leading whitespace in function heuristic Ryan Zoeller 2020-09-24 6:45 ` [RFC 0/1] Leading whitespace as a function identification heuristic? Junio C Hamano 2020-09-24 21:17 ` Jeff King 2020-09-24 22:01 ` Ryan Zoeller 2020-09-25 9:11 ` Phillip Wood 2020-09-25 18:43 ` Jeff King 2020-09-25 19:01 ` Phillip Wood 2020-09-25 19:05 ` Jeff King 2020-09-25 18:12 ` Johannes Sixt
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.