git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Avoid false positives in label detection in cpp diff hunk header regex.
@ 2013-03-22 13:43 Vadim Zeitlin
  2013-03-22 15:02 ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Vadim Zeitlin @ 2013-03-22 13:43 UTC (permalink / raw)
  To: git

A C++ method start such as

        void
        foo::bar()

wasn't recognized by cpp diff driver as it mistakenly included "foo::bar" as a
label. However the colon in a label can't be followed by another colon, so
recognize this case specially to correctly detect C++ methods using this style.

Signed-off-by: Vadim Zeitlin <vz-git@zeitlins.org>
---
 userdiff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/userdiff.c b/userdiff.c
index ea43a03..9415586 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -125,7 +125,7 @@ PATTERNS("tex",
"^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
         "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"),
 PATTERNS("cpp",
         /* Jump targets or access declarations */
-        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
+        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:([^:].*$|$)\n"
         /* C/++ functions/methods at top level */
         "^([A-Za-z_][A-Za-z_0-9]*([ \t*]+[A-Za-z_][A-Za-z_0-9]*([ \t]*::[
\t]*[^[:space:]]+)?){1,}[ \t]*\\([^;]*)$\n"
         /* compound type at top level */
--
1.8.2.135.g7b592fa

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] Avoid false positives in label detection in cpp diff hunk header regex.
  2013-03-22 13:43 [PATCH] Avoid false positives in label detection in cpp diff hunk header regex Vadim Zeitlin
@ 2013-03-22 15:02 ` Junio C Hamano
  2013-03-22 17:27   ` Vadim Zeitlin
  2013-03-22 21:55   ` Johannes Sixt
  0 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2013-03-22 15:02 UTC (permalink / raw)
  To: Vadim Zeitlin; +Cc: git

Vadim Zeitlin <vz-git@zeitlins.org> writes:

> A C++ method start such as
>
>         void
>         foo::bar()
>
> wasn't recognized by cpp diff driver as it mistakenly included "foo::bar" as a
> label. However the colon in a label can't be followed by another colon, so
> recognize this case specially to correctly detect C++ methods using this style.
>
> Signed-off-by: Vadim Zeitlin <vz-git@zeitlins.org>
> ---
>  userdiff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/userdiff.c b/userdiff.c
> index ea43a03..9415586 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -125,7 +125,7 @@ PATTERNS("tex",
> "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
>          "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"),
>  PATTERNS("cpp",
>          /* Jump targets or access declarations */
> -        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
> +        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:([^:].*$|$)\n"

Hmm.  Wouldn't "find a word (possibly after indentation), colon and
then either a non-colon or end of line" be sufficient and simpler?
iow, something like...

       "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:([^:]|$)"

>          /* C/++ functions/methods at top level */
>          "^([A-Za-z_][A-Za-z_0-9]*([ \t*]+[A-Za-z_][A-Za-z_0-9]*([ \t]*::[
> \t]*[^[:space:]]+)?){1,}[ \t]*\\([^;]*)$\n"
>          /* compound type at top level */
> --
> 1.8.2.135.g7b592fa

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Avoid false positives in label detection in cpp diff hunk header regex.
  2013-03-22 15:02 ` Junio C Hamano
@ 2013-03-22 17:27   ` Vadim Zeitlin
  2013-03-22 21:55   ` Johannes Sixt
  1 sibling, 0 replies; 9+ messages in thread
From: Vadim Zeitlin @ 2013-03-22 17:27 UTC (permalink / raw)
  To: git

Junio C Hamano <gitster <at> pobox.com> writes:

> 
> Vadim Zeitlin <vz-git <at> zeitlins.org> writes:
... 
> > diff --git a/userdiff.c b/userdiff.c
> > index ea43a03..9415586 100644
> > --- a/userdiff.c
> > +++ b/userdiff.c
> > @@ -125,7 +125,7 @@ PATTERNS("tex",
> > "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
> >          "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"),
> >  PATTERNS("cpp",
> >          /* Jump targets or access declarations */
> > -        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
> > +        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:([^:].*$|$)\n"
> 
> Hmm.  Wouldn't "find a word (possibly after indentation), colon and
> then either a non-colon or end of line" be sufficient and simpler?
> iow, something like...
> 
>        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:([^:]|$)"

 This works too, of course. I didn't know why did the original regex
contain ".*$" part so I decided to keep it but your version is indeed
how I would have written it myself if I were doing it from scratch.

 Should I resubmit an updated patch or could you please just apply
your version?

 TIA!
VZ

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Avoid false positives in label detection in cpp diff hunk header regex.
  2013-03-22 15:02 ` Junio C Hamano
  2013-03-22 17:27   ` Vadim Zeitlin
@ 2013-03-22 21:55   ` Johannes Sixt
  2013-03-22 22:32     ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2013-03-22 21:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Vadim Zeitlin, git

Am 22.03.2013 16:02, schrieb Junio C Hamano:
> Vadim Zeitlin <vz-git@zeitlins.org> writes:
> 
>> A C++ method start such as
>>
>>         void
>>         foo::bar()
>>
>> wasn't recognized by cpp diff driver as it mistakenly included "foo::bar" as a
>> label. However the colon in a label can't be followed by another colon, so
>> recognize this case specially to correctly detect C++ methods using this style.

Much appreciated!

>>  PATTERNS("cpp",
>>          /* Jump targets or access declarations */
>> -        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
>> +        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:([^:].*$|$)\n"
> 
> Hmm.  Wouldn't "find a word (possibly after indentation), colon and
> then either a non-colon or end of line" be sufficient and simpler?
> iow, something like...
> 
>        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:([^:]|$)"

Yes, indeed. We don't need to match more than necessary in a negative
pattern. The \n must still remain, though.

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Avoid false positives in label detection in cpp diff hunk header regex.
  2013-03-22 21:55   ` Johannes Sixt
@ 2013-03-22 22:32     ` Junio C Hamano
  2013-03-22 23:11       ` Johannes Sixt
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2013-03-22 22:32 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Vadim Zeitlin, git

Johannes Sixt <j6t@kdbg.org> writes:

> Am 22.03.2013 16:02, schrieb Junio C Hamano:
>> Vadim Zeitlin <vz-git@zeitlins.org> writes:
>> 
>>> A C++ method start such as
>>>
>>>         void
>>>         foo::bar()
>>>
>>> wasn't recognized by cpp diff driver as it mistakenly included "foo::bar" as a
>>> label. However the colon in a label can't be followed by another colon, so
>>> recognize this case specially to correctly detect C++ methods using this style.
>
> Much appreciated!
>
>>>  PATTERNS("cpp",
>>>          /* Jump targets or access declarations */
>>> -        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
>>> +        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:([^:].*$|$)\n"
>> 
>> Hmm.  Wouldn't "find a word (possibly after indentation), colon and
>> then either a non-colon or end of line" be sufficient and simpler?
>> iow, something like...
>> 
>>        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:([^:]|$)"
>
> Yes, indeed. We don't need to match more than necessary in a negative
> pattern. The \n must still remain, though.

... because \n is not for matching against the text, but merely to
separate the regular expressions, right?

I also wonder if 

	label :

should also be caught, or is it too weird format to be worth
supporting?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Avoid false positives in label detection in cpp diff hunk header regex.
  2013-03-22 22:32     ` Junio C Hamano
@ 2013-03-22 23:11       ` Johannes Sixt
  2013-03-23  0:38         ` Vadim Zeitlin
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2013-03-22 23:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Vadim Zeitlin, git

Am 22.03.2013 23:32, schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
> 
>> Am 22.03.2013 16:02, schrieb Junio C Hamano:
>>> Vadim Zeitlin <vz-git@zeitlins.org> writes:
>>>
>>>> A C++ method start such as
>>>>
>>>>         void
>>>>         foo::bar()
>>>>
>>>> wasn't recognized by cpp diff driver as it mistakenly included "foo::bar" as a
>>>> label. However the colon in a label can't be followed by another colon, so
>>>> recognize this case specially to correctly detect C++ methods using this style.
>>
>> Much appreciated!
>>
>>>>  PATTERNS("cpp",
>>>>          /* Jump targets or access declarations */
>>>> -        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
>>>> +        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:([^:].*$|$)\n"
>>>
>>> Hmm.  Wouldn't "find a word (possibly after indentation), colon and
>>> then either a non-colon or end of line" be sufficient and simpler?
>>> iow, something like...
>>>
>>>        "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:([^:]|$)"
>>
>> Yes, indeed. We don't need to match more than necessary in a negative
>> pattern. The \n must still remain, though.
> 
> ... because \n is not for matching against the text, but merely to
> separate the regular expressions, right?

Correct.

> I also wonder if 
> 
> 	label :
> 
> should also be caught, or is it too weird format to be worth
> supporting?

It's easy to support, by inserting another [ \t] before the first colon.
So, why not?

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Avoid false positives in label detection in cpp diff hunk header regex.
  2013-03-22 23:11       ` Johannes Sixt
@ 2013-03-23  0:38         ` Vadim Zeitlin
  2013-03-23  8:31           ` Andreas Schwab
  0 siblings, 1 reply; 9+ messages in thread
From: Vadim Zeitlin @ 2013-03-23  0:38 UTC (permalink / raw)
  To: git

Johannes Sixt <j6t <at> kdbg.org> writes:

> > I also wonder if 
> > 
> > 	label :
> > 
> > should also be caught, or is it too weird format to be worth
> > supporting?
> 
> It's easy to support, by inserting another [ \t] before the first colon.
> So, why not?

 This is really nitpicking, but if we do it, then it should be "[ \t]*". And the
"*" after the label should actually be a "+". So the full line becomes


  "!^[ \t]*[A-Za-z_][A-Za-z_0-9]+[ \t]*:([^:]|$)\n"


 But then I've never actually seen git putting labels incorrectly into the hunk
headers while I did see the problem this patch tries to fix, with wrong method
appearing in the header because the correct one was skipped due to this ignore
regex, quite a few times in the past.

 Regards,
VZ

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Avoid false positives in label detection in cpp diff hunk header regex.
  2013-03-23  0:38         ` Vadim Zeitlin
@ 2013-03-23  8:31           ` Andreas Schwab
  2013-03-23  9:48             ` Vadim Zeitlin
  0 siblings, 1 reply; 9+ messages in thread
From: Andreas Schwab @ 2013-03-23  8:31 UTC (permalink / raw)
  To: Vadim Zeitlin; +Cc: git

Vadim Zeitlin <vz-git@zeitlins.org> writes:

>   "!^[ \t]*[A-Za-z_][A-Za-z_0-9]+[ \t]*:([^:]|$)\n"

That would fail to match single-character identifiers.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Avoid false positives in label detection in cpp diff hunk header regex.
  2013-03-23  8:31           ` Andreas Schwab
@ 2013-03-23  9:48             ` Vadim Zeitlin
  0 siblings, 0 replies; 9+ messages in thread
From: Vadim Zeitlin @ 2013-03-23  9:48 UTC (permalink / raw)
  To: git

Andreas Schwab <schwab <at> linux-m68k.org> writes:

> Vadim Zeitlin <vz-git <at> zeitlins.org> writes:
> 
> >   "!^[ \t]*[A-Za-z_][A-Za-z_0-9]+[ \t]*:([^:]|$)\n"
> 
> That would fail to match single-character identifiers.

 Oops, yes, you're right, of course, sorry. I have no idea why did I write
that we needed to change this "*" to "+", the only explanation I see is that
it was simply too late at night when I did it. So the final version of the
exclusion regex is

	"!^[ \t]*[A-Za-z_][A-Za-z_0-9]*[ \t]*:([^:]|$)\n"


 But I feel like I'm still missing something about what is going on here.
Because after looking carefully at the (positive) regex for matching function
and method names, which is

	"^([A-Za-z_][A-Za-z_0-9]*([ \t*]+[A-Za-z_][A-Za-z_0-9]*"
	"([ \t]*::[ \t]*[^[:space:]]+)?){1,}[ \t]*\\([^;]*)$\n"

(split over 2 lines for readability), I actually don't understand how does it
manage to match my declaration. Yet match it does, I do get

@@ -438,6 +438,10 @@ firebird_statement_backend::execute(int number)

in my diff. But how is this possible? The "[ \t*]+" part has nowhere to match
but between "int" and "number" but it can't match there because there must be
only alphanumeric characters before it. Yet, not only it does match but if I
test with GNU grep -E, it matches too (after replacing "\\(" with just "\("
and removing "\n"). However if I test with perl or "sed -r", it does *not*
match. Can anyone see what's going on here?


 FWIW I've started looking into this because I thought that the current
regex wouldn't detect something like

	foo::nested_type foo::method()

as a start of a method. However it does detect this just fine as well which
I can't understand at all. I'm out of lame excuses (it's not too late here
yet...) so I just hope that I'm missing something about the way Git creates
hunk headers and not some obvious problem with the regex itself because
I've been staring at it for half an hour but still can't see how does it
manage to match here. Could anyone who does see it please explain?

 Thanks in advance,
VZ

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-03-23  9:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-22 13:43 [PATCH] Avoid false positives in label detection in cpp diff hunk header regex Vadim Zeitlin
2013-03-22 15:02 ` Junio C Hamano
2013-03-22 17:27   ` Vadim Zeitlin
2013-03-22 21:55   ` Johannes Sixt
2013-03-22 22:32     ` Junio C Hamano
2013-03-22 23:11       ` Johannes Sixt
2013-03-23  0:38         ` Vadim Zeitlin
2013-03-23  8:31           ` Andreas Schwab
2013-03-23  9:48             ` Vadim Zeitlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).