git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Sixt <j.sixt@viscovery.net>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Thomas Rast <tr@thomasrast.ch>
Subject: Re: [RFC/PATCH] diff: simplify cpp funcname regex
Date: Fri, 07 Mar 2014 08:23:05 +0100	[thread overview]
Message-ID: <531973D9.9070803@viscovery.net> (raw)
In-Reply-To: <20140306212835.GA11743@sigill.intra.peff.net>

Am 3/6/2014 22:28, schrieb Jeff King:
> On Wed, Mar 05, 2014 at 08:58:26AM +0100, Johannes Sixt wrote:
>> The pattern I chose also catches variable definition, not just
>> functions. That is what I need, but it hurts grep --function-context
>> That's the reason I didn't forward the patch, yet.
> 
> If by variable definition you mean:
> 
>    struct foo bar = {
>   -       old
>   +       new
>    };
> 
> I'd think that would be covered by the existing "struct|class|enum".
> Though I think we'd want to also allow keywords in front of it, like
> "static". I suspect the original was more meant to find:
> 
>    struct foo {
>   -old
>   +new
>    };

No, I meant lines like

    static double var;
   -static int old;
   +static int new;

The motivation is to show hints where in a file the change is located:
Anything that could be of significance for the author should be picked up.

But that does not necessarily help grep --function-context. For example,
when there is a longish block of global variable definitions and there is
a match in the middle, then --function-context would provide no context
because the line itself would be regarded as the beginning of a
"function", i.e., the context, and the next line (which also matches the
pattern) would be the beginning of the *next* function, and would not be
in the context anymore.

> 
>> The parts of the pattern have the following flaws:
>>
>> - The first part matches an identifier followed immediately by a colon and
>>   arbitrary text and is intended to reject goto labels and C++ access
>>   specifiers (public, private, protected). But this pattern also rejects
>>   C++ constructs, which look like this:
>>
>>     MyClass::MyClass()
>>     MyClass::~MyClass()
>>     MyClass::Item MyClass::Find(...
> 
> Makes sense. I noticed your fix is to look for end-of-line or comments
> afterwards.  Would it be simpler to just check for a non-colon, like:
> 
>   !^[ \t]*[A-Za-z_][A-Za-z_0-9]*:($|[^:])

I want to match [[:space:]] after the label's colon, because I have lot's
of C++ files with CRLF, and I need to match the CR. Your more liberal
pattern would fit as well, although it would pick up a bit field as in

   struct foo {
      unsigned
        flag: 1;
   -old
   +new

I would not mind ignoring this case ("garbage in, garbage out" ;-).

>> - The second part matches an identifier followed by a list of qualified
>>   names (i.e. identifiers separated by the C++ scope operator '::')
>> [...]
> 
> A tried to keep the "looks like a function definition" bit in mine, and
> yours loosens this quite a bit more. I think that may be OK. That is, I
> do not think there is any reason for somebody to do:
> 
>     void foo() {
>     call_to_bar();
>    -old
>    +new
>     }
> 
> That is, nobody would put a function _call_ without indentation. If
> something has alphanumerics at the left-most column, then it is probably
> interesting no matter what.
> 
>> - The third part of the pattern finally matches compound definitions. But
>>   it forgets about unions and namespaces, and also skips single-line
>>   definitions
>>
>>     struct random_iterator_tag {};
>>
>>   because no semicolon can occur on the line.
> 
> I don't see how that is an interesting line. The point is to find a
> block that is surrounding the changes, but that is not surrounding
> the lines below.

I more often than not want to have an answer to the question "where?", not
"wherein?" Then anything that helps locate a hunk is useful.

(The particular example, an empty struct, looks strange for C programmers,
of course, but it's a common idiom in C++ when it comes to template
meta-programming.)

>> Notice that all interesting anchor points begin with an identifier or
>> keyword. But since there is a large variety of syntactical constructs after
>> the first "word", the simplest is to require only this word and accept
>> everything else. Therefore, this boils down to a line that begins with a
>> letter or underscore (optionally preceded by the C++ scope operator '::'
>> to accept functions returning a type anchored at the global namespace).
>> Replace the second and third part by a single pattern that picks such a
>> line.
> 
> Yeah, this bit makes sense to me.
> 
> Both yours and mine will find the first line here in things like:
> 
>    void foo(void);
>   -void bar(void);
>   +void bar(int arg);
> 
> but I think that is OK. There _isn't_ any interesting surrounding
> context here. The current code will sometimes come up with an empty
> funcline (which is good), but it may just as easily come up with a
> totally bogus funcline in a case like:
> 
>    void unrelated(void)
>    {
>    }
> 
>    void foo(void);
>   -void bar(void);
>   +void bar(int arg);
> 
> So trying to be very restrictive and say "that doesn't look like a
> function" does not really buy us anything (and it creates tons of false
> negatives, as you documented, because C++ syntax has all kinds of crazy
> stuff).
> 
> _If_ the backwards search learned to terminate (e.g., seeing the "^}"
> line and saying "well, we can't be inside a function"), then such
> negative lines might be useful for coming up with an empty funcname
> rather than the bogus "void foo(void);". But we do not do that
> currently, and I do not think it is that useful (the funcname above is
> arguably just as or more useful than an empty one).

As I said, my motivation is not so much to find a "container", but rather
a clue to help locate a change while reading the patch text. I can speak
for myself, but I have no idea what is more important for the majority.

-- Hannes

  reply	other threads:[~2014-03-07  7:23 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-05  0:36 [RFC/PATCH] diff: simplify cpp funcname regex Jeff King
2014-03-05  7:58 ` Johannes Sixt
2014-03-06 21:28   ` Jeff King
2014-03-07  7:23     ` Johannes Sixt [this message]
2014-03-14  3:54       ` Jeff King
2014-03-14  6:56         ` Johannes Sixt
2014-03-18  5:24           ` Jeff King
2014-03-18  8:02             ` Johannes Sixt
2014-03-18 11:00               ` René Scharfe
2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
2014-03-21 21:07                   ` [PATCH 01/10] userdiff: support C++ ->* and .* operators in the word regexp Johannes Sixt
2014-03-21 21:07                   ` [PATCH 02/10] userdiff: support unsigned and long long suffixes of integer constants Johannes Sixt
2014-03-21 21:07                   ` [PATCH 03/10] t4018: an infrastructure to test hunk headers Johannes Sixt
2014-03-21 22:00                     ` Junio C Hamano
2014-03-22  6:56                       ` Johannes Sixt
2014-03-23 19:36                         ` Junio C Hamano
2014-03-24 21:36                     ` Jeff King
2014-03-24 21:39                       ` Jeff King
2014-03-25 20:07                         ` Johannes Sixt
2014-03-25 21:42                           ` Jeff King
2014-03-21 21:07                   ` [PATCH 04/10] t4018: convert perl pattern tests to the new infrastructure Johannes Sixt
2014-03-21 21:07                   ` [PATCH 05/10] t4018: convert java pattern test " Johannes Sixt
2014-03-21 21:07                   ` [PATCH 06/10] t4018: convert custom " Johannes Sixt
2014-03-21 21:07                   ` [PATCH 07/10] t4018: reduce test files for pattern compilation tests Johannes Sixt
2014-03-21 21:07                   ` [PATCH 08/10] t4018: test cases for the built-in cpp pattern Johannes Sixt
2014-03-21 21:07                   ` [PATCH 09/10] t4018: test cases showing that the cpp pattern misses many anchor points Johannes Sixt
2014-03-21 21:07                   ` [PATCH 10/10] userdiff: have 'cpp' hunk header pattern catch more C++ " Johannes Sixt
2014-03-21 22:25                   ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Junio C Hamano
2014-03-24 21:49                   ` Jeff King
2014-03-05 20:31 ` [RFC/PATCH] diff: simplify cpp funcname regex Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=531973D9.9070803@viscovery.net \
    --to=j.sixt@viscovery.net \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=tr@thomasrast.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).