From: Jeff King <peff@peff.net>
To: Johannes Sixt <j.sixt@viscovery.net>
Cc: git@vger.kernel.org, Thomas Rast <tr@thomasrast.ch>
Subject: Re: [RFC/PATCH] diff: simplify cpp funcname regex
Date: Thu, 6 Mar 2014 16:28:36 -0500 [thread overview]
Message-ID: <20140306212835.GA11743@sigill.intra.peff.net> (raw)
In-Reply-To: <5316D922.9010501@viscovery.net>
On Wed, Mar 05, 2014 at 08:58:26AM +0100, Johannes Sixt wrote:
> Here is a patch that I'm carrying around since... a while.
> What do you think?
>
> The pattern I chose also catches variable definition, not just
> functions. That is what I need, but it hurts grep --function-context
> That's the reason I didn't forward the patch, yet.
If by variable definition you mean:
struct foo bar = {
- old
+ new
};
I'd think that would be covered by the existing "struct|class|enum".
Though I think we'd want to also allow keywords in front of it, like
"static". I suspect the original was more meant to find:
struct foo {
-old
+new
};
> The parts of the pattern have the following flaws:
>
> - The first part matches an identifier followed immediately by a colon and
> arbitrary text and is intended to reject goto labels and C++ access
> specifiers (public, private, protected). But this pattern also rejects
> C++ constructs, which look like this:
>
> MyClass::MyClass()
> MyClass::~MyClass()
> MyClass::Item MyClass::Find(...
Makes sense. I noticed your fix is to look for end-of-line or comments
afterwards. Would it be simpler to just check for a non-colon, like:
!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:($|[^:])
> - The second part matches an identifier followed by a list of qualified
> names (i.e. identifiers separated by the C++ scope operator '::')
> [...]
A tried to keep the "looks like a function definition" bit in mine, and
yours loosens this quite a bit more. I think that may be OK. That is, I
do not think there is any reason for somebody to do:
void foo() {
call_to_bar();
-old
+new
}
That is, nobody would put a function _call_ without indentation. If
something has alphanumerics at the left-most column, then it is probably
interesting no matter what.
> - The third part of the pattern finally matches compound definitions. But
> it forgets about unions and namespaces, and also skips single-line
> definitions
>
> struct random_iterator_tag {};
>
> because no semicolon can occur on the line.
I don't see how that is an interesting line. The point is to find a
block that is surrounding the changes, but that is not surrounding
the lines below.
> Notice that all interesting anchor points begin with an identifier or
> keyword. But since there is a large variety of syntactical constructs after
> the first "word", the simplest is to require only this word and accept
> everything else. Therefore, this boils down to a line that begins with a
> letter or underscore (optionally preceded by the C++ scope operator '::'
> to accept functions returning a type anchored at the global namespace).
> Replace the second and third part by a single pattern that picks such a
> line.
Yeah, this bit makes sense to me.
Both yours and mine will find the first line here in things like:
void foo(void);
-void bar(void);
+void bar(int arg);
but I think that is OK. There _isn't_ any interesting surrounding
context here. The current code will sometimes come up with an empty
funcline (which is good), but it may just as easily come up with a
totally bogus funcline in a case like:
void unrelated(void)
{
}
void foo(void);
-void bar(void);
+void bar(int arg);
So trying to be very restrictive and say "that doesn't look like a
function" does not really buy us anything (and it creates tons of false
negatives, as you documented, because C++ syntax has all kinds of crazy
stuff).
_If_ the backwards search learned to terminate (e.g., seeing the "^}"
line and saying "well, we can't be inside a function"), then such
negative lines might be useful for coming up with an empty funcname
rather than the bogus "void foo(void);". But we do not do that
currently, and I do not think it is that useful (the funcname above is
arguably just as or more useful than an empty one).
-Peff
next prev parent reply other threads:[~2014-03-06 21:28 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-05 0:36 [RFC/PATCH] diff: simplify cpp funcname regex Jeff King
2014-03-05 7:58 ` Johannes Sixt
2014-03-06 21:28 ` Jeff King [this message]
2014-03-07 7:23 ` Johannes Sixt
2014-03-14 3:54 ` Jeff King
2014-03-14 6:56 ` Johannes Sixt
2014-03-18 5:24 ` Jeff King
2014-03-18 8:02 ` Johannes Sixt
2014-03-18 11:00 ` René Scharfe
2014-03-21 21:07 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
2014-03-21 21:07 ` [PATCH 01/10] userdiff: support C++ ->* and .* operators in the word regexp Johannes Sixt
2014-03-21 21:07 ` [PATCH 02/10] userdiff: support unsigned and long long suffixes of integer constants Johannes Sixt
2014-03-21 21:07 ` [PATCH 03/10] t4018: an infrastructure to test hunk headers Johannes Sixt
2014-03-21 22:00 ` Junio C Hamano
2014-03-22 6:56 ` Johannes Sixt
2014-03-23 19:36 ` Junio C Hamano
2014-03-24 21:36 ` Jeff King
2014-03-24 21:39 ` Jeff King
2014-03-25 20:07 ` Johannes Sixt
2014-03-25 21:42 ` Jeff King
2014-03-21 21:07 ` [PATCH 04/10] t4018: convert perl pattern tests to the new infrastructure Johannes Sixt
2014-03-21 21:07 ` [PATCH 05/10] t4018: convert java pattern test " Johannes Sixt
2014-03-21 21:07 ` [PATCH 06/10] t4018: convert custom " Johannes Sixt
2014-03-21 21:07 ` [PATCH 07/10] t4018: reduce test files for pattern compilation tests Johannes Sixt
2014-03-21 21:07 ` [PATCH 08/10] t4018: test cases for the built-in cpp pattern Johannes Sixt
2014-03-21 21:07 ` [PATCH 09/10] t4018: test cases showing that the cpp pattern misses many anchor points Johannes Sixt
2014-03-21 21:07 ` [PATCH 10/10] userdiff: have 'cpp' hunk header pattern catch more C++ " Johannes Sixt
2014-03-21 22:25 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Junio C Hamano
2014-03-24 21:49 ` Jeff King
2014-03-05 20:31 ` [RFC/PATCH] diff: simplify cpp funcname regex Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140306212835.GA11743@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=j.sixt@viscovery.net \
--cc=tr@thomasrast.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).