git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Thomas Rast <tr@thomasrast.ch>
Subject: [RFC/PATCH] diff: simplify cpp funcname regex
Date: Tue, 4 Mar 2014 19:36:40 -0500	[thread overview]
Message-ID: <20140305003639.GA9474@sigill.intra.peff.net> (raw)

The current funcname matcher for C files requires one or
more words before the function name, like:

  static int foo(int arg)
  {

However, some coding styles look like this:

  static int
  foo(int arg)
  {

and we do not match, even though the default regex would.

This patch simplifies the regex significantly. Rather than
trying to handle all the variants of keywords and return
types, we simply look for an identifier at the start of the
line that contains a "(", meaning it is either a function
definition or a function call, and then not containing ";"
which would indicate it is a call or declaration.

This can be fooled by something like:

    if (foo)

but it is unlikely to find that at the very start of a line
with no identation (and without having a complete set of
keywords, we cannot distinguish that from a function called
"if" taking "foo" anyway).

We had no tests at all for .c files, so this attempts to add
a few obvious cases (including the one we are fixing here).

Signed-off-by: Jeff King <peff@peff.net>
---
I tried accommodating this one case in the current regex, but it just
kept getting more complicated and unreadable. Maybe I am being naive to
think that this much simpler version can work. We don't have a lot of
tests or a known-good data set to check.

I did try "git log -1000 -p" before and after on git.git, diff'd the
results and manually inspected. The results were a mix of strict
improvement (mostly instances of the style I was trying to fix here) and
cases where there is no good answer. For example, for top-level changes
outside functions, we might find:

  N_("some text that is long"

that is part of:

  const char *foo =
  N_("some text that is long"
  "and spans multiple lines");

Before this change, we would skip past it (using the cpp regex, that is;
the default one tends to find the same line) and either report nothing,
or whatever random function was before us. So it's a behavior change,
but the existing behavior is really no better.

 t/t4018-diff-funcname.sh | 36 ++++++++++++++++++++++++++++++++++++
 userdiff.c               |  2 +-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 38a092a..1e80b15 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -93,6 +93,29 @@ sed -e '
 	s/song;/song();/
 ' <Beer.perl >Beer-correct.perl
 
+cat >Beer.c <<\EOF
+static int foo(void)
+{
+label:
+	int x = old;
+}
+
+struct foo; /* catch failure below */
+static int
+gnu(int arg)
+{
+	int x = old;
+}
+
+struct foo; /* catch failure below */
+int multiline(int arg,
+	      char *arg2)
+{
+	int x = old;
+}
+EOF
+sed s/old/new/ <Beer.c >Beer-correct.c
+
 test_expect_funcname () {
 	lang=${2-java}
 	test_expect_code 1 git diff --no-index -U1 \
@@ -127,6 +150,7 @@ test_expect_success 'set up .gitattributes declaring drivers to test' '
 	cat >.gitattributes <<-\EOF
 	*.java diff=java
 	*.perl diff=perl
+	*.c diff=cpp
 	EOF
 '
 
@@ -158,6 +182,18 @@ test_expect_success 'perl pattern is not distracted by forward declaration' '
 	test_expect_funcname "package Beer;\$" perl
 '
 
+test_expect_success 'c pattern skips labels' '
+	test_expect_funcname "static int foo(void)" c
+'
+
+test_expect_success 'c pattern matches GNU-style functions' '
+	test_expect_funcname "gnu(int arg)\$" c
+'
+
+test_expect_success 'c pattern matches multiline functions' '
+	test_expect_funcname "int multiline(int arg,\$" c
+'
+
 test_expect_success 'custom pattern' '
 	test_config diff.java.funcname "!static
 !String
diff --git a/userdiff.c b/userdiff.c
index 10b61ec..b9d52b7 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -127,7 +127,7 @@
 	 /* Jump targets or access declarations */
 	 "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
 	 /* C/++ functions/methods at top level */
-	 "^([A-Za-z_][A-Za-z_0-9]*([ \t*]+[A-Za-z_][A-Za-z_0-9]*([ \t]*::[ \t]*[^[:space:]]+)?){1,}[ \t]*\\([^;]*)$\n"
+	 "^([A-Za-z_].*\\([^;]*)$\n"
 	 /* compound type at top level */
 	 "^((struct|class|enum)[^;]*)$",
 	 /* -- */
-- 
1.8.5.2.500.g8060133

             reply	other threads:[~2014-03-05  0:36 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-05  0:36 Jeff King [this message]
2014-03-05  7:58 ` [RFC/PATCH] diff: simplify cpp funcname regex Johannes Sixt
2014-03-06 21:28   ` Jeff King
2014-03-07  7:23     ` Johannes Sixt
2014-03-14  3:54       ` Jeff King
2014-03-14  6:56         ` Johannes Sixt
2014-03-18  5:24           ` Jeff King
2014-03-18  8:02             ` Johannes Sixt
2014-03-18 11:00               ` René Scharfe
2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
2014-03-21 21:07                   ` [PATCH 01/10] userdiff: support C++ ->* and .* operators in the word regexp Johannes Sixt
2014-03-21 21:07                   ` [PATCH 02/10] userdiff: support unsigned and long long suffixes of integer constants Johannes Sixt
2014-03-21 21:07                   ` [PATCH 03/10] t4018: an infrastructure to test hunk headers Johannes Sixt
2014-03-21 22:00                     ` Junio C Hamano
2014-03-22  6:56                       ` Johannes Sixt
2014-03-23 19:36                         ` Junio C Hamano
2014-03-24 21:36                     ` Jeff King
2014-03-24 21:39                       ` Jeff King
2014-03-25 20:07                         ` Johannes Sixt
2014-03-25 21:42                           ` Jeff King
2014-03-21 21:07                   ` [PATCH 04/10] t4018: convert perl pattern tests to the new infrastructure Johannes Sixt
2014-03-21 21:07                   ` [PATCH 05/10] t4018: convert java pattern test " Johannes Sixt
2014-03-21 21:07                   ` [PATCH 06/10] t4018: convert custom " Johannes Sixt
2014-03-21 21:07                   ` [PATCH 07/10] t4018: reduce test files for pattern compilation tests Johannes Sixt
2014-03-21 21:07                   ` [PATCH 08/10] t4018: test cases for the built-in cpp pattern Johannes Sixt
2014-03-21 21:07                   ` [PATCH 09/10] t4018: test cases showing that the cpp pattern misses many anchor points Johannes Sixt
2014-03-21 21:07                   ` [PATCH 10/10] userdiff: have 'cpp' hunk header pattern catch more C++ " Johannes Sixt
2014-03-21 22:25                   ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Junio C Hamano
2014-03-24 21:49                   ` Jeff King
2014-03-05 20:31 ` [RFC/PATCH] diff: simplify cpp funcname regex Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140305003639.GA9474@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=tr@thomasrast.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).