git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCH] diff: simplify cpp funcname regex
@ 2014-03-05  0:36 Jeff King
  2014-03-05  7:58 ` Johannes Sixt
  2014-03-05 20:31 ` [RFC/PATCH] diff: simplify cpp funcname regex Junio C Hamano
  0 siblings, 2 replies; 30+ messages in thread
From: Jeff King @ 2014-03-05  0:36 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast

The current funcname matcher for C files requires one or
more words before the function name, like:

  static int foo(int arg)
  {

However, some coding styles look like this:

  static int
  foo(int arg)
  {

and we do not match, even though the default regex would.

This patch simplifies the regex significantly. Rather than
trying to handle all the variants of keywords and return
types, we simply look for an identifier at the start of the
line that contains a "(", meaning it is either a function
definition or a function call, and then not containing ";"
which would indicate it is a call or declaration.

This can be fooled by something like:

    if (foo)

but it is unlikely to find that at the very start of a line
with no identation (and without having a complete set of
keywords, we cannot distinguish that from a function called
"if" taking "foo" anyway).

We had no tests at all for .c files, so this attempts to add
a few obvious cases (including the one we are fixing here).

Signed-off-by: Jeff King <peff@peff.net>
---
I tried accommodating this one case in the current regex, but it just
kept getting more complicated and unreadable. Maybe I am being naive to
think that this much simpler version can work. We don't have a lot of
tests or a known-good data set to check.

I did try "git log -1000 -p" before and after on git.git, diff'd the
results and manually inspected. The results were a mix of strict
improvement (mostly instances of the style I was trying to fix here) and
cases where there is no good answer. For example, for top-level changes
outside functions, we might find:

  N_("some text that is long"

that is part of:

  const char *foo =
  N_("some text that is long"
  "and spans multiple lines");

Before this change, we would skip past it (using the cpp regex, that is;
the default one tends to find the same line) and either report nothing,
or whatever random function was before us. So it's a behavior change,
but the existing behavior is really no better.

 t/t4018-diff-funcname.sh | 36 ++++++++++++++++++++++++++++++++++++
 userdiff.c               |  2 +-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 38a092a..1e80b15 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -93,6 +93,29 @@ sed -e '
 	s/song;/song();/
 ' <Beer.perl >Beer-correct.perl
 
+cat >Beer.c <<\EOF
+static int foo(void)
+{
+label:
+	int x = old;
+}
+
+struct foo; /* catch failure below */
+static int
+gnu(int arg)
+{
+	int x = old;
+}
+
+struct foo; /* catch failure below */
+int multiline(int arg,
+	      char *arg2)
+{
+	int x = old;
+}
+EOF
+sed s/old/new/ <Beer.c >Beer-correct.c
+
 test_expect_funcname () {
 	lang=${2-java}
 	test_expect_code 1 git diff --no-index -U1 \
@@ -127,6 +150,7 @@ test_expect_success 'set up .gitattributes declaring drivers to test' '
 	cat >.gitattributes <<-\EOF
 	*.java diff=java
 	*.perl diff=perl
+	*.c diff=cpp
 	EOF
 '
 
@@ -158,6 +182,18 @@ test_expect_success 'perl pattern is not distracted by forward declaration' '
 	test_expect_funcname "package Beer;\$" perl
 '
 
+test_expect_success 'c pattern skips labels' '
+	test_expect_funcname "static int foo(void)" c
+'
+
+test_expect_success 'c pattern matches GNU-style functions' '
+	test_expect_funcname "gnu(int arg)\$" c
+'
+
+test_expect_success 'c pattern matches multiline functions' '
+	test_expect_funcname "int multiline(int arg,\$" c
+'
+
 test_expect_success 'custom pattern' '
 	test_config diff.java.funcname "!static
 !String
diff --git a/userdiff.c b/userdiff.c
index 10b61ec..b9d52b7 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -127,7 +127,7 @@
 	 /* Jump targets or access declarations */
 	 "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
 	 /* C/++ functions/methods at top level */
-	 "^([A-Za-z_][A-Za-z_0-9]*([ \t*]+[A-Za-z_][A-Za-z_0-9]*([ \t]*::[ \t]*[^[:space:]]+)?){1,}[ \t]*\\([^;]*)$\n"
+	 "^([A-Za-z_].*\\([^;]*)$\n"
 	 /* compound type at top level */
 	 "^((struct|class|enum)[^;]*)$",
 	 /* -- */
-- 
1.8.5.2.500.g8060133

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCH] diff: simplify cpp funcname regex
  2014-03-05  0:36 [RFC/PATCH] diff: simplify cpp funcname regex Jeff King
@ 2014-03-05  7:58 ` Johannes Sixt
  2014-03-06 21:28   ` Jeff King
  2014-03-05 20:31 ` [RFC/PATCH] diff: simplify cpp funcname regex Junio C Hamano
  1 sibling, 1 reply; 30+ messages in thread
From: Johannes Sixt @ 2014-03-05  7:58 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Thomas Rast

Am 3/5/2014 1:36, schrieb Jeff King:
> The current funcname matcher for C files requires one or
> more words before the function name, like:
> 
>   static int foo(int arg)
>   {
> 
> However, some coding styles look like this:
> 
>   static int
>   foo(int arg)
>   {
> 
> and we do not match, even though the default regex would.
> 
> This patch simplifies the regex significantly. Rather than
> trying to handle all the variants of keywords and return
> types, we simply look for an identifier at the start of the
> line that contains a "(", meaning it is either a function
> definition or a function call, and then not containing ";"
> which would indicate it is a call or declaration.

Here is a patch that I'm carrying around since... a while.
What do you think?

The pattern I chose also catches variable definition, not just
functions. That is what I need, but it hurts grep --function-context
That's the reason I didn't forward the patch, yet.

--- 8< ---
From: Johannes Sixt <j6t@kdbg.org>
Date: Tue, 25 Sep 2012 14:08:02 +0200
Subject: [PATCH] userdiff: have 'cpp' hunk header pattern catch more C++ anchor points

The hunk header pattern 'cpp' is intended for C and C++ source code, but
it is actually not very useful for the latter, and even hurts some
use-cases for the former.

The parts of the pattern have the following flaws:

- The first part matches an identifier followed immediately by a colon and
  arbitrary text and is intended to reject goto labels and C++ access
  specifiers (public, private, protected). But this pattern also rejects
  C++ constructs, which look like this:

    MyClass::MyClass()
    MyClass::~MyClass()
    MyClass::Item MyClass::Find(...

- The second part matches an identifier followed by a list of qualified
  names (i.e. identifiers separated by the C++ scope operator '::')
  separated by space or '*' followed by an opening parenthesis (with space
  between the tokens). It matches function declarations like

    struct item* get_head(...
    int Outer::Inner::Func(...

  Since the pattern requires at least two identifiers, GNU-style function
  definitions are ignored:

    void
    func(...

  Moreover, since the pattern does not allow punctuation other than '*',
  the following C++ constructs are not recognized:

  . template definitions:
      template<class T> int func(T arg)

  . functions returning references:
      const string& get_message()

  . functions returning templated types:
      vector<int> foo()

  . operator definitions:
      Value operator+(Value l, Value r)

- The third part of the pattern finally matches compound definitions. But
  it forgets about unions and namespaces, and also skips single-line
  definitions

    struct random_iterator_tag {};

  because no semicolon can occur on the line.

Change the first pattern to require a colon at the end of the line (except
for trailing space and comments), so that it does not reject constructor
or destructor definitions.

Notice that all interesting anchor points begin with an identifier or
keyword. But since there is a large variety of syntactical constructs after
the first "word", the simplest is to require only this word and accept
everything else. Therefore, this boils down to a line that begins with a
letter or underscore (optionally preceded by the C++ scope operator '::'
to accept functions returning a type anchored at the global namespace).
Replace the second and third part by a single pattern that picks such a
line.

This has the following desirable consequence:

- All constructs mentioned above are recognized.

and the following likely desirable consequences:

- Definitions of global variables and typedefs are recognized:

    int num_entries = 0;
    extern const char* help_text;
    typedef basic_string<wchar_t> wstring;

- Commonly used marco-ized boilerplate code is recognized:

    BEGIN_MESSAGE_MAP(CCanvas,CWnd)
    Q_DECLARE_METATYPE(MyStruct)
    PATTERNS("tex",...)

  (The last one is from this very patch.)

but also the following possibly undesirable consequence:

- When a label is not on a line by itself (except for a comment) it is no
  longer rejected, but can appear as a hunk header if it occurs at the
  beginning of a line:

    next:;

IMO, the benefits of the change outweigh the (possible) regressions by a
large margin.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 userdiff.c                                 | 8 +++-----
 13 files changed, 3 insertions(+), 17 deletions(-)

diff --git a/userdiff.c b/userdiff.c
index ed958ef..49b2094 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -125,11 +125,9 @@ PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
 	 "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"),
 PATTERNS("cpp",
 	 /* Jump targets or access declarations */
-	 "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
-	 /* C/++ functions/methods at top level */
-	 "^([A-Za-z_][A-Za-z_0-9]*([ \t*]+[A-Za-z_][A-Za-z_0-9]*([ \t]*::[ \t]*[^[:space:]]+)?){1,}[ \t]*\\([^;]*)$\n"
-	 /* compound type at top level */
-	 "^((struct|class|enum)[^;]*)$",
+	 "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])\n"
+	 /* functions/methods, variables, and compounds at top level */
+	 "^((::[[:space:]]*)?[A-Za-z_].*)$",
 	 /* -- */
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
-- 
1.9.0.1398.g59a4f1b

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCH] diff: simplify cpp funcname regex
  2014-03-05  0:36 [RFC/PATCH] diff: simplify cpp funcname regex Jeff King
  2014-03-05  7:58 ` Johannes Sixt
@ 2014-03-05 20:31 ` Junio C Hamano
  1 sibling, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2014-03-05 20:31 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Thomas Rast

Jeff King <peff@peff.net> writes:

> types, we simply look for an identifier at the start of the
> line that contains a "(", meaning it is either a function
> definition or a function call, and then not containing ";"
> which would indicate it is a call or declaration.

It is not worth worrying about:

foo(arg,
    another);

that is not indented, so I think that simplicity is good.

> For example, for top-level changes
> outside functions, we might find:
>
>   N_("some text that is long"
>
> that is part of:
>
>   const char *foo =
>   N_("some text that is long"
>   "and spans multiple lines");

Unfortunate, but cannot be avoided.

>
> Before this change, we would skip past it (using the cpp regex, that is;
> the default one tends to find the same line) and either report nothing,
> or whatever random function was before us. So it's a behavior change,
> but the existing behavior is really no better.

True.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCH] diff: simplify cpp funcname regex
  2014-03-05  7:58 ` Johannes Sixt
@ 2014-03-06 21:28   ` Jeff King
  2014-03-07  7:23     ` Johannes Sixt
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-03-06 21:28 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Thomas Rast

On Wed, Mar 05, 2014 at 08:58:26AM +0100, Johannes Sixt wrote:

> Here is a patch that I'm carrying around since... a while.
> What do you think?
> 
> The pattern I chose also catches variable definition, not just
> functions. That is what I need, but it hurts grep --function-context
> That's the reason I didn't forward the patch, yet.

If by variable definition you mean:

   struct foo bar = {
  -       old
  +       new
   };

I'd think that would be covered by the existing "struct|class|enum".
Though I think we'd want to also allow keywords in front of it, like
"static". I suspect the original was more meant to find:

   struct foo {
  -old
  +new
   };

> The parts of the pattern have the following flaws:
> 
> - The first part matches an identifier followed immediately by a colon and
>   arbitrary text and is intended to reject goto labels and C++ access
>   specifiers (public, private, protected). But this pattern also rejects
>   C++ constructs, which look like this:
> 
>     MyClass::MyClass()
>     MyClass::~MyClass()
>     MyClass::Item MyClass::Find(...

Makes sense. I noticed your fix is to look for end-of-line or comments
afterwards.  Would it be simpler to just check for a non-colon, like:

  !^[ \t]*[A-Za-z_][A-Za-z_0-9]*:($|[^:])

> - The second part matches an identifier followed by a list of qualified
>   names (i.e. identifiers separated by the C++ scope operator '::')
> [...]

A tried to keep the "looks like a function definition" bit in mine, and
yours loosens this quite a bit more. I think that may be OK. That is, I
do not think there is any reason for somebody to do:

    void foo() {
    call_to_bar();
   -old
   +new
    }

That is, nobody would put a function _call_ without indentation. If
something has alphanumerics at the left-most column, then it is probably
interesting no matter what.

> - The third part of the pattern finally matches compound definitions. But
>   it forgets about unions and namespaces, and also skips single-line
>   definitions
> 
>     struct random_iterator_tag {};
> 
>   because no semicolon can occur on the line.

I don't see how that is an interesting line. The point is to find a
block that is surrounding the changes, but that is not surrounding
the lines below.

> Notice that all interesting anchor points begin with an identifier or
> keyword. But since there is a large variety of syntactical constructs after
> the first "word", the simplest is to require only this word and accept
> everything else. Therefore, this boils down to a line that begins with a
> letter or underscore (optionally preceded by the C++ scope operator '::'
> to accept functions returning a type anchored at the global namespace).
> Replace the second and third part by a single pattern that picks such a
> line.

Yeah, this bit makes sense to me.

Both yours and mine will find the first line here in things like:

   void foo(void);
  -void bar(void);
  +void bar(int arg);

but I think that is OK. There _isn't_ any interesting surrounding
context here. The current code will sometimes come up with an empty
funcline (which is good), but it may just as easily come up with a
totally bogus funcline in a case like:

   void unrelated(void)
   {
   }

   void foo(void);
  -void bar(void);
  +void bar(int arg);

So trying to be very restrictive and say "that doesn't look like a
function" does not really buy us anything (and it creates tons of false
negatives, as you documented, because C++ syntax has all kinds of crazy
stuff).

_If_ the backwards search learned to terminate (e.g., seeing the "^}"
line and saying "well, we can't be inside a function"), then such
negative lines might be useful for coming up with an empty funcname
rather than the bogus "void foo(void);". But we do not do that
currently, and I do not think it is that useful (the funcname above is
arguably just as or more useful than an empty one).

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCH] diff: simplify cpp funcname regex
  2014-03-06 21:28   ` Jeff King
@ 2014-03-07  7:23     ` Johannes Sixt
  2014-03-14  3:54       ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Sixt @ 2014-03-07  7:23 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Thomas Rast

Am 3/6/2014 22:28, schrieb Jeff King:
> On Wed, Mar 05, 2014 at 08:58:26AM +0100, Johannes Sixt wrote:
>> The pattern I chose also catches variable definition, not just
>> functions. That is what I need, but it hurts grep --function-context
>> That's the reason I didn't forward the patch, yet.
> 
> If by variable definition you mean:
> 
>    struct foo bar = {
>   -       old
>   +       new
>    };
> 
> I'd think that would be covered by the existing "struct|class|enum".
> Though I think we'd want to also allow keywords in front of it, like
> "static". I suspect the original was more meant to find:
> 
>    struct foo {
>   -old
>   +new
>    };

No, I meant lines like

    static double var;
   -static int old;
   +static int new;

The motivation is to show hints where in a file the change is located:
Anything that could be of significance for the author should be picked up.

But that does not necessarily help grep --function-context. For example,
when there is a longish block of global variable definitions and there is
a match in the middle, then --function-context would provide no context
because the line itself would be regarded as the beginning of a
"function", i.e., the context, and the next line (which also matches the
pattern) would be the beginning of the *next* function, and would not be
in the context anymore.

> 
>> The parts of the pattern have the following flaws:
>>
>> - The first part matches an identifier followed immediately by a colon and
>>   arbitrary text and is intended to reject goto labels and C++ access
>>   specifiers (public, private, protected). But this pattern also rejects
>>   C++ constructs, which look like this:
>>
>>     MyClass::MyClass()
>>     MyClass::~MyClass()
>>     MyClass::Item MyClass::Find(...
> 
> Makes sense. I noticed your fix is to look for end-of-line or comments
> afterwards.  Would it be simpler to just check for a non-colon, like:
> 
>   !^[ \t]*[A-Za-z_][A-Za-z_0-9]*:($|[^:])

I want to match [[:space:]] after the label's colon, because I have lot's
of C++ files with CRLF, and I need to match the CR. Your more liberal
pattern would fit as well, although it would pick up a bit field as in

   struct foo {
      unsigned
        flag: 1;
   -old
   +new

I would not mind ignoring this case ("garbage in, garbage out" ;-).

>> - The second part matches an identifier followed by a list of qualified
>>   names (i.e. identifiers separated by the C++ scope operator '::')
>> [...]
> 
> A tried to keep the "looks like a function definition" bit in mine, and
> yours loosens this quite a bit more. I think that may be OK. That is, I
> do not think there is any reason for somebody to do:
> 
>     void foo() {
>     call_to_bar();
>    -old
>    +new
>     }
> 
> That is, nobody would put a function _call_ without indentation. If
> something has alphanumerics at the left-most column, then it is probably
> interesting no matter what.
> 
>> - The third part of the pattern finally matches compound definitions. But
>>   it forgets about unions and namespaces, and also skips single-line
>>   definitions
>>
>>     struct random_iterator_tag {};
>>
>>   because no semicolon can occur on the line.
> 
> I don't see how that is an interesting line. The point is to find a
> block that is surrounding the changes, but that is not surrounding
> the lines below.

I more often than not want to have an answer to the question "where?", not
"wherein?" Then anything that helps locate a hunk is useful.

(The particular example, an empty struct, looks strange for C programmers,
of course, but it's a common idiom in C++ when it comes to template
meta-programming.)

>> Notice that all interesting anchor points begin with an identifier or
>> keyword. But since there is a large variety of syntactical constructs after
>> the first "word", the simplest is to require only this word and accept
>> everything else. Therefore, this boils down to a line that begins with a
>> letter or underscore (optionally preceded by the C++ scope operator '::'
>> to accept functions returning a type anchored at the global namespace).
>> Replace the second and third part by a single pattern that picks such a
>> line.
> 
> Yeah, this bit makes sense to me.
> 
> Both yours and mine will find the first line here in things like:
> 
>    void foo(void);
>   -void bar(void);
>   +void bar(int arg);
> 
> but I think that is OK. There _isn't_ any interesting surrounding
> context here. The current code will sometimes come up with an empty
> funcline (which is good), but it may just as easily come up with a
> totally bogus funcline in a case like:
> 
>    void unrelated(void)
>    {
>    }
> 
>    void foo(void);
>   -void bar(void);
>   +void bar(int arg);
> 
> So trying to be very restrictive and say "that doesn't look like a
> function" does not really buy us anything (and it creates tons of false
> negatives, as you documented, because C++ syntax has all kinds of crazy
> stuff).
> 
> _If_ the backwards search learned to terminate (e.g., seeing the "^}"
> line and saying "well, we can't be inside a function"), then such
> negative lines might be useful for coming up with an empty funcname
> rather than the bogus "void foo(void);". But we do not do that
> currently, and I do not think it is that useful (the funcname above is
> arguably just as or more useful than an empty one).

As I said, my motivation is not so much to find a "container", but rather
a clue to help locate a change while reading the patch text. I can speak
for myself, but I have no idea what is more important for the majority.

-- Hannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCH] diff: simplify cpp funcname regex
  2014-03-07  7:23     ` Johannes Sixt
@ 2014-03-14  3:54       ` Jeff King
  2014-03-14  6:56         ` Johannes Sixt
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-03-14  3:54 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Thomas Rast

On Fri, Mar 07, 2014 at 08:23:05AM +0100, Johannes Sixt wrote:

> No, I meant lines like
> 
>     static double var;
>    -static int old;
>    +static int new;
> 
> The motivation is to show hints where in a file the change is located:
> Anything that could be of significance for the author should be picked up.

I see. Coupled with what you said below:

> As I said, my motivation is not so much to find a "container", but rather
> a clue to help locate a change while reading the patch text. I can speak
> for myself, but I have no idea what is more important for the majority.

your proposal makes a lot more sense to me, and I think that is really
at the center of our discussion. I do not have a gut feeling for which
is "more right" (i.e., "container" versus "context").

But given that most of the cases we are discussing are ones where the
"diff -p" default regex gets it right (or at least better than) compared
to the C regex, I am tempted to say that we should be erring in the
direction of simplicity, and just finding interesting lines without
worrying about containers (i.e., it argues for your patch).

> > Makes sense. I noticed your fix is to look for end-of-line or comments
> > afterwards.  Would it be simpler to just check for a non-colon, like:
> > 
> >   !^[ \t]*[A-Za-z_][A-Za-z_0-9]*:($|[^:])
> 
> I want to match [[:space:]] after the label's colon, because I have lot's
> of C++ files with CRLF, and I need to match the CR. Your more liberal
> pattern would fit as well, although it would pick up a bit field as in
> 
>    struct foo {
>       unsigned
>         flag: 1;
>    -old
>    +new

Thanks, I was having trouble thinking of another good use of a colon,
but bitfields are what I was missing. Your pattern is probably better
here.

So I am leaning towards your patch, but I'm not sure if I understand all
of the implications for "git grep". Can you give some concrete examples?

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCH] diff: simplify cpp funcname regex
  2014-03-14  3:54       ` Jeff King
@ 2014-03-14  6:56         ` Johannes Sixt
  2014-03-18  5:24           ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Sixt @ 2014-03-14  6:56 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Thomas Rast

Am 3/14/2014 4:54, schrieb Jeff King:
> On Fri, Mar 07, 2014 at 08:23:05AM +0100, Johannes Sixt wrote:
> 
>> No, I meant lines like
>>
>>     static double var;
>>    -static int old;
>>    +static int new;
>>
>> The motivation is to show hints where in a file the change is located:
>> Anything that could be of significance for the author should be picked up.
> 
> I see. Coupled with what you said below:
> 
>> As I said, my motivation is not so much to find a "container", but rather
>> a clue to help locate a change while reading the patch text. I can speak
>> for myself, but I have no idea what is more important for the majority.
> 
> your proposal makes a lot more sense to me, and I think that is really
> at the center of our discussion. I do not have a gut feeling for which
> is "more right" (i.e., "container" versus "context").
> 
> But given that most of the cases we are discussing are ones where the
> "diff -p" default regex gets it right (or at least better than) compared
> to the C regex, I am tempted to say that we should be erring in the
> direction of simplicity, and just finding interesting lines without
> worrying about containers (i.e., it argues for your patch).

We are in agreement here. So, let's base a decision on the implications on
git grep.

> ... but I'm not sure if I understand all
> of the implications for "git grep". Can you give some concrete examples?

Consider this code:

  void above()
  {}
  static int Y;
  static int A;
  int bar()
  {
    return X;
  }
  void below()
  {}

When you 'git grep --function-context X', then you get this output with
the current pattern, you proposal, and my proposal (file name etc omitted
for brevity):

  int bar()
  {
    return X;
  }

because we start the context at the last hunk header pattern above *or
including the matching line* (and write it in the output), and we stop at
the next hunk header pattern below the match (and do not write it in the
output).

When you 'git grep --function-context Y', what do you want to see? With
the current pattern, and with your pattern that forbids semicolon we get:

  void above()
  {}
  static int Y;
  static int A;

and with my simple pattern, which allows semicolon, we get merely

  static int Y;

because the line itself is a hunk header (and we do not look back any
further) and the next line is as well. That is not exactly "function
context", and that is what I'm a bit worried about.

-- Hannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCH] diff: simplify cpp funcname regex
  2014-03-14  6:56         ` Johannes Sixt
@ 2014-03-18  5:24           ` Jeff King
  2014-03-18  8:02             ` Johannes Sixt
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-03-18  5:24 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Thomas Rast

On Fri, Mar 14, 2014 at 07:56:46AM +0100, Johannes Sixt wrote:

> Consider this code:
> 
>   void above()
>   {}
>   static int Y;
>   static int A;
>   int bar()
>   {
>     return X;
>   }
>   void below()
>   {}

Thanks, this example is very helpful.

> When you 'git grep --function-context X', then you get this output with
> the current pattern, you proposal, and my proposal (file name etc omitted
> for brevity):
> 
>   int bar()
>   {
>     return X;
>   }

Right, that makes sense to me.

> When you 'git grep --function-context Y', what do you want to see? With
> the current pattern, and with your pattern that forbids semicolon we get:
> 
>   void above()
>   {}
>   static int Y;
>   static int A;
> 
> and with my simple pattern, which allows semicolon, we get merely
> 
>   static int Y;
> 
> because the line itself is a hunk header (and we do not look back any
> further) and the next line is as well. That is not exactly "function
> context", and that is what I'm a bit worried about.

Hmm. To be honest, I do not see yours as all that bad. Is "above()" or
"A" actually interesting here? I'm not sure that they are. But then I do
not use --function-context myself.

I guess it violates the "show things that are vaguely nearby, rather
than a container" view of context that we discussed earlier. But somehow
that seems less important to me with "--function-context".

So I dunno. I kind of like your version.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCH] diff: simplify cpp funcname regex
  2014-03-18  5:24           ` Jeff King
@ 2014-03-18  8:02             ` Johannes Sixt
  2014-03-18 11:00               ` René Scharfe
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Sixt @ 2014-03-18  8:02 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Thomas Rast, René Scharfe

Cc René; do you have any comments regarding grep --function-context?

Am 3/18/2014 6:24, schrieb Jeff King:
> On Fri, Mar 14, 2014 at 07:56:46AM +0100, Johannes Sixt wrote:
> 
>> Consider this code:
>>
>>   void above()
>>   {}
>>   static int Y;
>>   static int A;
>>   int bar()
>>   {
>>     return X;
>>   }
>>   void below()
>>   {}
> 
> Thanks, this example is very helpful.
> 
>> When you 'git grep --function-context X', then you get this output with
>> the current pattern, you proposal, and my proposal (file name etc omitted
>> for brevity):
>>
>>   int bar()
>>   {
>>     return X;
>>   }
> 
> Right, that makes sense to me.
> 
>> When you 'git grep --function-context Y', what do you want to see? With
>> the current pattern, and with your pattern that forbids semicolon we get:
>>
>>   void above()
>>   {}
>>   static int Y;
>>   static int A;
>>
>> and with my simple pattern, which allows semicolon, we get merely
>>
>>   static int Y;
>>
>> because the line itself is a hunk header (and we do not look back any
>> further) and the next line is as well. That is not exactly "function
>> context", and that is what I'm a bit worried about.
> 
> Hmm. To be honest, I do not see yours as all that bad. Is "above()" or
> "A" actually interesting here? I'm not sure that they are. But then I do
> not use --function-context myself.
> 
> I guess it violates the "show things that are vaguely nearby, rather
> than a container" view of context that we discussed earlier. But somehow
> that seems less important to me with "--function-context".
> 
> So I dunno. I kind of like your version.

Then I'll polish my patch series (it also rewrites the test framework) and
post it.

-- Hannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCH] diff: simplify cpp funcname regex
  2014-03-18  8:02             ` Johannes Sixt
@ 2014-03-18 11:00               ` René Scharfe
  2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
  0 siblings, 1 reply; 30+ messages in thread
From: René Scharfe @ 2014-03-18 11:00 UTC (permalink / raw)
  To: Johannes Sixt, Jeff King; +Cc: git, Thomas Rast, l.s.r

Am 18.03.2014 09:02, schrieb Johannes Sixt:
> Cc René; do you have any comments regarding grep --function-context?
>
> Am 3/18/2014 6:24, schrieb Jeff King:
>> On Fri, Mar 14, 2014 at 07:56:46AM +0100, Johannes Sixt wrote:
>>
>>> Consider this code:
>>>
>>>    void above()
>>>    {}
>>>    static int Y;
>>>    static int A;
>>>    int bar()
>>>    {
>>>      return X;
>>>    }
>>>    void below()
>>>    {}
>>
>> Thanks, this example is very helpful.
>>
>>> When you 'git grep --function-context X', then you get this output with
>>> the current pattern, you proposal, and my proposal (file name etc omitted
>>> for brevity):
>>>
>>>    int bar()
>>>    {
>>>      return X;
>>>    }
>>
>> Right, that makes sense to me.
>>
>>> When you 'git grep --function-context Y', what do you want to see? With
>>> the current pattern, and with your pattern that forbids semicolon we get:
>>>
>>>    void above()
>>>    {}
>>>    static int Y;
>>>    static int A;
>>>
>>> and with my simple pattern, which allows semicolon, we get merely
>>>
>>>    static int Y;
>>>
>>> because the line itself is a hunk header (and we do not look back any
>>> further) and the next line is as well. That is not exactly "function
>>> context", and that is what I'm a bit worried about.

In global context there is no "function context", of course, so the 
latter makes sense.

"grep --function-context" is about useful context and its implementation 
piggy-backs on the hunk header definitions.  If those are useful then 
the grep output should be fine as well.  IAW: No worries, go ahead. :)

However, I only use the defaults heuristic (which shows just the Y-line 
as well) and don't know C++, so I my opinion on this matter isn't worth 
that much.

René

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 00/10] userdiff: cpp pattern simplification and test framework
  2014-03-18 11:00               ` René Scharfe
@ 2014-03-21 21:07                 ` Johannes Sixt
  2014-03-21 21:07                   ` [PATCH 01/10] userdiff: support C++ ->* and .* operators in the word regexp Johannes Sixt
                                     ` (11 more replies)
  0 siblings, 12 replies; 30+ messages in thread
From: Johannes Sixt @ 2014-03-21 21:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin,
	Johannes Sixt

Here is a series that makes the hunk header pattern for C and C++ even
simpler than suggested by Peff in [1] to catch a lot more C++ functions
and two more C patterns.

As a preparatory work, the test cases are totally rewritten to make it
a lot simpler to drop in new tests. There was an earlier attempt to
change the infrastructure [2], and it is the reason for the widened Cc
list.

Two patches also extend the word regexp.

[1] http://article.gmane.org/gmane.comp.version-control.git/243408
[2] http://thread.gmane.org/gmane.comp.version-control.git/187269/focus=187497

Johannes Sixt (10):
  userdiff: support C++ ->* and .* operators in the word regexp
  userdiff: support unsigned and long long suffixes of integer constants
  t4018: an infrastructure to test hunk headers
  t4018: convert perl pattern tests to the new infrastructure
  t4018: convert java pattern test to the new infrastructure
  t4018: convert custom pattern test to the new infrastructure
  t4018: reduce test files for pattern compilation tests
  t4018: test cases for the built-in cpp pattern
  t4018: test cases showing that the cpp pattern misses many anchor
    points
  userdiff: have 'cpp' hunk header pattern catch more C++ anchor points

 t/t4018-diff-funcname.sh                   | 230 ++++++++++-------------------
 t/t4018/README                             |  18 +++
 t/t4018/cpp-c++-function                   |   4 +
 t/t4018/cpp-class-constructor              |   4 +
 t/t4018/cpp-class-constructor-mem-init     |   5 +
 t/t4018/cpp-class-definition               |   4 +
 t/t4018/cpp-class-definition-derived       |   5 +
 t/t4018/cpp-class-destructor               |   4 +
 t/t4018/cpp-function-returning-global-type |   4 +
 t/t4018/cpp-function-returning-nested      |   5 +
 t/t4018/cpp-function-returning-pointer     |   4 +
 t/t4018/cpp-function-returning-reference   |   4 +
 t/t4018/cpp-gnu-style-function             |   5 +
 t/t4018/cpp-namespace-definition           |   4 +
 t/t4018/cpp-operator-definition            |   4 +
 t/t4018/cpp-skip-access-specifiers         |   8 +
 t/t4018/cpp-skip-comment-block             |   9 ++
 t/t4018/cpp-skip-labels                    |   8 +
 t/t4018/cpp-struct-definition              |   9 ++
 t/t4018/cpp-struct-single-line             |   7 +
 t/t4018/cpp-template-function-definition   |   4 +
 t/t4018/cpp-union-definition               |   4 +
 t/t4018/cpp-void-c-function                |   4 +
 t/t4018/custom1-pattern                    |  17 +++
 t/t4018/custom2-match-to-end-of-line       |   8 +
 t/t4018/custom3-alternation-in-pattern     |  17 +++
 t/t4018/java-class-member-function         |   8 +
 t/t4018/perl-skip-end-of-heredoc           |   8 +
 t/t4018/perl-skip-forward-decl             |  10 ++
 t/t4018/perl-skip-sub-in-pod               |  18 +++
 t/t4018/perl-sub-definition                |   4 +
 t/t4018/perl-sub-definition-kr-brace       |   4 +
 userdiff.c                                 |  12 +-
 33 files changed, 303 insertions(+), 160 deletions(-)
 create mode 100644 t/t4018/README
 create mode 100644 t/t4018/cpp-c++-function
 create mode 100644 t/t4018/cpp-class-constructor
 create mode 100644 t/t4018/cpp-class-constructor-mem-init
 create mode 100644 t/t4018/cpp-class-definition
 create mode 100644 t/t4018/cpp-class-definition-derived
 create mode 100644 t/t4018/cpp-class-destructor
 create mode 100644 t/t4018/cpp-function-returning-global-type
 create mode 100644 t/t4018/cpp-function-returning-nested
 create mode 100644 t/t4018/cpp-function-returning-pointer
 create mode 100644 t/t4018/cpp-function-returning-reference
 create mode 100644 t/t4018/cpp-gnu-style-function
 create mode 100644 t/t4018/cpp-namespace-definition
 create mode 100644 t/t4018/cpp-operator-definition
 create mode 100644 t/t4018/cpp-skip-access-specifiers
 create mode 100644 t/t4018/cpp-skip-comment-block
 create mode 100644 t/t4018/cpp-skip-labels
 create mode 100644 t/t4018/cpp-struct-definition
 create mode 100644 t/t4018/cpp-struct-single-line
 create mode 100644 t/t4018/cpp-template-function-definition
 create mode 100644 t/t4018/cpp-union-definition
 create mode 100644 t/t4018/cpp-void-c-function
 create mode 100644 t/t4018/custom1-pattern
 create mode 100644 t/t4018/custom2-match-to-end-of-line
 create mode 100644 t/t4018/custom3-alternation-in-pattern
 create mode 100644 t/t4018/java-class-member-function
 create mode 100644 t/t4018/perl-skip-end-of-heredoc
 create mode 100644 t/t4018/perl-skip-forward-decl
 create mode 100644 t/t4018/perl-skip-sub-in-pod
 create mode 100644 t/t4018/perl-sub-definition
 create mode 100644 t/t4018/perl-sub-definition-kr-brace

-- 
1.8.5.2.244.g9fb3fb1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/10] userdiff: support C++ ->* and .* operators in the word regexp
  2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
@ 2014-03-21 21:07                   ` Johannes Sixt
  2014-03-21 21:07                   ` [PATCH 02/10] userdiff: support unsigned and long long suffixes of integer constants Johannes Sixt
                                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Johannes Sixt @ 2014-03-21 21:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin,
	Johannes Sixt

The character sequences ->* and .* are valid C++ operators. Keep them
together in --word-diff mode.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 userdiff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/userdiff.c b/userdiff.c
index 10b61ec..434535b 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -133,7 +133,7 @@
 	 /* -- */
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
-	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
+	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*"),
 PATTERNS("csharp",
 	 /* Keywords */
 	 "!^[ \t]*(do|while|for|if|else|instanceof|new|return|switch|case|throw|catch|using)\n"
-- 
1.8.5.2.244.g9fb3fb1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/10] userdiff: support unsigned and long long suffixes of integer constants
  2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
  2014-03-21 21:07                   ` [PATCH 01/10] userdiff: support C++ ->* and .* operators in the word regexp Johannes Sixt
@ 2014-03-21 21:07                   ` Johannes Sixt
  2014-03-21 21:07                   ` [PATCH 03/10] t4018: an infrastructure to test hunk headers Johannes Sixt
                                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Johannes Sixt @ 2014-03-21 21:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin,
	Johannes Sixt

Do not split constants such as 123U, 456ll, 789UL at the first U or
second L.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 userdiff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/userdiff.c b/userdiff.c
index 434535b..8830417 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -132,7 +132,7 @@
 	 "^((struct|class|enum)[^;]*)$",
 	 /* -- */
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
-	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
+	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lLuU]*"
 	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*"),
 PATTERNS("csharp",
 	 /* Keywords */
-- 
1.8.5.2.244.g9fb3fb1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/10] t4018: an infrastructure to test hunk headers
  2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
  2014-03-21 21:07                   ` [PATCH 01/10] userdiff: support C++ ->* and .* operators in the word regexp Johannes Sixt
  2014-03-21 21:07                   ` [PATCH 02/10] userdiff: support unsigned and long long suffixes of integer constants Johannes Sixt
@ 2014-03-21 21:07                   ` Johannes Sixt
  2014-03-21 22:00                     ` Junio C Hamano
  2014-03-24 21:36                     ` Jeff King
  2014-03-21 21:07                   ` [PATCH 04/10] t4018: convert perl pattern tests to the new infrastructure Johannes Sixt
                                     ` (8 subsequent siblings)
  11 siblings, 2 replies; 30+ messages in thread
From: Johannes Sixt @ 2014-03-21 21:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin,
	Johannes Sixt

Add an infrastructure that simplifies adding new tests of the hunk
header regular expressions.

To add new tests, a file with the syntax to test can be dropped in the
directory t4018. The README file explains how a test file must contain;
the README itself tests the default behavior.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 t/t4018-diff-funcname.sh | 60 +++++++++++++++++++++++++++++++++++++++++++-----
 t/t4018/README           | 18 +++++++++++++++
 2 files changed, 72 insertions(+), 6 deletions(-)
 create mode 100644 t/t4018/README

diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 38a092a..b467d9e 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -100,7 +100,25 @@ test_expect_funcname () {
 	grep "^@@.*@@ $1" diff
 }
 
-for p in ada bibtex cpp csharp fortran html java matlab objc pascal perl php python ruby tex
+diffpatterns="
+	ada
+	bibtex
+	cpp
+	csharp
+	fortran
+	html
+	java
+	matlab
+	objc
+	pascal
+	perl
+	php
+	python
+	ruby
+	tex
+"
+
+for p in $diffpatterns
 do
 	test_expect_success "builtin $p pattern compiles" '
 		echo "*.java diff=$p" >.gitattributes &&
@@ -118,11 +136,6 @@ do
 	'
 done
 
-test_expect_success 'default behaviour' '
-	rm -f .gitattributes &&
-	test_expect_funcname "public class Beer\$"
-'
-
 test_expect_success 'set up .gitattributes declaring drivers to test' '
 	cat >.gitattributes <<-\EOF
 	*.java diff=java
@@ -182,4 +195,39 @@ test_expect_success 'alternation in pattern' '
 	test_expect_funcname "public static void main("
 '
 
+test_expect_success 'setup hunk header tests' '
+	for i in $diffpatterns
+	do
+		echo "$i-* diff=$i"
+	done > .gitattributes &&
+
+	# add all test files to the index
+	(
+		cd "$TEST_DIRECTORY"/t4018 &&
+		git --git-dir="$TRASH_DIRECTORY/.git" add .
+	) &&
+
+	# place modified files in the worktree
+	for i in $(git ls-files)
+	do
+		sed -e "s/ChangeMe/IWasChanged/" <"$TEST_DIRECTORY/t4018/$i" >"$i" || return 1
+	done
+'
+
+# check each individual file
+for i in $(git ls-files)
+do
+	if grep broken "$i" >/dev/null 2>&1
+	then
+		result=failure
+	else
+		result=success
+	fi
+	test_expect_$result "hunk header: $i" "
+		test_when_finished 'cat actual' &&	# for debugging only
+		git diff -U1 $i >actual &&
+		grep '@@ .* @@.*RIGHT' actual
+	"
+done
+
 test_done
diff --git a/t/t4018/README b/t/t4018/README
new file mode 100644
index 0000000..283e01cc
--- /dev/null
+++ b/t/t4018/README
@@ -0,0 +1,18 @@
+How to write RIGHT test cases
+=============================
+
+Insert the word "ChangeMe" (exactly this form) at a distance of
+at least two lines from the line that must appear in the hunk header.
+
+The text that must appear in the hunk header must contain the word
+"right", but in all upper-case, like in the title above.
+
+To mark a test case that highlights a malfunction, insert the word
+BROKEN in all lower-case somewhere in the file.
+
+This text is a bit twisted and out of order, but it is itself a
+test case for the default hunk header pattern. Know what you are doing
+if you change it.
+
+BTW, this tests that the head line goes to the hunk header, not the line
+of equal signs.
-- 
1.8.5.2.244.g9fb3fb1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/10] t4018: convert perl pattern tests to the new infrastructure
  2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
                                     ` (2 preceding siblings ...)
  2014-03-21 21:07                   ` [PATCH 03/10] t4018: an infrastructure to test hunk headers Johannes Sixt
@ 2014-03-21 21:07                   ` Johannes Sixt
  2014-03-21 21:07                   ` [PATCH 05/10] t4018: convert java pattern test " Johannes Sixt
                                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Johannes Sixt @ 2014-03-21 21:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin,
	Johannes Sixt

There is one subtlety: The old test case 'perl pattern gets full line of
POD header' does not have its own new test case, but the feature is
tested nevertheless by placing the RIGHT tag at the end of the expected
hunk header in t4018/perl-skip-sub-in-pod.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 t/t4018-diff-funcname.sh             | 88 ------------------------------------
 t/t4018/perl-skip-end-of-heredoc     |  8 ++++
 t/t4018/perl-skip-forward-decl       | 10 ++++
 t/t4018/perl-skip-sub-in-pod         | 18 ++++++++
 t/t4018/perl-sub-definition          |  4 ++
 t/t4018/perl-sub-definition-kr-brace |  4 ++
 6 files changed, 44 insertions(+), 88 deletions(-)
 create mode 100644 t/t4018/perl-skip-end-of-heredoc
 create mode 100644 t/t4018/perl-skip-forward-decl
 create mode 100644 t/t4018/perl-skip-sub-in-pod
 create mode 100644 t/t4018/perl-sub-definition
 create mode 100644 t/t4018/perl-sub-definition-kr-brace

diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index b467d9e..c94a5f4 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -29,69 +29,6 @@ public class Beer
 }
 EOF
 sed 's/beer\\/beer,\\/' <Beer.java >Beer-correct.java
-cat >Beer.perl <<\EOT
-package Beer;
-
-use strict;
-use warnings;
-use parent qw(Exporter);
-our @EXPORT_OK = qw(round finalround);
-
-sub other; # forward declaration
-
-# hello
-
-sub round {
-	my ($n) = @_;
-	print "$n bottles of beer on the wall ";
-	print "$n bottles of beer\n";
-	print "Take one down, pass it around, ";
-	$n = $n - 1;
-	print "$n bottles of beer on the wall.\n";
-}
-
-sub finalround
-{
-	print "Go to the store, buy some more\n";
-	print "99 bottles of beer on the wall.\n");
-}
-
-sub withheredocument {
-	print <<"EOF"
-decoy here-doc
-EOF
-	# some lines of context
-	# to pad it out
-	print "hello\n";
-}
-
-__END__
-
-=head1 NAME
-
-Beer - subroutine to output fragment of a drinking song
-
-=head1 SYNOPSIS
-
-	use Beer qw(round finalround);
-
-	sub song {
-		for (my $i = 99; $i > 0; $i--) {
-			round $i;
-		}
-		finalround;
-	}
-
-	song;
-
-=cut
-EOT
-sed -e '
-	s/hello/goodbye/
-	s/beer\\/beer,\\/
-	s/more\\/more,\\/
-	s/song;/song();/
-' <Beer.perl >Beer-correct.perl
 
 test_expect_funcname () {
 	lang=${2-java}
@@ -139,7 +76,6 @@ done
 test_expect_success 'set up .gitattributes declaring drivers to test' '
 	cat >.gitattributes <<-\EOF
 	*.java diff=java
-	*.perl diff=perl
 	EOF
 '
 
@@ -147,30 +83,6 @@ test_expect_success 'preset java pattern' '
 	test_expect_funcname "public static void main("
 '
 
-test_expect_success 'preset perl pattern' '
-	test_expect_funcname "sub round {\$" perl
-'
-
-test_expect_success 'perl pattern accepts K&R style brace placement, too' '
-	test_expect_funcname "sub finalround\$" perl
-'
-
-test_expect_success 'but is not distracted by end of <<here document' '
-	test_expect_funcname "sub withheredocument {\$" perl
-'
-
-test_expect_success 'perl pattern is not distracted by sub within POD' '
-	test_expect_funcname "=head" perl
-'
-
-test_expect_success 'perl pattern gets full line of POD header' '
-	test_expect_funcname "=head1 SYNOPSIS\$" perl
-'
-
-test_expect_success 'perl pattern is not distracted by forward declaration' '
-	test_expect_funcname "package Beer;\$" perl
-'
-
 test_expect_success 'custom pattern' '
 	test_config diff.java.funcname "!static
 !String
diff --git a/t/t4018/perl-skip-end-of-heredoc b/t/t4018/perl-skip-end-of-heredoc
new file mode 100644
index 0000000..c22d39b
--- /dev/null
+++ b/t/t4018/perl-skip-end-of-heredoc
@@ -0,0 +1,8 @@
+sub RIGHTwithheredocument {
+	print <<"EOF"
+decoy here-doc
+EOF
+	# some lines of context
+	# to pad it out
+	print "ChangeMe\n";
+}
diff --git a/t/t4018/perl-skip-forward-decl b/t/t4018/perl-skip-forward-decl
new file mode 100644
index 0000000..a98cb8b
--- /dev/null
+++ b/t/t4018/perl-skip-forward-decl
@@ -0,0 +1,10 @@
+package RIGHT;
+
+use strict;
+use warnings;
+use parent qw(Exporter);
+our @EXPORT_OK = qw(round finalround);
+
+sub other; # forward declaration
+
+# ChangeMe
diff --git a/t/t4018/perl-skip-sub-in-pod b/t/t4018/perl-skip-sub-in-pod
new file mode 100644
index 0000000..e39f024
--- /dev/null
+++ b/t/t4018/perl-skip-sub-in-pod
@@ -0,0 +1,18 @@
+=head1 NAME
+
+Beer - subroutine to output fragment of a drinking song
+
+=head1 SYNOPSIS_RIGHT
+
+	use Beer qw(round finalround);
+
+	sub song {
+		for (my $i = 99; $i > 0; $i--) {
+			round $i;
+		}
+		finalround;
+	}
+
+	ChangeMe;
+
+=cut
diff --git a/t/t4018/perl-sub-definition b/t/t4018/perl-sub-definition
new file mode 100644
index 0000000..a507d1f
--- /dev/null
+++ b/t/t4018/perl-sub-definition
@@ -0,0 +1,4 @@
+sub RIGHT {
+	my ($n) = @_;
+	print "ChangeMe";
+}
diff --git a/t/t4018/perl-sub-definition-kr-brace b/t/t4018/perl-sub-definition-kr-brace
new file mode 100644
index 0000000..330b3df
--- /dev/null
+++ b/t/t4018/perl-sub-definition-kr-brace
@@ -0,0 +1,4 @@
+sub RIGHT
+{
+	print "ChangeMe\n";
+}
-- 
1.8.5.2.244.g9fb3fb1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/10] t4018: convert java pattern test to the new infrastructure
  2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
                                     ` (3 preceding siblings ...)
  2014-03-21 21:07                   ` [PATCH 04/10] t4018: convert perl pattern tests to the new infrastructure Johannes Sixt
@ 2014-03-21 21:07                   ` Johannes Sixt
  2014-03-21 21:07                   ` [PATCH 06/10] t4018: convert custom " Johannes Sixt
                                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Johannes Sixt @ 2014-03-21 21:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin,
	Johannes Sixt

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 t/t4018-diff-funcname.sh           | 4 ----
 t/t4018/java-class-member-function | 8 ++++++++
 2 files changed, 8 insertions(+), 4 deletions(-)
 create mode 100644 t/t4018/java-class-member-function

diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index c94a5f4..008325f 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -79,10 +79,6 @@ test_expect_success 'set up .gitattributes declaring drivers to test' '
 	EOF
 '
 
-test_expect_success 'preset java pattern' '
-	test_expect_funcname "public static void main("
-'
-
 test_expect_success 'custom pattern' '
 	test_config diff.java.funcname "!static
 !String
diff --git a/t/t4018/java-class-member-function b/t/t4018/java-class-member-function
new file mode 100644
index 0000000..298bc7a
--- /dev/null
+++ b/t/t4018/java-class-member-function
@@ -0,0 +1,8 @@
+public class Beer
+{
+	int special;
+	public static void main(String RIGHT[])
+	{
+		System.out.print("ChangeMe");
+	}
+}
-- 
1.8.5.2.244.g9fb3fb1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/10] t4018: convert custom pattern test to the new infrastructure
  2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
                                     ` (4 preceding siblings ...)
  2014-03-21 21:07                   ` [PATCH 05/10] t4018: convert java pattern test " Johannes Sixt
@ 2014-03-21 21:07                   ` Johannes Sixt
  2014-03-21 21:07                   ` [PATCH 07/10] t4018: reduce test files for pattern compilation tests Johannes Sixt
                                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Johannes Sixt @ 2014-03-21 21:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin,
	Johannes Sixt

For the test case "matches to end of line", extend the pattern by a few
wildcards so that the pattern captures the "RIGHT" token, which is needed
for verification, without mentioning it in the pattern.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 t/t4018-diff-funcname.sh               | 40 ++++++++++++++--------------------
 t/t4018/custom1-pattern                | 17 +++++++++++++++
 t/t4018/custom2-match-to-end-of-line   |  8 +++++++
 t/t4018/custom3-alternation-in-pattern | 17 +++++++++++++++
 4 files changed, 58 insertions(+), 24 deletions(-)
 create mode 100644 t/t4018/custom1-pattern
 create mode 100644 t/t4018/custom2-match-to-end-of-line
 create mode 100644 t/t4018/custom3-alternation-in-pattern

diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 008325f..5ac744f 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -30,12 +30,19 @@ public class Beer
 EOF
 sed 's/beer\\/beer,\\/' <Beer.java >Beer-correct.java
 
-test_expect_funcname () {
-	lang=${2-java}
-	test_expect_code 1 git diff --no-index -U1 \
-		"Beer.$lang" "Beer-correct.$lang" >diff &&
-	grep "^@@.*@@ $1" diff
-}
+test_expect_success 'setup' '
+	# a non-trivial custom pattern
+	git config diff.custom1.funcname "!static
+!String
+[^ 	].*s.*" &&
+
+	# a custom pattern which matches to end of line
+	git config diff.custom2.funcname "......Beer\$" &&
+
+	# alternation in pattern
+	git config diff.custom3.funcname "Beer$" &&
+	git config diff.custom3.xfuncname "^[ 	]*((public|static).*)$"
+'
 
 diffpatterns="
 	ada
@@ -53,6 +60,9 @@ diffpatterns="
 	python
 	ruby
 	tex
+	custom1
+	custom2
+	custom3
 "
 
 for p in $diffpatterns
@@ -79,30 +89,12 @@ test_expect_success 'set up .gitattributes declaring drivers to test' '
 	EOF
 '
 
-test_expect_success 'custom pattern' '
-	test_config diff.java.funcname "!static
-!String
-[^ 	].*s.*" &&
-	test_expect_funcname "int special;\$"
-'
-
 test_expect_success 'last regexp must not be negated' '
 	test_config diff.java.funcname "!static" &&
 	test_expect_code 128 git diff --no-index Beer.java Beer-correct.java 2>msg &&
 	grep ": Last expression must not be negated:" msg
 '
 
-test_expect_success 'pattern which matches to end of line' '
-	test_config diff.java.funcname "Beer\$" &&
-	test_expect_funcname "Beer\$"
-'
-
-test_expect_success 'alternation in pattern' '
-	test_config diff.java.funcname "Beer$" &&
-	test_config diff.java.xfuncname "^[ 	]*((public|static).*)$" &&
-	test_expect_funcname "public static void main("
-'
-
 test_expect_success 'setup hunk header tests' '
 	for i in $diffpatterns
 	do
diff --git a/t/t4018/custom1-pattern b/t/t4018/custom1-pattern
new file mode 100644
index 0000000..e8fd59f
--- /dev/null
+++ b/t/t4018/custom1-pattern
@@ -0,0 +1,17 @@
+public class Beer
+{
+	int special, RIGHT;
+	public static void main(String args[])
+	{
+		String s=" ";
+		for(int x = 99; x > 0; x--)
+		{
+			System.out.print(x + " bottles of beer on the wall "
+				+ x + " bottles of beer\n" // ChangeMe
+				+ "Take one down, pass it around, " + (x - 1)
+				+ " bottles of beer on the wall.\n");
+		}
+		System.out.print("Go to the store, buy some more,\n"
+			+ "99 bottles of beer on the wall.\n");
+	}
+}
diff --git a/t/t4018/custom2-match-to-end-of-line b/t/t4018/custom2-match-to-end-of-line
new file mode 100644
index 0000000..f88ac31
--- /dev/null
+++ b/t/t4018/custom2-match-to-end-of-line
@@ -0,0 +1,8 @@
+public class RIGHT_Beer
+{
+	int special;
+	public static void main(String args[])
+	{
+		System.out.print("ChangeMe");
+	}
+}
diff --git a/t/t4018/custom3-alternation-in-pattern b/t/t4018/custom3-alternation-in-pattern
new file mode 100644
index 0000000..5f3769c
--- /dev/null
+++ b/t/t4018/custom3-alternation-in-pattern
@@ -0,0 +1,17 @@
+public class Beer
+{
+	int special;
+	public static void main(String RIGHT[])
+	{
+		String s=" ";
+		for(int x = 99; x > 0; x--)
+		{
+			System.out.print(x + " bottles of beer on the wall "
+				+ x + " bottles of beer\n" // ChangeMe
+				+ "Take one down, pass it around, " + (x - 1)
+				+ " bottles of beer on the wall.\n");
+		}
+		System.out.print("Go to the store, buy some more,\n"
+			+ "99 bottles of beer on the wall.\n");
+	}
+}
-- 
1.8.5.2.244.g9fb3fb1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/10] t4018: reduce test files for pattern compilation tests
  2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
                                     ` (5 preceding siblings ...)
  2014-03-21 21:07                   ` [PATCH 06/10] t4018: convert custom " Johannes Sixt
@ 2014-03-21 21:07                   ` Johannes Sixt
  2014-03-21 21:07                   ` [PATCH 08/10] t4018: test cases for the built-in cpp pattern Johannes Sixt
                                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Johannes Sixt @ 2014-03-21 21:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin,
	Johannes Sixt

All test cases that need a file with specific text patterns have been
converted to utilize texts in the t4018/ directory. The remaining tests
in the test script deal only with the validity of the regular
expressions. These tests do not depend on the contents of files that
'git diff' is invoked on. Remove the largish here-document and use only
tiny files.

While we are touching these tests, convert grep to test_i18ngrep as the
texts checked for may undergo translation in the future.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 t/t4018-diff-funcname.sh | 52 +++++++++++++-----------------------------------
 1 file changed, 14 insertions(+), 38 deletions(-)

diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 5ac744f..34591c2 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -7,29 +7,6 @@ test_description='Test custom diff function name patterns'
 
 . ./test-lib.sh
 
-LF='
-'
-cat >Beer.java <<\EOF
-public class Beer
-{
-	int special;
-	public static void main(String args[])
-	{
-		String s=" ";
-		for(int x = 99; x > 0; x--)
-		{
-			System.out.print(x + " bottles of beer on the wall "
-				+ x + " bottles of beer\n"
-				+ "Take one down, pass it around, " + (x - 1)
-				+ " bottles of beer on the wall.\n");
-		}
-		System.out.print("Go to the store, buy some more,\n"
-			+ "99 bottles of beer on the wall.\n");
-	}
-}
-EOF
-sed 's/beer\\/beer,\\/' <Beer.java >Beer-correct.java
-
 test_expect_success 'setup' '
 	# a non-trivial custom pattern
 	git config diff.custom1.funcname "!static
@@ -41,7 +18,11 @@ test_expect_success 'setup' '
 
 	# alternation in pattern
 	git config diff.custom3.funcname "Beer$" &&
-	git config diff.custom3.xfuncname "^[ 	]*((public|static).*)$"
+	git config diff.custom3.xfuncname "^[ 	]*((public|static).*)$" &&
+
+	# for regexp compilation tests
+	echo A >A.java &&
+	echo B >B.java
 '
 
 diffpatterns="
@@ -70,29 +51,24 @@ do
 	test_expect_success "builtin $p pattern compiles" '
 		echo "*.java diff=$p" >.gitattributes &&
 		test_expect_code 1 git diff --no-index \
-			Beer.java Beer-correct.java 2>msg &&
-		! grep fatal msg &&
-		! grep error msg
+			A.java B.java 2>msg &&
+		! test_i18ngrep fatal msg &&
+		! test_i18ngrep error msg
 	'
 	test_expect_success "builtin $p wordRegex pattern compiles" '
 		echo "*.java diff=$p" >.gitattributes &&
 		test_expect_code 1 git diff --no-index --word-diff \
-			Beer.java Beer-correct.java 2>msg &&
-		! grep fatal msg &&
-		! grep error msg
+			A.java B.java 2>msg &&
+		! test_i18ngrep fatal msg &&
+		! test_i18ngrep error msg
 	'
 done
 
-test_expect_success 'set up .gitattributes declaring drivers to test' '
-	cat >.gitattributes <<-\EOF
-	*.java diff=java
-	EOF
-'
-
 test_expect_success 'last regexp must not be negated' '
+	echo "*.java diff=java" >.gitattributes &&
 	test_config diff.java.funcname "!static" &&
-	test_expect_code 128 git diff --no-index Beer.java Beer-correct.java 2>msg &&
-	grep ": Last expression must not be negated:" msg
+	test_expect_code 128 git diff --no-index A.java B.java 2>msg &&
+	test_i18ngrep ": Last expression must not be negated:" msg
 '
 
 test_expect_success 'setup hunk header tests' '
-- 
1.8.5.2.244.g9fb3fb1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/10] t4018: test cases for the built-in cpp pattern
  2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
                                     ` (6 preceding siblings ...)
  2014-03-21 21:07                   ` [PATCH 07/10] t4018: reduce test files for pattern compilation tests Johannes Sixt
@ 2014-03-21 21:07                   ` Johannes Sixt
  2014-03-21 21:07                   ` [PATCH 09/10] t4018: test cases showing that the cpp pattern misses many anchor points Johannes Sixt
                                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Johannes Sixt @ 2014-03-21 21:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin,
	Johannes Sixt

A later patch changes the built-in cpp pattern. These test cases
demonstrate aspects of the pattern that we do not want to change.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 t/t4018/cpp-c++-function               | 4 ++++
 t/t4018/cpp-class-definition           | 4 ++++
 t/t4018/cpp-class-definition-derived   | 5 +++++
 t/t4018/cpp-function-returning-pointer | 4 ++++
 t/t4018/cpp-skip-access-specifiers     | 8 ++++++++
 t/t4018/cpp-skip-comment-block         | 9 +++++++++
 t/t4018/cpp-skip-labels                | 8 ++++++++
 t/t4018/cpp-struct-definition          | 9 +++++++++
 t/t4018/cpp-void-c-function            | 4 ++++
 9 files changed, 55 insertions(+)
 create mode 100644 t/t4018/cpp-c++-function
 create mode 100644 t/t4018/cpp-class-definition
 create mode 100644 t/t4018/cpp-class-definition-derived
 create mode 100644 t/t4018/cpp-function-returning-pointer
 create mode 100644 t/t4018/cpp-skip-access-specifiers
 create mode 100644 t/t4018/cpp-skip-comment-block
 create mode 100644 t/t4018/cpp-skip-labels
 create mode 100644 t/t4018/cpp-struct-definition
 create mode 100644 t/t4018/cpp-void-c-function

diff --git a/t/t4018/cpp-c++-function b/t/t4018/cpp-c++-function
new file mode 100644
index 0000000..9ee6bbe
--- /dev/null
+++ b/t/t4018/cpp-c++-function
@@ -0,0 +1,4 @@
+Item RIGHT::DoSomething( Args with_spaces )
+{
+	ChangeMe;
+}
diff --git a/t/t4018/cpp-class-definition b/t/t4018/cpp-class-definition
new file mode 100644
index 0000000..11b61da
--- /dev/null
+++ b/t/t4018/cpp-class-definition
@@ -0,0 +1,4 @@
+class RIGHT
+{
+	int ChangeMe;
+};
diff --git a/t/t4018/cpp-class-definition-derived b/t/t4018/cpp-class-definition-derived
new file mode 100644
index 0000000..3b98cd0
--- /dev/null
+++ b/t/t4018/cpp-class-definition-derived
@@ -0,0 +1,5 @@
+class RIGHT :
+	public Baseclass
+{
+	int ChangeMe;
+};
diff --git a/t/t4018/cpp-function-returning-pointer b/t/t4018/cpp-function-returning-pointer
new file mode 100644
index 0000000..ef15657
--- /dev/null
+++ b/t/t4018/cpp-function-returning-pointer
@@ -0,0 +1,4 @@
+const char *get_it_RIGHT(char *ptr)
+{
+	ChangeMe;
+}
diff --git a/t/t4018/cpp-skip-access-specifiers b/t/t4018/cpp-skip-access-specifiers
new file mode 100644
index 0000000..4d4a9db
--- /dev/null
+++ b/t/t4018/cpp-skip-access-specifiers
@@ -0,0 +1,8 @@
+class RIGHT : public Baseclass
+{
+public:
+protected:
+private:
+	void DoSomething();
+	int ChangeMe;
+};
diff --git a/t/t4018/cpp-skip-comment-block b/t/t4018/cpp-skip-comment-block
new file mode 100644
index 0000000..3800b99
--- /dev/null
+++ b/t/t4018/cpp-skip-comment-block
@@ -0,0 +1,9 @@
+struct item RIGHT(int i)
+// Do not
+// pick up
+/* these
+** comments.
+*/
+{
+	ChangeMe;
+}
diff --git a/t/t4018/cpp-skip-labels b/t/t4018/cpp-skip-labels
new file mode 100644
index 0000000..b9c10ab
--- /dev/null
+++ b/t/t4018/cpp-skip-labels
@@ -0,0 +1,8 @@
+void RIGHT (void)
+{
+repeat:		// C++ comment
+next:		/* C comment */
+	do_something();
+
+	ChangeMe;
+}
diff --git a/t/t4018/cpp-struct-definition b/t/t4018/cpp-struct-definition
new file mode 100644
index 0000000..521c59f
--- /dev/null
+++ b/t/t4018/cpp-struct-definition
@@ -0,0 +1,9 @@
+struct RIGHT {
+	unsigned
+	/* this bit field looks like a label and should not be picked up */
+		decoy_bitfield: 2,
+		more : 1;
+	int filler;
+
+	int ChangeMe;
+};
diff --git a/t/t4018/cpp-void-c-function b/t/t4018/cpp-void-c-function
new file mode 100644
index 0000000..153081e
--- /dev/null
+++ b/t/t4018/cpp-void-c-function
@@ -0,0 +1,4 @@
+void RIGHT (void)
+{
+	ChangeMe;
+}
-- 
1.8.5.2.244.g9fb3fb1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/10] t4018: test cases showing that the cpp pattern misses many anchor points
  2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
                                     ` (7 preceding siblings ...)
  2014-03-21 21:07                   ` [PATCH 08/10] t4018: test cases for the built-in cpp pattern Johannes Sixt
@ 2014-03-21 21:07                   ` Johannes Sixt
  2014-03-21 21:07                   ` [PATCH 10/10] userdiff: have 'cpp' hunk header pattern catch more C++ " Johannes Sixt
                                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Johannes Sixt @ 2014-03-21 21:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin,
	Johannes Sixt

Most of the tests show C++ code, but there is also a union definition and
a GNU style function definition that are not recognized.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 t/t4018/cpp-class-constructor              | 5 +++++
 t/t4018/cpp-class-constructor-mem-init     | 6 ++++++
 t/t4018/cpp-class-destructor               | 5 +++++
 t/t4018/cpp-function-returning-global-type | 5 +++++
 t/t4018/cpp-function-returning-nested      | 6 ++++++
 t/t4018/cpp-function-returning-reference   | 5 +++++
 t/t4018/cpp-gnu-style-function             | 6 ++++++
 t/t4018/cpp-namespace-definition           | 5 +++++
 t/t4018/cpp-operator-definition            | 5 +++++
 t/t4018/cpp-struct-single-line             | 8 ++++++++
 t/t4018/cpp-template-function-definition   | 5 +++++
 t/t4018/cpp-union-definition               | 5 +++++
 12 files changed, 66 insertions(+)
 create mode 100644 t/t4018/cpp-class-constructor
 create mode 100644 t/t4018/cpp-class-constructor-mem-init
 create mode 100644 t/t4018/cpp-class-destructor
 create mode 100644 t/t4018/cpp-function-returning-global-type
 create mode 100644 t/t4018/cpp-function-returning-nested
 create mode 100644 t/t4018/cpp-function-returning-reference
 create mode 100644 t/t4018/cpp-gnu-style-function
 create mode 100644 t/t4018/cpp-namespace-definition
 create mode 100644 t/t4018/cpp-operator-definition
 create mode 100644 t/t4018/cpp-struct-single-line
 create mode 100644 t/t4018/cpp-template-function-definition
 create mode 100644 t/t4018/cpp-union-definition

diff --git a/t/t4018/cpp-class-constructor b/t/t4018/cpp-class-constructor
new file mode 100644
index 0000000..4c4925c
--- /dev/null
+++ b/t/t4018/cpp-class-constructor
@@ -0,0 +1,5 @@
+Item::Item(int RIGHT)
+{
+	ChangeMe;
+	broken;
+}
diff --git a/t/t4018/cpp-class-constructor-mem-init b/t/t4018/cpp-class-constructor-mem-init
new file mode 100644
index 0000000..eec1d7c
--- /dev/null
+++ b/t/t4018/cpp-class-constructor-mem-init
@@ -0,0 +1,6 @@
+Item::Item(int RIGHT) :
+	member(0)
+{
+	ChangeMe;
+	broken;
+}
diff --git a/t/t4018/cpp-class-destructor b/t/t4018/cpp-class-destructor
new file mode 100644
index 0000000..03aa51c
--- /dev/null
+++ b/t/t4018/cpp-class-destructor
@@ -0,0 +1,5 @@
+RIGHT::~RIGHT()
+{
+	ChangeMe;
+	broken;
+}
diff --git a/t/t4018/cpp-function-returning-global-type b/t/t4018/cpp-function-returning-global-type
new file mode 100644
index 0000000..bff3e5f
--- /dev/null
+++ b/t/t4018/cpp-function-returning-global-type
@@ -0,0 +1,5 @@
+::Item get::it::RIGHT()
+{
+	ChangeMe;
+	broken;
+}
diff --git a/t/t4018/cpp-function-returning-nested b/t/t4018/cpp-function-returning-nested
new file mode 100644
index 0000000..41700f2
--- /dev/null
+++ b/t/t4018/cpp-function-returning-nested
@@ -0,0 +1,6 @@
+get::Item get::it::RIGHT()
+{
+	ChangeMe;
+	broken;
+}
+
diff --git a/t/t4018/cpp-function-returning-reference b/t/t4018/cpp-function-returning-reference
new file mode 100644
index 0000000..29e2bd4
--- /dev/null
+++ b/t/t4018/cpp-function-returning-reference
@@ -0,0 +1,5 @@
+string& get::it::RIGHT(char *ptr)
+{
+	ChangeMe;
+	broken;
+}
diff --git a/t/t4018/cpp-gnu-style-function b/t/t4018/cpp-gnu-style-function
new file mode 100644
index 0000000..d65fc74
--- /dev/null
+++ b/t/t4018/cpp-gnu-style-function
@@ -0,0 +1,6 @@
+const char *
+RIGHT(int arg)
+{
+	ChangeMe;
+	broken;
+}
diff --git a/t/t4018/cpp-namespace-definition b/t/t4018/cpp-namespace-definition
new file mode 100644
index 0000000..6b88dd9
--- /dev/null
+++ b/t/t4018/cpp-namespace-definition
@@ -0,0 +1,5 @@
+namespace RIGHT
+{
+	ChangeMe;
+	broken;
+}
diff --git a/t/t4018/cpp-operator-definition b/t/t4018/cpp-operator-definition
new file mode 100644
index 0000000..f2bd167
--- /dev/null
+++ b/t/t4018/cpp-operator-definition
@@ -0,0 +1,5 @@
+Value operator+(Value LEFT, Value RIGHT)
+{
+	ChangeMe;
+	broken;
+}
diff --git a/t/t4018/cpp-struct-single-line b/t/t4018/cpp-struct-single-line
new file mode 100644
index 0000000..ad6fa8b
--- /dev/null
+++ b/t/t4018/cpp-struct-single-line
@@ -0,0 +1,8 @@
+void wrong()
+{
+}
+
+struct RIGHT_iterator_tag {};
+
+int ChangeMe;
+// broken
diff --git a/t/t4018/cpp-template-function-definition b/t/t4018/cpp-template-function-definition
new file mode 100644
index 0000000..a410298
--- /dev/null
+++ b/t/t4018/cpp-template-function-definition
@@ -0,0 +1,5 @@
+template<class T> int RIGHT(T arg)
+{
+	ChangeMe;
+	broken;
+}
diff --git a/t/t4018/cpp-union-definition b/t/t4018/cpp-union-definition
new file mode 100644
index 0000000..133b662
--- /dev/null
+++ b/t/t4018/cpp-union-definition
@@ -0,0 +1,5 @@
+union RIGHT {
+	double v;
+	int ChangeMe;
+	broken;
+};
-- 
1.8.5.2.244.g9fb3fb1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/10] userdiff: have 'cpp' hunk header pattern catch more C++ anchor points
  2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
                                     ` (8 preceding siblings ...)
  2014-03-21 21:07                   ` [PATCH 09/10] t4018: test cases showing that the cpp pattern misses many anchor points Johannes Sixt
@ 2014-03-21 21:07                   ` Johannes Sixt
  2014-03-21 22:25                   ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Junio C Hamano
  2014-03-24 21:49                   ` Jeff King
  11 siblings, 0 replies; 30+ messages in thread
From: Johannes Sixt @ 2014-03-21 21:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin,
	Johannes Sixt

The hunk header pattern 'cpp' is intended for C and C++ source code, but
it is actually not particularly useful for the latter, and even misses
some use-cases for the former.

The parts of the pattern have the following flaws:

- The first part matches an identifier followed immediately by a colon
  and arbitrary text and is intended to reject goto labels and C++
  access specifiers (public, private, protected). But this pattern also
  rejects C++ constructs, which look like this:

    MyClass::MyClass()
    MyClass::~MyClass()
    MyClass::Item MyClass::Find(...

- The second part matches an identifier followed by a list of qualified
  names (i.e. identifiers separated by the C++ scope operator '::')
  separated by space or '*' followed by an opening parenthesis (with
  space between the tokens). It matches function declarations like

    struct item* get_head(...
    int Outer::Inner::Func(...

  Since the pattern requires at least two identifiers, GNU-style
  function definitions are ignored:

    void
    func(...

  Moreover, since the pattern does not allow punctuation other than '*',
  the following C++ constructs are not recognized:

  . template definitions:
      template<class T> int func(T arg)

  . functions returning references:
      const string& get_message()

  . functions returning templated types:
      vector<int> foo()

  . operator definitions:
      Value operator+(Value l, Value r)

- The third part of the pattern finally matches compound definitions.
  But it forgets about unions and namespaces, and also skips single-line
  definitions

    struct random_iterator_tag {};

  because no semicolon can occur on the line.

Change the first pattern to require a colon at the end of the line
(except for trailing space and comments), so that it does not reject
constructor or destructor definitions.

Notice that all interesting anchor points begin with an identifier or
keyword. But since there is a large variety of syntactical constructs
after the first "word", the simplest is to require only this word and
accept everything else. Therefore, this boils down to a line that begins
with a letter or underscore (optionally preceded by the C++ scope
operator '::' to accept functions returning a type anchored at the
global namespace). Replace the second and third part by a single pattern
that picks such a line.

This has the following desirable consequence:

- All constructs mentioned above are recognized.

and the following likely desirable consequences:

- Definitions of global variables and typedefs are recognized:

    int num_entries = 0;
    extern const char* help_text;
    typedef basic_string<wchar_t> wstring;

- Commonly used marco-ized boilerplate code is recognized:

    BEGIN_MESSAGE_MAP(CCanvas,CWnd)
    Q_DECLARE_METATYPE(MyStruct)
    PATTERNS("tex",...)

  (The last one is from this very patch.)

but also the following possibly undesirable consequence:

- When a label is not on a line by itself (except for a comment) it is
  no longer rejected, but can appear as a hunk header if it occurs at
  the beginning of a line:

    next:;

IMO, the benefits of the change outweigh the (possible) regressions by a
large margin.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 t/t4018/cpp-class-constructor              | 1 -
 t/t4018/cpp-class-constructor-mem-init     | 1 -
 t/t4018/cpp-class-destructor               | 1 -
 t/t4018/cpp-function-returning-global-type | 1 -
 t/t4018/cpp-function-returning-nested      | 1 -
 t/t4018/cpp-function-returning-reference   | 1 -
 t/t4018/cpp-gnu-style-function             | 1 -
 t/t4018/cpp-namespace-definition           | 1 -
 t/t4018/cpp-operator-definition            | 1 -
 t/t4018/cpp-struct-single-line             | 1 -
 t/t4018/cpp-template-function-definition   | 1 -
 t/t4018/cpp-union-definition               | 1 -
 userdiff.c                                 | 8 +++-----
 13 files changed, 3 insertions(+), 17 deletions(-)

diff --git a/t/t4018/cpp-class-constructor b/t/t4018/cpp-class-constructor
index 4c4925c..ec4f115 100644
--- a/t/t4018/cpp-class-constructor
+++ b/t/t4018/cpp-class-constructor
@@ -1,5 +1,4 @@
 Item::Item(int RIGHT)
 {
 	ChangeMe;
-	broken;
 }
diff --git a/t/t4018/cpp-class-constructor-mem-init b/t/t4018/cpp-class-constructor-mem-init
index eec1d7c..49a69f3 100644
--- a/t/t4018/cpp-class-constructor-mem-init
+++ b/t/t4018/cpp-class-constructor-mem-init
@@ -2,5 +2,4 @@ Item::Item(int RIGHT) :
 	member(0)
 {
 	ChangeMe;
-	broken;
 }
diff --git a/t/t4018/cpp-class-destructor b/t/t4018/cpp-class-destructor
index 03aa51c..5487665 100644
--- a/t/t4018/cpp-class-destructor
+++ b/t/t4018/cpp-class-destructor
@@ -1,5 +1,4 @@
 RIGHT::~RIGHT()
 {
 	ChangeMe;
-	broken;
 }
diff --git a/t/t4018/cpp-function-returning-global-type b/t/t4018/cpp-function-returning-global-type
index bff3e5f..1084d59 100644
--- a/t/t4018/cpp-function-returning-global-type
+++ b/t/t4018/cpp-function-returning-global-type
@@ -1,5 +1,4 @@
 ::Item get::it::RIGHT()
 {
 	ChangeMe;
-	broken;
 }
diff --git a/t/t4018/cpp-function-returning-nested b/t/t4018/cpp-function-returning-nested
index 41700f2..d9750aa 100644
--- a/t/t4018/cpp-function-returning-nested
+++ b/t/t4018/cpp-function-returning-nested
@@ -1,6 +1,5 @@
 get::Item get::it::RIGHT()
 {
 	ChangeMe;
-	broken;
 }
 
diff --git a/t/t4018/cpp-function-returning-reference b/t/t4018/cpp-function-returning-reference
index 29e2bd4..01b051d 100644
--- a/t/t4018/cpp-function-returning-reference
+++ b/t/t4018/cpp-function-returning-reference
@@ -1,5 +1,4 @@
 string& get::it::RIGHT(char *ptr)
 {
 	ChangeMe;
-	broken;
 }
diff --git a/t/t4018/cpp-gnu-style-function b/t/t4018/cpp-gnu-style-function
index d65fc74..08c7c75 100644
--- a/t/t4018/cpp-gnu-style-function
+++ b/t/t4018/cpp-gnu-style-function
@@ -2,5 +2,4 @@ const char *
 RIGHT(int arg)
 {
 	ChangeMe;
-	broken;
 }
diff --git a/t/t4018/cpp-namespace-definition b/t/t4018/cpp-namespace-definition
index 6b88dd9..6749980 100644
--- a/t/t4018/cpp-namespace-definition
+++ b/t/t4018/cpp-namespace-definition
@@ -1,5 +1,4 @@
 namespace RIGHT
 {
 	ChangeMe;
-	broken;
 }
diff --git a/t/t4018/cpp-operator-definition b/t/t4018/cpp-operator-definition
index f2bd167..1acd827 100644
--- a/t/t4018/cpp-operator-definition
+++ b/t/t4018/cpp-operator-definition
@@ -1,5 +1,4 @@
 Value operator+(Value LEFT, Value RIGHT)
 {
 	ChangeMe;
-	broken;
 }
diff --git a/t/t4018/cpp-struct-single-line b/t/t4018/cpp-struct-single-line
index ad6fa8b..a0de5fb 100644
--- a/t/t4018/cpp-struct-single-line
+++ b/t/t4018/cpp-struct-single-line
@@ -5,4 +5,3 @@ void wrong()
 struct RIGHT_iterator_tag {};
 
 int ChangeMe;
-// broken
diff --git a/t/t4018/cpp-template-function-definition b/t/t4018/cpp-template-function-definition
index a410298..0cdf5ba 100644
--- a/t/t4018/cpp-template-function-definition
+++ b/t/t4018/cpp-template-function-definition
@@ -1,5 +1,4 @@
 template<class T> int RIGHT(T arg)
 {
 	ChangeMe;
-	broken;
 }
diff --git a/t/t4018/cpp-union-definition b/t/t4018/cpp-union-definition
index 133b662..7ec94df 100644
--- a/t/t4018/cpp-union-definition
+++ b/t/t4018/cpp-union-definition
@@ -1,5 +1,4 @@
 union RIGHT {
 	double v;
 	int ChangeMe;
-	broken;
 };
diff --git a/userdiff.c b/userdiff.c
index 8830417..fad52d6 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -125,11 +125,9 @@
 	 "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"),
 PATTERNS("cpp",
 	 /* Jump targets or access declarations */
-	 "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
-	 /* C/++ functions/methods at top level */
-	 "^([A-Za-z_][A-Za-z_0-9]*([ \t*]+[A-Za-z_][A-Za-z_0-9]*([ \t]*::[ \t]*[^[:space:]]+)?){1,}[ \t]*\\([^;]*)$\n"
-	 /* compound type at top level */
-	 "^((struct|class|enum)[^;]*)$",
+	 "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])\n"
+	 /* functions/methods, variables, and compounds at top level */
+	 "^((::[[:space:]]*)?[A-Za-z_].*)$",
 	 /* -- */
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lLuU]*"
-- 
1.8.5.2.244.g9fb3fb1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/10] t4018: an infrastructure to test hunk headers
  2014-03-21 21:07                   ` [PATCH 03/10] t4018: an infrastructure to test hunk headers Johannes Sixt
@ 2014-03-21 22:00                     ` Junio C Hamano
  2014-03-22  6:56                       ` Johannes Sixt
  2014-03-24 21:36                     ` Jeff King
  1 sibling, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2014-03-21 22:00 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Jeff King, Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin

Johannes Sixt <j6t@kdbg.org> writes:

> Add an infrastructure that simplifies adding new tests of the hunk
> header regular expressions.
>
> To add new tests, a file with the syntax to test can be dropped in the
> directory t4018. The README file explains how a test file must contain;

s/how/what/, or "how a test file must be written" you mean?

> the README itself tests the default behavior.

Thanks.  Looks like a reasonable way to mark what must be found.

> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> ---
>  t/t4018-diff-funcname.sh | 60 +++++++++++++++++++++++++++++++++++++++++++-----
>  t/t4018/README           | 18 +++++++++++++++
>  2 files changed, 72 insertions(+), 6 deletions(-)
>  create mode 100644 t/t4018/README
>
> diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
> index 38a092a..b467d9e 100755
> --- a/t/t4018-diff-funcname.sh
> +++ b/t/t4018-diff-funcname.sh
> @@ -100,7 +100,25 @@ test_expect_funcname () {
>  	grep "^@@.*@@ $1" diff
>  }
>  
> -for p in ada bibtex cpp csharp fortran html java matlab objc pascal perl php python ruby tex
> +diffpatterns="
> +	ada
> +	bibtex
> +	cpp
> +	csharp
> +	fortran
> +	html
> +	java
> +	matlab
> +	objc
> +	pascal
> +	perl
> +	php
> +	python
> +	ruby
> +	tex
> +"
> +
> +for p in $diffpatterns
>  do
>  	test_expect_success "builtin $p pattern compiles" '
>  		echo "*.java diff=$p" >.gitattributes &&

I always found this "Let's apply rules for language $p to these
*.java files" strange.  I have wonder if it makes sense to further
change the framework to read the name of the rule to be applied from
the file in t/t4018/ directory, instead of using filename that is
the same as the name of the rule?  That way, you can list the files
in t/t4018/ directory to come up with the above list, without having
to maintain the list of rules separately like the above.

> @@ -118,11 +136,6 @@ do
>  	'
>  done
>  
> -test_expect_success 'default behaviour' '
> -	rm -f .gitattributes &&
> -	test_expect_funcname "public class Beer\$"
> -'
> -
>  test_expect_success 'set up .gitattributes declaring drivers to test' '
>  	cat >.gitattributes <<-\EOF
>  	*.java diff=java
> @@ -182,4 +195,39 @@ test_expect_success 'alternation in pattern' '
>  	test_expect_funcname "public static void main("
>  '
>  
> +test_expect_success 'setup hunk header tests' '
> +	for i in $diffpatterns
> +	do
> +		echo "$i-* diff=$i"
> +	done > .gitattributes &&

I like that you can have more than one test for each language/rule
this way, allowing you to test one kind of breakage without getting
affected by lines prepared for other tests in the same file.

> +	# add all test files to the index
> +	(
> +		cd "$TEST_DIRECTORY"/t4018 &&
> +		git --git-dir="$TRASH_DIRECTORY/.git" add .
> +	) &&
> +
> +	# place modified files in the worktree
> +	for i in $(git ls-files)
> +	do
> +		sed -e "s/ChangeMe/IWasChanged/" <"$TEST_DIRECTORY/t4018/$i" >"$i" || return 1
> +	done
> +'
> +
> +# check each individual file
> +for i in $(git ls-files)
> +do
> +	if grep broken "$i" >/dev/null 2>&1
> +	then
> +		result=failure
> +	else
> +		result=success
> +	fi
> +	test_expect_$result "hunk header: $i" "
> +		test_when_finished 'cat actual' &&	# for debugging only
> +		git diff -U1 $i >actual &&
> +		grep '@@ .* @@.*RIGHT' actual
> +	"
> +done
> +
>  test_done
> diff --git a/t/t4018/README b/t/t4018/README
> new file mode 100644
> index 0000000..283e01cc
> --- /dev/null
> +++ b/t/t4018/README
> @@ -0,0 +1,18 @@
> +How to write RIGHT test cases
> +=============================
> +
> +Insert the word "ChangeMe" (exactly this form) at a distance of
> +at least two lines from the line that must appear in the hunk header.
> +
> +The text that must appear in the hunk header must contain the word
> +"right", but in all upper-case, like in the title above.
> +
> +To mark a test case that highlights a malfunction, insert the word
> +BROKEN in all lower-case somewhere in the file.
> +
> +This text is a bit twisted and out of order, but it is itself a
> +test case for the default hunk header pattern. Know what you are doing
> +if you change it.
> +
> +BTW, this tests that the head line goes to the hunk header, not the line
> +of equal signs.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/10] userdiff: cpp pattern simplification and test framework
  2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
                                     ` (9 preceding siblings ...)
  2014-03-21 21:07                   ` [PATCH 10/10] userdiff: have 'cpp' hunk header pattern catch more C++ " Johannes Sixt
@ 2014-03-21 22:25                   ` Junio C Hamano
  2014-03-24 21:49                   ` Jeff King
  11 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2014-03-21 22:25 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Jeff King, Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin

Thanks; will replace jk/diff-funcname-cpp-regex with this series.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/10] t4018: an infrastructure to test hunk headers
  2014-03-21 22:00                     ` Junio C Hamano
@ 2014-03-22  6:56                       ` Johannes Sixt
  2014-03-23 19:36                         ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Sixt @ 2014-03-22  6:56 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin

Am 21.03.2014 23:00, schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
> 
>> Add an infrastructure that simplifies adding new tests of the hunk
>> header regular expressions.
>>
>> To add new tests, a file with the syntax to test can be dropped in the
>> directory t4018. The README file explains how a test file must contain;
> 
> s/how/what/, or "how a test file must be written" you mean?

I do mean one of those, preferably the latter ;)

>> -for p in ada bibtex cpp csharp fortran html java matlab objc pascal perl php python ruby tex
>> +diffpatterns="
>> +	ada
>> +	bibtex
>> +	cpp
>> +	csharp
>> +	fortran
>> +	html
>> +	java
>> +	matlab
>> +	objc
>> +	pascal
>> +	perl
>> +	php
>> +	python
>> +	ruby
>> +	tex
>> +"
>> +
>> +for p in $diffpatterns
>>  do
>>  	test_expect_success "builtin $p pattern compiles" '
>>  		echo "*.java diff=$p" >.gitattributes &&
> 
> I always found this "Let's apply rules for language $p to these
> *.java files" strange.  I have wonder if it makes sense to further
> change the framework to read the name of the rule to be applied from
> the file in t/t4018/ directory, instead of using filename that is
> the same as the name of the rule?  That way, you can list the files
> in t/t4018/ directory to come up with the above list, without having
> to maintain the list of rules separately like the above.

I see what you mean. Notice, however, that we also test whether the
regular expressions can be compiled successfully, and for this it is
desirable to have the complete list of userdiff drivers. Until we have
at least one test-case for each driver, we wouldn't get the complete
list. In summary, I do agree that a bit more automation is also
desirable, but we are not there, yet.

-- Hannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/10] t4018: an infrastructure to test hunk headers
  2014-03-22  6:56                       ` Johannes Sixt
@ 2014-03-23 19:36                         ` Junio C Hamano
  0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2014-03-23 19:36 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Jeff King, Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin

Johannes Sixt <j6t@kdbg.org> writes:

> I see what you mean. Notice, however, that we also test whether
> the regular expressions can be compiled successfully, and for this
> it is desirable to have the complete list of userdiff drivers.
> Until we have at least one test-case for each driver, we wouldn't
> get the complete list.

I missed the fact that we lack "here is a pair of test files in this
lang $p; go find diff between them and make sure we have the right
hunk header" for many $p.

So the kind of improvements I wondered about are allowed to come
later, but shouldn't be done in this series, especiallly not at this
step in the series.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/10] t4018: an infrastructure to test hunk headers
  2014-03-21 21:07                   ` [PATCH 03/10] t4018: an infrastructure to test hunk headers Johannes Sixt
  2014-03-21 22:00                     ` Junio C Hamano
@ 2014-03-24 21:36                     ` Jeff King
  2014-03-24 21:39                       ` Jeff King
  1 sibling, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-03-24 21:36 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin

On Fri, Mar 21, 2014 at 10:07:15PM +0100, Johannes Sixt wrote:

> Add an infrastructure that simplifies adding new tests of the hunk
> header regular expressions.
> 
> To add new tests, a file with the syntax to test can be dropped in the
> directory t4018. The README file explains how a test file must contain;
> the README itself tests the default behavior.

I really like the cleanups you've done in t4018. I noticed how messy it
was when I modified it recently, but I didn't take the time to clean it.

> diff --git a/t/t4018/README b/t/t4018/README
> new file mode 100644
> index 0000000..283e01cc
> --- /dev/null
> +++ b/t/t4018/README
> @@ -0,0 +1,18 @@
> +How to write RIGHT test cases
> +=============================
> +
> +Insert the word "ChangeMe" (exactly this form) at a distance of
> +at least two lines from the line that must appear in the hunk header.

The existing tests use -U1 to make writing cases simpler. Is there a
reason not to continue that (or if you found that porting the existing
cases was not a chore with -U3, I can buy that argument, too)?

> +The text that must appear in the hunk header must contain the word
> +"right", but in all upper-case, like in the title above.
> +
> +To mark a test case that highlights a malfunction, insert the word
> +BROKEN in all lower-case somewhere in the file.

I wondered why you wouldn't write them in the case you are indicating,
when...

> +This text is a bit twisted and out of order, but it is itself a
> +test case for the default hunk header pattern. Know what you are doing
> +if you change it.

Ah. Clever. :)

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/10] t4018: an infrastructure to test hunk headers
  2014-03-24 21:36                     ` Jeff King
@ 2014-03-24 21:39                       ` Jeff King
  2014-03-25 20:07                         ` Johannes Sixt
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2014-03-24 21:39 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin

On Mon, Mar 24, 2014 at 05:36:59PM -0400, Jeff King wrote:

> > +How to write RIGHT test cases
> > +=============================
> > +
> > +Insert the word "ChangeMe" (exactly this form) at a distance of
> > +at least two lines from the line that must appear in the hunk header.
> 
> The existing tests use -U1 to make writing cases simpler. Is there a
> reason not to continue that (or if you found that porting the existing
> cases was not a chore with -U3, I can buy that argument, too)?

I take it back. You did keep "-U1" in the result. Is this "two lines"
rule necessary, then?

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/10] userdiff: cpp pattern simplification and test framework
  2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
                                     ` (10 preceding siblings ...)
  2014-03-21 22:25                   ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Junio C Hamano
@ 2014-03-24 21:49                   ` Jeff King
  11 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2014-03-24 21:49 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin

On Fri, Mar 21, 2014 at 10:07:12PM +0100, Johannes Sixt wrote:

> Here is a series that makes the hunk header pattern for C and C++ even
> simpler than suggested by Peff in [1] to catch a lot more C++ functions
> and two more C patterns.
> 
> As a preparatory work, the test cases are totally rewritten to make it
> a lot simpler to drop in new tests. There was an earlier attempt to
> change the infrastructure [2], and it is the reason for the widened Cc
> list.

Thanks. This looks sane overall, and I am especially happy about the
increased test coverage. I do not know if your 10/10 misses some cases,
but at least now we can collect and reason about a set of examples going
forward.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/10] t4018: an infrastructure to test hunk headers
  2014-03-24 21:39                       ` Jeff King
@ 2014-03-25 20:07                         ` Johannes Sixt
  2014-03-25 21:42                           ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Sixt @ 2014-03-25 20:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin

Am 24.03.2014 22:39, schrieb Jeff King:
> On Mon, Mar 24, 2014 at 05:36:59PM -0400, Jeff King wrote:
> 
>>> +How to write RIGHT test cases
>>> +=============================
>>> +
>>> +Insert the word "ChangeMe" (exactly this form) at a distance of
>>> +at least two lines from the line that must appear in the hunk header.
>>
>> The existing tests use -U1 to make writing cases simpler. Is there a
>> reason not to continue that (or if you found that porting the existing
>> cases was not a chore with -U3, I can buy that argument, too)?
> 
> I take it back. You did keep "-U1" in the result. Is this "two lines"
> rule necessary, then?

When we have

   one
   two
   three

how would you describe the distance between "one" and "three"?

Or do you have a wrong expectation how we determine the hunk header? The
hunk header is searched in the text *before* the pre-context. Therefore,
we need at least as many lines between the intended hunk header line
(with tag RIGHT) and the changed line (ChangeMe) as we request with -U
for the context.

So: yes, we need the rule.

-- Hannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/10] t4018: an infrastructure to test hunk headers
  2014-03-25 20:07                         ` Johannes Sixt
@ 2014-03-25 21:42                           ` Jeff King
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2014-03-25 21:42 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Brandon Casey, git, Thomas Rast, l.s.r, Johannes Schindelin

On Tue, Mar 25, 2014 at 09:07:18PM +0100, Johannes Sixt wrote:

> Am 24.03.2014 22:39, schrieb Jeff King:
> > On Mon, Mar 24, 2014 at 05:36:59PM -0400, Jeff King wrote:
> > 
> >>> +How to write RIGHT test cases
> >>> +=============================
> >>> +
> >>> +Insert the word "ChangeMe" (exactly this form) at a distance of
> >>> +at least two lines from the line that must appear in the hunk header.
> >>
> >> The existing tests use -U1 to make writing cases simpler. Is there a
> >> reason not to continue that (or if you found that porting the existing
> >> cases was not a chore with -U3, I can buy that argument, too)?
> > 
> > I take it back. You did keep "-U1" in the result. Is this "two lines"
> > rule necessary, then?
> 
> When we have
> 
>    one
>    two
>    three
> 
> how would you describe the distance between "one" and "three"?

I read it as "two lines between". Though that would be appropriate for
-U2, so I think I probably had an off-by-one in my head, then, too.

So it is probably fine as-is, and this was mostly just me confusing
myself. Sorry for the noise.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-03-25 21:42 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-05  0:36 [RFC/PATCH] diff: simplify cpp funcname regex Jeff King
2014-03-05  7:58 ` Johannes Sixt
2014-03-06 21:28   ` Jeff King
2014-03-07  7:23     ` Johannes Sixt
2014-03-14  3:54       ` Jeff King
2014-03-14  6:56         ` Johannes Sixt
2014-03-18  5:24           ` Jeff King
2014-03-18  8:02             ` Johannes Sixt
2014-03-18 11:00               ` René Scharfe
2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
2014-03-21 21:07                   ` [PATCH 01/10] userdiff: support C++ ->* and .* operators in the word regexp Johannes Sixt
2014-03-21 21:07                   ` [PATCH 02/10] userdiff: support unsigned and long long suffixes of integer constants Johannes Sixt
2014-03-21 21:07                   ` [PATCH 03/10] t4018: an infrastructure to test hunk headers Johannes Sixt
2014-03-21 22:00                     ` Junio C Hamano
2014-03-22  6:56                       ` Johannes Sixt
2014-03-23 19:36                         ` Junio C Hamano
2014-03-24 21:36                     ` Jeff King
2014-03-24 21:39                       ` Jeff King
2014-03-25 20:07                         ` Johannes Sixt
2014-03-25 21:42                           ` Jeff King
2014-03-21 21:07                   ` [PATCH 04/10] t4018: convert perl pattern tests to the new infrastructure Johannes Sixt
2014-03-21 21:07                   ` [PATCH 05/10] t4018: convert java pattern test " Johannes Sixt
2014-03-21 21:07                   ` [PATCH 06/10] t4018: convert custom " Johannes Sixt
2014-03-21 21:07                   ` [PATCH 07/10] t4018: reduce test files for pattern compilation tests Johannes Sixt
2014-03-21 21:07                   ` [PATCH 08/10] t4018: test cases for the built-in cpp pattern Johannes Sixt
2014-03-21 21:07                   ` [PATCH 09/10] t4018: test cases showing that the cpp pattern misses many anchor points Johannes Sixt
2014-03-21 21:07                   ` [PATCH 10/10] userdiff: have 'cpp' hunk header pattern catch more C++ " Johannes Sixt
2014-03-21 22:25                   ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Junio C Hamano
2014-03-24 21:49                   ` Jeff King
2014-03-05 20:31 ` [RFC/PATCH] diff: simplify cpp funcname regex Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).