* Re: [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions
2011-08-25 20:40 ` [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions Jeff King
@ 2011-08-25 21:00 ` Eric Sunshine
2011-08-25 21:06 ` Jeff King
2011-08-25 22:29 ` Brandon Casey
` (3 subsequent siblings)
4 siblings, 1 reply; 23+ messages in thread
From: Eric Sunshine @ 2011-08-25 21:00 UTC (permalink / raw)
To: Jeff King; +Cc: Boaz Harrosh, git
On Thu, Aug 25, 2011 at 4:40 PM, Jeff King <peff@peff.net> wrote:
> We already provide sane hunk-header patterns for specific
> languages. However, the user has to manually map common
> extensions to use them. It's not that hard to do, but it's
> an extra step that the user might not even know is an
> option. Let's be nice and do it automatically.
>
> Also, any other extensions that would go into such a list? I have no
> idea what the common extension is for something like pascal or csharp.
C# uses extension ".cs".
".cpp" is common, in fact often required, by Windows compilers.
What about ".h" and ".hpp"?
-- ES
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions
2011-08-25 21:00 ` Eric Sunshine
@ 2011-08-25 21:06 ` Jeff King
2011-08-25 22:01 ` Boaz Harrosh
2011-08-25 23:44 ` Eric Sunshine
0 siblings, 2 replies; 23+ messages in thread
From: Jeff King @ 2011-08-25 21:06 UTC (permalink / raw)
To: Eric Sunshine; +Cc: Boaz Harrosh, git
On Thu, Aug 25, 2011 at 05:00:51PM -0400, Eric Sunshine wrote:
> > Also, any other extensions that would go into such a list? I have no
> > idea what the common extension is for something like pascal or csharp.
>
> C# uses extension ".cs".
>
> ".cpp" is common, in fact often required, by Windows compilers.
Thanks, added both to my list.
> What about ".h" and ".hpp"?
How well do our cpp patterns do with header files? I imagine they're
better than the default, but I don't think I've ever really tried
anything tricky.
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions
2011-08-25 21:06 ` Jeff King
@ 2011-08-25 22:01 ` Boaz Harrosh
2011-08-25 23:44 ` Eric Sunshine
1 sibling, 0 replies; 23+ messages in thread
From: Boaz Harrosh @ 2011-08-25 22:01 UTC (permalink / raw)
To: Jeff King; +Cc: Eric Sunshine, git
On 08/25/2011 02:06 PM, Jeff King wrote:
> On Thu, Aug 25, 2011 at 05:00:51PM -0400, Eric Sunshine wrote:
>
>>> Also, any other extensions that would go into such a list? I have no
>>> idea what the common extension is for something like pascal or csharp.
>>
>> C# uses extension ".cs".
>>
>> ".cpp" is common, in fact often required, by Windows compilers.
>
> Thanks, added both to my list.
>
>> What about ".h" and ".hpp"?
>
> How well do our cpp patterns do with header files? I imagine they're
> better than the default, but I don't think I've ever really tried
> anything tricky.
>
> -Peff
Thanks Jeff, thanks everyone! This looks very promising. Specially that
it's all already there and I don't have to code it up.
RTFM time for me now
Boaz
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions
2011-08-25 21:06 ` Jeff King
2011-08-25 22:01 ` Boaz Harrosh
@ 2011-08-25 23:44 ` Eric Sunshine
2011-08-26 2:39 ` Jeff King
1 sibling, 1 reply; 23+ messages in thread
From: Eric Sunshine @ 2011-08-25 23:44 UTC (permalink / raw)
To: Jeff King; +Cc: Boaz Harrosh, git
On 08/25/2011 05:06 PM, Jeff King wrote:
> On Thu, Aug 25, 2011 at 05:00:51PM -0400, Eric Sunshine wrote:
>
>>> Also, any other extensions that would go into such a list? I have no
>>> idea what the common extension is for something like pascal or csharp.
>>
>> C# uses extension ".cs".
>>
>> ".cpp" is common, in fact often required, by Windows compilers.
>
> Thanks, added both to my list.
To clarify, I meant to say that for C++, .cpp is common/required on Windows.
>> What about ".h" and ".hpp"?
>
> How well do our cpp patterns do with header files? I imagine they're
> better than the default, but I don't think I've ever really tried
> anything tricky.
I scanned through a number of revisions for one of my long-running C++
projects comparing the diff of header files with and without "*.h
diff=cpp". In some header files in this project, the oft-used C++
keywords public:, protected:, and private: appear at start-of-line. In
such cases, the default diff emits a less-than-useful hunk header:
@@ -19,8 +19,8 @@ public:
whereas, "diff=cpp" emits:
@@ -19,8 +19,8 @@ class Foobar
-- ES
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions
2011-08-25 23:44 ` Eric Sunshine
@ 2011-08-26 2:39 ` Jeff King
0 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2011-08-26 2:39 UTC (permalink / raw)
To: Eric Sunshine; +Cc: Boaz Harrosh, git
On Thu, Aug 25, 2011 at 07:44:25PM -0400, Eric Sunshine wrote:
> >How well do our cpp patterns do with header files? I imagine they're
> >better than the default, but I don't think I've ever really tried
> >anything tricky.
>
> I scanned through a number of revisions for one of my long-running
> C++ projects comparing the diff of header files with and without "*.h
> diff=cpp". In some header files in this project, the oft-used C++
> keywords public:, protected:, and private: appear at start-of-line.
> In such cases, the default diff emits a less-than-useful hunk header:
>
> @@ -19,8 +19,8 @@ public:
>
> whereas, "diff=cpp" emits:
>
> @@ -19,8 +19,8 @@ class Foobar
Thanks. My C++ is so rusty that I didn't think immediately of how often
those keywords appear in header files. Also, code in inline
functions in either C or C++ will be found in header files. So I think
defaulting *.h and *.hpp to cpp is sensible.
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions
2011-08-25 20:40 ` [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions Jeff King
2011-08-25 21:00 ` Eric Sunshine
@ 2011-08-25 22:29 ` Brandon Casey
2011-08-26 2:45 ` Jeff King
2011-08-25 22:57 ` Junio C Hamano
` (2 subsequent siblings)
4 siblings, 1 reply; 23+ messages in thread
From: Brandon Casey @ 2011-08-25 22:29 UTC (permalink / raw)
To: Jeff King; +Cc: Boaz Harrosh, git
On 08/25/2011 03:40 PM, Jeff King wrote:
> We already provide sane hunk-header patterns for specific
> languages. However, the user has to manually map common
> extensions to use them. It's not that hard to do, but it's
> an extra step that the user might not even know is an
> option. Let's be nice and do it automatically.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I tried to think of negative side effects.
That's what I worried about when I last touched this code. Now I'm
thinking "what took us so long to do this!!??".
> Also, any other extensions that would go into such a list?
*.bib diff=bibtex
*.tex diff=tex
*.[Ff] diff=fortran
*.[Ff][0-9][0-9] diff=fortran
GNU fortran currently recognizes .fXX where XX is 90, 95, 03 and 08
and probably enables/disables features based on the respective standard.
[0-9][0-9] would future proof against fortran f13 and f25 as long as
there aren't other extensions that would conflict.
Wikipedia says that .for is an extension for fortran, but I've never
seen that in the wild. Maybe it's a windows thing (3-char ext).
-Brandon
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions
2011-08-25 22:29 ` Brandon Casey
@ 2011-08-26 2:45 ` Jeff King
2011-08-26 3:58 ` Eric Sunshine
2011-08-26 15:33 ` Brandon Casey
0 siblings, 2 replies; 23+ messages in thread
From: Jeff King @ 2011-08-26 2:45 UTC (permalink / raw)
To: Brandon Casey; +Cc: Boaz Harrosh, git
On Thu, Aug 25, 2011 at 05:29:36PM -0500, Brandon Casey wrote:
> > Also, any other extensions that would go into such a list?
>
> *.bib diff=bibtex
> *.tex diff=tex
I had those ones already. ;P
> *.[Ff] diff=fortran
> *.[Ff][0-9][0-9] diff=fortran
Thanks, I'll add those. I don't see a big problem with generalizing
f[0-9][0-9] to always be fortran, even though many of those numbers
aren't used. I don't think I've ever seen one used for anything else.
Should all of our matches be case-insensitive? That is, should we be
matching both .HTML and .html? Clearly lowercase is the One True Way,
but I don't know what kind of junk people with case-insensitive
filesystems have, or whether we should even worry about it.
> Wikipedia says that .for is an extension for fortran, but I've never
> seen that in the wild. Maybe it's a windows thing (3-char ext).
We can leave it out. It's easy enough for somebody to add their own
gitattribute if they want, or to even complain that it should be in the
default set. I was trying to keep this list to the utterly common, not
become a catalogue of obscure fortran customs. :)
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions
2011-08-26 2:45 ` Jeff King
@ 2011-08-26 3:58 ` Eric Sunshine
2011-08-26 15:33 ` Brandon Casey
1 sibling, 0 replies; 23+ messages in thread
From: Eric Sunshine @ 2011-08-26 3:58 UTC (permalink / raw)
To: Jeff King; +Cc: Brandon Casey, Boaz Harrosh, git
On 08/25/2011 10:45 PM, Jeff King wrote:
> Should all of our matches be case-insensitive? That is, should we be
> matching both .HTML and .html? Clearly lowercase is the One True Way,
> but I don't know what kind of junk people with case-insensitive
> filesystems have, or whether we should even worry about it.
In the Windows world, uppercase extensions are common. Also, one often
finds .htm on Windows rather than .html.
Speaking of other platforms, on Mac OS X:
Objective-C is .m
Objective-C++ is .mm (and long-deprecated .M is probably not relevant)
-- ES
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions
2011-08-26 2:45 ` Jeff King
2011-08-26 3:58 ` Eric Sunshine
@ 2011-08-26 15:33 ` Brandon Casey
1 sibling, 0 replies; 23+ messages in thread
From: Brandon Casey @ 2011-08-26 15:33 UTC (permalink / raw)
To: Jeff King; +Cc: Boaz Harrosh, git
On 08/25/2011 09:45 PM, Jeff King wrote:
> On Thu, Aug 25, 2011 at 05:29:36PM -0500, Brandon Casey wrote:
>
>>> Also, any other extensions that would go into such a list?
>>
>> *.bib diff=bibtex
>> *.tex diff=tex
>
> I had those ones already. ;P
Indeed. I must be blind, I skipped right over them.
>> *.[Ff] diff=fortran
>> *.[Ff][0-9][0-9] diff=fortran
>
> Thanks, I'll add those. I don't see a big problem with generalizing
> f[0-9][0-9] to always be fortran, even though many of those numbers
> aren't used. I don't think I've ever seen one used for anything else.
>
> Should all of our matches be case-insensitive? That is, should we be
> matching both .HTML and .html? Clearly lowercase is the One True Way,
> but I don't know what kind of junk people with case-insensitive
> filesystems have, or whether we should even worry about it.
For the fortran case, Gnu fortran actually processes the files differently
depending on whether the f is capitalized (it preprocesses or not). So
there is a functional reason for using a capital letter.
For the others, I don't know. Do people still create files named .HTML
or is that just a relic of the past? I can't really think of a strong
argument for or against matching insensitively.
-Brandon
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions
2011-08-25 20:40 ` [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions Jeff King
2011-08-25 21:00 ` Eric Sunshine
2011-08-25 22:29 ` Brandon Casey
@ 2011-08-25 22:57 ` Junio C Hamano
2011-08-26 2:59 ` Jeff King
2011-08-26 9:44 ` Thomas Rast
2011-08-27 5:14 ` Alexey Shumkin
4 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2011-08-25 22:57 UTC (permalink / raw)
To: Jeff King; +Cc: Boaz Harrosh, git
Jeff King <peff@peff.net> writes:
> If you have any matching attribute line in your own files, it should
> override. So:
>
> foo/* -diff
>
> will still mark foo/bar.c as binary, even with this change.
>
> Can anyone think of other possible side effects?
>
> Also, any other extensions that would go into such a list? I have no
> idea what the common extension is for something like pascal or csharp.
As long as the builtin ones are the lowest priority fallback, we should be
Ok.
Do we say anywhere that "Ah, this has 'diff' attribute defined, so it must
be text"? If so, we should fix _that_. In other words, having this one
extra entry
"* diff=default"
in the builtin_attr[] array should be a no-op, I think.
>
> attr.c | 12 ++++++++++++
> 1 files changed, 12 insertions(+), 0 deletions(-)
>
> diff --git a/attr.c b/attr.c
> index da29c8e..5118a14 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -294,6 +294,18 @@ static void free_attr_elem(struct attr_stack *e)
>
> static const char *builtin_attr[] = {
> "[attr]binary -diff -text",
> + "*.html diff=html",
> + "*.java diff=java",
> + "*.perl diff=perl",
> + "*.pl diff=perl",
> + "*.php diff=php",
> + "*.py diff=python",
> + "*.rb diff=ruby",
> + "*.bib diff=bibtex",
> + "*.tex diff=tex",
> + "*.c diff=cpp",
> + "*.cc diff=cpp",
> + "*.cxx diff=cpp",
> NULL,
> };
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions
2011-08-25 22:57 ` Junio C Hamano
@ 2011-08-26 2:59 ` Jeff King
2011-08-26 5:52 ` Junio C Hamano
0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2011-08-26 2:59 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Boaz Harrosh, git
On Thu, Aug 25, 2011 at 03:57:06PM -0700, Junio C Hamano wrote:
> > If you have any matching attribute line in your own files, it should
> > override. So:
> >
> > foo/* -diff
> >
> > will still mark foo/bar.c as binary, even with this change.
> >
> > Can anyone think of other possible side effects?
> >
> > Also, any other extensions that would go into such a list? I have no
> > idea what the common extension is for something like pascal or csharp.
>
> As long as the builtin ones are the lowest priority fallback, we should be
> Ok.
>
> Do we say anywhere that "Ah, this has 'diff' attribute defined, so it must
> be text"? If so, we should fix _that_. In other words, having this one
> extra entry
>
> "* diff=default"
>
> in the builtin_attr[] array should be a no-op, I think.
No, certainly not since 122aa6f (diff: introduce diff.<driver>.binary,
2008-10-05). That commit's message claims that we did before it, but
looking at the patch, I am not so sure. But I'm not about to start
testing a 3-year-old patch to see if it really was the source of the
fix; the point is that it is correct now. :)
I think it could be a problem in the future if the builtin userdiff
drivers started growing more invasive options, like automatically
claiming to be non-binary (i.e., setting diff.cpp.binary = false by
default). In other words, I think we have two options:
1. Builtin drivers like "cpp" can stay minimal, only setting funcname
and color-words headers that aren't going to produce terrible
results if we are wrong about detecting by extension.
2. We force the user to identify file types manually, so we can't be
wrong. The "cpp" diff driver means "you are a text C file", and if
a user mis-marks a binary file with that diff driver, they are the
one who is wrong.
So if it's an either/or situation, we should decide not only that
extension auto-detection is a good feature, but that it trumps adding
more advanced features to the builtin drivers in the future.
Or we could decide that the extensions really are good enough, and if
you really do have binary files named "foo.c", it's your problem to
override the defaults with "*.c -diff".
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions
2011-08-26 2:59 ` Jeff King
@ 2011-08-26 5:52 ` Junio C Hamano
0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2011-08-26 5:52 UTC (permalink / raw)
To: Jeff King; +Cc: Boaz Harrosh, git
Jeff King <peff@peff.net> writes:
> No, certainly not since 122aa6f (diff: introduce diff.<driver>.binary,
> 2008-10-05). That commit's message claims that we did before it, but
> looking at the patch, I am not so sure. But I'm not about to start
> testing a 3-year-old patch to see if it really was the source of the
> fix; the point is that it is correct now. :)
Violently agreed ;-)
> I think it could be a problem in the future if the builtin userdiff
> drivers started growing more invasive options, like automatically
> claiming to be non-binary (i.e., setting diff.cpp.binary = false by
> default).
Well, I think we can be careful when we start thinking about doing
something complex like that, then. Mentioning the above consideration in
the commit message of the final version of this patch would probably be a
good idea, I presume.
Thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions
2011-08-25 20:40 ` [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions Jeff King
` (2 preceding siblings ...)
2011-08-25 22:57 ` Junio C Hamano
@ 2011-08-26 9:44 ` Thomas Rast
2011-08-27 5:14 ` Alexey Shumkin
4 siblings, 0 replies; 23+ messages in thread
From: Thomas Rast @ 2011-08-26 9:44 UTC (permalink / raw)
To: Jeff King; +Cc: Boaz Harrosh, git
Jeff King wrote:
> We already provide sane hunk-header patterns for specific
> languages. However, the user has to manually map common
> extensions to use them. It's not that hard to do, but it's
> an extra step that the user might not even know is an
> option. Let's be nice and do it automatically.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I tried to think of negative side effects.
>
> The userdiff drivers we have are pretty conservative; they just specify
> hunk headers.
And word-diff regexes.
In my book this is a plus for your patch, but I'm just saying. It
will trigger the slightly slower mode of word-diffing that splits at
far more places than the default.
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions
2011-08-25 20:40 ` [RFC/PATCH] attr: map builtin userdiff drivers to well-known extensions Jeff King
` (3 preceding siblings ...)
2011-08-26 9:44 ` Thomas Rast
@ 2011-08-27 5:14 ` Alexey Shumkin
4 siblings, 0 replies; 23+ messages in thread
From: Alexey Shumkin @ 2011-08-27 5:14 UTC (permalink / raw)
To: git
> Also, any other extensions that would go into such a list? I have no
> idea what the common extension is for something like pascal or csharp.
# [Object] Pascal unit files
*.pas diff=pascal
# + project files (they rarely contain procedures/functions
# but it is not forbidden in specification)
*.dpr diff=pascal
# for Lazarus - pp and .lpr respectivly
*.pp diff=pascal
*.lpr diff=pascal
^ permalink raw reply [flat|nested] 23+ messages in thread