git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Configuring (future) committags support in gitweb
@ 2008-11-08 19:07 Jakub Narebski
  2008-11-08 20:02 ` Francis Galiegue
  0 siblings, 1 reply; 33+ messages in thread
From: Jakub Narebski @ 2008-11-08 19:07 UTC (permalink / raw)
  To: git; +Cc: Francis Galiegue

Francis Galiegue <fg@one2team.net> writes
in "Need help for migration from CVS to git in one go..." 

> * third: also Bonsai-related; Bonsai can link to Bugzilla by
> matching (wild guess) /\b(?:#?)(\d+)\b/ and transforming this into
> http://your.bugzilla.fqdn.here/show_bug.cgi?id=$1. Does gitweb have
> this built-in? (haven't looked yet) Is this planned, or has it been
> discussed and been considered not worth the hassle?

Here below there is proposal how the committags support could look like
for gitweb _user_, which means how to configure gitweb to use (or do not
use) committags, how to configure committags, and how to define new
committags.


Committags are "tags" in commit messages, expanded when rendering commit
message, like gitweb now does for (shortened) SHA-1, converting them to
'object' view link.  It should be done in a way to make it easy
configurable, preferably having to configure only variable part, and not
having to write whole replacement rule.

Possible committags include: _BUG(n)_, bug _#n_, _FEATURE(n),
Message-Id, plain text URL e.g. _http://repo.or.cz_, spam protecting
of email addresses, "rich text formatting" like *bold* and _underline_,
syntax highlighting of signoff lines.


I think it would be good idea to use repository config file for
setting-up repository-specific committags, and use whatever Perl
structure for global configuration. The config language can be
borrowed from "drivers" in gitattributes (`diff' and `merge' drivers).

So the example configuration could look like this:

  [gitweb]
  	committags = sha1 signoff bugzilla

  [committag "bugzilla"]
  	match = "\\b(?:#?)(\\d+)\\b"
  	link  = "http://your.bugzilla.fqdn.here/show_bug.cgi?id=$1"

where 'sha1' and 'signoff' are built-in committags, committags are
applied in the order they are put in gitweb.committags; possible actions
for committag driver include:
 * link: replace $match by '_<a href="$link">_$match_</a>_'
 * html: replace $match by '_$html_'
 * text: replace $match by '$text'
where '_a_' means that 'a' is treated as HTML, and is not expanded
further, and 'b' means that it can be further expanded by later
committags, and finally is HTML-escaped (esc_html).


What do you think about this?
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Configuring (future) committags support in gitweb
  2008-11-08 19:07 [RFC] Configuring (future) committags support in gitweb Jakub Narebski
@ 2008-11-08 20:02 ` Francis Galiegue
  2008-11-08 22:35   ` Jakub Narebski
  0 siblings, 1 reply; 33+ messages in thread
From: Francis Galiegue @ 2008-11-08 20:02 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Le Saturday 08 November 2008 20:07:53 Jakub Narebski, vous avez écrit :
> Francis Galiegue <fg@one2team.net> writes
> in "Need help for migration from CVS to git in one go..."
>
> > * third: also Bonsai-related; Bonsai can link to Bugzilla by
> > matching (wild guess) /\b(?:#?)(\d+)\b/ and transforming this into
> > http://your.bugzilla.fqdn.here/show_bug.cgi?id=$1. Does gitweb have
> > this built-in? (haven't looked yet) Is this planned, or has it been
> > discussed and been considered not worth the hassle?
>
> Here below there is proposal how the committags support could look like
> for gitweb _user_, which means how to configure gitweb to use (or do not
> use) committags, how to configure committags, and how to define new
> committags.
>

Your proposal goes much further than my initial question, but I thought I'd 
jump in anyway :p

> Committags are "tags" in commit messages, expanded when rendering commit
> message, like gitweb now does for (shortened) SHA-1, converting them to
> 'object' view link.  It should be done in a way to make it easy
> configurable, preferably having to configure only variable part, and not
> having to write whole replacement rule.
>
> Possible committags include: _BUG(n)_, bug _#n_, _FEATURE(n),
> Message-Id, plain text URL e.g. _http://repo.or.cz_, spam protecting
> of email addresses, "rich text formatting" like *bold* and _underline_,
> syntax highlighting of signoff lines.
>

What do you mean with "not having to write whole replacement rule"?

> I think it would be good idea to use repository config file for
> setting-up repository-specific committags, and use whatever Perl
> structure for global configuration. The config language can be
> borrowed from "drivers" in gitattributes (`diff' and `merge' drivers).
>
> So the example configuration could look like this:
>
>   [gitweb]
>   	committags = sha1 signoff bugzilla
>
>   [committag "bugzilla"]
>   	match = "\\b(?:#?)(\\d+)\\b"
>   	link  = "http://your.bugzilla.fqdn.here/show_bug.cgi?id=$1"
>
> where 'sha1' and 'signoff' are built-in committags, committags are
> applied in the order they are put in gitweb.committags;

I don't understand what the "signoff" builtin is : is that a link to see only 
commits "Signed-off-by:" a particular person?

If so, might I suggest that an "alt" tells "Only show commits signed off by 
this person"?

And also, what about the sha1 builtin? AFAIK, a SHA1 can point to a commit, a 
tree, and others... In fact, it points to any of these right now, but how 
would you tell apart these different SHA1s in a commit message? The only 
obvious use I see for it is the builtin "Revert ..." commit message, that the 
commiter _can_ override...

Or would that be:

my $sha1_re = qr/[a-z[0-9]{40}/;

/(?:(?i:commit\s+))?\b($sha1_re)\b/ => [link to commit $1]
/(?:(?i:tree\s+))\b($sha1_re)\b)/ => [link to tree $1]
/(?:(?:tag\s+))\b($sha1_re)\b)/ => [link to tag $1]

Finally, is there any reason to think that a sha1 or signoff committag will 
ever need to be overriden in some way?

> possible actions 
> for committag driver include:
>  * link: replace $match by '_<a href="$link">_$match_</a>_'
>  * html: replace $match by '_$html_'
>  * text: replace $match by '$text'
> where '_a_' means that 'a' is treated as HTML, and is not expanded
> further, and 'b' means that it can be further expanded by later
> committags, and finally is HTML-escaped (esc_html).
>

What use do you see for the html match? Just asking...

And I don't see what you '_a_' and '_b_' are about...

-- 
fge

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Configuring (future) committags support in gitweb
  2008-11-08 20:02 ` Francis Galiegue
@ 2008-11-08 22:35   ` Jakub Narebski
  2008-11-08 23:27     ` Francis Galiegue
  2009-02-17 15:32     ` [RFC] Configuring (future) committags support in gitweb, especially bug linking Marcel M. Cary
  0 siblings, 2 replies; 33+ messages in thread
From: Jakub Narebski @ 2008-11-08 22:35 UTC (permalink / raw)
  To: Francis Galiegue; +Cc: git

Dnia sobota 8. listopada 2008 21:02, Francis Galiegue napisał:
> Le Saturday 08 November 2008 20:07:53 Jakub Narebski, vous avez écrit :
>> Francis Galiegue <fg@one2team.net> writes
>> in "Need help for migration from CVS to git in one go..."
>>
>>> * third: also Bonsai-related; Bonsai can link to Bugzilla by
>>> matching (wild guess) /\b(?:#?)(\d+)\b/ and transforming this into
>>> http://your.bugzilla.fqdn.here/show_bug.cgi?id=$1. Does gitweb have
>>> this built-in? (haven't looked yet) Is this planned, or has it been
>>> discussed and been considered not worth the hassle?
[...]

>> Committags are "tags" in commit messages, expanded when rendering commit
>> message, like gitweb now does for (shortened) SHA-1, converting them to
>> 'object' view link.  It should be done in a way to make it easy
>> configurable, preferably having to configure only variable part, and not
>> having to write whole replacement rule.
>>
>> Possible committags include: _BUG(n)_, bug _#n_, _FEATURE(n),
>> Message-Id, plain text URL e.g. _http://repo.or.cz_, spam protecting
>> of email addresses, "rich text formatting" like *bold* and _underline_,
>> syntax highlighting of signoff lines.
>>
> 
> What do you mean with "not having to write whole replacement rule"?

Like in example with 'link' rule, not having to write whole
<a href="http://example.com/bugzilla.php?id=$1">$&</a>
(or something like that).

>> I think it would be good idea to use repository config file for
>> setting-up repository-specific committags, and use whatever Perl
>> structure for global configuration. The config language can be
>> borrowed from "drivers" in gitattributes (`diff' and `merge' drivers).
>>
>> So the example configuration could look like this:
>>
>>   [gitweb]
>>   	committags = sha1 signoff bugzilla
>>
>>   [committag "bugzilla"]
>>   	match = "\\b(?:#?)(\\d+)\\b"
>>   	link  = "http://your.bugzilla.fqdn.here/show_bug.cgi?id=$1"
>>
>> where 'sha1' and 'signoff' are built-in committags, committags are
>> applied in the order they are put in gitweb.committags;
> 
> I don't understand what the "signoff" builtin is : is that a link to see only 
> commits "Signed-off-by:" a particular person?

Committags doesn't need to be replaced by links. In this case I meant
here using 'signoff' class for Signed-off-by: (and the like) lines, by
wrapping it in '<span class="signoff">' ... '</a>'.

> And also, what about the sha1 builtin? AFAIK, a SHA1 can point to a commit, a 
> tree, and others... In fact, it points to any of these right now, but how 
> would you tell apart these different SHA1s in a commit message? The only 
> obvious use I see for it is the builtin "Revert ..." commit message, that the 
> commiter _can_ override...

SHA1 (or shortened SHA1 from 8 charasters to 40 characters, or to be
even more exact something that looks like SHA1) is replaced by link
to 'object' view, which in turn finds type of object and _redirect_
to proper view, be it 'commit' (most frequent), 'tag', 'blob' or 'tree'.

We could have used instead gitweb link with 'h' (hash) parameter, but
without 'a' (action) parameter, which currently finds type of object
and _uses_ correct view...
 
> Finally, is there any reason to think that a sha1 or signoff committag will 
> ever need to be overriden in some way?

One might not want to link SHA1, for example if there are lots of false
positives because of commit message conventions or something, or refine
'signoff' committag to use different styles for different types of
signoff: Signed-off-by, Acked-by, Tested-by, other.  Having explicit
'signoff' committag allows us also to put some committags _after_ it,
for example SPAM-protection of emails, or add some committag before
'sha1' to filter out some SHA1 match false positives.
 
>> possible actions 
>> for committag driver include:
>>  * link: replace $match by '_<a href="$link">_$match_</a>_'
>>  * html: replace $match by '_$html_'
>>  * text: replace $match by '$text'
>> where '_a_' means that 'a' is treated as HTML, and is not expanded
>> further, and 'b' means that it can be further expanded by later
>> committags, and finally is HTML-escaped (esc_html).
>>
> 
> What use do you see for the html match? Just asking...

For example 'signoff' committag... well, it is not exactly pure "html"
but rather something like template.

  [committag "signoff"]
  	match = "(?i)^ *(signed[ \\-]off[ \\-]by[ :]|acked[ \\-]by[ :]|cc[ :])"
  	templ = "{<span class=\"signoff\">}$1{</span>}"

Or simpler

  [committag "signoff"]
  	match = "(?i)^ *(signed[ \\-]off[ \\-]by[ :]|acked[ \\-]by[ :]|cc[ :])"
  	class = signoff

> And I don't see what you '_a_' and '_b_' are about...

For example in link match, the text of the link can be further refined
by committags later in sequence.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Configuring (future) committags support in gitweb
  2008-11-08 22:35   ` Jakub Narebski
@ 2008-11-08 23:27     ` Francis Galiegue
  2008-11-09  0:25       ` Jakub Narebski
  2009-02-17 15:32     ` [RFC] Configuring (future) committags support in gitweb, especially bug linking Marcel M. Cary
  1 sibling, 1 reply; 33+ messages in thread
From: Francis Galiegue @ 2008-11-08 23:27 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Le Saturday 08 November 2008 23:35:48 Jakub Narebski, vous avez écrit :

> >
> > What do you mean with "not having to write whole replacement rule"?
>
> Like in example with 'link' rule, not having to write whole
> <a href="http://example.com/bugzilla.php?id=$1">$&</a>
> (or something like that).
>

OK, good one.

> >> I think it would be good idea to use repository config file for
> >> setting-up repository-specific committags, and use whatever Perl
> >> structure for global configuration. The config language can be
> >> borrowed from "drivers" in gitattributes (`diff' and `merge' drivers).
> >>
> >> So the example configuration could look like this:
> >>
> >>   [gitweb]
> >>   	committags = sha1 signoff bugzilla
> >>
> >>   [committag "bugzilla"]
> >>   	match = "\\b(?:#?)(\\d+)\\b"
> >>   	link  = "http://your.bugzilla.fqdn.here/show_bug.cgi?id=$1"
> >>
> >> where 'sha1' and 'signoff' are built-in committags, committags are
> >> applied in the order they are put in gitweb.committags;
> >
> > I don't understand what the "signoff" builtin is : is that a link to see
> > only commits "Signed-off-by:" a particular person?
>
> Committags doesn't need to be replaced by links. In this case I meant
> here using 'signoff' class for Signed-off-by: (and the like) lines, by
> wrapping it in '<span class="signoff">' ... '</a>'.
>

Well, this would also mean to update gitweb.css, wouldn't it?

> > And also, what about the sha1 builtin? AFAIK, a SHA1 can point to a
> > commit, a tree, and others... In fact, it points to any of these right
> > now, but how would you tell apart these different SHA1s in a commit
> > message? The only obvious use I see for it is the builtin "Revert ..."
> > commit message, that the commiter _can_ override...
>
> SHA1 (or shortened SHA1 from 8 charasters to 40 characters, or to be
> even more exact something that looks like SHA1) is replaced by link
> to 'object' view, which in turn finds type of object and _redirect_
> to proper view, be it 'commit' (most frequent), 'tag', 'blob' or 'tree'.
>
> We could have used instead gitweb link with 'h' (hash) parameter, but
> without 'a' (action) parameter, which currently finds type of object
> and _uses_ correct view...
>

OK, you lost me somewhat.

What I understand is that right now, the SHA1 links are "pre-processed" by 
gitweb so that the 'a' parameter is correct, right?

Out of curiosity, I just went to the kernel git repository (I don't know the 
git version that git.kernel.org uses offhand) and altered the 'a' parameter 
to something which is not even an 'a' command at all: 500...

However, if I try a VALID 'a' command with an "irrelevant" 'h' parameter, it 
acts quite funny: it just looks like it wants to try the closest match, but 
takes some time figuring it out... Sometimes to something relevant, sometimes 
to nothing really relevant. See for instance [1], in which 'a' was 
originally "commit".

Ow.

 [1]
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=tag;h=788a5f3f70e2a9c46020bdd3a195f2a866441c5d

> > Finally, is there any reason to think that a sha1 or signoff committag
> > will ever need to be overriden in some way?
>
> One might not want to link SHA1, for example if there are lots of false
> positives because of commit message conventions or something, or refine
> 'signoff' committag to use different styles for different types of
> signoff: Signed-off-by, Acked-by, Tested-by, other.  Having explicit
> 'signoff' committag allows us also to put some committags _after_ it,
> for example SPAM-protection of emails, or add some committag before
> 'sha1' to filter out some SHA1 match false positives.
>

Hmmm, so this means you'd want to make styles customizable somewhat (signoff). 
In fact, what you really want is span for CSS! Then why not, just, making a 
document to say "This is what you can do with CSS for gitweb", and say "these 
are the available CSS tags", and then be done with it?

I mean, when comes the day that someone will WANT other spans to be defined, 
badly, it's not like it will be unheard of, won't it? 

>
> > And I don't see what you '_a_' and '_b_' are about...
>
> For example in link match, the text of the link can be further refined
> by committags later in sequence.

I still don't get it. Can you give an example?

[personal thoughts: it would be really, really nice if, somewhat, gitweb.perl 
were splitted somewhat into different modules, and ideally use more 
of "what's out there on CPAN". I'm convinced that some CPAN modules would be 
of GREAT help to gitweb, as well as I'm convinced that not many people out 
there use Windows to run gitweb anyway :p]

-- 
fge

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Configuring (future) committags support in gitweb
  2008-11-08 23:27     ` Francis Galiegue
@ 2008-11-09  0:25       ` Jakub Narebski
  0 siblings, 0 replies; 33+ messages in thread
From: Jakub Narebski @ 2008-11-09  0:25 UTC (permalink / raw)
  To: Francis Galiegue; +Cc: git

Dnia niedziela 9. listopada 2008 00:27, Francis Galiegue napisał:
> Le Saturday 08 November 2008 23:35:48 Jakub Narebski, vous avez écrit :

>>> I don't understand what the "signoff" builtin is : is that a link to see
>>> only commits "Signed-off-by:" a particular person?
>>
>> Committags doesn't need to be replaced by links. In this case I meant
>> here using 'signoff' class for Signed-off-by: (and the like) lines, by
>> wrapping it in '<span class="signoff">' ... '</a>'.
> 
> Well, this would also mean to update gitweb.css, wouldn't it?

Not necessary. Please remember that you can configure gitweb to either
use _alternate_ stylesheet (instead of provided gitweb.css), or use
_additional_ stylesheet (for example gitweb-commit.css in addition to
gitweb.css)
 
>>> And also, what about the sha1 builtin? AFAIK, a SHA1 can point to a
>>> commit, a tree, and others... In fact, it points to any of these right
>>> now, but how would you tell apart these different SHA1s in a commit
>>> message? The only obvious use I see for it is the builtin "Revert ..."
>>> commit message, that the commiter _can_ override...
>>
>> SHA1 (or shortened SHA1 from 8 charasters to 40 characters, or to be
>> even more exact something that looks like SHA1) is replaced by link
>> to 'object' view, which in turn finds type of object and _redirect_
>> to proper view, be it 'commit' (most frequent), 'tag', 'blob' or 'tree'.
>>
>> We could have used instead gitweb link with 'h' (hash) parameter, but
>> without 'a' (action) parameter, which currently finds type of object
>> and _uses_ correct view...
> 
> OK, you lost me somewhat.

I'm sorry about that. Perhaps I should use only one mechanism.
 
> What I understand is that right now, the SHA1 links are "pre-processed" by 
> gitweb so that the 'a' parameter is correct, right?

No, they are 'post-processed': finding correct action is left to
_after_ you click on the link (it is more natural, and helps
performance).

If you don't know action for given SHA-1 you can use either

  http://example.com/gitweb.cgi?a=object;h=deadbeef

which finds correct type (for example 'commit') and redirects using
HTTP 302 Found redirection to

  http://example.com/gitweb.cgi?a=commit;h=deadbeef

This way is used bu SHA-1 committag in commit messages.


Alternate solution (meant for bugtrackers), done by another author,
is to simply skip action ('a') parameter:

  http://example.com/gitweb.cgi?h=deadbeef

and then gitweb finds type of object and act accordingly (without
redirect to correct view).

[...]
>>> Finally, is there any reason to think that a sha1 or signoff committag
>>> will ever need to be overriden in some way?
>>
>> One might not want to link SHA1, for example if there are lots of false
>> positives because of commit message conventions or something, or refine
>> 'signoff' committag to use different styles for different types of
>> signoff: Signed-off-by, Acked-by, Tested-by, other.  Having explicit
>> 'signoff' committag allows us also to put some committags _after_ it,
>> for example SPAM-protection of emails, or add some committag before
>> 'sha1' to filter out some SHA1 match false positives.
> 
> Hmmm, so this means you'd want to make styles customizable somewhat (signoff). 
> In fact, what you really want is span for CSS! Then why not, just, making a 
> document to say "This is what you can do with CSS for gitweb", and say "these 
> are the available CSS tags", and then be done with it?
> 
> I mean, when comes the day that someone will WANT other spans to be defined, 
> badly, it's not like it will be unheard of, won't it? 

Errr... I don't understand.

The examples perhaps are not the best. One would be for example to use
different class (different style) for Signed-off-by signoff (which
denotes signing Certificate of Origin), and the rest of (informative)
signoff:

  [gitweb]
  	committags = sha1 signoff_signed signoff

Another example would be to add SPAM-protection of emails, for example
replacing 'user@example.com' by 'user AT example DOT com', or something
more advanced. This have to be used _after_ signoff, because otherwise
regexp could have difficulties matching mangled email

  [gitweb]
  	committags = sha1 signoff mailto

>>
>>> And I don't see what you '_a_' and '_b_' are about...
>>
>> For example in link match, the text of the link can be further refined
>> by committags later in sequence.
> 
> I still don't get it. Can you give an example?

For example signoff line

  Signed-off-by: A U Thor <author@example.com>

would be replaced by

  {<span class="signoff">}Signed-off-by: A U Thor <author@example.com>{</a>}
 
where parts inside {...} is HTML code, and should be not expanded
further, and parts outside it could be expanded further by following
lower priority committags (like anti-SPAM for emails), and have to be
finally HTML escaped (like '<' and '>' in email in signoff).


.....................................................................

> [personal thoughts: it would be really, really nice if, somewhat, gitweb.perl 
> were splitted somewhat into different modules, and ideally use more 
> of "what's out there on CPAN". I'm convinced that some CPAN modules would be 
> of GREAT help to gitweb, as well as I'm convinced that not many people out 
> there use Windows to run gitweb anyway :p]

First, having gitweb in (almost) one piece makes for easier installation.
But there are plans to have gitweb use Git.pm or future Got::Repo and
friends. I'm not sure about further splitting...

Second, we cannot in good faith use CPAN modules which cannot be found
in standard Perl distributions, or at least in some trusted extras
package (application) repositories, as gitweb is sometimes run on
machines with tight security (think kernel.org for example) where you
cannot simply ask admin to install some third-party alpha-version CPAN
module.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Configuring (future) committags support in gitweb, especially bug linking
  2008-11-08 22:35   ` Jakub Narebski
  2008-11-08 23:27     ` Francis Galiegue
@ 2009-02-17 15:32     ` Marcel M. Cary
  2009-02-18  3:00       ` [PATCH RFC 1/2] gitweb: Fix warnings with override permitted but no repo override Marcel M. Cary
                         ` (2 more replies)
  1 sibling, 3 replies; 33+ messages in thread
From: Marcel M. Cary @ 2009-02-17 15:32 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Francis Galiegue, git, Petr Baudis

I'm interested in cross-linking bug references in commit messages to a
bug tracking system.  I started tinkering a couple weeks ago and am
finally understanding that committags encompass this functionality.
(From the subject line I first understood "tags" to mean git tags rather
than commit message munging.)

Is the committags idea still under active development?

I'd be happy to share what I have, which is for bug linking only, but
very little code would apply to the more general concept of committags.
 Here are some ideas that might apply...


Two regexes would make it easier to configure a driver without needing
look-ahead and look-behind assertions.  For example, if you want to
match non-negative integers but only in the context of a Resolves-bug
header:

    Resolves-bug: 1234, 1235

With two regexes you can write:

    /^Resolves-bug: \d+(, \d+)*/
    /\d+/

But with only one, you'd have to write:

    /(?<=^Resolves-bug: (\d+, )*)(\d+)/

The need for a lookbehind assertion means I need to stop at perlreref to
lookup syntax.  Hrm... and with testing I see that it's worse than that:

    $ perl -wpe 's/(?<=^Resolves-bug: (\d+, )*)(\d+)/[$2]/g'
    Variable length lookbehind not implemented in regex; marked by <--
    HERE in m/(?<=^Resolves-bug: (\d+, )*)(\d+) <-- HERE / at -e line 1.

I guess it can't be done even with the look-behind assertion.


I got the two-regex idea from a spec I ran across while evaluating
Subversion:

http://guest:@tortoisesvn.tigris.org/svn/tortoisesvn/trunk/doc/issuetrackers.txt

If there is interest in BTS integration beyond gitweb, for example in
git-gui, gitk, or the Windows UIs, then perhaps this spec would be worth
considering.  It covers more than just hyperlinking.  It also considers
issues like how to draw the form field for a bug ID as part of a commit
message form, how to validate that form field, and then how to munge the
log message to include the entered ID.  Some details, like using a
newline to separate the two regexes, might be more awkward for Git than
Subversion.


I like the idea of allowing a regex writer -- a gitweb admin or a
repository owner -- to ignore issues regarding HTML escaping.  For
example, I'd rather not have &nbsp; in the regex.  And I don't want the
replacement to have to escape "&" in a query string.  That's a strength
of not having to write the whole link replacement rule.  And I think
hyperlinking will be one of the most common uses of this committag
feature, so it's worth special support.

In the case of false positives, it might also be helpful to have a title
attribute that explains the committag's interpretation of the text.

I also like the idea of giving the admin full control to specify a Perl
function of some sort, which might go as far as looking up bug summaries
for the "title" attribute or adding JS to fetch it via AJAX on
mouse-over.  But I doubt I would bother with that myself.


Appealing as it is, the use of '$1' in my replacements didn't work for me:

    $ perl -wpe '$reg = "(\\d+)"; $rep = ".\$1."; s/$reg/$rep/g'
    123
    .$1.

I think usage of capturing parenthesis is important, even with two
regexes, because it makes it easier to specify link text that's broader
than the data that goes in the URL.  Specifically, I wanted to be able
to produce HTML like this, with the hash mark hyperlinked but not used
in the URL:

    <a href="...bug=123">#123</a>

I guess that's just my aesthetic.  To support that, my code calls
sprintf with $&, $1, $2, ... $9, and that particualr replacement URL
uses %2$s.


I'm concerned about the composition of these committag drivers.  In
other words, will it be hard for the configurer to manage interactions
between committag drivers?  To choose a sane order, will I have to
understand the implementation details of each committag driver?

Perhaps a simpler alternative would be to let at most one driver process
a given snippet of text, forbidding nesting of replacements.  (If I
understand Junio's suggestion to use a list of strings and refs,
non-nesting overlaps are already not supported.)  If all replacements
were hyperlinks -- and I expect that to be the common case -- they
wouldn't be nestable anyway.  I wouldn't see it as a huge loss for the
nesting examples I can think of:  Separate rules for span around S-o-b
and linking or obfuscation of email could be combined into one...  A
rule to shade text quoted email-style with leading angle brackets could
just clobber any further processing of that text.  And it might simplify
the code and testing of it quite a bit.

If committags do turn out to support nesting, perhaps it would make
sense to stratify the ordering so that it's clear whether a particular
driver takes as input HTML vs. text and outputs HTML vs. text.  (For
example, weak email obfuscation might be text -> text.)  I guess to
strictly honor the input and output types of a driver, the text -> html
drivers still have to be expanded in a single pass.


A few ideas for drivers that I don't think have been mentioned yet:

* Wiki page names, like to [[Feature Documentation]].  These are notable
because they tend to contain punctuation that get HTML-escaped, like
quotes and ampersands.

* Links to gitweb itself, such as 123abc:file.txt and HEAD:file.txt.  I
guess the current hash linking sort of does the first case except that
you have to get the hash of the blob instead of using the commit hash,
and the current hash linking wouldn't reveal the filename until after
you click, nor when viewing textual log messages.  I'm not sure whether
special support for linking to multi-commit diffs or other object types
would be as helpful.

Marcel


Jakub Narebski wrote:
> Dnia sobota 8. listopada 2008 21:02, Francis Galiegue napisał:
>> Le Saturday 08 November 2008 20:07:53 Jakub Narebski, vous avez écrit :
>>> Francis Galiegue <fg@one2team.net> writes
>>> in "Need help for migration from CVS to git in one go..."
>>>
>>>> * third: also Bonsai-related; Bonsai can link to Bugzilla by
>>>> matching (wild guess) /\b(?:#?)(\d+)\b/ and transforming this into
>>>> http://your.bugzilla.fqdn.here/show_bug.cgi?id=$1. Does gitweb have
>>>> this built-in? (haven't looked yet) Is this planned, or has it been
>>>> discussed and been considered not worth the hassle?
> [...]
>
>>> Committags are "tags" in commit messages, expanded when rendering commit
>>> message, like gitweb now does for (shortened) SHA-1, converting them to
>>> 'object' view link.  It should be done in a way to make it easy
>>> configurable, preferably having to configure only variable part, and not
>>> having to write whole replacement rule.
>>>
>>> Possible committags include: _BUG(n)_, bug _#n_, _FEATURE(n),
>>> Message-Id, plain text URL e.g. _http://repo.or.cz_, spam protecting
>>> of email addresses, "rich text formatting" like *bold* and _underline_,
>>> syntax highlighting of signoff lines.
>>>
>> What do you mean with "not having to write whole replacement rule"?
>
> Like in example with 'link' rule, not having to write whole
> <a href="http://example.com/bugzilla.php?id=$1">$&</a>
> (or something like that).
>
>>> I think it would be good idea to use repository config file for
>>> setting-up repository-specific committags, and use whatever Perl
>>> structure for global configuration. The config language can be
>>> borrowed from "drivers" in gitattributes (`diff' and `merge' drivers).
>>>
>>> So the example configuration could look like this:
>>>
>>>   [gitweb]
>>>      committags = sha1 signoff bugzilla
>>>
>>>   [committag "bugzilla"]
>>>      match = "\\b(?:#?)(\\d+)\\b"
>>>      link  = "http://your.bugzilla.fqdn.here/show_bug.cgi?id=$1"
>>>
>>> where 'sha1' and 'signoff' are built-in committags, committags are
>>> applied in the order they are put in gitweb.committags;
>> I don't understand what the "signoff" builtin is : is that a link to see only
>> commits "Signed-off-by:" a particular person?
>
> Committags doesn't need to be replaced by links. In this case I meant
> here using 'signoff' class for Signed-off-by: (and the like) lines, by
> wrapping it in '<span class="signoff">' ... '</a>'.
>
>> And also, what about the sha1 builtin? AFAIK, a SHA1 can point to a commit, a
>> tree, and others... In fact, it points to any of these right now, but how
>> would you tell apart these different SHA1s in a commit message? The only
>> obvious use I see for it is the builtin "Revert ..." commit message, that the
>> commiter _can_ override...
>
> SHA1 (or shortened SHA1 from 8 charasters to 40 characters, or to be
> even more exact something that looks like SHA1) is replaced by link
> to 'object' view, which in turn finds type of object and _redirect_
> to proper view, be it 'commit' (most frequent), 'tag', 'blob' or 'tree'.
>
> We could have used instead gitweb link with 'h' (hash) parameter, but
> without 'a' (action) parameter, which currently finds type of object
> and _uses_ correct view...
>
>> Finally, is there any reason to think that a sha1 or signoff committag will
>> ever need to be overriden in some way?
>
> One might not want to link SHA1, for example if there are lots of false
> positives because of commit message conventions or something, or refine
> 'signoff' committag to use different styles for different types of
> signoff: Signed-off-by, Acked-by, Tested-by, other.  Having explicit
> 'signoff' committag allows us also to put some committags _after_ it,
> for example SPAM-protection of emails, or add some committag before
> 'sha1' to filter out some SHA1 match false positives.
>
>>> possible actions
>>> for committag driver include:
>>>  * link: replace $match by '_<a href="$link">_$match_</a>_'
>>>  * html: replace $match by '_$html_'
>>>  * text: replace $match by '$text'
>>> where '_a_' means that 'a' is treated as HTML, and is not expanded
>>> further, and 'b' means that it can be further expanded by later
>>> committags, and finally is HTML-escaped (esc_html).
>>>
>> What use do you see for the html match? Just asking...
>
> For example 'signoff' committag... well, it is not exactly pure "html"
> but rather something like template.
>
>   [committag "signoff"]
>         match = "(?i)^ *(signed[ \\-]off[ \\-]by[ :]|acked[ \\-]by[ :]|cc[ :])"
>         templ = "{<span class=\"signoff\">}$1{</span>}"
>
> Or simpler
>
>   [committag "signoff"]
>         match = "(?i)^ *(signed[ \\-]off[ \\-]by[ :]|acked[ \\-]by[ :]|cc[ :])"
>         class = signoff
>
>> And I don't see what you '_a_' and '_b_' are about...
>
> For example in link match, the text of the link can be further refined
> by committags later in sequence.
>
> --
> Jakub Narebski
> Poland
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH RFC 1/2] gitweb: Fix warnings with override permitted but no repo override
  2009-02-17 15:32     ` [RFC] Configuring (future) committags support in gitweb, especially bug linking Marcel M. Cary
@ 2009-02-18  3:00       ` Marcel M. Cary
  2009-02-18  7:41         ` Giuseppe Bilotta
  2009-02-18  8:40         ` Junio C Hamano
  2009-02-18  3:00       ` [PATCH RFC 2/2] gitweb: Hyperlink multiple git hashes on the same commit message line Marcel M. Cary
  2009-02-18  3:38       ` [RFC] Configuring (future) committags support in gitweb, especially bug linking Jakub Narebski
  2 siblings, 2 replies; 33+ messages in thread
From: Marcel M. Cary @ 2009-02-18  3:00 UTC (permalink / raw)
  To: git; +Cc: jnareb, fg, giuseppe.bilotta, pasky, Marcel M. Cary

When a feature like "blame" is permitted to be overridden in the
repository configuration but it is not actually set in the
repository, a warning is emitted due to the undefined value
of the repository configuration, even though it's a perfectly
normal condition.

The warning is grounds for test failure in the gitweb test script,
so it causes some new feature tests of mine to fail.

This patch prevents warning and adds a test case to exercise it.

Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
---

Here's a small patch I put together while tinkering with bug hyperlinking.
Does this look reasonable?

Marcel


 gitweb/gitweb.perl                     |    8 +++++---
 t/t9500-gitweb-standalone-no-errors.sh |    5 +++++
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7c48181..653f0be 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -402,13 +402,13 @@ sub feature_bool {
 	my $key = shift;
 	my ($val) = git_get_project_config($key, '--bool');
 
-	if ($val eq 'true') {
+	if (!defined $val) {
+		return ($_[0]);
+	} elsif ($val eq 'true') {
 		return (1);
 	} elsif ($val eq 'false') {
 		return (0);
 	}
-
-	return ($_[0]);
 }
 
 sub feature_snapshot {
@@ -1978,6 +1978,8 @@ sub git_get_project_config {
 		$config_file = "$git_dir/config";
 	}
 
+	return undef if (!defined $config{"gitweb.$key"});
+
 	# ensure given type
 	if (!defined $type) {
 		return $config{"gitweb.$key"};
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 7c6f70b..559045e 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -662,6 +662,11 @@ cat >>gitweb_config.perl <<EOF
 EOF
 
 test_expect_success \
+	'config override: tree view, features not overridden in repo config' \
+	'gitweb_run "p=.git;a=tree"'
+test_debug 'cat gitweb.log'
+
+test_expect_success \
 	'config override: tree view, features disabled in repo config' \
 	'git config gitweb.blame no &&
 	 git config gitweb.snapshot none &&
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH RFC 2/2] gitweb: Hyperlink multiple git hashes on the same commit message line
  2009-02-17 15:32     ` [RFC] Configuring (future) committags support in gitweb, especially bug linking Marcel M. Cary
  2009-02-18  3:00       ` [PATCH RFC 1/2] gitweb: Fix warnings with override permitted but no repo override Marcel M. Cary
@ 2009-02-18  3:00       ` Marcel M. Cary
  2009-02-18 21:55         ` Jakub Narebski
  2009-02-18  3:38       ` [RFC] Configuring (future) committags support in gitweb, especially bug linking Jakub Narebski
  2 siblings, 1 reply; 33+ messages in thread
From: Marcel M. Cary @ 2009-02-18  3:00 UTC (permalink / raw)
  To: git; +Cc: jnareb, fg, giuseppe.bilotta, pasky, Marcel M. Cary

The current implementation only hyperlinks the first hash on
a given line of the commit message.  It seems sensible to
highlight all of them if there are multiple, and it seems
plausible that there would be multiple even with a tidy line
length limit, because they can be abbreviated as short as 8
characters.

Benchmark:

I wanted to make sure that using the 'e' switch to the Perl regex
wasn't going to kill performance, since this is called once per commit
message line displayed.

In all three A/B scenarios I tried, the A and B yielded the same
results within 2%, where A is the version of code before this patch
and B is the version after.

1: View a commit message containing the last 1000 commit hashes
2: View a commit message containing 1000 lines of 40 dots to avoid
   hyperlinking at the same message length
3: View a short merge commit message with a few lines of text and
   no hashes

All were run in CGI mode on my sub-production hardware on a recent
clone of git.git.  Numbers are the average of 10 reqests per second
with the first request discarded, since I expect this change to affect
primarily CPU usage.  Measured with ApacheBench.

Note that the web page rendered was the same; while the new code
supports multiple hashes per line, there was at most one per line.

The primary purpose of scenarios 2 and 3 were to verify that the
addition of 1000 commit messages had an impact on how much of the time
was spent rendering commit messages.  They were all within 2% of 0.80
requests per second (much faster).

So I think the patch has no noticeable effect on performance.

Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
---

And here's another.

Marcel


 gitweb/gitweb.perl |   12 +++++-------
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 653f0be..51b7f56 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1384,13 +1384,11 @@ sub format_log_line_html {
 	my $line = shift;
 
 	$line = esc_html($line, -nbsp=>1);
-	if ($line =~ m/\b([0-9a-fA-F]{8,40})\b/) {
-		my $hash_text = $1;
-		my $link =
-			$cgi->a({-href => href(action=>"object", hash=>$hash_text),
-			        -class => "text"}, $hash_text);
-		$line =~ s/$hash_text/$link/;
-	}
+	$line =~ s{\b([0-9a-fA-F]{8,40})\b}{
+		return $cgi->a({-href => href(action=>"object", hash=>$1),
+					   -class => "text"}, $1);
+	}eg;
+
 	return $line;
 }
 
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC] Configuring (future) committags support in gitweb, especially bug linking
  2009-02-17 15:32     ` [RFC] Configuring (future) committags support in gitweb, especially bug linking Marcel M. Cary
  2009-02-18  3:00       ` [PATCH RFC 1/2] gitweb: Fix warnings with override permitted but no repo override Marcel M. Cary
  2009-02-18  3:00       ` [PATCH RFC 2/2] gitweb: Hyperlink multiple git hashes on the same commit message line Marcel M. Cary
@ 2009-02-18  3:38       ` Jakub Narebski
  2009-02-19 17:08         ` Marcel M. Cary
                           ` (2 more replies)
  2 siblings, 3 replies; 33+ messages in thread
From: Jakub Narebski @ 2009-02-18  3:38 UTC (permalink / raw)
  To: Marcel M. Cary; +Cc: Francis Galiegue, git, Petr Baudis

On Tue, 17 Feb 2009, Marcel M. Cary wrote:

> I'm interested in cross-linking bug references in commit messages to a
> bug tracking system.  I started tinkering a couple weeks ago and am
> finally understanding that committags encompass this functionality.
> (From the subject line I first understood "tags" to mean git tags rather
> than commit message munging.)

What would you name this feature, then?

> 
> Is the committags idea still under active development?

Well, it is in my todo list, rather further on...

[...]
> Two regexes would make it easier to configure a driver without needing
> look-ahead and look-behind assertions.  For example, if you want to
> match non-negative integers but only in the context of a Resolves-bug
> header:
> 
>     Resolves-bug: 1234, 1235

[...]
> I got the two-regex idea from a spec I ran across while evaluating
> Subversion:
> 
> http://guest:@tortoisesvn.tigris.org/svn/tortoisesvn/trunk/doc/issuetrackers.txt

You don't need multiple regexps for that, and in above example it is
used _single_ regexp; only with more than one catching group.

> I like the idea of allowing a regex writer -- a gitweb admin or a
> repository owner -- to ignore issues regarding HTML escaping.  For
> example, I'd rather not have &nbsp; in the regex.  And I don't want the
> replacement to have to escape "&" in a query string.  That's a strength
> of not having to write the whole link replacement rule.  And I think
> hyperlinking will be one of the most common uses of this committag
> feature, so it's worth special support.

[...]
> I'm concerned about the composition of these committag drivers.  In
> other words, will it be hard for the configurer to manage interactions
> between committag drivers?  To choose a sane order, will I have to
> understand the implementation details of each committag driver?

In current proposal the order of running committags drivers is
specified in configuration...

> 
> Perhaps a simpler alternative would be to let at most one driver process
> a given snippet of text, forbidding nesting of replacements.  (If I
> understand Junio's suggestion to use a list of strings and refs,
> non-nesting overlaps are already not supported.)  If all replacements
> were hyperlinks -- and I expect that to be the common case -- they
> wouldn't be nestable anyway.  I wouldn't see it as a huge loss for the
> nesting examples I can think of:  Separate rules for span around S-o-b
> and linking or obfuscation of email could be combined into one...  A
> rule to shade text quoted email-style with leading angle brackets could
> just clobber any further processing of that text.  And it might simplify
> the code and testing of it quite a bit.

... but I guess that at first attempt we could support non-overlapping
committags only, i.e. replacement is always as whole not passed to
later committags.

Still there is a problem how to specify which parts of replacement for
committags have to be HTML escaped, and which are HTML and should not
be (and which are attributes, and have to be escaped too).

[...]
> A few ideas for drivers that I don't think have been mentioned yet:
> 
> * Wiki page names, like to [[Feature Documentation]].  These are notable
> because they tend to contain punctuation that get HTML-escaped, like
> quotes and ampersands.

Well, I think if it would be supported, it would be a very special
case, so I don't think generic support for this is needed nor required.

> 
> * Links to gitweb itself, such as 123abc:file.txt and HEAD:file.txt.  I
> guess the current hash linking sort of does the first case except that
> you have to get the hash of the blob instead of using the commit hash,
> and the current hash linking wouldn't reveal the filename until after
> you click, nor when viewing textual log messages.  I'm not sure whether
> special support for linking to multi-commit diffs or other object types
> would be as helpful.

Also 'v1.5.4' etc linking to tag; both would be a good idea. At this
point I think we have already list of all references (for ref markers)
so it wouldn't require additional call to git command.


P.S. I understand that this post is an exception (send after long, long
time), but please do not toppost in replies. It goes against natural
reading order.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC 1/2] gitweb: Fix warnings with override permitted but  no repo override
  2009-02-18  3:00       ` [PATCH RFC 1/2] gitweb: Fix warnings with override permitted but no repo override Marcel M. Cary
@ 2009-02-18  7:41         ` Giuseppe Bilotta
  2009-02-18  8:40         ` Junio C Hamano
  1 sibling, 0 replies; 33+ messages in thread
From: Giuseppe Bilotta @ 2009-02-18  7:41 UTC (permalink / raw)
  To: Marcel M. Cary; +Cc: git, jnareb, fg, pasky

On Wed, Feb 18, 2009 at 4:00 AM, Marcel M. Cary <marcel@oak.homeunix.org> wrote:
> When a feature like "blame" is permitted to be overridden in the
> repository configuration but it is not actually set in the
> repository, a warning is emitted due to the undefined value
> of the repository configuration, even though it's a perfectly
> normal condition.
>
> The warning is grounds for test failure in the gitweb test script,
> so it causes some new feature tests of mine to fail.
>
> This patch prevents warning and adds a test case to exercise it.
>
> Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
> ---
>
> Here's a small patch I put together while tinkering with bug hyperlinking.
> Does this look reasonable?

My only perplexity is about this:

> @@ -1978,6 +1978,8 @@ sub git_get_project_config {
>                $config_file = "$git_dir/config";
>        }
>
> +       return undef if (!defined $config{"gitweb.$key"});
> +

I'm no Perl expert, so I have no idea: how do non-bool config checks
(which expect arrays) cope with an undef? Also, you may want to add a
non-bool override test in the test suite.

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC 1/2] gitweb: Fix warnings with override permitted but no repo override
  2009-02-18  3:00       ` [PATCH RFC 1/2] gitweb: Fix warnings with override permitted but no repo override Marcel M. Cary
  2009-02-18  7:41         ` Giuseppe Bilotta
@ 2009-02-18  8:40         ` Junio C Hamano
  2009-02-18 13:09           ` Jakub Narebski
  1 sibling, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2009-02-18  8:40 UTC (permalink / raw)
  To: Marcel M. Cary; +Cc: git, jnareb, fg, giuseppe.bilotta, pasky

"Marcel M. Cary" <marcel@oak.homeunix.org> writes:

> When a feature like "blame" is permitted to be overridden in the
> repository configuration but it is not actually set in the
> repository, a warning is emitted due to the undefined value
> of the repository configuration, even though it's a perfectly
> normal condition.
>
> The warning is grounds for test failure in the gitweb test script,
> so it causes some new feature tests of mine to fail.
>
> This patch prevents warning and adds a test case to exercise it.
>
> Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
> ---
>
> Here's a small patch I put together while tinkering with bug hyperlinking.
> Does this look reasonable?

To my cursory look, it doesn't, and it is not entirely your fault, but if
we applied this patch, it would not improve things very much.  It just
would shift the same problem around.

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 7c48181..653f0be 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -402,13 +402,13 @@ sub feature_bool {
>  	my $key = shift;
>  	my ($val) = git_get_project_config($key, '--bool');
>  
> -	if ($val eq 'true') {
> +	if (!defined $val) {
> +		return ($_[0]);
> +	} elsif ($val eq 'true') {
>  		return (1);
>  	} elsif ($val eq 'false') {
>  		return (0);
>  	}

I think the warning you are talking about is to compare $val with 'true'
with 'eq' operator when $val could be undef.  The check to see if $val is
undefined ato avoid that 'eq' comparison is fine, and the intent to return
false is also good, but I think feature_bool is meant to say "yes" or
"no", and existing code for 'false' is returning (0).  I'd rather see your
new codepath normalize incoming undef the same way string 'false' is
normalized to return (0).  Granted, the caller should be prepared to take
the answer as boolean and treat the usual Perl false values (numeric zero,
a string with single "0", or an undef) without barfing, so returning (undef)
from here ought to be safe (otherwise the callers are broken), but I'd
rather see this function play safe.

But it certainly is not my main complaint.

>  sub feature_snapshot {
> @@ -1978,6 +1978,8 @@ sub git_get_project_config {
>  		$config_file = "$git_dir/config";
>  	}
>  
> +	return undef if (!defined $config{"gitweb.$key"});
> +

I think this change is missing a lot of necessary fixes associated with
it.  Have you actually audited all the callers of this function you are
modifying?  For example, feature_bool does this:

        sub feature_bool {
                my $key = shift;
                my ($val) = git_get_project_config($key, '--bool');

                if ($val eq 'true') {
                        return (1);
                } elsif ($val eq 'false') {
	...

With your above change, I think a missing configuration variable will
stuff undef in $val, and trigger the same "$val eq 'true'" comparison
warning here.

Granted, without your change the existing code triggers the same error in
another way, by calling config_to_bool sub with undef here:

	# ensure given type
	if (!defined $type) {
		return $config{"gitweb.$key"};
	} elsif ($type eq 'bool') {
		# backward compatibility: 'git config --bool' returns true/false
		return config_to_bool($config{"gitweb.$key"}) ? 'true' : 'false';

and config_to_bool sub is written in the same carelessness like so:

        sub config_to_bool {
                my $val = shift;

                # strip leading and trailing whitespace
                $val =~ s/^\s+//;

and triggers the same error here in the s/// operation.  I think the right
fix for this part would look like this:

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7c48181..2b140cc 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1920,6 +1920,8 @@ sub git_parse_project_config {
 sub config_to_bool {
 	my $val = shift;
 
+	return 1 if (!defined $val);
+
 	# strip leading and trailing whitespace
 	$val =~ s/^\s+//;
 	$val =~ s/\s+$//;

Because

	[gitweb]
        	variable

parsed by git_parse_project_config('gitweb') should return a hash that
maps "gitweb.variable" to undef it must be fed as undef to
config_to_bool.  This variable should be reported as "true".

There are tons of undef unsafeness in this file from a very cursory look.

Unrelated to any of these, I think the following is wrong:

        sub feature_patches {
                my @val = (git_get_project_config('patches', '--int'));

                if (@val) {
                        return @val;
                }

                return ($_[0]);
        }

As git_get_project_config() always returns something, hence "if (@val)"
can never be false.

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC 1/2] gitweb: Fix warnings with override permitted but no repo override
  2009-02-18  8:40         ` Junio C Hamano
@ 2009-02-18 13:09           ` Jakub Narebski
  2009-02-18 19:02             ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Jakub Narebski @ 2009-02-18 13:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Marcel M. Cary, git, Giuseppe Bilotta, Petr Baudis, Francis Galiegue

Fixed up patch at the bottom.

Junio C Hamano wrote:
> "Marcel M. Cary" <marcel@oak.homeunix.org> writes:
> 
> > When a feature like "blame" is permitted to be overridden in the
> > repository configuration but it is not actually set in the
> > repository, a warning is emitted due to the undefined value
> > of the repository configuration, even though it's a perfectly
> > normal condition.
> >
> > The warning is grounds for test failure in the gitweb test script,
> > so it causes some new feature tests of mine to fail.
> >
> > This patch prevents warning and adds a test case to exercise it.
> >
> > Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
> > ---
> >
> > Here's a small patch I put together while tinkering with bug hyperlinking.
> > Does this look reasonable?

Somewhat.

Corrected patch at the bottom.

> 
> To my cursory look, it doesn't, and it is not entirely your fault, but if
> we applied this patch, it would not improve things very much.  It just
> would shift the same problem around.
> 
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index 7c48181..653f0be 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -402,13 +402,13 @@ sub feature_bool {
> >  	my $key = shift;
> >  	my ($val) = git_get_project_config($key, '--bool');
> >  
> > -	if ($val eq 'true') {
> > +	if (!defined $val) {
> > +		return ($_[0]);
> > +	} elsif ($val eq 'true') {
> >  		return (1);
> >  	} elsif ($val eq 'false') {
> >  		return (0);
> >  	}
> 
> I think the warning you are talking about is to compare $val with 'true'
> with 'eq' operator when $val could be undef.  The check to see if $val is
> undefined to avoid that 'eq' comparison is fine, and the intent to return
> false is also good, but I think feature_bool is meant to say "yes" or
> "no", and existing code for 'false' is returning (0).  I'd rather see your
> new codepath normalize incoming undef the same way string 'false' is
> normalized to return (0).

Actually git_get_project_config($key, '--bool') can return only three
values: 
 * 'true' if gitweb.$key config variable is 'true', 'yes', 1, or (after
   fixes in the fixed up patch at the bottom) is undefined, i.e.
     [gitweb]
     	blame
   case
 * 'false' if gitweb.$key exists and has other value (that includes
   empty string value: "[section] val" is git-config --bool true, while
   "[section] val = " is --bool false).
 * undef if gitweb.$key does not exist in the config;
   earlier version which used "git config --bool <variable>" returned
   empty string ('') here.

We want to fall back to %feature value (i.e. do not override) if
variable is not set in config.  Variable not set was '', and now is undef,
therefore need for this (correct) change.

> Granted, the caller should be prepared to take 
> the answer as boolean and treat the usual Perl false values (numeric zero,
> a string with single "0", or an undef) without barfing, so returning (undef)
> from here ought to be safe (otherwise the callers are broken), but I'd
> rather see this function play safe.
> 
> But it certainly is not my main complaint.

Well, I think we can now get rid of backwards compatibility (which is
not complete anyway: '' for not existent variable for old version, undef
for new version) with the old version which ran git-config once for each
variable, and do not go through 'true'/'false' to imitate calling
"git config --bool", which has to be converted back to Perl boolean
anyway.

> 
> >  sub feature_snapshot {
> > @@ -1978,6 +1978,8 @@ sub git_get_project_config {
> >  		$config_file = "$git_dir/config";
> >  	}
> >  
> > +	return undef if (!defined $config{"gitweb.$key"});
> > +

It should be !exists, and not !defined here, see my fixed up patch
below.

> 
> I think this change is missing a lot of necessary fixes associated with
> it.  Have you actually audited all the callers of this function you are
> modifying?  For example, feature_bool does this:
> 
>         sub feature_bool {
>                 my $key = shift;
>                 my ($val) = git_get_project_config($key, '--bool');
> 
>                 if ($val eq 'true') {
>                         return (1);
>                 } elsif ($val eq 'false') {
> 	...
> 
> With your above change, I think a missing configuration variable will
> stuff undef in $val, and trigger the same "$val eq 'true'" comparison
> warning here.

Errr... Junio, Marcel DID fix feature_bool, see above:

> > @@ -402,13 +402,13 @@ sub feature_bool {
> >  	my $key = shift;
> >  	my ($val) = git_get_project_config($key, '--bool');
> >  
> > -	if ($val eq 'true') {
> > +	if (!defined $val) {
> > +		return ($_[0]);
> > +	} elsif ($val eq 'true') {
> >  		return (1);
> >  	} elsif ($val eq 'false') {
> >  		return (0);
> >  	}


> 
> Granted, without your change the existing code triggers the same error in
> another way, by calling config_to_bool sub with undef here:
> 
> 	# ensure given type
> 	if (!defined $type) {
> 		return $config{"gitweb.$key"};
> 	} elsif ($type eq 'bool') {
> 		# backward compatibility: 'git config --bool' returns true/false
> 		return config_to_bool($config{"gitweb.$key"}) ? 'true' : 'false';
> 
> and config_to_bool sub is written in the same carelessness like so:
> 
>         sub config_to_bool {
>                 my $val = shift;
> 
>                 # strip leading and trailing whitespace
>                 $val =~ s/^\s+//;
> 
> and triggers the same error here in the s/// operation.  I think the right
> fix for this part would look like this:
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 7c48181..2b140cc 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1920,6 +1920,8 @@ sub git_parse_project_config {
>  sub config_to_bool {
>  	my $val = shift;
>  
> +	return 1 if (!defined $val);
> +
>  	# strip leading and trailing whitespace
>  	$val =~ s/^\s+//;
>  	$val =~ s/\s+$//;
> 
> Because
> 
> 	[gitweb]
>         	variable
> 
> parsed by git_parse_project_config('gitweb') should return a hash that
> maps "gitweb.variable" to undef it must be fed as undef to
> config_to_bool.  This variable should be reported as "true".

Right (but not complete).

We do check !defined $val, but too late: _after_ trying to strip leading
and trailing whitespace.  When going through various versions of
config_to_bool I have somehow forgot about this issue...

> 
> There are tons of undef unsafeness in this file from a very cursory look.
> 
> Unrelated to any of these, I think the following is wrong:
> 
>         sub feature_patches {
>                 my @val = (git_get_project_config('patches', '--int'));
> 
>                 if (@val) {
>                         return @val;
>                 }
> 
>                 return ($_[0]);
>         }
> 
> As git_get_project_config() always returns something, hence "if (@val)"
> can never be false.

Actually after the patch below I think that git_get_project_config
returns empty list () in the list (array) context now if variable
does not exist in the config thanks to "return ;" magic.  And empty
list evaluates to false in scalar context: "if (@val)" would be false
if variable does not exist in the config... in which case we would not
override 'default' value in %feature.

Alternate solution would be to use

         sub feature_patches {
                 my $val = git_get_project_config('patches', '--int');
 
                 if (defined $val) {
                         return ($val);
                 }
 
                 return ($_[0]);
         }



-- >8 --
From: Marcel M. Cary <marcel@oak.homeunix.org>
Subject: [PATCH] gitweb: Fix warnings with override permitted but no repo override

When a feature like "blame" is permitted to be overridden in the
repository configuration but it is not actually set in the repository,
a warning is emitted due to the undefined value of the repository
configuration, even though it's a perfectly normal condition.
Emitting warning is grounds for test failure in the gitweb test
script.

This error was caused by rewrite of git_get_project_config from using
"git config [<type>] <name>" for each individual configuration
variable checked to parsing "git config --list --null" output in
commit b201927 (gitweb: Read repo config using 'git config -z -l').
Earlier version of git_get_project_config was returning empty string
if variable do not exist in config; newer version is meant to return
undef in this case, therefore change in feature_bool was needed.

Additionally config_to_* subroutines were meant to be invoked only if
configuration variable exists; therefore we added early return to
git_get_project_config: it now returns no value if variable does not
exists in config.  Otherwise config_to_* subroutines (config_to_bool
in paryicular) wouldn't be able to distinguish between the case where
variable does not exist and the case where variable doesn't have value
(the "[section] noval" case, which evaluates to true for boolean).

While at it fix bug in config_to_bool, where checking if $val is
defined (if config variable has value) was done _after_ stripping
leading and trailing whitespace, which lead to 'Use of uninitialized
value' warning.

Add test case for features overridable but not overriden in repo
config, and case for no value boolean configuration variable.

Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/gitweb.perl                     |   16 ++++++++++------
 t/t9500-gitweb-standalone-no-errors.sh |   18 +++++++++++++++++-
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7c48181..83858fb 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -402,13 +402,13 @@ sub feature_bool {
 	my $key = shift;
 	my ($val) = git_get_project_config($key, '--bool');
 
-	if ($val eq 'true') {
+	if (!defined $val) {
+		return ($_[0]);
+	} elsif ($val eq 'true') {
 		return (1);
 	} elsif ($val eq 'false') {
 		return (0);
 	}
-
-	return ($_[0]);
 }
 
 sub feature_snapshot {
@@ -1914,18 +1914,19 @@ sub git_parse_project_config {
 	return %config;
 }
 
-# convert config value to boolean, 'true' or 'false'
+# convert config value to boolean: 'true' or 'false'
 # no value, number > 0, 'true' and 'yes' values are true
 # rest of values are treated as false (never as error)
 sub config_to_bool {
 	my $val = shift;
 
+	return 1 if !defined $val;             # section.key
+
 	# strip leading and trailing whitespace
 	$val =~ s/^\s+//;
 	$val =~ s/\s+$//;
 
-	return (!defined $val ||               # section.key
-	        ($val =~ /^\d+$/ && $val) ||   # section.key = 1
+	return (($val =~ /^\d+$/ && $val) ||   # section.key = 1
 	        ($val =~ /^(?:true|yes)$/i));  # section.key = true
 }
 
@@ -1978,6 +1979,9 @@ sub git_get_project_config {
 		$config_file = "$git_dir/config";
 	}
 
+	# check if config variable (key) exists
+	return unless exists $config{"gitweb.$key"};
+
 	# ensure given type
 	if (!defined $type) {
 		return $config{"gitweb.$key"};
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 7c6f70b..6ed10d0 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -662,6 +662,11 @@ cat >>gitweb_config.perl <<EOF
 EOF
 
 test_expect_success \
+	'config override: tree view, features not overridden in repo config' \
+	'gitweb_run "p=.git;a=tree"'
+test_debug 'cat gitweb.log'
+
+test_expect_success \
 	'config override: tree view, features disabled in repo config' \
 	'git config gitweb.blame no &&
 	 git config gitweb.snapshot none &&
@@ -669,12 +674,23 @@ test_expect_success \
 test_debug 'cat gitweb.log'
 
 test_expect_success \
-	'config override: tree view, features enabled in repo config' \
+	'config override: tree view, features enabled in repo config (1)' \
 	'git config gitweb.blame yes &&
 	 git config gitweb.snapshot "zip,tgz, tbz2" &&
 	 gitweb_run "p=.git;a=tree"'
 test_debug 'cat gitweb.log'
 
+cat >.git/config <<\EOF
+# testing noval and alternate separator
+[gitweb]
+	blame
+	snapshot = zip tgz
+EOF
+test_expect_success \
+	'config override: tree view, features enabled in repo config (2)' \
+	'gitweb_run "p=.git;a=tree"'
+test_debug 'cat gitweb.log'
+
 # ----------------------------------------------------------------------
 # non-ASCII in README.html
 
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC 1/2] gitweb: Fix warnings with override permitted but no repo override
  2009-02-18 13:09           ` Jakub Narebski
@ 2009-02-18 19:02             ` Junio C Hamano
  0 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2009-02-18 19:02 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Marcel M. Cary, git, Giuseppe Bilotta, Petr Baudis, Francis Galiegue

Jakub Narebski <jnareb@gmail.com> writes:

> Fixed up patch at the bottom.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC 2/2] gitweb: Hyperlink multiple git hashes on the same commit message line
  2009-02-18  3:00       ` [PATCH RFC 2/2] gitweb: Hyperlink multiple git hashes on the same commit message line Marcel M. Cary
@ 2009-02-18 21:55         ` Jakub Narebski
  2009-02-20  8:35           ` Junio C Hamano
                             ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Jakub Narebski @ 2009-02-18 21:55 UTC (permalink / raw)
  To: Marcel M. Cary; +Cc: git, fg, giuseppe.bilotta, pasky

On Wed, 18 Feb 2009, Marcel M. Cary wrote:

> The current implementation only hyperlinks the first hash on
> a given line of the commit message.  It seems sensible to
> highlight all of them if there are multiple, and it seems
> plausible that there would be multiple even with a tidy line
> length limit, because they can be abbreviated as short as 8
> characters.

That is a good catch. Code simply was not modified since we required
fill-length 40-characters SHA-1 id.

> 
> Benchmark:
> 
> I wanted to make sure that using the 'e' switch to the Perl regex
> wasn't going to kill performance, since this is called once per commit
> message line displayed.
> 
> In all three A/B scenarios I tried, the A and B yielded the same
> results within 2%, where A is the version of code before this patch
> and B is the version after.
> 
> 1: View a commit message containing the last 1000 commit hashes
> 2: View a commit message containing 1000 lines of 40 dots to avoid
>    hyperlinking at the same message length
> 3: View a short merge commit message with a few lines of text and
>    no hashes

I don't think we should worry about that; after all esc_path and
unescape subroutines also use 'e' switch to Perl regexp.

So the benchmark is nice addition, but I don't think it is really
necessary, especially that the change results in shorter and easier
(I think) to maintain code.

[...]
> So I think the patch has no noticeable effect on performance.
> 
> Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
> ---
> 
> And here's another.
> 
> Marcel

Do I understand correctly that those patches are not related at all
semantically or textually, only in that you have them one after other
(and blob sha-1 in the index line reflects state after former), isn't
it?

> 
> 
>  gitweb/gitweb.perl |   12 +++++-------
>  1 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 653f0be..51b7f56 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1384,13 +1384,11 @@ sub format_log_line_html {
>  	my $line = shift;
>  
>  	$line = esc_html($line, -nbsp=>1);
> -	if ($line =~ m/\b([0-9a-fA-F]{8,40})\b/) {
> -		my $hash_text = $1;
> -		my $link =
> -			$cgi->a({-href => href(action=>"object", hash=>$hash_text),
> -			        -class => "text"}, $hash_text);
> -		$line =~ s/$hash_text/$link/;
> -	}
> +	$line =~ s{\b([0-9a-fA-F]{8,40})\b}{
> +		return $cgi->a({-href => href(action=>"object", hash=>$1),
> +					   -class => "text"}, $1);
> +	}eg;
> +

Almost correct... but for this unnecessary 'return' statement.
Without it: ACK.

>  	return $line;
>  }
>  
> -- 
> 1.6.1
> 
> 

P.S. Why bare emails (without user names), e.g. "pasky@suse.cz"
and not "Petr Baudis <pasky@suse.cz>"? Just curious...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Configuring (future) committags support in gitweb, especially bug linking
  2009-02-18  3:38       ` [RFC] Configuring (future) committags support in gitweb, especially bug linking Jakub Narebski
@ 2009-02-19 17:08         ` Marcel M. Cary
  2009-06-19 14:13         ` [RFC PATCH 1/2] gitweb: Hyperlink various committags in commit message with regex Marcel M. Cary
  2009-06-19 14:13         ` [RFC PATCH 2/2] gitweb: Add second-stage matching of bug IDs in bugzilla committag Marcel M. Cary
  2 siblings, 0 replies; 33+ messages in thread
From: Marcel M. Cary @ 2009-02-19 17:08 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Francis Galiegue, git, Petr Baudis

Jakub Narebski wrote:
> On Tue, 17 Feb 2009, Marcel M. Cary wrote:
>
>> I'm interested in cross-linking bug references in commit messages to
>> a bug tracking system.  I started tinkering a couple weeks ago and am
>> finally understanding that committags encompass this functionality.
>> (From the subject line I first understood "tags" to mean git tags
>> rather than commit message munging.)
>
> What would you name this feature, then?

Heh, I'm not sure.  It's like a filter in the unix pipeline sense, but
"commit message filter" sounds to me like some messages might be
rejected.  Most of the drivers markup static text with HTML tags, but
not all of them.  Maybe "commit message embellishment".

Perhaps a more important question is: will people find the feature once
it's implemented?  I think that won't be a problem provided that it's
listed in the gitweb docs like the other configs.

>> Is the committags idea still under active development?
>
> Well, it is in my todo list, rather further on...

Is any code for it published in a repository anywhere?  I see a branch
jn/gitweb-committag merged into master that looks relevant, but it only
has the sha1 regex improvement.

>> Two regexes would make it easier to configure a driver without
>> needing look-ahead and look-behind assertions.  For example, if you
>> want to match non-negative integers but only in the context of a
>> Resolves-bug header:
>>
>>     Resolves-bug: 1234, 1235
>
> [...]
>> I got the two-regex idea from a spec I ran across while evaluating
>> Subversion:
>>
>>
http://guest:@tortoisesvn.tigris.org/svn/tortoisesvn/trunk/doc/issuetrackers.txt
>
> You don't need multiple regexps for that, and in above example it is
> used _single_ regexp; only with more than one catching group.

I'm not sure what exactly you propose.  In the second example in the
bugtraq spec, there are two regexes.  Maybe you mean something like
this, but it breaks with three bugs:

    $ perl -MData::Dumper -wne '
        m/^Resolves-bug: (\d+)(?:, (\d+))*/;
        print Dumper([$1, $2, $3, $4]);
    '
    Resolves-bug: 123
    $VAR1 = [
          '123',
          undef,
          undef,
          undef
        ];
    Resolves-bug: 123, 124
    $VAR1 = [
          '123',
          '124',
          undef,
          undef
        ];
    Resolves-bug: 123, 124, 125
    $VAR1 = [
          '123',
          '125',
          undef,
          undef
        ];

Maybe something like this?  But it's limited to an arbitrary number of
bug matches.  Maybe it's good enough for pratical purposes, but it's
prone to unexpected breakage when the user exceeds the threshold of, in
this case, four bugs.

    /^Resolves-bug: (\d+)(?:, (\d+))?(?:, (\d+))?(?:, (\d+))?/


Marcel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC 2/2] gitweb: Hyperlink multiple git hashes on the same commit message line
  2009-02-18 21:55         ` Jakub Narebski
@ 2009-02-20  8:35           ` Junio C Hamano
  2009-02-20 11:46             ` Jakub Narebski
  2009-02-24 15:38           ` Addresses with full names in patch emails Marcel M. Cary
  2009-02-24 16:33           ` [PATCH RFC 2/2] gitweb: Hyperlink multiple git hashes on the same commit message line Marcel M. Cary
  2 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2009-02-20  8:35 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Marcel M. Cary, git, fg, giuseppe.bilotta, pasky

Jakub Narebski <jnareb@gmail.com> writes:

>> +	$line =~ s{\b([0-9a-fA-F]{8,40})\b}{
>> +		return $cgi->a({-href => href(action=>"object", hash=>$1),
>> +					   -class => "text"}, $1);
>> +	}eg;
>> +
>
> Almost correct... but for this unnecessary 'return' statement.
> Without it: ACK.

I've applied this directly on 'master' without the return from inside s///e
with your Ack.  Please check the result.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC 2/2] gitweb: Hyperlink multiple git hashes on the same commit message line
  2009-02-20  8:35           ` Junio C Hamano
@ 2009-02-20 11:46             ` Jakub Narebski
  0 siblings, 0 replies; 33+ messages in thread
From: Jakub Narebski @ 2009-02-20 11:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marcel M. Cary, git, fg, giuseppe.bilotta, pasky

Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
> >> +	$line =~ s{\b([0-9a-fA-F]{8,40})\b}{
> >> +		return $cgi->a({-href => href(action=>"object", hash=>$1),
> >> +					   -class => "text"}, $1);
> >> +	}eg;
> >> +
> >
> > Almost correct... but for this unnecessary 'return' statement.
> > Without it: ACK.
> 
> I've applied this directly on 'master' without the return from inside s///e
> with your Ack.  Please check the result.

I did quick test by installing newest gitweb (with above commit applied),
doing in gitweb searching commit message for '[a-f0-9]{8,40}' with regexp
search; everything looks all right.  But I didn't do extensive tests.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Addresses with full names in patch emails
  2009-02-18 21:55         ` Jakub Narebski
  2009-02-20  8:35           ` Junio C Hamano
@ 2009-02-24 15:38           ` Marcel M. Cary
  2009-02-24 15:58             ` Jakub Narebski
  2009-02-24 16:33           ` [PATCH RFC 2/2] gitweb: Hyperlink multiple git hashes on the same commit message line Marcel M. Cary
  2 siblings, 1 reply; 33+ messages in thread
From: Marcel M. Cary @ 2009-02-24 15:38 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Marcel M. Cary, git, fg, giuseppe.bilotta, pasky

Thanks for the two patch tweaks.

Jakub Narebski wrote:
> P.S. Why bare emails (without user names), e.g. "pasky@suse.cz"
> and not "Petr Baudis <pasky@suse.cz>"? Just curious...

I've been using "git send-email" for patches, and have Thunderbird as my
MUA otherwise.  (I'd use (al)pine if I could make it work with
Exchange/NTLM at work, but that's another story...)  I've been
transfering recipients (--to and --cc) from Thunderbird to the
commandline with copy/paste.

In Thurderbird, copying an email address from a message only gets you
the user@domain part, not the "Full Name" <user@domain>.  To get the
"Full Name" <user@domain> I would have to View Message Source and
pickout the CC line, which is marginally harder.

And on the commandline, instead of just pasting an email as a shell
word, I'd have to add single quotes (I think) to keep the whole "Full
Name" <user@domain> as one word and quote the shell meta characters.

Neither piece is all that onerous, I guess.  Sounds like you would see
some value in the full names.  Maybe I'll try including them on my next
patch.  Looks like --cc-cmd or sendemail.aliasesfile might make it
easier, but I'd have to set them up.

Marcel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Addresses with full names in patch emails
  2009-02-24 15:38           ` Addresses with full names in patch emails Marcel M. Cary
@ 2009-02-24 15:58             ` Jakub Narebski
  0 siblings, 0 replies; 33+ messages in thread
From: Jakub Narebski @ 2009-02-24 15:58 UTC (permalink / raw)
  To: Marcel M. Cary; +Cc: git, fg, giuseppe.bilotta, pasky

On Tue, 24 Feb 2009, Marcel M. Cary wrote:
> Thanks for the two patch tweaks.
> 
> Jakub Narebski wrote:
> > P.S. Why bare emails (without user names), e.g. "pasky@suse.cz"
> > and not "Petr Baudis <pasky@suse.cz>"? Just curious...
> 
> I've been using "git send-email" for patches, and have Thunderbird as my
> MUA otherwise.  (I'd use (al)pine if I could make it work with
> Exchange/NTLM at work, but that's another story...)  I've been
> transfering recipients (--to and --cc) from Thunderbird to the
> commandline with copy/paste.
[...]

Well, 'technical reasons with copy'n'paste' would be enough for me.

P.S. But '"pasky@suse.cz" <pasky@suse.cz>' looks very silly...
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RFC 2/2] gitweb: Hyperlink multiple git hashes on the same commit message line
  2009-02-18 21:55         ` Jakub Narebski
  2009-02-20  8:35           ` Junio C Hamano
  2009-02-24 15:38           ` Addresses with full names in patch emails Marcel M. Cary
@ 2009-02-24 16:33           ` Marcel M. Cary
  2 siblings, 0 replies; 33+ messages in thread
From: Marcel M. Cary @ 2009-02-24 16:33 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Marcel M. Cary, git, fg, giuseppe.bilotta, pasky

Jakub Narebski wrote:
> Do I understand correctly that those patches are not related at all
> semantically or textually, only in that you have them one after other
> (and blob sha-1 in the index line reflects state after former), isn't
> it?

Yes, I think they were independent.  They are related only in that they
are small tweaks to gitweb and products of my bug hyperlinking patch
(not submitted).  I submitted them as a series mainly to group them.
Are you suggesting that I reserve threading patches together for when
one builds upon the previous one?

Marcel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 1/2] gitweb: Hyperlink various committags in commit message with regex
  2009-02-18  3:38       ` [RFC] Configuring (future) committags support in gitweb, especially bug linking Jakub Narebski
  2009-02-19 17:08         ` Marcel M. Cary
@ 2009-06-19 14:13         ` Marcel M. Cary
  2009-06-22 11:18           ` Jakub Narebski
  2009-06-19 14:13         ` [RFC PATCH 2/2] gitweb: Add second-stage matching of bug IDs in bugzilla committag Marcel M. Cary
  2 siblings, 1 reply; 33+ messages in thread
From: Marcel M. Cary @ 2009-06-19 14:13 UTC (permalink / raw)
  To: Jakub Narebski, git
  Cc: Petr Baudis, Giuseppe Bilotta, Francis Galiegue, Marcel M. Cary

I want gitweb to hyperlink commits to my bug tracking system so that
information regarding the current status of a commit can be easily
cross-referenced.  For example, the QA and release status of a commit
cannot be inserted into the comment.  Maybe someday a "git notes"
feature will help with this, but for now, my organization has a
separate bug tracking system.  Other repository browsers such as
unfuddle and websvn support similar features.

Since the bug hyperlinking feature was previously discussed as part of
"committags," a more general mechanism to embellish commit messages,
implement the more general mechanism instead, including the following
capabilities:

* Hyperlinking mentions of bug IDs to Bugzilla
* Hyperlinking URLs
* Hyperlinking Message-Ids to a mailing list archive
* Hyperlinking commit hashes as before by default, now with a
  configurable regex
* Defining new committags per gitweb installation

Since different repositories may use different bug tracking systems or
mailing list archives, the URL parameter may be configured
per-repository without reiterating the regexes.  To accomodate
different conventions, regexes may also be configured per-project.

This patch is heavily based on discussions and code samples from the
Git list:

	[RFC/PATCH] gitweb: Add committags support, Sep 2006
	http://thread.gmane.org/gmane.comp.version-control.git/27504

	[RFC] gitweb: Add committags support (take 2), Dec 2006
	http://thread.gmane.org/gmane.comp.version-control.git/33150

	[RFC] Configuring (future) committags support in gitweb, Nov 2008
	http://thread.gmane.org/gmane.comp.version-control.git/100415

Some issues I considered but punted:

* Should this configuration try to follow the bugtraq spec?

  As far as I know, only subversion implements it.  Separation of
  regexes by a newline would be a little awkward in the git config.
  And it is broader than just hyperlinking bugs: it also encompasses
  GUI bug ID form fields.  So gitweb would only implement a subset.
  The gitweb configuration mechanism currently only reads
  keys starting with "gitweb.", but these parameters would be more
  broadly applicable, potentially to git-gui, for example.

  However, it *would* be useful for Git tools to standardize on
  config keys and interpretations of regexes and url formats.  For
  example, git-gui might be able to hyperlink the same text as gitweb,
  and even show a separate bugID field when composing a commit
  message.

* I would prefer the regex match against the whole commit message.

  This would allow the regex to insist that a bug reference occur
  on the first line or non-first line of the commit message.  However,
  even if we concatenated the log lines for the first committag,
  subsequent committags would see the text broken up.

  Also, it would allow the regex to match a phrase split across a
  line boundary, as dicussed at some length in the first thread,
  but again, only if no prior committags had interfered.

  This could happen in a later patch.

* I would prefer the site admin have a way to let a repository
  owner define new committags, which means having a way to specify
  the 'sub' key from the repo config or having a flexible default.

The bugtraq and some of the regex questions must be decided now to
avoid breaking gitweb configs later.

Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
---
 gitweb/INSTALL                         |    4 +
 gitweb/gitweb.perl                     |  221 +++++++++++++++++++++++++++++++-
 t/t9500-gitweb-standalone-no-errors.sh |  150 +++++++++++++++++++++-
 3 files changed, 367 insertions(+), 8 deletions(-)

diff --git a/gitweb/INSTALL b/gitweb/INSTALL
index 18c9ce3..223e39e 100644
--- a/gitweb/INSTALL
+++ b/gitweb/INSTALL
@@ -123,6 +123,10 @@ GITWEB_CONFIG file:
 	$feature{'snapshot'}{'default'} = ['zip', 'tgz'];
 	$feature{'snapshot'}{'override'} = 1;
 
+	$feature{'committags'}{'default'} = ['sha1', 'url', 'bugzilla'];
+	$feature{'committags'}{'override'} = 1;
+
+
 
 Gitweb repositories
 -------------------
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 1e7e2d8..c66fdf3 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -195,6 +195,81 @@ our %known_snapshot_format_aliases = (
 	'x-zip' => undef, '' => undef,
 );
 
+# Could call these something else besides committags... embellishments,
+# patterns, rewrite rules, ?
+#
+# In general, the site admin can enable/disable per-project configuration
+# of each committag.  Only the 'options' part of the committag is configurable
+# per-project.
+#
+# The site admin can of course add new tags to this hash or override the
+# 'sub' key if necessary.  But such changes may be fragile; this is not
+# designed as a full-blown plugin architecture.
+our %committags = (
+	# Link Git-style hashes to this gitweb
+	'sha1' => {
+		'options' => {
+			'pattern' => qr/\b([0-9a-fA-F]{8,40})\b/,
+		},
+		'override' => 0,
+		'sub' => sub {
+			my ($opts, @match) = @_;
+			\$cgi->a({-href => href(action=>"object", hash=>$match[1]),
+			          -class => "text"}, esc_html($match[0], -nbsp=>1));
+		},
+	},
+	# Link bug/features to Mantis bug tracker using Mantis-style contextual cues
+	'mantis' => {
+		'options' => {
+			'pattern' => qr/(?:BUG|FEATURE)\((\d+)\)/,
+			'url' => 'http://bugs.xmms2.xmms.se/view.php?id=',
+		},
+		'override' => 0,
+		'sub' => \&hyperlink_committag,
+	},
+	# Link mentions of bug IDs to bugzilla
+	'bugzilla' => {
+		'options' => {
+			'pattern' => qr/bug\s+(\d+)/,
+			'url' => 'http://bugzilla.kernel.org/show_bug.cgi?id=',
+		},
+		'override' => 0,
+		'sub' => \&hyperlink_committag,
+	},
+	# Link URLs
+	'url' => {
+		'options' => {
+			# Avoid matching punctuation that might immediately follow
+			# a url, is not part of the url, and is allowed in urls,
+			# like a full-stop ('.').
+			'pattern' => qr!(http|ftp)s?://[-_a-zA-Z0-9\@/&=+~#<>;%:.?]+
+			                               [-_a-zA-Z0-9\@/&=+~#<>]!x,
+		},
+		'override' => 0,
+		'sub' => sub {
+			my ($opts, @match) = @_;
+			return
+				\$cgi->a({-href => $match[0],
+				          -class => "text"},
+				         esc_html($match[0], -nbsp=>1));
+		},
+	},
+	# Link Message-Id to mailing list archive
+	'messageid' => {
+		'options' => {
+			# The original pattern, which I don't really understand
+			#'pattern' => qr!(?:message|msg)-id:?\s+<([^>]+)>;!i,
+			'pattern' => qr!(?:message|msg)-?id:?\s+(<[^>]+>)!i,
+			'url' => 'http://news.gmane.org/find-root.php?message_id=',
+		},
+		'override' => 0,
+		# The original version didn't include the "msg-id" text in the
+		# link text, but this does.  In general, I think a little more
+		# context makes for better link text.
+		'sub' => \&hyperlink_committag,
+	},
+);
+
 # You define site-wide feature defaults here; override them with
 # $GITWEB_CONFIG as necessary.
 our %feature = (
@@ -365,6 +440,21 @@ our %feature = (
 		'sub' => \&feature_patches,
 		'override' => 0,
 		'default' => [16]},
+
+	# The selection and ordering of committags that are enabled.
+	# Committag transformations will be applied to commit log messages
+	# in this order if listed here.
+
+	# To disable system wide have in $GITWEB_CONFIG
+	# $feature{'committags'}{'default'} = [];
+	# To have project specific config enable override in $GITWEB_CONFIG
+	# $feature{'committags'}{'override'} = 1;
+	# and in project config gitweb.committags = sha1, url, bugzilla
+	# to enable those three committags for that project
+	'committags' => {
+		'sub' => \&feature_committags,
+		'override' => 0,
+		'default' => ['sha1']},
 );
 
 sub gitweb_get_feature {
@@ -433,6 +523,18 @@ sub feature_patches {
 	return ($_[0]);
 }
 
+sub feature_committags {
+	my (@defaults) = @_;
+
+	my ($cfg) = git_get_project_config('committags');
+
+	if ($cfg) {
+		return ($cfg eq 'none' ? () : split(/\s*[,\s]\s*/, $cfg));
+	}
+
+	return @defaults;
+}
+
 # checking HEAD file with -e is fragile if the repository was
 # initialized long time ago (i.e. symlink HEAD) and was pack-ref'ed
 # and then pruned.
@@ -814,6 +916,34 @@ $git_dir = "$projectroot/$project" if $project;
 our @snapshot_fmts = gitweb_get_feature('snapshot');
 @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
 
+# ordering of committags
+our @committags = gitweb_get_feature('committags');
+
+# Merge project configs with default committag definitions
+gitweb_load_project_committags();
+
+# Load committag configs from the repository config file and and
+# incorporate them into the gitweb defaults where permitted by the
+# site administrator.
+sub gitweb_load_project_committags {
+	return if (!$git_dir);
+	my %project_config = ();
+	my %raw_config = git_parse_project_config('gitweb\.committag');
+	foreach my $key (keys(%raw_config)) {
+		next if ($key !~ /gitweb\.committag\.[^.]+\.[^.]/);
+		my ($gitweb_prefix, $committag_prefix, $ctname, $option) =
+			split(/\./, $key, 4);
+		$project_config{$ctname}{$option} = $raw_config{$key};
+	}
+	foreach my $ctname (keys(%committags)) {
+		next if (!$committags{$ctname}{'override'});
+		foreach my $optname (keys %{$project_config{$ctname}}) {
+			$committags{$ctname}{'options'}{$optname} =
+				$project_config{$ctname}{$optname};
+		}
+	}
+}
+
 # dispatch
 if (!defined $action) {
 	if (defined $hash) {
@@ -1384,13 +1514,92 @@ sub file_type_long {
 sub format_log_line_html {
 	my $line = shift;
 
-	$line = esc_html($line, -nbsp=>1);
-	$line =~ s{\b([0-9a-fA-F]{8,40})\b}{
-		$cgi->a({-href => href(action=>"object", hash=>$1),
-					-class => "text"}, $1);
-	}eg;
+	# In this list of log message fragments, a string ref indicates HTML,
+	# and a string indicates plain text
+	my @list = ( $line );
 
-	return $line;
+COMMITTAG:
+	foreach my $ctname (@committags) {
+		next COMMITTAG unless exists $committags{$ctname};
+		my $committag = $committags{$ctname};
+
+		next COMMITTAG unless exists $committag->{'options'};
+		my $opts = $committag->{'options'};
+
+		next COMMITTAG unless exists $opts->{'pattern'};
+		my $pattern = $opts->{'pattern'};
+
+		my @newlist = ();
+
+	PART:
+		foreach my $part (@list) {
+			next PART if $part eq "";
+			if (ref($part)) {
+				push @newlist, $part;
+				next PART;
+			}
+
+			my $oldpos = 0;
+
+		MATCH:
+			while ($part =~ m/$pattern/gc) {
+				my ($prepos, $postpos) = ($-[0], $+[0]);
+				my $repl = $committag->{'sub'}->($opts, $&, $1);
+				$repl = "" if (!defined $repl);
+
+				my $pre = substr($part, $oldpos, $prepos - $oldpos);
+				push_or_append(\@newlist, $pre);
+				push_or_append(\@newlist, $repl);
+
+				$oldpos = $postpos;
+			} # end while [regexp matches]
+
+			my $rest = substr($part, $oldpos);
+			push_or_append(\@newlist, $rest);
+
+		} # end foreach (@list)
+
+		@list = @newlist;
+	} # end foreach (@committags)
+
+	# Escape any remaining plain text and concatenate
+	my $html = '';
+	for my $part (@list) {
+		if (ref($part)) {
+			$html .= $$part;
+		} else {
+			$html .= esc_html($part, -nbsp=>1);
+		}
+	}
+
+	return $html;
+}
+
+# Returns a ref to an HTML snippet that links the second
+# parameter to a URL formed from the first and last parameters.
+# This is a helper function used in %committags.
+sub hyperlink_committag {
+	my ($opts, @match) = @_;
+	return
+		\$cgi->a({-href => $opts->{url} . CGI::escape($match[1]),
+				  -class => "text"},
+				 esc_html($match[0], -nbsp=>1));
+}
+
+
+sub push_or_append (\@@) {
+	my $list = shift;
+
+	if (ref $_[0] || ! @$list || ref $list->[-1]) {
+		push @$list, @_;
+	} else {
+		my $a = pop @$list;
+		my $b = shift @_;
+
+		push @$list, $a . $b, @_;
+	}
+	# imitate push
+	return scalar @$list;
 }
 
 # format marker of refs pointing to given object
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index d539619..37a127c 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -55,9 +55,9 @@ gitweb_run () {
 	# some of git commands write to STDERR on error, but this is not
 	# written to web server logs, so we are not interested in that:
 	# we are interested only in properly formatted errors/warnings
-	rm -f gitweb.log &&
+	rm -f resp.http gitweb.log &&
 	perl -- "$SCRIPT_NAME" \
-		>/dev/null 2>gitweb.log &&
+		> resp.http 2>gitweb.log &&
 	if grep "^[[]" gitweb.log >/dev/null 2>&1; then false; else true; fi
 
 	# gitweb.log is left for debugging
@@ -702,4 +702,150 @@ test_expect_success \
 	 gitweb_run "p=.git;a=summary"'
 test_debug 'cat gitweb.log'
 
+# ----------------------------------------------------------------------
+# sha1 linking
+#
+echo hi > file.txt
+git add file.txt
+git commit -q -F - file.txt <<END
+Summary
+
+See also commit 567890ab
+END
+test_expect_success 'sha1 link: enabled by default' '
+	h=$(git rev-parse --verify HEAD) &&
+	gitweb_run "p=.git;a=commit;h=$h" &&
+	grep -q \
+		"commit&nbsp;<a class=\"text\" href=\".*\">567890ab</a>" \
+		resp.http
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 567890ab resp.http'
+
+# ----------------------------------------------------------------------
+# bugzilla commit tag
+#
+
+echo foo > file.txt
+git add file.txt
+git commit -q -F - file.txt <<END
+Fix foo
+
+Fixes bug 1234 involving foo.
+END
+git config gitweb.committags 'sha1, bugzilla'
+test_expect_success 'bugzilla: enabled but not permitted' '
+	h=$(git rev-parse --verify HEAD) &&
+	gitweb_run "p=.git;a=commit;h=$h" &&
+	grep -F -q \
+		"Fixes&nbsp;bug&nbsp;1234&nbsp;involving" \
+		resp.http
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 1234 resp.http'
+
+echo '$feature{"committags"}{"override"} = 1;' >> gitweb_config.perl
+test_expect_success 'bugzilla: enabled' '
+	h=$(git rev-parse --verify HEAD) &&
+	gitweb_run "p=.git;a=commit;h=$h" &&
+	grep -F -q \
+		"Fixes&nbsp;<a class=\"text\" href=\"http://bugzilla.kernel.org/show_bug.cgi?id=1234\">bug&nbsp;1234</a>&nbsp;involving" \
+		resp.http
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 1234 resp.http'
+
+git config gitweb.committag.bugzilla.url 'http://bts.example.com?bug='
+test_expect_success 'bugzilla: url overridden but not permitted' '
+	h=$(git rev-parse --verify HEAD) &&
+	gitweb_run "p=.git;a=commit;h=$h" &&
+	grep -F -q \
+		"Fixes&nbsp;<a class=\"text\" href=\"http://bugzilla.kernel.org/show_bug.cgi?id=1234\">bug&nbsp;1234</a>&nbsp;involving" \
+		resp.http
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 1234 resp.http'
+
+echo '$committags{"bugzilla"}{"override"} = 1;' >> gitweb_config.perl
+test_expect_success 'bugzilla: url overridden' '
+	h=$(git rev-parse --verify HEAD) &&
+	gitweb_run "p=.git;a=commit;h=$h" &&
+	grep -F -q \
+		"Fixes&nbsp;<a class=\"text\" href=\"http://bts.example.com?bug=1234\">bug&nbsp;1234</a>&nbsp;involving" \
+		resp.http
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 1234 resp.http'
+
+git config gitweb.committag.bugzilla.pattern 'Fixes bug (\d+)'
+test_expect_success 'bugzilla: pattern overridden' '
+	h=$(git rev-parse --verify HEAD) &&
+	gitweb_run "p=.git;a=commit;h=$h" &&
+	grep -F -q \
+		"<a class=\"text\" href=\"http://bts.example.com?bug=1234\">Fixes&nbsp;bug&nbsp;1234</a>&nbsp;involving" \
+		resp.http
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 1234 resp.http'
+git config --unset gitweb.committag.bugzilla.pattern
+
+test_expect_success 'bugzilla: affects log view too' '
+	gitweb_run "p=.git;a=log" &&
+	grep -F -q \
+		"<a class=\"text\" href=\"http://bts.example.com?bug=1234\">bug&nbsp;1234</a>" \
+		resp.http
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 1234 resp.http'
+
+# ----------------------------------------------------------------------
+# url linking
+#
+echo url_test > file.txt
+git add file.txt
+url='http://user@pass:example.com/foo.html?u=v&x=y#z'
+url_esc="$(echo "$url" | sed 's/&/&amp;/g')"
+git commit -q -F - file.txt <<END
+Summary
+
+See also $url.
+END
+echo '$feature{"committags"}{"override"} = 1;' >> gitweb_config.perl
+git config gitweb.committags 'sha1, url'
+test_expect_success 'url link: links when enabled' '
+	h=$(git rev-parse --verify HEAD) &&
+	gitweb_run "p=.git;a=commit;h=$h" &&
+	grep -q -F \
+		"See&nbsp;also&nbsp;<a class=\"text\" href=\"$url_esc\">$url_esc</a>." \
+		resp.http
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep -F "$url" resp.http'
+
+# ----------------------------------------------------------------------
+# message id linking
+#
+echo msgid_test > file.txt
+git add file.txt
+url='http://news.gmane.org/find-root.php?message_id='
+msgid='<x@y.z>'
+msgid_esc="$(echo "$msgid" | sed 's/</\&lt;/g; s/>/\&gt;/g')"
+msgid_url="$url$(echo "$msgid" | sed 's/</%3C/g; s/@/%40/g; s/>/%3E/g')"
+git commit -q -F - file.txt <<END
+Summary
+
+See msg-id $msgid.
+END
+echo '$feature{"committags"}{"override"} = 1;' >> gitweb_config.perl
+git config gitweb.committags 'sha1, messageid'
+test_expect_success 'msgid link: linked when enabled' '
+	h=$(git rev-parse --verify HEAD) &&
+	gitweb_run "p=.git;a=commit;h=$h" &&
+	grep -q -F \
+		"See&nbsp;<a class=\"text\" href=\"$msgid_url\">msg-id&nbsp;$msgid_esc</a>." \
+		resp.http
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep -F "y.z" resp.http'
+
 test_done
-- 
1.6.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH 2/2] gitweb: Add second-stage matching of bug IDs in bugzilla committag
  2009-02-18  3:38       ` [RFC] Configuring (future) committags support in gitweb, especially bug linking Jakub Narebski
  2009-02-19 17:08         ` Marcel M. Cary
  2009-06-19 14:13         ` [RFC PATCH 1/2] gitweb: Hyperlink various committags in commit message with regex Marcel M. Cary
@ 2009-06-19 14:13         ` Marcel M. Cary
  2 siblings, 0 replies; 33+ messages in thread
From: Marcel M. Cary @ 2009-06-19 14:13 UTC (permalink / raw)
  To: Jakub Narebski, git
  Cc: Petr Baudis, Giuseppe Bilotta, Francis Galiegue, Marcel M. Cary

Match Bugzilla bug IDs with two regexes instead of one.  The first is
a pre-filter that allows easy matching of multiple bug IDs on the same
line, and the second easily picks out the individual big IDs for
hyperlinking in the context of the first regex.

For example, it would help in matching these and hyperlinking each ID
individually.

	[#1234, #1235]
	Resolves-bug: 1234, 1235
	bugs 1234, 1235, and 1236

Maybe there's a better naming scheme for the two patterns?

Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
---
 gitweb/gitweb.perl                     |   61 ++++++++++++++++++++++----------
 t/t9500-gitweb-standalone-no-errors.sh |   18 +++++++++
 2 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c66fdf3..47c8cd5 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -227,14 +227,28 @@ our %committags = (
 		'override' => 0,
 		'sub' => \&hyperlink_committag,
 	},
-	# Link mentions of bug IDs to bugzilla
+	# Link mentions of bugs to bugzilla, allowing for separate outer
+	# and inner regexes (see unit test for example)
 	'bugzilla' => {
 		'options' => {
 			'pattern' => qr/bug\s+(\d+)/,
+			'innerpattern' => undef,
 			'url' => 'http://bugzilla.kernel.org/show_bug.cgi?id=',
 		},
 		'override' => 0,
-		'sub' => \&hyperlink_committag,
+		'sub' => sub {
+			my ($opts, @match) = @_;
+			if (defined($opts->{'innerpattern'})) {
+				my @list = ();
+				push_or_append_replacements(\@list, $opts->{innerpattern},
+				                            $match[0], sub {
+						return hyperlink_committag($opts, @_);
+					});
+				return @list;
+			} else {
+				return hyperlink_committag(@_);
+			}
+		},
 	},
 	# Link URLs
 	'url' => {
@@ -1539,23 +1553,9 @@ COMMITTAG:
 				next PART;
 			}
 
-			my $oldpos = 0;
-
-		MATCH:
-			while ($part =~ m/$pattern/gc) {
-				my ($prepos, $postpos) = ($-[0], $+[0]);
-				my $repl = $committag->{'sub'}->($opts, $&, $1);
-				$repl = "" if (!defined $repl);
-
-				my $pre = substr($part, $oldpos, $prepos - $oldpos);
-				push_or_append(\@newlist, $pre);
-				push_or_append(\@newlist, $repl);
-
-				$oldpos = $postpos;
-			} # end while [regexp matches]
-
-			my $rest = substr($part, $oldpos);
-			push_or_append(\@newlist, $rest);
+			push_or_append_replacements(\@newlist, $opts->{'pattern'}, $part, sub {
+					$committag->{'sub'}->($opts, @_);
+				});
 
 		} # end foreach (@list)
 
@@ -1586,6 +1586,29 @@ sub hyperlink_committag {
 				 esc_html($match[0], -nbsp=>1));
 }
 
+# Find $pattern in string $part, and push_or_append the parts between
+# matches and the result of calling $sub with matched text to $newlist.
+sub push_or_append_replacements {
+	my ($newlist, $pattern, $part, $sub) = @_;
+
+	my $oldpos = 0;
+
+MATCH:
+	while ($part =~ m/$pattern/gc) {
+		my ($prepos, $postpos) = ($-[0], $+[0]);
+
+		my @repl = $sub->($&, $1);
+
+		my $pre = substr($part, $oldpos, $prepos - $oldpos);
+		push_or_append($newlist, $pre);
+		push_or_append($newlist, @repl);
+
+		$oldpos = $postpos;
+	} # end while [regexp matches]
+
+	my $rest = substr($part, $oldpos);
+	push_or_append($newlist, $rest);
+}
 
 sub push_or_append (\@@) {
 	my $list = shift;
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 37a127c..573f03c 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -798,6 +798,24 @@ test_expect_success 'bugzilla: affects log view too' '
 test_debug 'cat gitweb.log'
 test_debug 'grep 1234 resp.http'
 
+echo hello > file.txt
+git add file.txt
+git commit -q -F - file.txt <<END
+[#123,#45] This commit fixes two bugs involving bar and baz.
+END
+git config gitweb.committag.bugzilla.pattern       '^\[#\d+(,(&nbsp;)?#\d+)\]'
+git config gitweb.committag.bugzilla.innerpattern  '#(\d+)'
+git config gitweb.committag.bugzilla.url           'http://bugs/'
+test_expect_success 'bugzilla: override everything, use fancier url format' '
+	h=$(git rev-parse --verify HEAD) &&
+	gitweb_run "p=.git;a=commit;h=$h" &&
+	grep -F -q \
+		"[<a class=\"text\" href=\"http://bugs/123\">#123</a>,<a class=\"text\" href=\"http://bugs/45\">#45</a>]" \
+		resp.http
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 123 resp.http'
+
 # ----------------------------------------------------------------------
 # url linking
 #
-- 
1.6.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] gitweb: Hyperlink various committags in commit message with regex
  2009-06-19 14:13         ` [RFC PATCH 1/2] gitweb: Hyperlink various committags in commit message with regex Marcel M. Cary
@ 2009-06-22 11:18           ` Jakub Narebski
  2009-11-18  6:22             ` [RFC PATCH 0/6] Second round of committag series Marcel M. Cary
  0 siblings, 1 reply; 33+ messages in thread
From: Jakub Narebski @ 2009-06-22 11:18 UTC (permalink / raw)
  To: Marcel M. Cary; +Cc: git, Petr Baudis, Giuseppe Bilotta, Francis Galiegue

On Fri, 19 June 2009, Marcel M. Cary wrote:

Thanks for diligently working on this issue.  Good work!

I see that it is an RFC, and not final submission, but just in case
I'd like to remind you that some of information below should go into
commit message, but some of it should I think go to comments (between
"---" and diffstat).

> I want gitweb to hyperlink commits to my bug tracking system so that
> information regarding the current status of a commit can be easily
> cross-referenced.  For example, the QA and release status of a commit
> cannot be inserted into the comment.  Maybe someday a "git notes"
> feature will help with this, but for now, my organization has a
> separate bug tracking system.  Other repository browsers such as
> unfuddle and websvn support similar features.

The paragraph above should, I think, be made more clear.  You don't
need to mention what you don't do; the comment about "git notes" should
be not in commit message but in comments section.

What you want to have is to have some markers in commit message 
(committags) hyperlinked; namely you want notifications about bug/issue
numbers in the commit message hyperlinked to appropriate bugtracker/issue
tracker URL.  Do I understand this correctly?

I tried here to reword what you said, to come up with better commit
message for a future final submission.

> 
> Since the bug hyperlinking feature was previously discussed as part of
> "committags," a more general mechanism to embellish commit messages,
> implement the more general mechanism instead, including the following
> capabilities:

Well, I think that the fact that it would be not much harder to create
general mechanism for commit message transformation, than to add 
suitably generic and well customizable support for bugtracker 
'committags'.

> 
> * Hyperlinking mentions of bug IDs to Bugzilla
> * Hyperlinking URLs
> * Hyperlinking Message-Ids to a mailing list archive
> * Hyperlinking commit hashes as before by default, now with a
>   configurable regex
> * Defining new committags per gitweb installation

Well, there is one _implicit_ (but important) commit message 
transformation (filter) for display, which has to be always present[1],
namely HTML escaping.  We make use of the fact that you can do
HTML escaping before doing the only currently supported committag, 
namely hyperlinking (shortened) SHA-1 to 'object' gitweb URL, but for
other committags like mentioned "Message-Id to mail archive" committag
(filter) it would make them more difficult.

Also one might consider vertical whitespace simplification (removing
leading empty lines, compacting empty lines to single empty line 
between paragraphs), and syntax highlighting signoff lines to be
a kind of commit message filter like mentioned above committags
(see git_print_log() subroutine).

Although probably vertical whitespace simplification should be not
made into commit message filter, as it is used not for all views.

> 
> Since different repositories may use different bug tracking systems or
> mailing list archives, the URL parameter may be configured
> per-repository without reiterating the regexes.  To accomodate
> different conventions, regexes may also be configured per-project.

Also list of supported committags is separated from the list of 
committags used[1]; just like it is done for snapshot formats.
This could be mentioned in final commit message.

[1] well, sequence rather than list in this case, as here
    ordering does matter a bit

> 
> This patch is heavily based on discussions and code samples from the
> Git list:
> 
> 	[RFC/PATCH] gitweb: Add committags support, Sep 2006
> 	http://thread.gmane.org/gmane.comp.version-control.git/27504
> 
> 	[RFC] gitweb: Add committags support (take 2), Dec 2006
> 	http://thread.gmane.org/gmane.comp.version-control.git/33150
> 
> 	[RFC] Configuring (future) committags support in gitweb, Nov 2008
> 	http://thread.gmane.org/gmane.comp.version-control.git/100415

Hmmm... should this be put in final commit message, or only in comment
to the patch (should this be in commit history of git repository)?

> 
> Some issues I considered but punted:
> 
> * Should this configuration try to follow the bugtraq spec?
> 
>   As far as I know, only subversion implements it.  Separation of
>   regexes by a newline would be a little awkward in the git config.
>   And it is broader than just hyperlinking bugs: it also encompasses
>   GUI bug ID form fields.  So gitweb would only implement a subset.

I didn't even know that there is such spec.  Were you talking about
http://tortoisesvn.net/issuetracker_integration or do you have different
URL in mind?

>   The gitweb configuration mechanism currently only reads
>   keys starting with "gitweb.", but these parameters would be more
>   broadly applicable, potentially to git-gui, for example.

Actually the fact that gitweb reads only keys in the 'gitweb' section
from config is just a convention.  There were (are) no config variables
in other places (other sections) which would be of interest to gitweb.

> 
>   However, it *would* be useful for Git tools to standardize on
>   config keys and interpretations of regexes and url formats.  For
>   example, git-gui might be able to hyperlink the same text as gitweb,
>   and even show a separate bugID field when composing a commit
>   message.

This is I think a very good idea... but I think idea which 
implementation can be left for later.

> 
> * I would prefer the regex match against the whole commit message.
> 
>   This would allow the regex to insist that a bug reference occur
>   on the first line or non-first line of the commit message.  However,
>   even if we concatenated the log lines for the first committag,
>   subsequent committags would see the text broken up.
> 
>   Also, it would allow the regex to match a phrase split across a
>   line boundary, as dicussed at some length in the first thread,
>   but again, only if no prior committags had interfered.
> 
>   This could happen in a later patch.

Well, the change should be fairly easy: just concatenate lines before
passing them as single element list to commit message filters.  OTOH
you would have to take care of end of line characters in committags
regexps.

> 
> * I would prefer the site admin have a way to let a repository
>   owner define new committags, which means having a way to specify
>   the 'sub' key from the repo config or having a flexible default.

Perhaps, following your earlier suggestion to make committags supported
also by other tools, gitweb (and e.g. git-gui / gitk) use config 
variable committag.<name>.<key> (where <key> can be 'pattern' or 'url';
although I wonder if we can allow 'pattern' as malicious user can do
a DoS attack against gitweb / server using badly behaved regexp).

> 
> The bugtraq and some of the regex questions must be decided now to
> avoid breaking gitweb configs later.

True.

> 
> Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
> ---
>  gitweb/INSTALL                         |    4 +
>  gitweb/gitweb.perl                     |  221 +++++++++++++++++++++++++++++++-
>  t/t9500-gitweb-standalone-no-errors.sh |  150 +++++++++++++++++++++-
>  3 files changed, 367 insertions(+), 8 deletions(-)
> 
> diff --git a/gitweb/INSTALL b/gitweb/INSTALL
> index 18c9ce3..223e39e 100644
> --- a/gitweb/INSTALL
> +++ b/gitweb/INSTALL
> @@ -123,6 +123,10 @@ GITWEB_CONFIG file:
>  	$feature{'snapshot'}{'default'} = ['zip', 'tgz'];
>  	$feature{'snapshot'}{'override'} = 1;
>  
> +	$feature{'committags'}{'default'} = ['sha1', 'url', 'bugzilla'];
> +	$feature{'committags'}{'override'} = 1;
> +
> +
>  
>  Gitweb repositories
>  -------------------
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 1e7e2d8..c66fdf3 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -195,6 +195,81 @@ our %known_snapshot_format_aliases = (
>  	'x-zip' => undef, '' => undef,
>  );
>  

I understand that comments such as one below would be not present in
a final submission, and they are here to provide running commentary for
code, isn't it?

> +# Could call these something else besides committags... embellishments,
> +# patterns, rewrite rules, ?

They are "commit filters", or "commit message filters" (or 'formatters',
or 'processors'; they are not 'parsers').  

The name 'committag' was first introduced as far as I remember in xmms2
fork of gitweb (in old times when gitweb was separate project, and not
part of git repository).

> +#
> +# In general, the site admin can enable/disable per-project configuration
> +# of each committag.  Only the 'options' part of the committag is configurable
> +# per-project.

See above caveat about allowing to customize 'regexp'/'pattern' part
in untrusted environment; you can construct regexp which has exponential
behavior.

> +#
> +# The site admin can of course add new tags to this hash or override the
> +# 'sub' key if necessary.  But such changes may be fragile; this is not
> +# designed as a full-blown plugin architecture.
> +our %committags = (


You should put the comments here about supported keys, similar to the
one for %known_snapshot_formats and %feature hashes.

> +	# Link Git-style hashes to this gitweb
> +	'sha1' => {
> +		'options' => {
> +			'pattern' => qr/\b([0-9a-fA-F]{8,40})\b/,
> +		},
> +		'override' => 0,

Shouldn't 'override' key be better last?

> +		'sub' => sub {
> +			my ($opts, @match) = @_;
> +			\$cgi->a({-href => href(action=>"object", hash=>$match[1]),
> +			          -class => "text"}, esc_html($match[0], -nbsp=>1));
> +		},

Style: although there is commonly used idiom to use 'sub { <expr>; }'
for a wrapper subroutines (e.g. 'sub { [] }' in Moose examples), one
should use explicit "return" statement instead of relying on Perl 
behavior of returning last statement in a block.

See Perl::Critic::Policy::Subroutines::RequireFinalReturn policy in
Perl::Critic (perlcritic.com).  "Perl Best Practices" says:

  Subroutines without explicit 'return' statements at their ends can be
  confusing. It can be challenging to deduce what the return value will be.

> +	},
> +	# Link bug/features to Mantis bug tracker using Mantis-style contextual cues
> +	'mantis' => {
> +		'options' => {
> +			'pattern' => qr/(?:BUG|FEATURE)\((\d+)\)/,
> +			'url' => 'http://bugs.xmms2.xmms.se/view.php?id=',

I don't think we want to put such URL here.  Please check if Mantis 
documentation uses some specific links, or follow RFC conventions and
use 'example.com' as hostname (e.g. 'bugs.example.com').

By the way the bugtraq proposal you mentioned uses placeholder in URL
for putting issue number (%BUGID%).  Perhaps gitweb should do the same
here.

> +		},
> +		'override' => 0,
> +		'sub' => \&hyperlink_committag,
> +	},
> +	# Link mentions of bug IDs to bugzilla
> +	'bugzilla' => {
> +		'options' => {
> +			'pattern' => qr/bug\s+(\d+)/,
> +			'url' => 'http://bugzilla.kernel.org/show_bug.cgi?id=',

The same comment as above.

> +		},
> +		'override' => 0,
> +		'sub' => \&hyperlink_committag,
> +	},
> +	# Link URLs
> +	'url' => {
> +		'options' => {
> +			# Avoid matching punctuation that might immediately follow
> +			# a url, is not part of the url, and is allowed in urls,
> +			# like a full-stop ('.').
> +			'pattern' => qr!(http|ftp)s?://[-_a-zA-Z0-9\@/&=+~#<>;%:.?]+
> +			                               [-_a-zA-Z0-9\@/&=+~#<>]!x,

If you took this regexp from some place (like blog), it would be good
to mention URL here, to be able to check more detailed explanation of
construction of this URL-catching regexp.

Should we also support irc://, nntp:// (pseudo)protocols? What about
git:// ?

> +		},
> +		'override' => 0,
> +		'sub' => sub {
> +			my ($opts, @match) = @_;
> +			return
> +				\$cgi->a({-href => $match[0],
> +				          -class => "text"},
> +				         esc_html($match[0], -nbsp=>1));
> +		},

Here you use explicit return.

> +	},
> +	# Link Message-Id to mailing list archive
> +	'messageid' => {
> +		'options' => {
> +			# The original pattern, which I don't really understand
> +			#'pattern' => qr!(?:message|msg)-id:?\s+<([^>]+)>;!i,
> +			'pattern' => qr!(?:message|msg)-?id:?\s+(<[^>]+>)!i,

Errr... how original patter is different from the one used?  Also above
comment should be removed in final submission.

> +			'url' => 'http://news.gmane.org/find-root.php?message_id=',

Same comment about generic URL... although on the other hand perhaps
having a few examples of mail archive sites which support finding 
messages by Message-Id could be a good idea.

BTW. you can write 'http://mid.gmane.org/' instead...

> +		},
> +		'override' => 0,
> +		# The original version didn't include the "msg-id" text in the
> +		# link text, but this does.  In general, I think a little more
> +		# context makes for better link text.

I guess that is the result of using generic hyperlink_committag() 
subroutine here.  (This comment should be removed or reworded in final
submitted version, I think.)

BTW. it would be much easier with Perl6-ish (or Perl 5.10.x) named
captures (named groups):

	'pattern' => qr!(?:message|msg)-?id:?\s+(?P<query><[^>]+>)!i,

or something like that.

> +		'sub' => \&hyperlink_committag,
> +	},
> +);
> +
>  # You define site-wide feature defaults here; override them with
>  # $GITWEB_CONFIG as necessary.
>  our %feature = (
> @@ -365,6 +440,21 @@ our %feature = (
>  		'sub' => \&feature_patches,
>  		'override' => 0,
>  		'default' => [16]},
> +
> +	# The selection and ordering of committags that are enabled.
> +	# Committag transformations will be applied to commit log messages
> +	# in this order if listed here.

/this/given/

You need to mention somewhere that committag subroutines return a list
of mixed scalar and reference to scalar elements, where using reference
to scalar removes value from the chain of filters (including implicit
final esc_html filter).

> +
> +	# To disable system wide have in $GITWEB_CONFIG
> +	# $feature{'committags'}{'default'} = [];
> +	# To have project specific config enable override in $GITWEB_CONFIG
> +	# $feature{'committags'}{'override'} = 1;
> +	# and in project config gitweb.committags = sha1, url, bugzilla
> +	# to enable those three committags for that project

Just a thought: perhaps we should provide support for 'default' in
config (which would currently be "sha1" or "sha1, url").

See also comment text for 'snapshot' feature, which says:

  and in project config, a comma-separated list of [...] or 
  "none" to disable.

> +	'committags' => {
> +		'sub' => \&feature_committags,
> +		'override' => 0,
> +		'default' => ['sha1']},
>  );
>  
>  sub gitweb_get_feature {
> @@ -433,6 +523,18 @@ sub feature_patches {
>  	return ($_[0]);
>  }
>  
> +sub feature_committags {
> +	my (@defaults) = @_;
> +
> +	my ($cfg) = git_get_project_config('committags');
> +
> +	if ($cfg) {
> +		return ($cfg eq 'none' ? () : split(/\s*[,\s]\s*/, $cfg));
> +	}
> +
> +	return @defaults;
> +}

As this would be second feature which uses comma-separated (or for
backward compatibility space separated) list of options, perhaps
we should factor out this part into common helper subroutine named
for example 'feature_list' or 'feature_multi' (like 'feature_bool').

> +
>  # checking HEAD file with -e is fragile if the repository was
>  # initialized long time ago (i.e. symlink HEAD) and was pack-ref'ed
>  # and then pruned.
> @@ -814,6 +916,34 @@ $git_dir = "$projectroot/$project" if $project;
>  our @snapshot_fmts = gitweb_get_feature('snapshot');
>  @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
>  
> +# ordering of committags
> +our @committags = gitweb_get_feature('committags');
> +
> +# Merge project configs with default committag definitions
> +gitweb_load_project_committags();

Good idea... although gitweb first defines and then uses subroutine,
see evaluate_path_info().

> +
> +# Load committag configs from the repository config file and and
> +# incorporate them into the gitweb defaults where permitted by the
> +# site administrator.
> +sub gitweb_load_project_committags {
> +	return if (!$git_dir);
> +	my %project_config = ();
> +	my %raw_config = git_parse_project_config('gitweb\.committag');

Why not do lazy-loading of a whole config here?  We use committag
info only for project-specific actions in gitweb.

> +	foreach my $key (keys(%raw_config)) {
> +		next if ($key !~ /gitweb\.committag\.[^.]+\.[^.]/);
> +		my ($gitweb_prefix, $committag_prefix, $ctname, $option) =
> +			split(/\./, $key, 4);
> +		$project_config{$ctname}{$option} = $raw_config{$key};
> +	}

And use created subroutines to handle config?

> +	foreach my $ctname (keys(%committags)) {
> +		next if (!$committags{$ctname}{'override'});
> +		foreach my $optname (keys %{$project_config{$ctname}}) {
> +			$committags{$ctname}{'options'}{$optname} =
> +				$project_config{$ctname}{$optname};
> +		}
> +	}
> +}
> +
>  # dispatch
>  if (!defined $action) {
>  	if (defined $hash) {
> @@ -1384,13 +1514,92 @@ sub file_type_long {
>  sub format_log_line_html {
>  	my $line = shift;
>  
> -	$line = esc_html($line, -nbsp=>1);
> -	$line =~ s{\b([0-9a-fA-F]{8,40})\b}{
> -		$cgi->a({-href => href(action=>"object", hash=>$1),
> -					-class => "text"}, $1);
> -	}eg;
> +	# In this list of log message fragments, a string ref indicates HTML,
> +	# and a string indicates plain text
> +	my @list = ( $line );

Well, to be more exact string ref means that the string referenced is
not to be processed by later filters, including final implicit esc_html.

Perhaps it would be better to use less generic name than @list herem
e.g. @process or something?

>  
> -	return $line;
> +COMMITTAG:
> +	foreach my $ctname (@committags) {
> +		next COMMITTAG unless exists $committags{$ctname};
> +		my $committag = $committags{$ctname};
> +
> +		next COMMITTAG unless exists $committag->{'options'};
> +		my $opts = $committag->{'options'};
> +
> +		next COMMITTAG unless exists $opts->{'pattern'};
> +		my $pattern = $opts->{'pattern'};
> +
> +		my @newlist = ();
> +
> +	PART:
> +		foreach my $part (@list) {
> +			next PART if $part eq "";
> +			if (ref($part)) {
> +				push @newlist, $part;
> +				next PART;
> +			}
> +
> +			my $oldpos = 0;
> +
> +		MATCH:
> +			while ($part =~ m/$pattern/gc) {
> +				my ($prepos, $postpos) = ($-[0], $+[0]);
> +				my $repl = $committag->{'sub'}->($opts, $&, $1);
> +				$repl = "" if (!defined $repl);
> +
> +				my $pre = substr($part, $oldpos, $prepos - $oldpos);
> +				push_or_append(\@newlist, $pre);
> +				push_or_append(\@newlist, $repl);
> +
> +				$oldpos = $postpos;
> +			} # end while [regexp matches]
> +
> +			my $rest = substr($part, $oldpos);
> +			push_or_append(\@newlist, $rest);
> +
> +		} # end foreach (@list)
> +
> +		@list = @newlist;
> +	} # end foreach (@committags)
> +
> +	# Escape any remaining plain text and concatenate
> +	my $html = '';
> +	for my $part (@list) {
> +		if (ref($part)) {
> +			$html .= $$part;
> +		} else {
> +			$html .= esc_html($part, -nbsp=>1);
> +		}
> +	}
> +
> +	return $html;
> +}

Nice.

> +
> +# Returns a ref to an HTML snippet that links the second
> +# parameter to a URL formed from the first and last parameters.
> +# This is a helper function used in %committags.
> +sub hyperlink_committag {
> +	my ($opts, @match) = @_;
> +	return
> +		\$cgi->a({-href => $opts->{url} . CGI::escape($match[1]),

$opts->{'url'} not $opts->{url}

'$cgi->escapeHTML' I think, not 'CGI::escape' (but I am not sure here).
besides, we can always import 'escape'.

> +				  -class => "text"},
> +				 esc_html($match[0], -nbsp=>1));
> +}
> +
> +
> +sub push_or_append (\@@) {

Hmmm... this would be first use of Perl subroutine prototypes in gitweb.
But this is made to imitate 'push' built-in, so I think it is O.K.

> +	my $list = shift;
> +
> +	if (ref $_[0] || ! @$list || ref $list->[-1]) {
> +		push @$list, @_;
> +	} else {
> +		my $a = pop @$list;
> +		my $b = shift @_;
> +
> +		push @$list, $a . $b, @_;
> +	}
> +	# imitate push
> +	return scalar @$list;
>  }
>  
>  # format marker of refs pointing to given object
> diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
> index d539619..37a127c 100755
> --- a/t/t9500-gitweb-standalone-no-errors.sh
> +++ b/t/t9500-gitweb-standalone-no-errors.sh
> @@ -55,9 +55,9 @@ gitweb_run () {
>  	# some of git commands write to STDERR on error, but this is not
>  	# written to web server logs, so we are not interested in that:
>  	# we are interested only in properly formatted errors/warnings
> -	rm -f gitweb.log &&
> +	rm -f resp.http gitweb.log &&
>  	perl -- "$SCRIPT_NAME" \
> -		>/dev/null 2>gitweb.log &&
> +		> resp.http 2>gitweb.log &&
>  	if grep "^[[]" gitweb.log >/dev/null 2>&1; then false; else true; fi
>  

Well, if you begin to check _output_ of gitweb, then it should be put 
in separate test, not t/t9500-gitweb-standalone-no-errors.sh which is
only about no-errors... or change name of gitweb test.

>  	# gitweb.log is left for debugging
> @@ -702,4 +702,150 @@ test_expect_success \
>  	 gitweb_run "p=.git;a=summary"'
>  test_debug 'cat gitweb.log'
>  
> +# ----------------------------------------------------------------------
> +# sha1 linking
> +#
> +echo hi > file.txt
> +git add file.txt
> +git commit -q -F - file.txt <<END
> +Summary
> +
> +See also commit 567890ab
> +END
> +test_expect_success 'sha1 link: enabled by default' '
> +	h=$(git rev-parse --verify HEAD) &&
> +	gitweb_run "p=.git;a=commit;h=$h" &&

Actually you can just use "h=HEAD" or use query without 'h' parameter
(which defaults to "HEAD") here.

> +	grep -q \
> +		"commit&nbsp;<a class=\"text\" href=\".*\">567890ab</a>" \
> +		resp.http
> +'
> +test_debug 'cat gitweb.log'
> +test_debug 'grep 567890ab resp.http'

I'd rather use Test::* (e.g. Test::WWW::Mechanize::CGI) for that...
but having some output test for gitweb, even in such simple form would
certainly  be nice.

> +
> +# ----------------------------------------------------------------------
> +# bugzilla commit tag
> +#
> +
> +echo foo > file.txt
> +git add file.txt
> +git commit -q -F - file.txt <<END
> +Fix foo
> +
> +Fixes bug 1234 involving foo.
> +END
> +git config gitweb.committags 'sha1, bugzilla'
> +test_expect_success 'bugzilla: enabled but not permitted' '
> +	h=$(git rev-parse --verify HEAD) &&
> +	gitweb_run "p=.git;a=commit;h=$h" &&
> +	grep -F -q \
> +		"Fixes&nbsp;bug&nbsp;1234&nbsp;involving" \
> +		resp.http
> +'
> +test_debug 'cat gitweb.log'
> +test_debug 'grep 1234 resp.http'
> +
> +echo '$feature{"committags"}{"override"} = 1;' >> gitweb_config.perl
> +test_expect_success 'bugzilla: enabled' '
> +	h=$(git rev-parse --verify HEAD) &&
> +	gitweb_run "p=.git;a=commit;h=$h" &&
> +	grep -F -q \
> +		"Fixes&nbsp;<a class=\"text\" href=\"http://bugzilla.kernel.org/show_bug.cgi?id=1234\">bug&nbsp;1234</a>&nbsp;involving" \
> +		resp.http
> +'

Hmmm...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 0/6] Second round of committag series
  2009-06-22 11:18           ` Jakub Narebski
@ 2009-11-18  6:22             ` Marcel M. Cary
  2009-11-18  6:22               ` [RFC PATCH 1/6] gitweb: Hyperlink committags in a commit message by regex matching Marcel M. Cary
  2009-11-20 23:24               ` [RFC PATCH 0/6] Second round of committag series Jakub Narebski
  0 siblings, 2 replies; 33+ messages in thread
From: Marcel M. Cary @ 2009-11-18  6:22 UTC (permalink / raw)
  To: Jakub Narebski, git
  Cc: Petr Baudis, Giuseppe Bilotta, Francis Galiegue, Marcel M. Cary

Thanks for the feedback.  I've added four more patches to the end of
the series and updated the first two.  My replies are below.

On Mon, 22 Jun 2009, Jakub Narebski wrote:
> On Fri, 19 June 2009, Marcel M. Cary wrote:
> 
> Thanks for diligently working on this issue.  Good work!
> 
> I see that it is an RFC, and not final submission, but just in case
> I'd like to remind you that some of information below should go into
> commit message, but some of it should I think go to comments (between
> "---" and diffstat).
> 
> > I want gitweb to hyperlink commits to my bug tracking system so that
> > information regarding the current status of a commit can be easily
> > cross-referenced.  For example, the QA and release status of a commit
> > cannot be inserted into the comment.  Maybe someday a "git notes"
> > feature will help with this, but for now, my organization has a
> > separate bug tracking system.  Other repository browsers such as
> > unfuddle and websvn support similar features.
> 
> The paragraph above should, I think, be made more clear.  You don't
> need to mention what you don't do; the comment about "git notes" should
> be not in commit message but in comments section.
> 
> What you want to have is to have some markers in commit message 
> (committags) hyperlinked; namely you want notifications about bug/issue
> numbers in the commit message hyperlinked to appropriate bugtracker/issue
> tracker URL.  Do I understand this correctly?

I'm not sure I would call the context of the bug numbers
"notifications".  A good example of what I'm looking for is to
hyperlink "Resolves-bug: 1234" to
http://bugzilla.example.com/show_bug.cgi?bug_id=1234.  I suppose this
could be seen as a notification of bug 1234's resolution, except that
I expect the more complex bugs (bugs, issues, and features) to require
several commits to resolve, each of which I'd like tagged with that
bug number, and so the bug would not really be resolved after the
first such commit.

> I tried here to reword what you said, to come up with better commit
> message for a future final submission.

I've rewritten the commit message paragraph as you suggested.  Is it
now more inline with what you'd like to see?

> > Since the bug hyperlinking feature was previously discussed as part of
> > "committags," a more general mechanism to embellish commit messages,
> > implement the more general mechanism instead, including the following
> > capabilities:
> 
> Well, I think that the fact that it would be not much harder to create
> general mechanism for commit message transformation, than to add 
> suitably generic and well customizable support for bugtracker 
> 'committags'.

So far, the biggest difficulty in generalizing the bugzilla hyperlink
feature has been the interaction between two or more commit tag
transformations -- how to structure them so they are composable but
decoupled.

> > * Hyperlinking mentions of bug IDs to Bugzilla
> > * Hyperlinking URLs
> > * Hyperlinking Message-Ids to a mailing list archive
> > * Hyperlinking commit hashes as before by default, now with a
> >   configurable regex
> > * Defining new committags per gitweb installation
> 
> Well, there is one _implicit_ (but important) commit message 
> transformation (filter) for display, which has to be always present[1],
> namely HTML escaping.  We make use of the fact that you can do
> HTML escaping before doing the only currently supported committag, 
> namely hyperlinking (shortened) SHA-1 to 'object' gitweb URL, but for
> other committags like mentioned "Message-Id to mail archive" committag
> (filter) it would make them more difficult.

The original patch still did the HTML escaping.  I don't see much
value in implementing this transformation as a committag since it
wouldn't be useful to configure it, and I don't really see it making
the code more clear or concise.  Did you have any particular reasons?

> Also one might consider vertical whitespace simplification (removing
> leading empty lines, compacting empty lines to single empty line 
> between paragraphs), and syntax highlighting signoff lines to be
> a kind of commit message filter like mentioned above committags
> (see git_print_log() subroutine).
> 
> Although probably vertical whitespace simplification should be not
> made into commit message filter, as it is used not for all views.
> 
> > 
> > Since different repositories may use different bug tracking systems or
> > mailing list archives, the URL parameter may be configured
> > per-repository without reiterating the regexes.  To accomodate
> > different conventions, regexes may also be configured per-project.
> 
> Also list of supported committags is separated from the list of 
> committags used[1]; just like it is done for snapshot formats.
> This could be mentioned in final commit message.
> 
> [1] well, sequence rather than list in this case, as here
>     ordering does matter a bit
> 
> > 
> > This patch is heavily based on discussions and code samples from the
> > Git list:
> > 
> > 	[RFC/PATCH] gitweb: Add committags support, Sep 2006
> > 	http://thread.gmane.org/gmane.comp.version-control.git/27504
> > 
> > 	[RFC] gitweb: Add committags support (take 2), Dec 2006
> > 	http://thread.gmane.org/gmane.comp.version-control.git/33150
> > 
> > 	[RFC] Configuring (future) committags support in gitweb, Nov 2008
> > 	http://thread.gmane.org/gmane.comp.version-control.git/100415
> 
> Hmmm... should this be put in final commit message, or only in comment
> to the patch (should this be in commit history of git repository)?

I'll exclude the mailing list references from the commit message since
they describe how I arrived at this set of changes rather than
describing the changes themselves.

> > Some issues I considered but punted:
> > 
> > * Should this configuration try to follow the bugtraq spec?
> > 
> >   As far as I know, only subversion implements it.  Separation of
> >   regexes by a newline would be a little awkward in the git config.
> >   And it is broader than just hyperlinking bugs: it also encompasses
> >   GUI bug ID form fields.  So gitweb would only implement a subset.
> 
> I didn't even know that there is such spec.  Were you talking about
> http://tortoisesvn.net/issuetracker_integration or do you have different
> URL in mind?

That is the Bugtraq spec, although I had been looking at a text-only
version of it at the time.  I can't seem to find the text-only
document, but I think that one explains it just as well.

> >   The gitweb configuration mechanism currently only reads
> >   keys starting with "gitweb.", but these parameters would be more
> >   broadly applicable, potentially to git-gui, for example.
> 
> Actually the fact that gitweb reads only keys in the 'gitweb' section
> from config is just a convention.  There were (are) no config variables
> in other places (other sections) which would be of interest to gitweb.
> 
> > 
> >   However, it *would* be useful for Git tools to standardize on
> >   config keys and interpretations of regexes and url formats.  For
> >   example, git-gui might be able to hyperlink the same text as gitweb,
> >   and even show a separate bugID field when composing a commit
> >   message.
> 
> This is I think a very good idea... but I think idea which 
> implementation can be left for later.

Very well then, I'll leave everything related to the other git tools
for later.

> > 
> > * I would prefer the regex match against the whole commit message.
> > 
> >   This would allow the regex to insist that a bug reference occur
> >   on the first line or non-first line of the commit message.  However,
> >   even if we concatenated the log lines for the first committag,
> >   subsequent committags would see the text broken up.
> > 
> >   Also, it would allow the regex to match a phrase split across a
> >   line boundary, as dicussed at some length in the first thread,
> >   but again, only if no prior committags had interfered.
> > 
> >   This could happen in a later patch.
> 
> Well, the change should be fairly easy: just concatenate lines before
> passing them as single element list to commit message filters.  OTOH
> you would have to take care of end of line characters in committags
> regexps.

Yes, it seems manageable to process the whole commit message at once
rather than line by line.  I've made this change to support patch 4 of
this series.

> > * I would prefer the site admin have a way to let a repository
> >   owner define new committags, which means having a way to specify
> >   the 'sub' key from the repo config or having a flexible default.
> 
> Perhaps, following your earlier suggestion to make committags supported
> also by other tools, gitweb (and e.g. git-gui / gitk) use config 
> variable committag.<name>.<key> (where <key> can be 'pattern' or 'url';

This is a great example of the kind of decision I'd like to make
up-front to avoid breaking existing configs later:
gitweb.committag.<name>.<key> vs. committag.<name>.<key>.  I'm not
very interested in adding the git-gui / gitk feature myself.  Is
anyone actually interested in this feature?  If not, perhaps it should
stay under gitweb.committag.<name>.<key>.  And even if there is
interest in the git-gui / gitk features, perhaps it would make sense
to start with the gitweb-specific version of the config variable names
and once there is cross-tool support, keep those configs as overrides
for gitweb only?

> although I wonder if we can allow 'pattern' as malicious user can do
> a DoS attack against gitweb / server using badly behaved regexp).

Yes, I imagine there is a risk of abuse in allowing patterns to be
configurable.  That's one reason why the site config can disallow
per-project configuration of regexes.  Are you suggesting there be a
separate flag to allow override of regexes than to allow override of
other parameters?  For example, with a separate flag for regexes,
repo.or.cz could allow you to configure the bug tracker URL but not
the bug regex.  I've added very fine grain support for allowing only
some options to be overridden in patch 3 of this series.

> > The bugtraq and some of the regex questions must be decided now to
> > avoid breaking gitweb configs later.
> 
> True.
> 
> > 
> > Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
> > ---
> >  gitweb/INSTALL                         |    4 +
> >  gitweb/gitweb.perl                     |  221 +++++++++++++++++++++++++++++++-
> >  t/t9500-gitweb-standalone-no-errors.sh |  150 +++++++++++++++++++++-
> >  3 files changed, 367 insertions(+), 8 deletions(-)
> > 
> > diff --git a/gitweb/INSTALL b/gitweb/INSTALL
> > index 18c9ce3..223e39e 100644
> > --- a/gitweb/INSTALL
> > +++ b/gitweb/INSTALL
> > @@ -123,6 +123,10 @@ GITWEB_CONFIG file:
> >  	$feature{'snapshot'}{'default'} = ['zip', 'tgz'];
> >  	$feature{'snapshot'}{'override'} = 1;
> >  
> > +	$feature{'committags'}{'default'} = ['sha1', 'url', 'bugzilla'];
> > +	$feature{'committags'}{'override'} = 1;
> > +
> > +
> >  
> >  Gitweb repositories
> >  -------------------
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index 1e7e2d8..c66fdf3 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -195,6 +195,81 @@ our %known_snapshot_format_aliases = (
> >  	'x-zip' => undef, '' => undef,
> >  );
> >  
> 
> I understand that comments such as one below would be not present in
> a final submission, and they are here to provide running commentary for
> code, isn't it?

Yes.  I've removed the running commentary, since it seems to be in the
way.

> > +# Could call these something else besides committags... embellishments,
> > +# patterns, rewrite rules, ?
> 
> They are "commit filters", or "commit message filters" (or 'formatters',
> or 'processors'; they are not 'parsers').  
> 
> The name 'committag' was first introduced as far as I remember in xmms2
> fork of gitweb (in old times when gitweb was separate project, and not
> part of git repository).

Sounds as though there is no interest in changing the name to
something that more clearly describes the feature (asside from my own,
but I'd like a little more validation...).  Let's stick with
committag.

> > +#
> > +# In general, the site admin can enable/disable per-project configuration
> > +# of each committag.  Only the 'options' part of the committag is configurable
> > +# per-project.
> 
> See above caveat about allowing to customize 'regexp'/'pattern' part
> in untrusted environment; you can construct regexp which has exponential
> behavior.
> 
> > +#
> > +# The site admin can of course add new tags to this hash or override the
> > +# 'sub' key if necessary.  But such changes may be fragile; this is not
> > +# designed as a full-blown plugin architecture.
> > +our %committags = (
> 
> 
> You should put the comments here about supported keys, similar to the
> one for %known_snapshot_formats and %feature hashes.

I believe I've addressed your request by adding a comment that
applies to all tags.  Because of the similarity of the tags, I don't
think it would be very useful to do this for each tag.  Does the 'url'
option used on two of the tags need further explanation?

> > +	# Link Git-style hashes to this gitweb
> > +	'sha1' => {
> > +		'options' => {
> > +			'pattern' => qr/\b([0-9a-fA-F]{8,40})\b/,
> > +		},
> > +		'override' => 0,
> 
> Shouldn't 'override' key be better last?

I liked 'override' close to the 'options' hash because that is what it
controls.  The 'sub' key was not overridable per-project no matter how
you configure gitweb.  Now it is, so maybe now 'override' could go
last.  Or first.  But really, I don't think the order matters much.

> > +		'sub' => sub {
> > +			my ($opts, @match) = @_;
> > +			\$cgi->a({-href => href(action=>"object", hash=>$match[1]),
> > +			          -class => "text"}, esc_html($match[0], -nbsp=>1));
> > +		},
> 
> Style: although there is commonly used idiom to use 'sub { <expr>; }'
> for a wrapper subroutines (e.g. 'sub { [] }' in Moose examples), one
> should use explicit "return" statement instead of relying on Perl 
> behavior of returning last statement in a block.
> 
> See Perl::Critic::Policy::Subroutines::RequireFinalReturn policy in
> Perl::Critic (perlcritic.com).  "Perl Best Practices" says:
> 
>   Subroutines without explicit 'return' statements at their ends can be
>   confusing. It can be challenging to deduce what the return value will be.
> 
> > +	},
> > +	# Link bug/features to Mantis bug tracker using Mantis-style contextual cues
> > +	'mantis' => {
> > +		'options' => {
> > +			'pattern' => qr/(?:BUG|FEATURE)\((\d+)\)/,
> > +			'url' => 'http://bugs.xmms2.xmms.se/view.php?id=',
> 
> I don't think we want to put such URL here.  Please check if Mantis 
> documentation uses some specific links, or follow RFC conventions and
> use 'example.com' as hostname (e.g. 'bugs.example.com').

Ok, I'll use a neutral URL for the mantis default.

> By the way the bugtraq proposal you mentioned uses placeholder in URL
> for putting issue number (%BUGID%).  Perhaps gitweb should do the same
> here.

I'd rather use a regex-style or sprintf-style syntax that could be
used by all committags.  I started with just an opaque string to
prepend to the bug ID because it was the simplest way to fulfill my
requirements, but it's certainly plausible that a BTS would want urls
like "example.com/bug/1234/detail" in which case the prepending
strategy isn't sufficient.  Again, ideally we'd decide now whether to
use '$1' or '%s' so we don't break configs later.  If the bugtraq
feature is implemented at some point, those config parameters can
still use the %BUGID% placeholder.

> > +		},
> > +		'override' => 0,
> > +		'sub' => \&hyperlink_committag,
> > +	},
> > +	# Link mentions of bug IDs to bugzilla
> > +	'bugzilla' => {
> > +		'options' => {
> > +			'pattern' => qr/bug\s+(\d+)/,
> > +			'url' => 'http://bugzilla.kernel.org/show_bug.cgi?id=',
> 
> The same comment as above.
> 
> > +		},
> > +		'override' => 0,
> > +		'sub' => \&hyperlink_committag,
> > +	},
> > +	# Link URLs
> > +	'url' => {
> > +		'options' => {
> > +			# Avoid matching punctuation that might immediately follow
> > +			# a url, is not part of the url, and is allowed in urls,
> > +			# like a full-stop ('.').
> > +			'pattern' => qr!(http|ftp)s?://[-_a-zA-Z0-9\@/&=+~#<>;%:.?]+
> > +			                               [-_a-zA-Z0-9\@/&=+~#<>]!x,
> 
> If you took this regexp from some place (like blog), it would be good
> to mention URL here, to be able to check more detailed explanation of
> construction of this URL-catching regexp.

Actually, I think I started with a URL regex mentioned on-list and
then tweaked it until I thought it worked well enough.

Most of the regexes I found on the web had issues.  They tended to be
too long because they were trying to precisely validate the URL or
because they were trying to parse all the different parts of the url,
or they did not deal with the trailing punctuation problem: I don't
want the period (".") to be highlighted in this context: "See
http://example.com/foo."  I think it's a valid URL even with the
period, but typically the period will not be part of the URL.  On the
other hand, "http://example.com/foo.html" should be completely
highlighted in spite of the period.

> Should we also support irc://, nntp:// (pseudo)protocols? What about
> git:// ?

I'va added more schemes, but it's also possible to leave those as a
customization.  In my organization, it would be very uncommon for
someone to follow a URL with any of those schemes.  I'd be willing to
allow any scheme that matches "[a-z]+://".  I don't care about stuff
like "data:literal+text", and "mailto:" is also less interesting to
me.  (I'd rather add a committag to match the email address without a
"mailto:".)  And then there's those pesky windows file sharing
thingies like '\\server\directory' that work like URLs in Internet
Explorer.  I'd rather not include them in the URL committag... but a
site admin could certainly configure a committag for it.  Here's a
list schemes for which KDE allegedly supports retreival of metadata:

bluetooth, fish, ftp, imap(s), invitation, iso, ldap(s), mac, mdns,
nfs, nntp(s), nxfish, obex, pop3(s), print, printdb, sdp, service,
sftp, slp, smb, smtp(s), webdav(s)

I wouldn't want to enumerate all of those, but, in addition to the
three you suggest, these look most useful to me: sftp, smb, webdav(s),
nfs.  So really I think it comes down to whether we want to enumerate
them or just match any scheme, and risk matching something that was
not intended as a URL.  And if we enumerate, how many do we want to
list.

What do you think of the new list of schemes in patch 1?  I've
included several more than before.

> > +		},
> > +		'override' => 0,
> > +		'sub' => sub {
> > +			my ($opts, @match) = @_;
> > +			return
> > +				\$cgi->a({-href => $match[0],
> > +				          -class => "text"},
> > +				         esc_html($match[0], -nbsp=>1));
> > +		},
> 
> Here you use explicit return.
> 
> > +	},
> > +	# Link Message-Id to mailing list archive
> > +	'messageid' => {
> > +		'options' => {
> > +			# The original pattern, which I don't really understand
> > +			#'pattern' => qr!(?:message|msg)-id:?\s+<([^>]+)>;!i,
> > +			'pattern' => qr!(?:message|msg)-?id:?\s+(<[^>]+>)!i,
> 
> Errr... how original patter is different from the one used?  Also above
> comment should be removed in final submission.

The most important change is that the semicolon is removed.  I
also made the dash optional.  It wasn't clear to me whether this was
intended to match a header-style reference:

    Message-Id: <asdf@example.com>

Or a casual mention:

    In msgid <asdf@example.com>, John Doe says...

But I like the latter, and would like the former to still be
supported.

> > +			'url' => 'http://news.gmane.org/find-root.php?message_id=',
> 
> Same comment about generic URL... although on the other hand perhaps
> having a few examples of mail archive sites which support finding 
> messages by Message-Id could be a good idea.
> 
> BTW. you can write 'http://mid.gmane.org/' instead...

Thanks for the tip.  Is there another common mail archive site
that allows looking up emails by message-id?  If so, I'd love to
mention the url in the comment for that committag.  I think we need to
strike a balance between having things work with minimal configuration
and avoiding the promotion of specific web sites that won't make sense
for most sites using a particular committag.  So, unless there is a
whole slew of web sites that provide this feature, I'd suggest leaving
the specific URL in.

> > +		},
> > +		'override' => 0,
> > +		# The original version didn't include the "msg-id" text in the
> > +		# link text, but this does.  In general, I think a little more
> > +		# context makes for better link text.
> 
> I guess that is the result of using generic hyperlink_committag() 
> subroutine here.  (This comment should be removed or reworded in final
> submitted version, I think.)

Yes, although if we decided it was sub-optimal link text, I'd be
happy to use a less generic mechanism.

> BTW. it would be much easier with Perl6-ish (or Perl 5.10.x) named
> captures (named groups):
> 
> 	'pattern' => qr!(?:message|msg)-?id:?\s+(?P<query><[^>]+>)!i,
> 
> or something like that.

I like the notion of names rather than numberic indices, but I'm
hesitant to require site admins to use unfamiliar Perl regex syntax to
configure committags.  Of two admins I asked, neither could make sense
of the sample regex.  If wouldn't want a site admin to *have* to learn
new syntax to configure regexes.

> > +		'sub' => \&hyperlink_committag,
> > +	},
> > +);
> > +
> >  # You define site-wide feature defaults here; override them with
> >  # $GITWEB_CONFIG as necessary.
> >  our %feature = (
> > @@ -365,6 +440,21 @@ our %feature = (
> >  		'sub' => \&feature_patches,
> >  		'override' => 0,
> >  		'default' => [16]},
> > +
> > +	# The selection and ordering of committags that are enabled.
> > +	# Committag transformations will be applied to commit log messages
> > +	# in this order if listed here.
> 
> /this/given/
> 
> You need to mention somewhere that committag subroutines return a list
> of mixed scalar and reference to scalar elements, where using reference
> to scalar removes value from the chain of filters (including implicit
> final esc_html filter).

I chose to add that explanation in the section that describes the
configuration of committags rather than in this section that defines
the list of active committags.  Sound ok?

> > +
> > +	# To disable system wide have in $GITWEB_CONFIG
> > +	# $feature{'committags'}{'default'} = [];
> > +	# To have project specific config enable override in $GITWEB_CONFIG
> > +	# $feature{'committags'}{'override'} = 1;
> > +	# and in project config gitweb.committags = sha1, url, bugzilla
> > +	# to enable those three committags for that project
> 
> Just a thought: perhaps we should provide support for 'default' in
> config (which would currently be "sha1" or "sha1, url").

I didn't understand how this would be very useful until I added
the signoff committag.  The use case I see is that it allows the
gitweb distribution to adjust the base functionality, in this case by
pushing some functionality into a committag, without project owners
needing to reconfigure their repositories.  Site admins don't need
this because they can use "push" or "unshift" to preserve the
distributed committags, right?  The project owner should get the
distribution default, not the site default, when they write "default",
right?  Are there other use cases you had in mind?  A potential
implementation is in patch 6.

> See also comment text for 'snapshot' feature, which says:
> 
>   and in project config, a comma-separated list of [...] or 
>   "none" to disable.
> 
> > +	'committags' => {
> > +		'sub' => \&feature_committags,
> > +		'override' => 0,
> > +		'default' => ['sha1']},
> >  );
> >  
> >  sub gitweb_get_feature {
> > @@ -433,6 +523,18 @@ sub feature_patches {
> >  	return ($_[0]);
> >  }
> >  
> > +sub feature_committags {
> > +	my (@defaults) = @_;
> > +
> > +	my ($cfg) = git_get_project_config('committags');
> > +
> > +	if ($cfg) {
> > +		return ($cfg eq 'none' ? () : split(/\s*[,\s]\s*/, $cfg));
> > +	}
> > +
> > +	return @defaults;
> > +}
> 
> As this would be second feature which uses comma-separated (or for
> backward compatibility space separated) list of options, perhaps
> we should factor out this part into common helper subroutine named
> for example 'feature_list' or 'feature_multi' (like 'feature_bool').

Thanks for pointing out the opportunity to fold the feature list
functions together.

> > +
> >  # checking HEAD file with -e is fragile if the repository was
> >  # initialized long time ago (i.e. symlink HEAD) and was pack-ref'ed
> >  # and then pruned.
> > @@ -814,6 +916,34 @@ $git_dir = "$projectroot/$project" if $project;
> >  our @snapshot_fmts = gitweb_get_feature('snapshot');
> >  @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
> >  
> > +# ordering of committags
> > +our @committags = gitweb_get_feature('committags');
> > +
> > +# Merge project configs with default committag definitions
> > +gitweb_load_project_committags();
> 
> Good idea... although gitweb first defines and then uses subroutine,
> see evaluate_path_info().

Sounds like you're asking me to move the function call after the
function definition.  I've made the change, but I'm also curious to
know why you have that preference.  I sometimes find it less readable,
as in this case.

> > +
> > +# Load committag configs from the repository config file and and
> > +# incorporate them into the gitweb defaults where permitted by the
> > +# site administrator.
> > +sub gitweb_load_project_committags {
> > +	return if (!$git_dir);
> > +	my %project_config = ();
> > +	my %raw_config = git_parse_project_config('gitweb\.committag');
> 
> Why not do lazy-loading of a whole config here?  We use committag
> info only for project-specific actions in gitweb.

I thought the check for $git_dir would prevent loading it for
non-project-specific pages.  Even so, I suppose we might load the
config on a page like the shortlog page that doesn't need it.

> > +	foreach my $key (keys(%raw_config)) {
> > +		next if ($key !~ /gitweb\.committag\.[^.]+\.[^.]/);
> > +		my ($gitweb_prefix, $committag_prefix, $ctname, $option) =
> > +			split(/\./, $key, 4);
> > +		$project_config{$ctname}{$option} = $raw_config{$key};
> > +	}
> 
> And use created subroutines to handle config?

Are you saying I should be using other subroutines that already
exist in gitweb rather than implementing my own?  Are you thinking of
git_get_project_config?  If I understand correctly, it requires me to
enumerate all the config keys I want.  I'd rather not require the
committags to have a predetermined set of possible config keys.  I see
now that I could still call git_get_project_config to load data into
%config and access that directly... I didn't do it before because it
seems to violate some kind of abstraction.

> > +	foreach my $ctname (keys(%committags)) {
> > +		next if (!$committags{$ctname}{'override'});
> > +		foreach my $optname (keys %{$project_config{$ctname}}) {
> > +			$committags{$ctname}{'options'}{$optname} =
> > +				$project_config{$ctname}{$optname};
> > +		}
> > +	}
> > +}
> > +
> >  # dispatch
> >  if (!defined $action) {
> >  	if (defined $hash) {
> > @@ -1384,13 +1514,92 @@ sub file_type_long {
> >  sub format_log_line_html {
> >  	my $line = shift;
> >  
> > -	$line = esc_html($line, -nbsp=>1);
> > -	$line =~ s{\b([0-9a-fA-F]{8,40})\b}{
> > -		$cgi->a({-href => href(action=>"object", hash=>$1),
> > -					-class => "text"}, $1);
> > -	}eg;
> > +	# In this list of log message fragments, a string ref indicates HTML,
> > +	# and a string indicates plain text
> > +	my @list = ( $line );
> 
> Well, to be more exact string ref means that the string referenced is
> not to be processed by later filters, including final implicit esc_html.

Well... it means both not processing and HTML escaping.  The string
refs will also not be escaped in the final processing step.

> Perhaps it would be better to use less generic name than @list herem
> e.g. @process or something?

Sure.  I chose @message_fragments as the less generic name for @list.

> > -	return $line;
> > +COMMITTAG:
> > +	foreach my $ctname (@committags) {
> > +		next COMMITTAG unless exists $committags{$ctname};
> > +		my $committag = $committags{$ctname};
> > +
> > +		next COMMITTAG unless exists $committag->{'options'};
> > +		my $opts = $committag->{'options'};
> > +
> > +		next COMMITTAG unless exists $opts->{'pattern'};
> > +		my $pattern = $opts->{'pattern'};
> > +
> > +		my @newlist = ();
> > +
> > +	PART:
> > +		foreach my $part (@list) {
> > +			next PART if $part eq "";
> > +			if (ref($part)) {
> > +				push @newlist, $part;
> > +				next PART;
> > +			}
> > +
> > +			my $oldpos = 0;
> > +
> > +		MATCH:
> > +			while ($part =~ m/$pattern/gc) {
> > +				my ($prepos, $postpos) = ($-[0], $+[0]);
> > +				my $repl = $committag->{'sub'}->($opts, $&, $1);
> > +				$repl = "" if (!defined $repl);
> > +
> > +				my $pre = substr($part, $oldpos, $prepos - $oldpos);
> > +				push_or_append(\@newlist, $pre);
> > +				push_or_append(\@newlist, $repl);
> > +
> > +				$oldpos = $postpos;
> > +			} # end while [regexp matches]
> > +
> > +			my $rest = substr($part, $oldpos);
> > +			push_or_append(\@newlist, $rest);
> > +
> > +		} # end foreach (@list)
> > +
> > +		@list = @newlist;
> > +	} # end foreach (@committags)
> > +
> > +	# Escape any remaining plain text and concatenate
> > +	my $html = '';
> > +	for my $part (@list) {
> > +		if (ref($part)) {
> > +			$html .= $$part;
> > +		} else {
> > +			$html .= esc_html($part, -nbsp=>1);
> > +		}
> > +	}
> > +
> > +	return $html;
> > +}
> 
> Nice.

Hehe, that bit of code is mostly yours, which you posted on this list
ages ago. (:  I suppose we should try to get your signoff into the
first patch of the series?

> > +
> > +# Returns a ref to an HTML snippet that links the second
> > +# parameter to a URL formed from the first and last parameters.
> > +# This is a helper function used in %committags.
> > +sub hyperlink_committag {
> > +	my ($opts, @match) = @_;
> > +	return
> > +		\$cgi->a({-href => $opts->{url} . CGI::escape($match[1]),
> 
> $opts->{'url'} not $opts->{url}
> 
> '$cgi->escapeHTML' I think, not 'CGI::escape' (but I am not sure here).
> besides, we can always import 'escape'.
> 
> > +				  -class => "text"},
> > +				 esc_html($match[0], -nbsp=>1));
> > +}
> > +
> > +
> > +sub push_or_append (\@@) {
> 
> Hmmm... this would be first use of Perl subroutine prototypes in gitweb.
> But this is made to imitate 'push' built-in, so I think it is O.K.

Ok, I guess I'll leave the subroutine prototype in then.  I don't
understand all the implications, so if you think it'll work well
without the prototype, I'm happy to remove it.  This is code
that I think you posted to the list.

> > +	my $list = shift;
> > +
> > +	if (ref $_[0] || ! @$list || ref $list->[-1]) {
> > +		push @$list, @_;
> > +	} else {
> > +		my $a = pop @$list;
> > +		my $b = shift @_;
> > +
> > +		push @$list, $a . $b, @_;
> > +	}
> > +	# imitate push
> > +	return scalar @$list;
> >  }
> >  
> >  # format marker of refs pointing to given object
> > diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
> > index d539619..37a127c 100755
> > --- a/t/t9500-gitweb-standalone-no-errors.sh
> > +++ b/t/t9500-gitweb-standalone-no-errors.sh
> > @@ -55,9 +55,9 @@ gitweb_run () {
> >  	# some of git commands write to STDERR on error, but this is not
> >  	# written to web server logs, so we are not interested in that:
> >  	# we are interested only in properly formatted errors/warnings
> > -	rm -f gitweb.log &&
> > +	rm -f resp.http gitweb.log &&
> >  	perl -- "$SCRIPT_NAME" \
> > -		>/dev/null 2>gitweb.log &&
> > +		> resp.http 2>gitweb.log &&
> >  	if grep "^[[]" gitweb.log >/dev/null 2>&1; then false; else true; fi
> >  
> 
> Well, if you begin to check _output_ of gitweb, then it should be put 
> in separate test, not t/t9500-gitweb-standalone-no-errors.sh which is
> only about no-errors... or change name of gitweb test.

I see there is some precedent for adding lib-foo.sh files.  Perhaps it
would be appropriate to move parts of the no-errors test into a
lib-gitweb.sh?  Like gitweb_init, most of gitweb_run, and maybe the
perl checks (which are that way for lib-git-svn.sh)?  I'm all for
separating the tests.  I don't like having to keep telling the test to
skip the first 85 cases when all I want to run is the committag stuff.

Actually, looks like Mark Rada already made a gitweb-lib.sh which was
exactly the same as the one I'd made except the name of the lib (vs.
lib-gitweb.sh) and the name of the file holding gitweb output.  I've
rebased onto those changes.

> I think I'm gonna leave this and the other feedback
> >  	# gitweb.log is left for debugging
> > @@ -702,4 +702,150 @@ test_expect_success \
> >  	 gitweb_run "p=.git;a=summary"'
> >  test_debug 'cat gitweb.log'
> >  
> > +# ----------------------------------------------------------------------
> > +# sha1 linking
> > +#
> > +echo hi > file.txt
> > +git add file.txt
> > +git commit -q -F - file.txt <<END
> > +Summary
> > +
> > +See also commit 567890ab
> > +	h=$(git rev-parse --verify HEAD) &&
> > +	gitweb_run "p=.git;a=commit;h=$h" &&
> 
> Actually you can just use "h=HEAD" or use query without 'h' parameter
> (which defaults to "HEAD") here.
 
Thanks for the tip, I'll use h=HEAD.
 
> > +	grep -q \
> > +		"commit&nbsp;<a class=\"text\" href=\".*\">567890ab</a>" \
> > +		resp.http
> > +'
> > +test_debug 'cat gitweb.log'
> > +test_debug 'grep 567890ab resp.http'
> 
> I'd rather use Test::* (e.g. Test::WWW::Mechanize::CGI) for that...
> but having some output test for gitweb, even in such simple form would
> certainly  be nice.

Would this mean people need to install some Mechanize stuff before
they can run the tests?  I think I'd rather avoid the dependency as
long as grep seems to do the trick.

> > +
> > +# ----------------------------------------------------------------------
> > +# bugzilla commit tag
> > +#
> > +
> > +echo foo > file.txt
> > +git add file.txt
> > +git commit -q -F - file.txt <<END
> > +Fix foo
> > +
> > +Fixes bug 1234 involving foo.
> > +END
> > +git config gitweb.committags 'sha1, bugzilla'
> > +test_expect_success 'bugzilla: enabled but not permitted' '
> > +	h=$(git rev-parse --verify HEAD) &&
> > +	gitweb_run "p=.git;a=commit;h=$h" &&
> > +	grep -F -q \
> > +		"Fixes&nbsp;bug&nbsp;1234&nbsp;involving" \
> > +		resp.http
> > +'
> > +test_debug 'cat gitweb.log'
> > +test_debug 'grep 1234 resp.http'
> > +
> > +echo '$feature{"committags"}{"override"} = 1;' >> gitweb_config.perl
> > +test_expect_success 'bugzilla: enabled' '
> > +	h=$(git rev-parse --verify HEAD) &&
> > +	gitweb_run "p=.git;a=commit;h=$h" &&
> > +	grep -F -q \
> > +		"Fixes&nbsp;<a class=\"text\" href=\"http://bugzilla.kernel.org/show_bug.cgi?id=1234\">bug&nbsp;1234</a>&nbsp;involving" \
> > +		resp.http
> > +'
> 
> Hmmm...
> 

?



Marcel M. Cary (6):
  gitweb: Hyperlink various committags in commit message with regex
  gitweb: Add second-stage matching of bug IDs in bugzilla committag
  gitweb: Allow finer-grained override controls for committags
  gitweb: Allow committag pattern matches to span multiple lines
  gitweb: Allow per-repository definition of new committags
  gitweb: Add _defaults_ keyword for feature lists in project config

 gitweb/INSTALL               |   16 ++
 gitweb/gitweb.perl           |  404 +++++++++++++++++++++++++++++++++++++-----
 t/t9502-gitweb-committags.sh |  309 ++++++++++++++++++++++++++++++++
 3 files changed, 681 insertions(+), 48 deletions(-)
 create mode 100755 t/t9502-gitweb-committags.sh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 1/6] gitweb: Hyperlink committags in a commit message by regex matching
  2009-11-18  6:22             ` [RFC PATCH 0/6] Second round of committag series Marcel M. Cary
@ 2009-11-18  6:22               ` Marcel M. Cary
  2009-11-18  6:22                 ` [RFC PATCH 2/6] gitweb: Add second-stage matching of bug IDs in bugzilla committag Marcel M. Cary
                                   ` (2 more replies)
  2009-11-20 23:24               ` [RFC PATCH 0/6] Second round of committag series Jakub Narebski
  1 sibling, 3 replies; 33+ messages in thread
From: Marcel M. Cary @ 2009-11-18  6:22 UTC (permalink / raw)
  To: Jakub Narebski, git
  Cc: Petr Baudis, Giuseppe Bilotta, Francis Galiegue, Marcel M. Cary

I want gitweb to hyperlink commits to my bug tracking system so that
information regarding the current status of a commit can be easily
cross-referenced.  The QA and release status of a commit cannot be
directly inserted into the commit message because they change over
time.  But if the commit message mentions a bug number, gitweb could
detect the bug reference in the message and hyperlink it to the bug
tracking system.  Other repository browsers such as unfuddle and
websvn support similar features.

Currently only commit hashes are hyperlinked in this manner.

Since the bug hyperlinking feature was previously discussed as part of
"committags," a more general mechanism to embellish commit messages,
implement the more general mechanism instead, including the following
capabilities:

* Hyperlinking mentions of bug IDs to Bugzilla
* Hyperlinking URLs
* Hyperlinking Message-Ids to a mailing list archive
* Hyperlinking commit hashes, as before by default, now with a
  configurable regex
* Defining new committags per gitweb installation

Since different repositories may use different bug tracking systems or
mailing list archives, the URL parameter may be configured
per-repository without reiterating the regexes.  To accomodate
different conventions, regexes may also be configured per-project.

The order in which gitweb applies committags may be configured
per-project as well, because one committag may affect subsequent ones.
Inclusion in this sequence determines whether a committag is enabled
or not.

Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
---

One additional thing that occured to me is that maybe the hyperlinks
added by committags should have 'rel="nofollow"' by default?  And if
so, maybe that needs to be configurable?  On the other hand, I'm not
sure how useful it is to hide real URLs in the commit messages from
search engines... ?


 gitweb/INSTALL               |    5 +
 gitweb/gitweb.perl           |  247 +++++++++++++++++++++++++++++++++++++++---
 t/t9502-gitweb-committags.sh |  150 +++++++++++++++++++++++++
 3 files changed, 389 insertions(+), 13 deletions(-)
 create mode 100755 t/t9502-gitweb-committags.sh

diff --git a/gitweb/INSTALL b/gitweb/INSTALL
index b76a0cf..9081ed8 100644
--- a/gitweb/INSTALL
+++ b/gitweb/INSTALL
@@ -132,6 +132,11 @@ adding the following lines to your $GITWEB_CONFIG:
 	$known_snapshot_formats{'zip'}{'disabled'} = 1;
 	$known_snapshot_formats{'tgz'}{'compressor'} = ['gzip','-6'];
 
+To add a committag to the default list of commit tags, for example to
+enable hyperlinking of bug numbers to a bug tracker for all projects:
+
+	push @{$feature{'committags'}{'default'}}, 'bugzilla';
+
 
 Gitweb repositories
 -------------------
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e4cbfc3..2d72202 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -213,6 +213,97 @@ our %avatar_size = (
 	'double'  => 32
 );
 
+# In general, the site admin can enable/disable per-project
+# configuration of each committag.  Only the 'options' part of the
+# committag is configurable per-project.
+#
+# The site admin can of course add new tags to this hash or override
+# the 'sub' key if necessary.  But such changes may be fragile; this
+# is not designed as a full-blown plugin architecture.  The 'sub' must
+# return a list of strings or string refs.  The strings must contain
+# plain text and the string refs must contain HTML.  The string refs
+# will not be processed further.
+#
+# For any committag, set the 'override' key to 1 to allow individual
+# projects to override entries in the 'options' hash for that tag.
+# For example, to match only commit hashes given in lowercase in one
+# project, add this to the $GITWEB_CONFIG:
+#
+#     $committags{'sha1'}{'override'} = 1;
+#
+# And in the project's config:
+#
+#     gitweb.committags.sha1.pattern = \\b([0-9a-f]{8,40})\\b
+#
+# Some committags have additional options whose interpretation depends
+# on the implementation of the 'sub' key.  The hyperlink_committag
+# value appends the first captured group to the 'url' option.
+our %committags = (
+	# Link Git-style hashes to this gitweb
+	'sha1' => {
+		'options' => {
+			'pattern' => qr/\b([0-9a-fA-F]{8,40})\b/,
+		},
+		'override' => 0,
+		'sub' => sub {
+			my ($opts, @match) = @_;
+			return \$cgi->a({-href => href(action=>"object", hash=>$match[1]),
+			                 -class => "text"},
+			                esc_html($match[0], -nbsp=>1));
+		},
+	},
+	# Link bug/features to Mantis bug tracker using Mantis-style
+	# contextual cues
+	'mantis' => {
+		'options' => {
+			'pattern' => qr/(?:BUG|FEATURE)\((\d+)\)/,
+			'url' => 'http://www.example.com/mantisbt/view.php?id=',
+		},
+		'override' => 0,
+		'sub' => \&hyperlink_committag,
+	},
+	# Link mentions of bug IDs to bugzilla
+	'bugzilla' => {
+		'options' => {
+			'pattern' => qr/bug\s+(\d+)/,
+			'url' => 'http://bugzilla.example.com/show_bug.cgi?id=',
+		},
+		'override' => 0,
+		'sub' => \&hyperlink_committag,
+	},
+	# Link URLs
+	'url' => {
+		'options' => {
+			# Avoid matching punctuation that might immediately follow
+			# a url, is not part of the url, and is allowed in urls,
+			# like a full-stop ('.').
+			'pattern' => qr!(https?|ftps?|git|ssh|ssh+git|sftp|smb|webdavs?|
+			                 nfs|irc|nntp|rsync)
+			                ://[-_a-zA-Z0-9\@/&=+~#<>;%:.?]+
+			                   [-_a-zA-Z0-9\@/&=+~#<>]!x,
+		},
+		'override' => 0,
+		'sub' => sub {
+			my ($opts, @match) = @_;
+			return \$cgi->a({-href => $match[0],
+			                 -class => "text"},
+			                esc_html($match[0], -nbsp=>1));
+		},
+	},
+	# Link Message-Id to mailing list archive
+	'messageid' => {
+		'options' => {
+			'pattern' => qr!(?:message|msg)-?id:?\s+(<[^>]+>)!i,
+			'url' => 'http://mid.gmane.org/',
+		},
+		'override' => 0,
+		# Includes the "msg-id" text in the link text.
+		# Since we don't support linking multiple msg-ids in one match, we
+		# can include the "msg-id" in the link text for better context.
+		'sub' => \&hyperlink_committag,
+	},
+);
+
 # You define site-wide feature defaults here; override them with
 # $GITWEB_CONFIG as necessary.
 our %feature = (
@@ -258,7 +349,7 @@ our %feature = (
 	# and in project config, a comma-separated list of formats or "none"
 	# to disable.  Example: gitweb.snapshot = tbz2,zip;
 	'snapshot' => {
-		'sub' => \&feature_snapshot,
+		'sub' => sub { feature_list('snapshot', @_) },
 		'override' => 0,
 		'default' => ['tgz']},
 
@@ -417,6 +508,21 @@ our %feature = (
 		'sub' => \&feature_avatar,
 		'override' => 0,
 		'default' => ['']},
+
+	# The selection and ordering of committags that are enabled.
+	# Committag transformations will be applied to commit log messages
+	# in the order listed here if listed here.
+
+	# To disable system wide have in $GITWEB_CONFIG
+	# $feature{'committags'}{'default'} = [];
+	# To have project specific config enable override in $GITWEB_CONFIG
+	# $feature{'committags'}{'override'} = 1;
+	# and in project config gitweb.committags = sha1, url, bugzilla
+	# to enable those three committags for that project
+	'committags' => {
+		'sub' => sub { feature_list('committags', @_) },
+		'override' => 0,
+		'default' => ['sha1']},
 );
 
 sub gitweb_get_feature {
@@ -463,16 +569,16 @@ sub feature_bool {
 	}
 }
 
-sub feature_snapshot {
-	my (@fmts) = @_;
+sub feature_list {
+	my ($key, @defaults) = @_;
 
-	my ($val) = git_get_project_config('snapshot');
+	my ($cfg) = git_get_project_config($key);
 
-	if ($val) {
-		@fmts = ($val eq 'none' ? () : split /\s*[,\s]\s*/, $val);
+	if ($cfg) {
+		return ($cfg eq 'none' ? () : split(/\s*[,\s]\s*/, $cfg));
 	}
 
-	return @fmts;
+	return @defaults;
 }
 
 sub feature_patches {
@@ -886,6 +992,35 @@ if ($git_avatar eq 'gravatar') {
 	$git_avatar = '';
 }
 
+# ordering of committags
+our @committags = gitweb_get_feature('committags');
+
+# whether we've loaded committags for the project yet
+our $loaded_project_committags = 0;
+
+# Load committag configs from the repository config file and and
+# incorporate them into the gitweb defaults where permitted by the
+# site administrator.
+sub gitweb_load_project_committags {
+	return if (!$git_dir || $loaded_project_committags);
+	my %project_config = ();
+	my %raw_config = git_parse_project_config('gitweb\.committag');
+	foreach my $key (keys(%raw_config)) {
+		next if ($key !~ /gitweb\.committag\.[^.]+\.[^.]/);
+		my ($gitweb_prefix, $committag_prefix, $ctname, $option) =
+			split(/\./, $key, 4);
+		$project_config{$ctname}{$option} = $raw_config{$key};
+	}
+	foreach my $ctname (keys(%committags)) {
+		next if (!$committags{$ctname}{'override'});
+		foreach my $optname (keys %{$project_config{$ctname}}) {
+			$committags{$ctname}{'options'}{$optname} =
+				$project_config{$ctname}{$optname};
+		}
+	}
+	$loaded_project_committags = 1;
+}
+
 # dispatch
 if (!defined $action) {
 	if (defined $hash) {
@@ -1458,13 +1593,99 @@ sub file_type_long {
 sub format_log_line_html {
 	my $line = shift;
 
-	$line = esc_html($line, -nbsp=>1);
-	$line =~ s{\b([0-9a-fA-F]{8,40})\b}{
-		$cgi->a({-href => href(action=>"object", hash=>$1),
-					-class => "text"}, $1);
-	}eg;
+	# Merge project configs with site default committag definitions if
+	# it hasn't been done yet
+	gitweb_load_project_committags();
 
-	return $line;
+	# In this list of log message fragments, a string ref indicates
+	# HTML, and a string indicates plain text.  The string refs are
+	# also currently not processed by subsequent committags.
+	my @message_fragments = ( $line );
+
+COMMITTAG:
+	foreach my $ctname (@committags) {
+		next COMMITTAG unless exists $committags{$ctname};
+		my $committag = $committags{$ctname};
+
+		next COMMITTAG unless exists $committag->{'sub'};
+		my $sub = $committag->{'sub'};
+
+		next COMMITTAG unless exists $committag->{'options'};
+		my $opts = $committag->{'options'};
+
+		next COMMITTAG unless exists $opts->{'pattern'};
+		my $pattern = $opts->{'pattern'};
+
+		my @new_message_fragments = ();
+
+	PART:
+		foreach my $fragment (@message_fragments) {
+			next PART if $fragment eq "";
+			if (ref($fragment)) {
+				push @new_message_fragments, $fragment;
+				next PART;
+			}
+
+			my $oldpos = 0;
+
+		MATCH:
+			while ($fragment =~ m/$pattern/gc) {
+				my ($prepos, $postpos) = ($-[0], $+[0]);
+				my $repl = $sub->($opts, $&, $1);
+				$repl = "" if (!defined $repl);
+
+				my $pre = substr($fragment, $oldpos, $prepos - $oldpos);
+				push_or_append(\@new_message_fragments, $pre);
+				push_or_append(\@new_message_fragments, $repl);
+
+				$oldpos = $postpos;
+			} # end while [regexp matches]
+
+			my $rest = substr($fragment, $oldpos);
+			push_or_append(\@new_message_fragments, $rest);
+
+		} # end foreach (@message_fragments)
+
+		@message_fragments = @new_message_fragments;
+	} # end foreach (@committags)
+
+	# Escape any remaining plain text and concatenate
+	my $html = '';
+	for my $fragment (@message_fragments) {
+		if (ref($fragment)) {
+			$html .= $$fragment;
+		} else {
+			$html .= esc_html($fragment, -nbsp=>1);
+		}
+	}
+
+	return $html;
+}
+
+# Returns a ref to an HTML snippet that links the whole match to a URL
+# formed from the 'url' option and the first captured subgroup.  This
+# is a helper function used in %committags.
+sub hyperlink_committag {
+	my ($opts, @match) = @_;
+	return \$cgi->a({-href => $opts->{'url'} . $cgi->escape($match[1]),
+	                 -class => "text"},
+	                esc_html($match[0], -nbsp=>1));
+}
+
+
+sub push_or_append (\@@) {
+	my $fragments = shift;
+
+	if (ref $_[0] || ! @$fragments || ref $fragments->[-1]) {
+		push @$fragments, @_;
+	} else {
+		my $a = pop @$fragments;
+		my $b = shift @_;
+
+		push @$fragments, $a . $b, @_;
+	}
+	# imitate push
+	return scalar @$fragments;
 }
 
 # format marker of refs pointing to given object
diff --git a/t/t9502-gitweb-committags.sh b/t/t9502-gitweb-committags.sh
new file mode 100755
index 0000000..f86cb3d
--- /dev/null
+++ b/t/t9502-gitweb-committags.sh
@@ -0,0 +1,150 @@
+#!/bin/sh
+
+test_description='gitweb committag tests.
+
+This test runs gitweb (git web interface) as CGI script from
+commandline, and checks that committags perform the expected
+transformations on log messages.'
+
+. ./gitweb-lib.sh
+
+# ----------------------------------------------------------------------
+# sha1 linking
+#
+echo sha1_test > file.txt
+git add file.txt
+git commit -q -F - file.txt <<END
+Summary
+
+See also commit 567890ab
+END
+test_expect_success 'sha1 link: enabled by default' '
+	gitweb_run "p=.git;a=commit;h=HEAD" &&
+	grep -q \
+		"commit&nbsp;<a class=\"text\" href=\".*\">567890ab</a>" \
+		gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 567890ab gitweb.output'
+
+# ----------------------------------------------------------------------
+# bugzilla commit tag
+#
+
+echo bugzilla_test > file.txt
+git add file.txt
+git commit -q -F - file.txt <<END
+Fix foo
+
+Fixes bug 1234 involving foo.
+END
+git config gitweb.committags 'sha1, bugzilla'
+test_expect_success 'bugzilla: enabled but not permitted' '
+	gitweb_run "p=.git;a=commit;h=HEAD" &&
+	grep -F -q \
+		"Fixes&nbsp;bug&nbsp;1234&nbsp;involving" \
+		gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 1234 gitweb.output'
+
+echo '$feature{"committags"}{"override"} = 1;' >> gitweb_config.perl
+test_expect_success 'bugzilla: enabled' '
+	gitweb_run "p=.git;a=commit;h=HEAD" &&
+	grep -F -q \
+		"Fixes&nbsp;<a class=\"text\" href=\"http://bugzilla.example.com/show_bug.cgi?id=1234\">bug&nbsp;1234</a>&nbsp;involving" \
+		gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 1234 gitweb.output'
+
+git config gitweb.committag.bugzilla.url 'http://bts.example.com?bug='
+test_expect_success 'bugzilla: url overridden but not permitted' '
+	gitweb_run "p=.git;a=commit;h=HEAD" &&
+	grep -F -q \
+		"Fixes&nbsp;<a class=\"text\" href=\"http://bugzilla.example.com/show_bug.cgi?id=1234\">bug&nbsp;1234</a>&nbsp;involving" \
+		gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 1234 gitweb.output'
+
+echo '$committags{"bugzilla"}{"override"} = 1;' >> gitweb_config.perl
+test_expect_success 'bugzilla: url overridden' '
+	gitweb_run "p=.git;a=commit;h=HEAD" &&
+	grep -F -q \
+		"Fixes&nbsp;<a class=\"text\" href=\"http://bts.example.com?bug=1234\">bug&nbsp;1234</a>&nbsp;involving" \
+		gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 1234 gitweb.output'
+
+git config gitweb.committag.bugzilla.pattern 'Fixes bug (\d+)'
+test_expect_success 'bugzilla: pattern overridden' '
+	gitweb_run "p=.git;a=commit;h=HEAD" &&
+	grep -F -q \
+		"<a class=\"text\" href=\"http://bts.example.com?bug=1234\">Fixes&nbsp;bug&nbsp;1234</a>&nbsp;involving" \
+		gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 1234 gitweb.output'
+git config --unset gitweb.committag.bugzilla.pattern
+
+test_expect_success 'bugzilla: affects log view too' '
+	gitweb_run "p=.git;a=log" &&
+	grep -F -q \
+		"<a class=\"text\" href=\"http://bts.example.com?bug=1234\">bug&nbsp;1234</a>" \
+		gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 1234 gitweb.output'
+
+# ----------------------------------------------------------------------
+# url linking
+#
+echo url_test > file.txt
+git add file.txt
+url='http://user@pass:example.com/foo.html?u=v&x=y#z'
+url_esc="$(echo "$url" | sed 's/&/&amp;/g')"
+git commit -q -F - file.txt <<END
+Summary
+
+See also $url.
+END
+echo '$feature{"committags"}{"override"} = 1;' >> gitweb_config.perl
+git config gitweb.committags 'sha1, url'
+test_expect_success 'url link: links when enabled' '
+	gitweb_run "p=.git;a=commit;h=HEAD" &&
+	grep -q -F \
+		"See&nbsp;also&nbsp;<a class=\"text\" href=\"$url_esc\">$url_esc</a>." \
+		gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep -F "$url" gitweb.output'
+
+# ----------------------------------------------------------------------
+# message id linking
+#
+echo msgid_test > file.txt
+git add file.txt
+url='http://mid.gmane.org/'
+msgid='<x@y.z>'
+msgid_esc="$(echo "$msgid" | sed 's/</\&lt;/g; s/>/\&gt;/g')"
+msgid_url="$url$(echo "$msgid" | sed 's/</%3C/g; s/@/%40/g; s/>/%3E/g')"
+git commit -q -F - file.txt <<END
+Summary
+
+See msg-id $msgid.
+END
+echo '$feature{"committags"}{"override"} = 1;' >> gitweb_config.perl
+git config gitweb.committags 'sha1, messageid'
+test_expect_success 'msgid link: linked when enabled' '
+	gitweb_run "p=.git;a=commit;h=HEAD" &&
+	grep -q -F \
+		"See&nbsp;<a class=\"text\" href=\"$msgid_url\">msg-id&nbsp;$msgid_esc</a>." \
+		gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep -F "y.z" gitweb.output'
+
+
+test_done
-- 
1.6.4.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH 2/6] gitweb: Add second-stage matching of bug IDs in bugzilla committag
  2009-11-18  6:22               ` [RFC PATCH 1/6] gitweb: Hyperlink committags in a commit message by regex matching Marcel M. Cary
@ 2009-11-18  6:22                 ` Marcel M. Cary
  2009-11-18  6:22                   ` [RFC PATCH 3/6] gitweb: Allow finer-grained override controls for committags Marcel M. Cary
  2009-11-18  8:20                 ` [RFC PATCH 1/6] gitweb: Hyperlink committags in a commit message by regex matching Petr Baudis
  2009-11-18  8:26                 ` Petr Baudis
  2 siblings, 1 reply; 33+ messages in thread
From: Marcel M. Cary @ 2009-11-18  6:22 UTC (permalink / raw)
  To: Jakub Narebski, git
  Cc: Petr Baudis, Giuseppe Bilotta, Francis Galiegue, Marcel M. Cary

Currently it's not easy to capture an unbounded number of items
in a committag phrase to hyperlink individually.

For example, I would like to match and hyperlinking each bug ID
individually in these situations:

	[#1234, #1235]
	Resolves-bug: 1234, 1235
	bugs 1234, 1235, and 1236

Match Bugzilla bug IDs with two regexes instead of one.  The first is
a pre-filter that allows easy matching of multiple bug IDs and
contextual queues like "bugs ___ and ___" in a phrase, and the second
easily picks out the individual big IDs for hyperlinking.

Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
---
 gitweb/gitweb.perl           |   68 +++++++++++++++++++++++++++++------------
 t/t9502-gitweb-committags.sh |   69 ++++++++++++++++++++++++++++++++++++++---
 2 files changed, 112 insertions(+), 25 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2d72202..032b1c5 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -262,14 +262,31 @@ our %committags = (
 		'override' => 0,
 		'sub' => \&hyperlink_committag,
 	},
-	# Link mentions of bug IDs to bugzilla
+	# Link mentions of bugs to bugzilla, allowing for separate outer
+	# and inner regexes (see unit test for example)
 	'bugzilla' => {
 		'options' => {
-			'pattern' => qr/bug\s+(\d+)/,
+			'pattern' => qr/(?i:bugs?):?\s+
+			                [#]?\d+(?:(?:,\s*|,?\sand\s|,?\sn?or\s|\s+)
+			                          [#]?\d+\b)*/x,
+			'innerpattern' => qr/#?(\d+)/,
 			'url' => 'http://bugzilla.example.com/show_bug.cgi?id=',
 		},
 		'override' => 0,
-		'sub' => \&hyperlink_committag,
+		'sub' => sub {
+			my ($opts, @match) = @_;
+			if ($opts->{'innerpattern'}) {
+				my @message_fragments = ();
+				push_or_append_replacements(\@message_fragments,
+				                            $opts->{'innerpattern'},
+				                            $match[0], sub {
+						return hyperlink_committag($opts, @_);
+					});
+				return @message_fragments;
+			} else {
+				return hyperlink_committag(@_);
+			}
+		},
 	},
 	# Link URLs
 	'url' => {
@@ -1626,23 +1643,10 @@ COMMITTAG:
 				next PART;
 			}
 
-			my $oldpos = 0;
-
-		MATCH:
-			while ($fragment =~ m/$pattern/gc) {
-				my ($prepos, $postpos) = ($-[0], $+[0]);
-				my $repl = $sub->($opts, $&, $1);
-				$repl = "" if (!defined $repl);
-
-				my $pre = substr($fragment, $oldpos, $prepos - $oldpos);
-				push_or_append(\@new_message_fragments, $pre);
-				push_or_append(\@new_message_fragments, $repl);
-
-				$oldpos = $postpos;
-			} # end while [regexp matches]
-
-			my $rest = substr($fragment, $oldpos);
-			push_or_append(\@new_message_fragments, $rest);
+			push_or_append_replacements(\@new_message_fragments,
+			                            $pattern, $fragment, sub {
+					$sub->($opts, @_);
+				});
 
 		} # end foreach (@message_fragments)
 
@@ -1672,6 +1676,30 @@ sub hyperlink_committag {
 	                esc_html($match[0], -nbsp=>1));
 }
 
+# Find $pattern in string $fragment, and push_or_append the parts
+# between matches and the result of calling $sub with matched text to
+# $new_fragments.
+sub push_or_append_replacements {
+	my ($new_fragments, $pattern, $fragment, $sub) = @_;
+
+	my $oldpos = 0;
+
+MATCH:
+	while ($fragment =~ m/$pattern/gc) {
+		my ($prepos, $postpos) = ($-[0], $+[0]);
+
+		my @repl = $sub->($&, $1);
+
+		my $pre = substr($fragment, $oldpos, $prepos - $oldpos);
+		push_or_append($new_fragments, $pre);
+		push_or_append($new_fragments, @repl);
+
+		$oldpos = $postpos;
+	} # end while [regexp matches]
+
+	my $rest = substr($fragment, $oldpos);
+	push_or_append($new_fragments, $rest);
+}
 
 sub push_or_append (\@@) {
 	my $fragments = shift;
diff --git a/t/t9502-gitweb-committags.sh b/t/t9502-gitweb-committags.sh
index f86cb3d..718e763 100755
--- a/t/t9502-gitweb-committags.sh
+++ b/t/t9502-gitweb-committags.sh
@@ -52,7 +52,7 @@ echo '$feature{"committags"}{"override"} = 1;' >> gitweb_config.perl
 test_expect_success 'bugzilla: enabled' '
 	gitweb_run "p=.git;a=commit;h=HEAD" &&
 	grep -F -q \
-		"Fixes&nbsp;<a class=\"text\" href=\"http://bugzilla.example.com/show_bug.cgi?id=1234\">bug&nbsp;1234</a>&nbsp;involving" \
+		"Fixes&nbsp;bug&nbsp;<a class=\"text\" href=\"http://bugzilla.example.com/show_bug.cgi?id=1234\">1234</a>&nbsp;involving" \
 		gitweb.output
 '
 test_debug 'cat gitweb.log'
@@ -62,7 +62,7 @@ git config gitweb.committag.bugzilla.url 'http://bts.example.com?bug='
 test_expect_success 'bugzilla: url overridden but not permitted' '
 	gitweb_run "p=.git;a=commit;h=HEAD" &&
 	grep -F -q \
-		"Fixes&nbsp;<a class=\"text\" href=\"http://bugzilla.example.com/show_bug.cgi?id=1234\">bug&nbsp;1234</a>&nbsp;involving" \
+		"Fixes&nbsp;bug&nbsp;<a class=\"text\" href=\"http://bugzilla.example.com/show_bug.cgi?id=1234\">1234</a>&nbsp;involving" \
 		gitweb.output
 '
 test_debug 'cat gitweb.log'
@@ -72,12 +72,13 @@ echo '$committags{"bugzilla"}{"override"} = 1;' >> gitweb_config.perl
 test_expect_success 'bugzilla: url overridden' '
 	gitweb_run "p=.git;a=commit;h=HEAD" &&
 	grep -F -q \
-		"Fixes&nbsp;<a class=\"text\" href=\"http://bts.example.com?bug=1234\">bug&nbsp;1234</a>&nbsp;involving" \
+		"Fixes&nbsp;bug&nbsp;<a class=\"text\" href=\"http://bts.example.com?bug=1234\">1234</a>&nbsp;involving" \
 		gitweb.output
 '
 test_debug 'cat gitweb.log'
 test_debug 'grep 1234 gitweb.output'
 
+git config gitweb.committag.bugzilla.innerpattern ''
 git config gitweb.committag.bugzilla.pattern 'Fixes bug (\d+)'
 test_expect_success 'bugzilla: pattern overridden' '
 	gitweb_run "p=.git;a=commit;h=HEAD" &&
@@ -87,17 +88,75 @@ test_expect_success 'bugzilla: pattern overridden' '
 '
 test_debug 'cat gitweb.log'
 test_debug 'grep 1234 gitweb.output'
-git config --unset gitweb.committag.bugzilla.pattern
 
+git config --unset gitweb.committag.bugzilla.innerpattern
+git config --unset gitweb.committag.bugzilla.pattern
 test_expect_success 'bugzilla: affects log view too' '
 	gitweb_run "p=.git;a=log" &&
 	grep -F -q \
-		"<a class=\"text\" href=\"http://bts.example.com?bug=1234\">bug&nbsp;1234</a>" \
+		"<a class=\"text\" href=\"http://bts.example.com?bug=1234\">1234</a>" \
 		gitweb.output
 '
 test_debug 'cat gitweb.log'
 test_debug 'grep 1234 gitweb.output'
 
+echo more_bugzilla > file.txt
+git add file.txt
+git commit -q -F - file.txt <<END
+[#123,#45] This commit fixes two bugs involving bar and baz.
+END
+git config gitweb.committag.bugzilla.pattern       '^\[#\d+(, ?#\d+)\]'
+git config gitweb.committag.bugzilla.innerpattern  '#(\d+)'
+git config gitweb.committag.bugzilla.url           'http://bugs/'
+test_expect_success 'bugzilla: override everything, use fancier url format' '
+       gitweb_run "p=.git;a=commit;h=HEAD" &&
+       grep -F -q \
+               "[<a class=\"text\" href=\"http://bugs/123\">#123</a>,<a class=\"text\" href=\"http://bugs/45\">#45</a>]" \
+               gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 123 gitweb.output'
+
+echo even_more_bugzilla > file.txt
+git add file.txt
+git commit -q -F - file.txt <<END
+Fix memory leak in confabulator from bug 123.
+
+Based on history from bugs 223, 224, and 225,
+fix bug 323 or 324.
+
+Bug: 423,424,425,426,427,428,429,430,431,432,435
+Resolves-bugs: #523 #524
+END
+git config --unset gitweb.committag.bugzilla.pattern
+git config --unset gitweb.committag.bugzilla.innerpattern
+git config --unset gitweb.committag.bugzilla.url
+gitweb_run "p=.git;a=commit;h=HEAD"
+test_expect_success 'bugzilla: fancy defaults: match one bug' '
+	grep -q "from&nbsp;bug&nbsp;<a[^>]*>123</a>." gitweb.output
+'
+test_expect_success 'bugzilla: fancy defaults: comma-separated list' '
+	grep -q \
+		"bugs&nbsp;<a[^>]*>223</a>,&nbsp;<a[^>]*>224</a>,&nbsp;and&nbsp;<a[^>]*>225</a>," \
+		gitweb.output
+'
+test_expect_success 'bugzilla: fancy defaults: or-pair' '
+	grep -q "bug&nbsp;<a[^>]*>323</a>&nbsp;or&nbsp;<a[^>]*>324</a>." \
+		gitweb.output
+'
+test_expect_success 'bugzilla: fancy defaults: comma-separated, caps, >10' '
+	grep -q \
+		"Bug:&nbsp;<a[^>]*>423</a>,<a[^>]*>424</a>,.*,<a[^>]*>435</a>" \
+		gitweb.output
+'
+test_expect_success 'bugzilla: fancy defaults: space-separated with hash' '
+	grep -q -e \
+		"-bugs:&nbsp;<a[^>]*>#523</a>&nbsp;<a[^>]*>#524</a>" \
+		gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 23 gitweb.output'
+
 # ----------------------------------------------------------------------
 # url linking
 #
-- 
1.6.4.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH 3/6] gitweb: Allow finer-grained override controls for committags
  2009-11-18  6:22                 ` [RFC PATCH 2/6] gitweb: Add second-stage matching of bug IDs in bugzilla committag Marcel M. Cary
@ 2009-11-18  6:22                   ` Marcel M. Cary
  2009-11-18  6:22                     ` [RFC PATCH 4/6] gitweb: Allow committag pattern matches to span multiple lines Marcel M. Cary
  0 siblings, 1 reply; 33+ messages in thread
From: Marcel M. Cary @ 2009-11-18  6:22 UTC (permalink / raw)
  To: Jakub Narebski, git
  Cc: Petr Baudis, Giuseppe Bilotta, Francis Galiegue, Marcel M. Cary

Currently, a site administrator must choose between allowing all or
none of a committag's options to be overridden in the project config.
However, a site admin may wish to permit specifying a bugzilla URL
without risking a maliciously resource hungry regular expression.

Allow the site admin to specify which committag parameters may be
overridden.  Preserve the behavior of the original 0 and 1 override
specifications.

Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
---
 gitweb/INSTALL               |    8 +++++++-
 gitweb/gitweb.perl           |   24 ++++++++++++++++++------
 t/t9502-gitweb-committags.sh |   13 +++++++++++++
 3 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/gitweb/INSTALL b/gitweb/INSTALL
index 9081ed8..15c0128 100644
--- a/gitweb/INSTALL
+++ b/gitweb/INSTALL
@@ -133,9 +133,15 @@ adding the following lines to your $GITWEB_CONFIG:
 	$known_snapshot_formats{'tgz'}{'compressor'} = ['gzip','-6'];
 
 To add a committag to the default list of commit tags, for example to
-enable hyperlinking of bug numbers to a bug tracker for all projects:
+enable hyperlinking of bug numbers to a bug tracker for all projects, while
+allowing each project to choose only the base URL for its bug tracker:
 
 	push @{$feature{'committags'}{'default'}}, 'bugzilla';
+	$committags{"bugzilla"}{"override"} = ["url"];
+
+And then let each project configure its bug tracker URL:
+
+	git config gitweb.committag.bugzilla.url 'http://bts.example.com?bug='
 
 
 Gitweb repositories
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 032b1c5..8f4480e 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -225,11 +225,13 @@ our %avatar_size = (
 # will not be processed further.
 #
 # For any committag, set the 'override' key to 1 to allow individual
-# projects to override entries in the 'options' hash for that tag.
-# For example, to match only commit hashes given in lowercase in one
-# project, add this to the $GITWEB_CONFIG:
+# projects to override any entry in the 'options' hash for that tag.
+# Leave 'override' as 0 to disallow all overriding of all entries.
+# Set 'override' to an array of 'option' key names to allow overriding
+# specific keys.  For example, to match only commit hashes given in
+# lowercase in one project, add this to the $GITWEB_CONFIG:
 #
-#     $committags{'sha1'}{'override'} = 1;
+#     $committags{'sha1'}{'override'} = 1;   # or ["pattern"]
 #
 # And in the project's config:
 #
@@ -237,7 +239,8 @@ our %avatar_size = (
 #
 # Some committags have additional options whose interpretation depends
 # on the implementation of the 'sub' key.  The hyperlink_committag
-# value appends the first captured group to the 'url' option.
+# value appends the first captured group to the 'url' option, for example.
+#
 our %committags = (
 	# Link Git-style hashes to this gitweb
 	'sha1' => {
@@ -1029,8 +1032,17 @@ sub gitweb_load_project_committags {
 		$project_config{$ctname}{$option} = $raw_config{$key};
 	}
 	foreach my $ctname (keys(%committags)) {
-		next if (!$committags{$ctname}{'override'});
+		my $override = $committags{$ctname}{'override'};
+		next if (!$override);
+		my $override_keys = undef;
+		if (ref($override) eq "ARRAY") {
+			$override_keys = {};
+			foreach my $optname (@$override) {
+				$override_keys->{$optname} = 1;
+			}
+		}
 		foreach my $optname (keys %{$project_config{$ctname}}) {
+			next if ($override_keys && !$override_keys->{$optname});
 			$committags{$ctname}{'options'}{$optname} =
 				$project_config{$ctname}{$optname};
 		}
diff --git a/t/t9502-gitweb-committags.sh b/t/t9502-gitweb-committags.sh
index 718e763..e13ac47 100755
--- a/t/t9502-gitweb-committags.sh
+++ b/t/t9502-gitweb-committags.sh
@@ -68,6 +68,19 @@ test_expect_success 'bugzilla: url overridden but not permitted' '
 test_debug 'cat gitweb.log'
 test_debug 'grep 1234 gitweb.output'
 
+echo '$committags{"bugzilla"}{"override"} = ["url"];' >> gitweb_config.perl
+git config gitweb.committag.bugzilla.url 'http://bts.example.com?bug='
+git config gitweb.committag.bugzilla.pattern 'slow DoS regex'
+test_expect_success 'bugzilla: url overridden but regex not permitted' '
+	gitweb_run "p=.git;a=commit;h=HEAD" &&
+	grep -F -q \
+		"Fixes&nbsp;bug&nbsp;<a class=\"text\" href=\"http://bts.example.com?bug=1234\">1234</a>&nbsp;involving" \
+		gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep 1234 gitweb.output'
+git config --unset gitweb.committag.bugzilla.pattern
+
 echo '$committags{"bugzilla"}{"override"} = 1;' >> gitweb_config.perl
 test_expect_success 'bugzilla: url overridden' '
 	gitweb_run "p=.git;a=commit;h=HEAD" &&
-- 
1.6.4.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH 4/6] gitweb: Allow committag pattern matches to span multiple lines
  2009-11-18  6:22                   ` [RFC PATCH 3/6] gitweb: Allow finer-grained override controls for committags Marcel M. Cary
@ 2009-11-18  6:22                     ` Marcel M. Cary
  2009-11-18  6:22                       ` [RFC PATCH 5/6] gitweb: Allow per-repository definition of new committags Marcel M. Cary
  0 siblings, 1 reply; 33+ messages in thread
From: Marcel M. Cary @ 2009-11-18  6:22 UTC (permalink / raw)
  To: Jakub Narebski, git
  Cc: Petr Baudis, Giuseppe Bilotta, Francis Galiegue, Marcel M. Cary

Committags cannot currently span multiple lines.  Since some committag
patterns match multiple words.  If some of those words wrap to the
next line, the committag would miss an opportunity to match.

Eliminate the for-loop over @log and pull the signoff transformation
from that loop into a committag.

The message will still get cut into pieces as committags are applied,
but at least newlines no longer force a cut.

Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
---
 gitweb/gitweb.perl           |   67 +++++++++++++++++++-----------------------
 t/t9502-gitweb-committags.sh |    8 +++++
 2 files changed, 38 insertions(+), 37 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 8f4480e..7f7d3a3 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -255,6 +255,21 @@ our %committags = (
 			                esc_html($match[0], -nbsp=>1));
 		},
 	},
+	# Facilitate styling of common header/footer lines, suppressing
+	# any trailing newlines
+	'signoff' => {
+		'options' => {
+			'pattern' =>
+				qr/^( *(?:signed[ \-]off[ \-]by|acked[ \-]by|cc)[ :].*)\n*$/mi,
+		},
+		'override' => 0,
+		'sub' => sub {
+			my ($opts, @match) = @_;
+			return (\$cgi->span({'class' => 'signoff'},
+			                    esc_html($match[1], -nbsp=>1)),
+			        "\n");
+		},
+	},
 	# Link bug/features to Mantis bug tracker using Mantis-style
 	# contextual cues
 	'mantis' => {
@@ -542,7 +557,7 @@ our %feature = (
 	'committags' => {
 		'sub' => sub { feature_list('committags', @_) },
 		'override' => 0,
-		'default' => ['sha1']},
+		'default' => ['signoff', 'sha1']},
 );
 
 sub gitweb_get_feature {
@@ -1619,8 +1634,8 @@ sub file_type_long {
 ## which don't belong to other sections
 
 # format line of commit message.
-sub format_log_line_html {
-	my $line = shift;
+sub format_log_html {
+	my $text = shift;
 
 	# Merge project configs with site default committag definitions if
 	# it hasn't been done yet
@@ -1629,7 +1644,7 @@ sub format_log_line_html {
 	# In this list of log message fragments, a string ref indicates
 	# HTML, and a string indicates plain text.  The string refs are
 	# also currently not processed by subsequent committags.
-	my @message_fragments = ( $line );
+	my @message_fragments = ( $text );
 
 COMMITTAG:
 	foreach my $ctname (@committags) {
@@ -1671,7 +1686,9 @@ COMMITTAG:
 		if (ref($fragment)) {
 			$html .= $$fragment;
 		} else {
-			$html .= esc_html($fragment, -nbsp=>1);
+			# Don't let esc_html turn "\n" into "\\n"
+			$html .= join("<br/>\n", map { esc_html($_, -nbsp=>1) }
+			                             split("\n", $fragment, -1));
 		}
 	}
 
@@ -3776,40 +3793,16 @@ sub git_print_log {
 		shift @$log;
 	}
 
-	# print log
-	my $signoff = 0;
-	my $empty = 0;
-	foreach my $line (@$log) {
-		if ($line =~ m/^ *(signed[ \-]off[ \-]by[ :]|acked[ \-]by[ :]|cc[ :])/i) {
-			$signoff = 1;
-			$empty = 0;
-			if (! $opts{'-remove_signoff'}) {
-				print "<span class=\"signoff\">" . esc_html($line) . "</span><br/>\n";
-				next;
-			} else {
-				# remove signoff lines
-				next;
-			}
-		} else {
-			$signoff = 0;
-		}
-
-		# print only one empty line
-		# do not print empty line after signoff
-		if ($line eq "") {
-			next if ($empty || $signoff);
-			$empty = 1;
-		} else {
-			$empty = 0;
-		}
-
-		print format_log_line_html($line) . "<br/>\n";
-	}
-
 	if ($opts{'-final_empty_line'}) {
-		# end with single empty line
-		print "<br/>\n" unless $empty;
+		# If we already have a trailing newline, this will be
+		# coalesced with it later.
+		push @$log, "";
 	}
+
+	# print log
+	my $text = join("\n", @$log) . "\n";
+	$text =~ s{\n\n+}{\n\n}g;
+	print format_log_html($text);
 }
 
 # return link target (what link points to)
diff --git a/t/t9502-gitweb-committags.sh b/t/t9502-gitweb-committags.sh
index e13ac47..0753630 100755
--- a/t/t9502-gitweb-committags.sh
+++ b/t/t9502-gitweb-committags.sh
@@ -138,6 +138,10 @@ Fix memory leak in confabulator from bug 123.
 Based on history from bugs 223, 224, and 225,
 fix bug 323 or 324.
 
+Bugs:
+1234,
+1235
+
 Bug: 423,424,425,426,427,428,429,430,431,432,435
 Resolves-bugs: #523 #524
 END
@@ -167,6 +171,10 @@ test_expect_success 'bugzilla: fancy defaults: space-separated with hash' '
 		"-bugs:&nbsp;<a[^>]*>#523</a>&nbsp;<a[^>]*>#524</a>" \
 		gitweb.output
 '
+test_expect_success 'bugzilla: fancy defaults: spanning newlines' '
+	grep -q -e "<a[^>]*>1234</a>,<br" gitweb.output &&
+	grep -q -e "<a[^>]*>1235</a><br" gitweb.output
+'
 test_debug 'cat gitweb.log'
 test_debug 'grep 23 gitweb.output'
 
-- 
1.6.4.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH 5/6] gitweb: Allow per-repository definition of new committags
  2009-11-18  6:22                     ` [RFC PATCH 4/6] gitweb: Allow committag pattern matches to span multiple lines Marcel M. Cary
@ 2009-11-18  6:22                       ` Marcel M. Cary
  2009-11-18  6:22                         ` [RFC PATCH 6/6] gitweb: Add _defaults_ keyword for feature lists in project config Marcel M. Cary
  0 siblings, 1 reply; 33+ messages in thread
From: Marcel M. Cary @ 2009-11-18  6:22 UTC (permalink / raw)
  To: Jakub Narebski, git
  Cc: Petr Baudis, Giuseppe Bilotta, Francis Galiegue, Marcel M. Cary

Committags are limited to the functionality configured by the site
administrator.

Provide two more general purpose committag subroutines that replace
text by feeding the capturing groups of a pattern to a sprintf format.
One additionally escapes the parameters of the capturing groups for
producing HTML snippets, the other does not.

Then, if permitted by the site administrator, allow the 'sub' key to
be overridden in an existing committag and allow a new committag to be
defined completely from within the repository configuration.

Signed-off-by: Marcel M. Cary <marcel@oak.homeunix.org>
---
 gitweb/gitweb.perl           |  135 ++++++++++++++++++++++++++++--------------
 t/t9502-gitweb-committags.sh |   50 +++++++++++++++
 2 files changed, 141 insertions(+), 44 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7f7d3a3..d413f22 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -214,8 +214,7 @@ our %avatar_size = (
 );
 
 # In general, the site admin can enable/disable per-project
-# configuration of each committag.  Only the 'options' part of the
-# committag is configurable per-project.
+# configuration of each committag.
 #
 # The site admin can of course add new tags to this hash or override
 # the 'sub' key if necessary.  But such changes may be fragile; this
@@ -241,12 +240,18 @@ our %avatar_size = (
 # on the implementation of the 'sub' key.  The hyperlink_committag
 # value appends the first captured group to the 'url' option, for example.
 #
+# The project configuration can define new committags.  Although the
+# project configuration cannot supply code defining a new 'sub' key,
+# the project configuration can choose from a list of pre-approved
+# subroutines named in the 'allowed_committag_subs' feature.  Both a
+# 'sub' key and 'pattern' key must be defined for the committag to be
+# used.  If the 'allowed_committag_subs' feature is empty, no new
+# committags can be defined in the project config.
+#
 our %committags = (
 	# Link Git-style hashes to this gitweb
 	'sha1' => {
-		'options' => {
-			'pattern' => qr/\b([0-9a-fA-F]{8,40})\b/,
-		},
+		'pattern' => qr/\b([0-9a-fA-F]{8,40})\b/,
 		'override' => 0,
 		'sub' => sub {
 			my ($opts, @match) = @_;
@@ -258,10 +263,8 @@ our %committags = (
 	# Facilitate styling of common header/footer lines, suppressing
 	# any trailing newlines
 	'signoff' => {
-		'options' => {
-			'pattern' =>
-				qr/^( *(?:signed[ \-]off[ \-]by|acked[ \-]by|cc)[ :].*)\n*$/mi,
-		},
+		'pattern' =>
+			qr/^( *(?:signed[ \-]off[ \-]by|acked[ \-]by|cc)[ :].*)\n*$/mi,
 		'override' => 0,
 		'sub' => sub {
 			my ($opts, @match) = @_;
@@ -273,23 +276,19 @@ our %committags = (
 	# Link bug/features to Mantis bug tracker using Mantis-style
 	# contextual cues
 	'mantis' => {
-		'options' => {
-			'pattern' => qr/(?:BUG|FEATURE)\((\d+)\)/,
-			'url' => 'http://www.example.com/mantisbt/view.php?id=',
-		},
+		'pattern' => qr/(?:BUG|FEATURE)\((\d+)\)/,
+		'url' => 'http://www.example.com/mantisbt/view.php?id=',
 		'override' => 0,
 		'sub' => \&hyperlink_committag,
 	},
 	# Link mentions of bugs to bugzilla, allowing for separate outer
 	# and inner regexes (see unit test for example)
 	'bugzilla' => {
-		'options' => {
-			'pattern' => qr/(?i:bugs?):?\s+
-			                [#]?\d+(?:(?:,\s*|,?\sand\s|,?\sn?or\s|\s+)
-			                          [#]?\d+\b)*/x,
-			'innerpattern' => qr/#?(\d+)/,
-			'url' => 'http://bugzilla.example.com/show_bug.cgi?id=',
-		},
+		'pattern' => qr/(?i:bugs?):?\s+
+		                [#]?\d+(?:(?:,\s*|,?\sand\s|,?\sn?or\s|\s+)
+		                          [#]?\d+\b)*/x,
+		'innerpattern' => qr/#?(\d+)/,
+		'url' => 'http://bugzilla.example.com/show_bug.cgi?id=',
 		'override' => 0,
 		'sub' => sub {
 			my ($opts, @match) = @_;
@@ -308,15 +307,13 @@ our %committags = (
 	},
 	# Link URLs
 	'url' => {
-		'options' => {
-			# Avoid matching punctuation that might immediately follow
-			# a url, is not part of the url, and is allowed in urls,
-			# like a full-stop ('.').
-			'pattern' => qr!(https?|ftps?|git|ssh|ssh+git|sftp|smb|webdavs?|
-			                 nfs|irc|nntp|rsync)
-			                ://[-_a-zA-Z0-9\@/&=+~#<>;%:.?]+
-			                   [-_a-zA-Z0-9\@/&=+~#<>]!x,
-		},
+		# Avoid matching punctuation that might immediately follow
+		# a url, is not part of the url, and is allowed in urls,
+		# like a full-stop ('.').
+		'pattern' => qr!(https?|ftps?|git|ssh|ssh+git|sftp|smb|webdavs?|
+		                 nfs|irc|nntp|rsync)
+		                ://[-_a-zA-Z0-9\@/&=+~#<>;%:.?]+
+		                   [-_a-zA-Z0-9\@/&=+~#<>]!x,
 		'override' => 0,
 		'sub' => sub {
 			my ($opts, @match) = @_;
@@ -327,10 +324,8 @@ our %committags = (
 	},
 	# Link Message-Id to mailing list archive
 	'messageid' => {
-		'options' => {
-			'pattern' => qr!(?:message|msg)-?id:?\s+(<[^>]+>)!i,
-			'url' => 'http://mid.gmane.org/',
-		},
+		'pattern' => qr!(?:message|msg)-?id:?\s+(<[^>]+>)!i,
+		'url' => 'http://mid.gmane.org/',
 		'override' => 0,
 		# Includes the "msg-id" text in the link text.
 		# Since we don't support linking multiple msg-ids in one match, we
@@ -558,6 +553,22 @@ our %feature = (
 		'sub' => sub { feature_list('committags', @_) },
 		'override' => 0,
 		'default' => ['signoff', 'sha1']},
+
+	# The list of committag callbacks that are permitted to be used
+	# from within a repository configuration.  These are interpretted
+	# as perl subrouting names.  If none are listed, no new committags
+	# can be defined in the project config, which is the default.
+
+	# To enable system wide have in $GITWEB_CONFIG
+	# $feature{'allowed_committag_subs'}{'default'} = [
+	#		'hyperlink_committag',
+	#		'markup_committag',
+	#		'transform_committag',
+	#		];
+	# It would not make sense to allow per-project overrides of this.
+	'allowed_committag_subs' => {
+		'override' => 0,
+		'default' => []},
 );
 
 sub gitweb_get_feature {
@@ -1030,6 +1041,9 @@ if ($git_avatar eq 'gravatar') {
 # ordering of committags
 our @committags = gitweb_get_feature('committags');
 
+# ordering of committags
+our @allowed_committag_subs = gitweb_get_feature('allowed_committag_subs');
+
 # whether we've loaded committags for the project yet
 our $loaded_project_committags = 0;
 
@@ -1046,9 +1060,18 @@ sub gitweb_load_project_committags {
 			split(/\./, $key, 4);
 		$project_config{$ctname}{$option} = $raw_config{$key};
 	}
-	foreach my $ctname (keys(%committags)) {
-		my $override = $committags{$ctname}{'override'};
+
+	my %allowed_subs = ();
+	foreach my $sub (@allowed_committag_subs) {
+		$allowed_subs{$sub} = 1;
+	}
+
+	foreach my $ctname (keys(%project_config)) {
+		my $override = $committags{$ctname}
+			? $committags{$ctname}{'override'}
+			: 1;
 		next if (!$override);
+
 		my $override_keys = undef;
 		if (ref($override) eq "ARRAY") {
 			$override_keys = {};
@@ -1056,12 +1079,19 @@ sub gitweb_load_project_committags {
 				$override_keys->{$optname} = 1;
 			}
 		}
+
 		foreach my $optname (keys %{$project_config{$ctname}}) {
 			next if ($override_keys && !$override_keys->{$optname});
-			$committags{$ctname}{'options'}{$optname} =
-				$project_config{$ctname}{$optname};
+			my $value = $project_config{$ctname}{$optname};
+			if ($optname eq 'sub') {
+				if (!$allowed_subs{$value}) {
+					next;
+				}
+			}
+			$committags{$ctname}{$optname} = $value;
 		}
 	}
+
 	$loaded_project_committags = 1;
 }
 
@@ -1654,11 +1684,8 @@ COMMITTAG:
 		next COMMITTAG unless exists $committag->{'sub'};
 		my $sub = $committag->{'sub'};
 
-		next COMMITTAG unless exists $committag->{'options'};
-		my $opts = $committag->{'options'};
-
-		next COMMITTAG unless exists $opts->{'pattern'};
-		my $pattern = $opts->{'pattern'};
+		next COMMITTAG unless exists $committag->{'pattern'};
+		my $pattern = $committag->{'pattern'};
 
 		my @new_message_fragments = ();
 
@@ -1672,7 +1699,8 @@ COMMITTAG:
 
 			push_or_append_replacements(\@new_message_fragments,
 			                            $pattern, $fragment, sub {
-					$sub->($opts, @_);
+					no strict "refs"; # for custome committags
+					$sub->($committag, @_);
 				});
 
 		} # end foreach (@message_fragments)
@@ -1705,6 +1733,25 @@ sub hyperlink_committag {
 	                esc_html($match[0], -nbsp=>1));
 }
 
+# Returns a ref to an HTML snippet formed from the 'replacement'
+# option and match data.  The match data is HTML-escaped, and the
+# 'replacement' option is used as a sprintf format.  This is a helper
+# function used in %committags.
+sub markup_committag {
+	my ($opts, @match) = @_;
+	return \sprintf($opts->{'replacement'},
+	                map { esc_html($_, -nbsp=>1) if defined } @match);
+}
+
+# Returns a text snippet formed from the 'replacement' option and
+# match data.  The 'replacement' option is used as a sprintf format.
+# Because the result is text, it can be re-processed by subsequent
+# committags.  This is a helper function used in %committags.
+sub transform_committag {
+	my ($opts, @match) = @_;
+	return sprintf($opts->{'replacement'}, @match);
+}
+
 # Find $pattern in string $fragment, and push_or_append the parts
 # between matches and the result of calling $sub with matched text to
 # $new_fragments.
@@ -1717,7 +1764,7 @@ MATCH:
 	while ($fragment =~ m/$pattern/gc) {
 		my ($prepos, $postpos) = ($-[0], $+[0]);
 
-		my @repl = $sub->($&, $1);
+		my @repl = $sub->($&, $1, $2);
 
 		my $pre = substr($fragment, $oldpos, $prepos - $oldpos);
 		push_or_append($new_fragments, $pre);
diff --git a/t/t9502-gitweb-committags.sh b/t/t9502-gitweb-committags.sh
index 0753630..cbe607b 100755
--- a/t/t9502-gitweb-committags.sh
+++ b/t/t9502-gitweb-committags.sh
@@ -226,5 +226,55 @@ test_expect_success 'msgid link: linked when enabled' '
 test_debug 'cat gitweb.log'
 test_debug 'grep -F "y.z" gitweb.output'
 
+# ----------------------------------------------------------------------
+# custom committags
+#
+echo custom_test > file.txt
+git add file.txt
+git commit -q -F - file.txt <<END
+Something for <foo&bar@bar.com>
+END
+echo '$feature{"allowed_committag_subs"}{"default"} = [
+	"hyperlink_committag",
+	"markup_committag",
+	"transform_committag",
+	];' >> gitweb_config.perl
+git config gitweb.committags 'sha1, obfuscate'
+git config gitweb.committag.obfuscate.pattern '([a-z&]+@)[a-z]+(.com)'
+git config gitweb.committag.obfuscate.sub 'transform_committag'
+git config gitweb.committag.obfuscate.replacement '%2$sXXX%3$s'
+test_expect_success 'custom committags: transform_committag' '
+	gitweb_run "p=.git;a=commit;h=HEAD" &&
+	grep -q -F \
+		"foo&amp;bar@XXX.com" \
+		gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep -F "foo" gitweb.output'
+
+git config gitweb.committags 'sha1, linkemail'
+git config gitweb.committag.linkemail.pattern '<([a-z&]+@[a-z]+.com)>'
+git config gitweb.committag.linkemail.sub 'markup_committag'
+git config gitweb.committag.linkemail.replacement '<a href="mailto:%2$s">%1$s</a>'
+test_expect_success 'custom committags: markup_committag' '
+	gitweb_run "p=.git;a=commit;h=HEAD" &&
+	grep -q -F \
+		"<a href=\"mailto:foo&amp;bar@bar.com\">&lt;foo&amp;bar@bar.com&gt;</a>" \
+		gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep -F "foo" gitweb.output'
+
+echo '$feature{"allowed_committag_subs"}{"default"} = [
+	];' >> gitweb_config.perl
+test_expect_success 'custom committags: ignored when disabled' '
+	gitweb_run "p=.git;a=commit;h=HEAD" &&
+	grep -q -F \
+		"Something&nbsp;for&nbsp;&lt;foo&amp;bar@bar.com&gt;" \
+		gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'grep -F "foo" gitweb.output'
+
 
 test_done
-- 
1.6.4.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH 6/6] gitweb: Add _defaults_ keyword for feature lists in project config
  2009-11-18  6:22                       ` [RFC PATCH 5/6] gitweb: Allow per-repository definition of new committags Marcel M. Cary
@ 2009-11-18  6:22                         ` Marcel M. Cary
  0 siblings, 0 replies; 33+ messages in thread
From: Marcel M. Cary @ 2009-11-18  6:22 UTC (permalink / raw)
  To: Jakub Narebski, git
  Cc: Petr Baudis, Giuseppe Bilotta, Francis Galiegue, Marcel M. Cary

If the site admin configures the list of committags, there's no way
for a project to get the defaults back short of enumerating them
explicitly.  Worse yet, when the distribution upgrades the default
list, perhaps to push more pre-existing functionality into committags,
the project would have to discover this and upgrade its configuration
to match the new defaults.

Add a special _defaults_ list entry which, in the project config,
expands to the build-time default list configured for that variable.
A project may use this to append or prepend to the default
configuration, even as the default configuration changes with new
releases.
---
 gitweb/INSTALL               |    5 +++++
 gitweb/gitweb.perl           |   18 +++++++++++++-----
 t/t9502-gitweb-committags.sh |   29 +++++++++++++++++++++++++++++
 3 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/gitweb/INSTALL b/gitweb/INSTALL
index 15c0128..83e6a5e 100644
--- a/gitweb/INSTALL
+++ b/gitweb/INSTALL
@@ -143,6 +143,11 @@ And then let each project configure its bug tracker URL:
 
 	git config gitweb.committag.bugzilla.url 'http://bts.example.com?bug='
 
+In a project config, the build-time list of committags can be accessed
+with the special _defaults_ entry.
+
+	git config gitweb.committags '_defaults_, bugzilla'
+
 
 Gitweb repositories
 -------------------
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index d413f22..707e76e 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -334,6 +334,13 @@ our %committags = (
 	},
 );
 
+sub make_list_feature {
+	my ($name, $hash) = @_;
+	$hash->{'build_default'} = [@{$hash->{'default'}}];
+	$hash->{'sub'} = sub { feature_list($name, @_) };
+	return @_;
+}
+
 # You define site-wide feature defaults here; override them with
 # $GITWEB_CONFIG as necessary.
 our %feature = (
@@ -378,8 +385,7 @@ our %feature = (
 	# $feature{'snapshot'}{'override'} = 1;
 	# and in project config, a comma-separated list of formats or "none"
 	# to disable.  Example: gitweb.snapshot = tbz2,zip;
-	'snapshot' => {
-		'sub' => sub { feature_list('snapshot', @_) },
+	make_list_feature 'snapshot' => {
 		'override' => 0,
 		'default' => ['tgz']},
 
@@ -549,8 +555,7 @@ our %feature = (
 	# $feature{'committags'}{'override'} = 1;
 	# and in project config gitweb.committags = sha1, url, bugzilla
 	# to enable those three committags for that project
-	'committags' => {
-		'sub' => sub { feature_list('committags', @_) },
+	make_list_feature 'committags' => {
 		'override' => 0,
 		'default' => ['signoff', 'sha1']},
 
@@ -621,7 +626,10 @@ sub feature_list {
 	my ($cfg) = git_get_project_config($key);
 
 	if ($cfg) {
-		return ($cfg eq 'none' ? () : split(/\s*[,\s]\s*/, $cfg));
+		return () if $cfg eq 'none';
+		return map {
+				$_ eq '_defaults_' ? @{$feature{$key}{'build_default'}} : $_
+			} split(/\s*[,\s]\s*/, $cfg);
 	}
 
 	return @defaults;
diff --git a/t/t9502-gitweb-committags.sh b/t/t9502-gitweb-committags.sh
index cbe607b..7d16329 100755
--- a/t/t9502-gitweb-committags.sh
+++ b/t/t9502-gitweb-committags.sh
@@ -276,5 +276,34 @@ test_expect_success 'custom committags: ignored when disabled' '
 test_debug 'cat gitweb.log'
 test_debug 'grep -F "foo" gitweb.output'
 
+# ----------------------------------------------------------------------
+# default keyword
+#
+echo default_test > file.txt
+git add file.txt
+git commit -q -F - file.txt <<END
+Lets see what's enabled...
+
+Bug 1234
+567890ab
+See msg-id <x@y.z>
+
+Signed-off-by: A U Thor <at@example.com>
+END
+echo '
+$feature{"committags"}{"default"} = ["sha1", "messageid"];
+$feature{"committags"}{"override"} = 1;
+' >> gitweb_config.perl
+git config gitweb.committags '_defaults_, bugzilla'
+# All these committags should be in effect except messageid
+test_expect_success '_defaults_ keyword: restores build-time default' '
+	gitweb_run "p=.git;a=commit;h=HEAD" &&
+	grep -q "Bug&nbsp;<a[^>]*>1234</a>" gitweb.output &&
+	grep -q "<a[^>]*>567890ab</a>" gitweb.output &&
+	grep -q "See&nbsp;msg-id&nbsp;&lt;x@y.z&gt;" gitweb.output &&
+	grep -q "<span[^>]*>Signed-off-by:" gitweb.output
+'
+test_debug 'cat gitweb.log'
+test_debug 'for i in Bug 5678 msg-id Signed-off; do grep $i gitweb.output; done'
 
 test_done
-- 
1.6.4.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/6] gitweb: Hyperlink committags in a commit message by regex matching
  2009-11-18  6:22               ` [RFC PATCH 1/6] gitweb: Hyperlink committags in a commit message by regex matching Marcel M. Cary
  2009-11-18  6:22                 ` [RFC PATCH 2/6] gitweb: Add second-stage matching of bug IDs in bugzilla committag Marcel M. Cary
@ 2009-11-18  8:20                 ` Petr Baudis
  2009-11-18  8:26                 ` Petr Baudis
  2 siblings, 0 replies; 33+ messages in thread
From: Petr Baudis @ 2009-11-18  8:20 UTC (permalink / raw)
  To: Marcel M. Cary; +Cc: Jakub Narebski, git, Giuseppe Bilotta, Francis Galiegue

On Tue, Nov 17, 2009 at 10:22:25PM -0800, Marcel M. Cary wrote:
> One additional thing that occured to me is that maybe the hyperlinks
> added by committags should have 'rel="nofollow"' by default?  And if
> so, maybe that needs to be configurable?  On the other hand, I'm not
> sure how useful it is to hide real URLs in the commit messages from
> search engines... ?

I don't think rel="nofollow" is necessary.

BTW, wouldn't it be useful to do the matching in blob bodies as well?
And is it sensible to call these "committags" at all then? I already
made the mistake of calling content tags "ctags" and I regret it; I
think calling yet another thing tags after git tags and ctags is almost
unbearable.

> diff --git a/gitweb/INSTALL b/gitweb/INSTALL
> index b76a0cf..9081ed8 100644
> --- a/gitweb/INSTALL
> +++ b/gitweb/INSTALL
> @@ -132,6 +132,11 @@ adding the following lines to your $GITWEB_CONFIG:
>  	$known_snapshot_formats{'zip'}{'disabled'} = 1;
>  	$known_snapshot_formats{'tgz'}{'compressor'} = ['gzip','-6'];
>  
> +To add a committag to the default list of commit tags, for example to
> +enable hyperlinking of bug numbers to a bug tracker for all projects:
> +
> +	push @{$feature{'committags'}{'default'}}, 'bugzilla';
> +
>  
>  Gitweb repositories
>  -------------------

I think this is not useful at all, since:

	(i) The user will also *always* need to override the URL.
	(ii) More importantly, the user has no idea what on earth commit
	tags are.

Could you please prepend this paragraph with a short committags
description, e.g.:

	"Gitweb can rewrite certain snippets of text in commit messages
	to hyperlinks, e.g. URL addresses or bug tracker references - we
	call these snippets 'committags'."

And you should also add

	$committags{'buzilla'}{'options'}{'url'} = ...

to the explanation, together with a reference to the appropriate part of
gitweb.perl for more details.

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index e4cbfc3..2d72202 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -213,6 +213,97 @@ our %avatar_size = (
>  	'double'  => 32
>  );
>  
> +# In general, the site admin can enable/disable per-project
> +# configuration of each committag.  Only the 'options' part of the
> +# committag is configurable per-project.

  The exact same problem here - it is not explained what committag
actually is and where it applies.

> +# The site admin can of course add new tags to this hash or override
> +# the 'sub' key if necessary.  But such changes may be fragile; this
> +# is not designed as a full-blown plugin architecture.  The 'sub' must
> +# return a list of strings or string refs.  The strings must contain
> +# plain text and the string refs must contain HTML.  The string refs
> +# will not be processed further.
> +#
> +# For any committag, set the 'override' key to 1 to allow individual
> +# projects to override entries in the 'options' hash for that tag.
> +# For example, to match only commit hashes given in lowercase in one
> +# project, add this to the $GITWEB_CONFIG:
> +#
> +#     $committags{'sha1'}{'override'} = 1;
> +#
> +# And in the project's config:
> +#
> +#     gitweb.committags.sha1.pattern = \\b([0-9a-f]{8,40})\\b
> +#
> +# Some committags have additional options whose interpretation depends
> +# on the implementation of the 'sub' key.  The hyperlink_committag
> +# value appends the first captured group to the 'url' option.
> +our %committags = (
> +	# Link Git-style hashes to this gitweb
> +	'sha1' => {
> +		'options' => {
> +			'pattern' => qr/\b([0-9a-fA-F]{8,40})\b/,
> +		},
> +		'override' => 0,
> +		'sub' => sub {
> +			my ($opts, @match) = @_;
> +			return \$cgi->a({-href => href(action=>"object", hash=>$match[1]),
> +			                 -class => "text"},
> +			                esc_html($match[0], -nbsp=>1));
> +		},
> +	},

Ideally, a link should be made only in case the object exists, but this
is not trivial to implement without overhead of 1 exec per object, so I
guess it's fine to leave this for later (after all this feature was
already present). In that case, I think it would be useful to start
matching ids from 5 characters up - I use these quite frequently ;) -
but until then it would probably make for too much false positives.

> @@ -417,6 +508,21 @@ our %feature = (
>  		'sub' => \&feature_avatar,
>  		'override' => 0,
>  		'default' => ['']},
> +
> +	# The selection and ordering of committags that are enabled.
> +	# Committag transformations will be applied to commit log messages
> +	# in the order listed here if listed here.

You should add something like "See %committags definition above for
explanation of committags and pre-defined committag classes."

> +	# To disable system wide have in $GITWEB_CONFIG
> +	# $feature{'committags'}{'default'} = [];
> +	# To have project specific config enable override in $GITWEB_CONFIG
> +	# $feature{'committags'}{'override'} = 1;
> +	# and in project config gitweb.committags = sha1, url, bugzilla
> +	# to enable those three committags for that project
> +	'committags' => {
> +		'sub' => sub { feature_list('committags', @_) },
> +		'override' => 0,
> +		'default' => ['sha1']},
>  );

Would people consider it harmful to add 'url' to the default set?

> @@ -463,16 +569,16 @@ sub feature_bool {
>  	}
>  }
>  
> -sub feature_snapshot {
> -	my (@fmts) = @_;
> +sub feature_list {
> +	my ($key, @defaults) = @_;
>  
> -	my ($val) = git_get_project_config('snapshot');
> +	my ($cfg) = git_get_project_config($key);
>  
> -	if ($val) {
> -		@fmts = ($val eq 'none' ? () : split /\s*[,\s]\s*/, $val);
> +	if ($cfg) {
> +		return ($cfg eq 'none' ? () : split(/\s*[,\s]\s*/, $cfg));
>  	}
>  
> -	return @fmts;
> +	return @defaults;
>  }
>  
>  sub feature_patches {
> @@ -886,6 +992,35 @@ if ($git_avatar eq 'gravatar') {
>  	$git_avatar = '';
>  }
>  
> +# ordering of committags
> +our @committags = gitweb_get_feature('committags');
> +
> +# whether we've loaded committags for the project yet
> +our $loaded_project_committags = 0;
> +
> +# Load committag configs from the repository config file and and
> +# incorporate them into the gitweb defaults where permitted by the
> +# site administrator.
> +sub gitweb_load_project_committags {
> +	return if (!$git_dir || $loaded_project_committags);

When can it happen that this is called and !$git_dir? In case it could
ever happen, why not allow the configuration at least in global gitweb
file?

> +	my %project_config = ();
> +	my %raw_config = git_parse_project_config('gitweb\.committag');
> +	foreach my $key (keys(%raw_config)) {
> +		next if ($key !~ /gitweb\.committag\.[^.]+\.[^.]/);
> +		my ($gitweb_prefix, $committag_prefix, $ctname, $option) =
> +			split(/\./, $key, 4);
> +		$project_config{$ctname}{$option} = $raw_config{$key};
> +	}
> +	foreach my $ctname (keys(%committags)) {
> +		next if (!$committags{$ctname}{'override'});
> +		foreach my $optname (keys %{$project_config{$ctname}}) {
> +			$committags{$ctname}{'options'}{$optname} =
> +				$project_config{$ctname}{$optname};
> +		}
> +	}
> +	$loaded_project_committags = 1;
> +}

For the next-conditions, I'd prefer unless formulation, but I guess
that's purely a matter of taste.

> +sub push_or_append (\@@) {
> +	my $fragments = shift;
> +
> +	if (ref $_[0] || ! @$fragments || ref $fragments->[-1]) {
> +		push @$fragments, @_;
> +	} else {
> +		my $a = pop @$fragments;
> +		my $b = shift @_;
> +
> +		push @$fragments, $a . $b, @_;
> +	}
> +	# imitate push
> +	return scalar @$fragments;

This looks *quite* cryptic, a comment would be rather helpful.

-- 
				Petr "Pasky" Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/6] gitweb: Hyperlink committags in a commit message by regex matching
  2009-11-18  6:22               ` [RFC PATCH 1/6] gitweb: Hyperlink committags in a commit message by regex matching Marcel M. Cary
  2009-11-18  6:22                 ` [RFC PATCH 2/6] gitweb: Add second-stage matching of bug IDs in bugzilla committag Marcel M. Cary
  2009-11-18  8:20                 ` [RFC PATCH 1/6] gitweb: Hyperlink committags in a commit message by regex matching Petr Baudis
@ 2009-11-18  8:26                 ` Petr Baudis
  2 siblings, 0 replies; 33+ messages in thread
From: Petr Baudis @ 2009-11-18  8:26 UTC (permalink / raw)
  To: Marcel M. Cary; +Cc: Jakub Narebski, git, Giuseppe Bilotta, Francis Galiegue

On Tue, Nov 17, 2009 at 10:22:25PM -0800, Marcel M. Cary wrote:
> +			# Avoid matching punctuation that might immediately follow
> +			# a url, is not part of the url, and is allowed in urls,
> +			# like a full-stop ('.').
> +			'pattern' => qr!(https?|ftps?|git|ssh|ssh+git|sftp|smb|webdavs?|
> +			                 nfs|irc|nntp|rsync)
> +			                ://[-_a-zA-Z0-9\@/&=+~#<>;%:.?]+
> +			                   [-_a-zA-Z0-9\@/&=+~#<>]!x,

You meant ssh\+git here. ;-)

				Petr "Pasky" Baudis

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/6] Second round of committag series
  2009-11-18  6:22             ` [RFC PATCH 0/6] Second round of committag series Marcel M. Cary
  2009-11-18  6:22               ` [RFC PATCH 1/6] gitweb: Hyperlink committags in a commit message by regex matching Marcel M. Cary
@ 2009-11-20 23:24               ` Jakub Narebski
  1 sibling, 0 replies; 33+ messages in thread
From: Jakub Narebski @ 2009-11-20 23:24 UTC (permalink / raw)
  To: Marcel M. Cary; +Cc: git, Petr Baudis, Giuseppe Bilotta, Francis Galiegue

On Wed, 18 Nov 2009, Marcel M. Cary wrote:

> Thanks for the feedback.  I've added four more patches to the end of
> the series and updated the first two.  My replies are below.
> 
> On Mon, 22 Jun 2009, Jakub Narebski wrote:
> > On Fri, 19 June 2009, Marcel M. Cary wrote:

Thanks for working on this.  I'll try to reply soon.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2009-11-20 23:24 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-08 19:07 [RFC] Configuring (future) committags support in gitweb Jakub Narebski
2008-11-08 20:02 ` Francis Galiegue
2008-11-08 22:35   ` Jakub Narebski
2008-11-08 23:27     ` Francis Galiegue
2008-11-09  0:25       ` Jakub Narebski
2009-02-17 15:32     ` [RFC] Configuring (future) committags support in gitweb, especially bug linking Marcel M. Cary
2009-02-18  3:00       ` [PATCH RFC 1/2] gitweb: Fix warnings with override permitted but no repo override Marcel M. Cary
2009-02-18  7:41         ` Giuseppe Bilotta
2009-02-18  8:40         ` Junio C Hamano
2009-02-18 13:09           ` Jakub Narebski
2009-02-18 19:02             ` Junio C Hamano
2009-02-18  3:00       ` [PATCH RFC 2/2] gitweb: Hyperlink multiple git hashes on the same commit message line Marcel M. Cary
2009-02-18 21:55         ` Jakub Narebski
2009-02-20  8:35           ` Junio C Hamano
2009-02-20 11:46             ` Jakub Narebski
2009-02-24 15:38           ` Addresses with full names in patch emails Marcel M. Cary
2009-02-24 15:58             ` Jakub Narebski
2009-02-24 16:33           ` [PATCH RFC 2/2] gitweb: Hyperlink multiple git hashes on the same commit message line Marcel M. Cary
2009-02-18  3:38       ` [RFC] Configuring (future) committags support in gitweb, especially bug linking Jakub Narebski
2009-02-19 17:08         ` Marcel M. Cary
2009-06-19 14:13         ` [RFC PATCH 1/2] gitweb: Hyperlink various committags in commit message with regex Marcel M. Cary
2009-06-22 11:18           ` Jakub Narebski
2009-11-18  6:22             ` [RFC PATCH 0/6] Second round of committag series Marcel M. Cary
2009-11-18  6:22               ` [RFC PATCH 1/6] gitweb: Hyperlink committags in a commit message by regex matching Marcel M. Cary
2009-11-18  6:22                 ` [RFC PATCH 2/6] gitweb: Add second-stage matching of bug IDs in bugzilla committag Marcel M. Cary
2009-11-18  6:22                   ` [RFC PATCH 3/6] gitweb: Allow finer-grained override controls for committags Marcel M. Cary
2009-11-18  6:22                     ` [RFC PATCH 4/6] gitweb: Allow committag pattern matches to span multiple lines Marcel M. Cary
2009-11-18  6:22                       ` [RFC PATCH 5/6] gitweb: Allow per-repository definition of new committags Marcel M. Cary
2009-11-18  6:22                         ` [RFC PATCH 6/6] gitweb: Add _defaults_ keyword for feature lists in project config Marcel M. Cary
2009-11-18  8:20                 ` [RFC PATCH 1/6] gitweb: Hyperlink committags in a commit message by regex matching Petr Baudis
2009-11-18  8:26                 ` Petr Baudis
2009-11-20 23:24               ` [RFC PATCH 0/6] Second round of committag series Jakub Narebski
2009-06-19 14:13         ` [RFC PATCH 2/2] gitweb: Add second-stage matching of bug IDs in bugzilla committag Marcel M. Cary

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).