All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] gitweb: use highlight's shebang detection
@ 2016-09-06 19:00 Ian Kelling
  2016-09-20 20:22 ` Jakub Narębski
  0 siblings, 1 reply; 23+ messages in thread
From: Ian Kelling @ 2016-09-06 19:00 UTC (permalink / raw)
  To: git; +Cc: jnareb

The highlight binary can detect language by shebang when we can't tell
the syntax type by the name of the file. To use highlight's shebang
detection, add highlight to the pipeline whenever highlight is enabled.

Document the shebang detection and add a test which exercises it in
t/t9500-gitweb-standalone-no-errors.sh.

Signed-off-by: Ian Kelling <ian@iankelling.org>
---

Notes:
    I wondered if adding highlight to the pipeline would make viewing a blob
    with no highlighting take longer but it did not on my computer. I found
    no noticeable impact on small files and strangely, on a 159k file, it
    took 7% less time averaged over several requests.

 Documentation/gitweb.conf.txt          | 21 ++++++++++++++-------
 gitweb/gitweb.perl                     | 10 +++++-----
 t/t9500-gitweb-standalone-no-errors.sh | 18 +++++++++++++-----
 3 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index a79e350..e632089 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -246,13 +246,20 @@ $highlight_bin::
 	Note that 'highlight' feature must be set for gitweb to actually
 	use syntax highlighting.
 +
-*NOTE*: if you want to add support for new file type (supported by
-"highlight" but not used by gitweb), you need to modify `%highlight_ext`
-or `%highlight_basename`, depending on whether you detect type of file
-based on extension (for example "sh") or on its basename (for example
-"Makefile").  The keys of these hashes are extension and basename,
-respectively, and value for given key is name of syntax to be passed via
-`--syntax <syntax>` to highlighter.
+*NOTE*: for a file to be highlighted, its syntax type must be detected
+and that syntax must be supported by "highlight".  The default syntax
+detection is minimal, and there are many supported syntax types with no
+detection by default.  There are three options for adding syntax
+detection.  The first and second priority are `%highlight_basename` and
+`%highlight_ext`, which detect based on basename (the full filename, for
+example "Makefile") and extension (for example "sh").  The keys of these
+hashes are the basename and extension, respectively, and the value for a
+given key is the name of the syntax to be passed via `--syntax <syntax>`
+to "highlight".  The last priority is the "highlight" configuration of
+`Shebang` regular expressions to detect the language based on the first
+line in the file, (for example, matching the line "#!/bin/bash").  See
+the highlight documentation and the default config at
+/etc/highlight/filetypes.conf for more details.
 +
 For example if repositories you are hosting use "phtml" extension for
 PHP files, and you want to have correct syntax-highlighting for those
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 33d701d..a672181 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3931,15 +3931,16 @@ sub guess_file_syntax {
 # or return original FD if no highlighting
 sub run_highlighter {
 	my ($fd, $highlight, $syntax) = @_;
-	return $fd unless ($highlight && defined $syntax);
+	return $fd unless ($highlight);
 
 	close $fd;
+	my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
 	open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
 	          quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse',
 	            '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
 	            '--', "-fe=$fallback_encoding")." | ".
 	          quote_command($highlight_bin).
-	          " --replace-tabs=8 --fragment --syntax $syntax |"
+	          " --replace-tabs=8 --fragment $syntax_arg |"
 		or die_error(500, "Couldn't open file or run syntax highlighter");
 	return $fd;
 }
@@ -7063,8 +7064,7 @@ sub git_blob {
 
 	my $highlight = gitweb_check_feature('highlight');
 	my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
-	$fd = run_highlighter($fd, $highlight, $syntax)
-		if $syntax;
+	$fd = run_highlighter($fd, $highlight, $syntax);
 
 	git_header_html(undef, $expires);
 	my $formats_nav = '';
@@ -7117,7 +7117,7 @@ sub git_blob {
 			$line = untabify($line);
 			printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
 			       $nr, esc_attr(href(-replay => 1)), $nr, $nr,
-			       $syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
+			       $highlight ? sanitize($line) : esc_html($line, -nbsp=>1);
 		}
 	}
 	close $fd
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index e94b2f1..9e5fcfe 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -702,12 +702,20 @@ test_expect_success HIGHLIGHT \
 	 gitweb_run "p=.git;a=blob;f=file"'
 
 test_expect_success HIGHLIGHT \
-	'syntax highlighting (highlighted, shell script)' \
+	'syntax highlighting (highlighted, shell script shebang)' \
 	'git config gitweb.highlight yes &&
-	 echo "#!/usr/bin/sh" > test.sh &&
-	 git add test.sh &&
-	 git commit -m "Add test.sh" &&
-	 gitweb_run "p=.git;a=blob;f=test.sh"'
+	 echo "#!/usr/bin/sh" > test &&
+	 git add test &&
+	 git commit -m "Add test" &&
+	 gitweb_run "p=.git;a=blob;f=test"'
+
+test_expect_success HIGHLIGHT \
+	'syntax highlighting (highlighted, header file)' \
+	'git config gitweb.highlight yes &&
+	 echo "#define ANSWER 42" > test.h &&
+	 git add test.h &&
+	 git commit -m "Add test.h" &&
+	 gitweb_run "p=.git;a=blob;f=test.h"'
 
 # ----------------------------------------------------------------------
 # forks of projects
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] gitweb: use highlight's shebang detection
  2016-09-06 19:00 [PATCH] gitweb: use highlight's shebang detection Ian Kelling
@ 2016-09-20 20:22 ` Jakub Narębski
  2016-09-21 16:38   ` Junio C Hamano
                     ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Jakub Narębski @ 2016-09-20 20:22 UTC (permalink / raw)
  To: Ian Kelling, git

W dniu 06.09.2016 o 21:00, Ian Kelling pisze:

> The highlight binary can detect language by shebang when we can't tell
> the syntax type by the name of the file. 

Was it something always present among highlight[1] binary capabilities,
or is it something present only in new enough highlight app?  Or only
in some specific fork / specific binary?  I couldn't find language
detection in highlight[1] documentation...

[1]: http://www.andre-simon.de/doku/highlight/en/highlight.php

If this feature is available only for some version, or for some
highlighters, gitweb would have to provide an option to configure
it.  It might be an additional configuration variable, it might
be a special value in the %highlight_basename or %highlight_ext.

>                                          To use highlight's shebang
> detection, add highlight to the pipeline whenever highlight is enabled.

This describes what this patch does, but the sentence feels
a bit convoluted, as it is stated.

> 
> Document the shebang detection and add a test which exercises it in
> t/t9500-gitweb-standalone-no-errors.sh.

Nice!

> 
> Signed-off-by: Ian Kelling <ian@iankelling.org>
> ---
> 
> Notes:
>     I wondered if adding highlight to the pipeline would make viewing a blob
>     with no highlighting take longer but it did not on my computer. I found
>     no noticeable impact on small files and strangely, on a 159k file, it
>     took 7% less time averaged over several requests.

Strange.  I would guess that invoking separate binary and perl would always
add to the time (especially on operation systems where forking / running
command is expensive... though those are not often used with web servers,
isn't it).

> 
>  Documentation/gitweb.conf.txt          | 21 ++++++++++++++-------
>  gitweb/gitweb.perl                     | 10 +++++-----
>  t/t9500-gitweb-standalone-no-errors.sh | 18 +++++++++++++-----
>  3 files changed, 32 insertions(+), 17 deletions(-)
> 
> diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> index a79e350..e632089 100644
> --- a/Documentation/gitweb.conf.txt
> +++ b/Documentation/gitweb.conf.txt
> @@ -246,13 +246,20 @@ $highlight_bin::
>  	Note that 'highlight' feature must be set for gitweb to actually
>  	use syntax highlighting.
>  +
> -*NOTE*: if you want to add support for new file type (supported by
> -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> -or `%highlight_basename`, depending on whether you detect type of file
> -based on extension (for example "sh") or on its basename (for example
> -"Makefile").  The keys of these hashes are extension and basename,
> -respectively, and value for given key is name of syntax to be passed via
> -`--syntax <syntax>` to highlighter.
> +*NOTE*: for a file to be highlighted, its syntax type must be detected
> +and that syntax must be supported by "highlight".  The default syntax
> +detection is minimal, and there are many supported syntax types with no
> +detection by default.  There are three options for adding syntax
> +detection.  The first and second priority are `%highlight_basename` and
> +`%highlight_ext`, which detect based on basename (the full filename, for
> +example "Makefile") and extension (for example "sh").  The keys of these
> +hashes are the basename and extension, respectively, and the value for a
> +given key is the name of the syntax to be passed via `--syntax <syntax>`
> +to "highlight".  The last priority is the "highlight" configuration of
> +`Shebang` regular expressions to detect the language based on the first
> +line in the file, (for example, matching the line "#!/bin/bash").  See
> +the highlight documentation and the default config at
> +/etc/highlight/filetypes.conf for more details.

All right; in addition to expanding the docs, it also improves them.

>  +
>  For example if repositories you are hosting use "phtml" extension for
>  PHP files, and you want to have correct syntax-highlighting for those
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 33d701d..a672181 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3931,15 +3931,16 @@ sub guess_file_syntax {
>  # or return original FD if no highlighting
>  sub run_highlighter {
>  	my ($fd, $highlight, $syntax) = @_;
> -	return $fd unless ($highlight && defined $syntax);
> +	return $fd unless ($highlight);

Here we would have check if we want / can invoke "highlight".

>  
>  	close $fd;
> +	my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
>  	open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
>  	          quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse',
>  	            '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
>  	            '--', "-fe=$fallback_encoding")." | ".
>  	          quote_command($highlight_bin).
> -	          " --replace-tabs=8 --fragment --syntax $syntax |"
> +	          " --replace-tabs=8 --fragment $syntax_arg |"
>  		or die_error(500, "Couldn't open file or run syntax highlighter");
>  	return $fd;
>  }

All right (well, except for the question asked at the beginning).

> @@ -7063,8 +7064,7 @@ sub git_blob {
>  
>  	my $highlight = gitweb_check_feature('highlight');
>  	my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
> -	$fd = run_highlighter($fd, $highlight, $syntax)
> -		if $syntax;

Hmmm... it looks like the old code checked if there was $syntax defined
twice: once for truthy value in caller, once for definedness in run_highlighter().

> +	$fd = run_highlighter($fd, $highlight, $syntax);

All right.

>  
>  	git_header_html(undef, $expires);
>  	my $formats_nav = '';
> @@ -7117,7 +7117,7 @@ sub git_blob {
>  			$line = untabify($line);
>  			printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
>  			       $nr, esc_attr(href(-replay => 1)), $nr, $nr,
> -			       $syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
> +			       $highlight ? sanitize($line) : esc_html($line, -nbsp=>1);

Oh, well.  It looks like checking if highlighter could be run in
run_highlight() is wrong, as the caller (that is, git_blob()) needs
to know if it is using "highlight" output (which is HTML) or raw blob
contents (which needs to be HTML-escaped).

>  		}
>  	}
>  	close $fd
> diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
> index e94b2f1..9e5fcfe 100755
> --- a/t/t9500-gitweb-standalone-no-errors.sh
> +++ b/t/t9500-gitweb-standalone-no-errors.sh
> @@ -702,12 +702,20 @@ test_expect_success HIGHLIGHT \
>  	 gitweb_run "p=.git;a=blob;f=file"'
>  
>  test_expect_success HIGHLIGHT \
> -	'syntax highlighting (highlighted, shell script)' \
> +	'syntax highlighting (highlighted, shell script shebang)' \

It would be nice to have in test name that it checks if highlighter
autodetection works, or at least doesn't crash gitweb.

>  	'git config gitweb.highlight yes &&
> -	 echo "#!/usr/bin/sh" > test.sh &&
> -	 git add test.sh &&
> -	 git commit -m "Add test.sh" &&
> -	 gitweb_run "p=.git;a=blob;f=test.sh"'
> +	 echo "#!/usr/bin/sh" > test &&
> +	 git add test &&
> +	 git commit -m "Add test" &&
> +	 gitweb_run "p=.git;a=blob;f=test"'
> +
> +test_expect_success HIGHLIGHT \
> +	'syntax highlighting (highlighted, header file)' \

Do we check explicit syntax knowledge (based on the extension),
or autodetect again?

> +	'git config gitweb.highlight yes &&
> +	 echo "#define ANSWER 42" > test.h &&
> +	 git add test.h &&
> +	 git commit -m "Add test.h" &&
> +	 gitweb_run "p=.git;a=blob;f=test.h"'
>  
>  # ----------------------------------------------------------------------
>  # forks of projects
> 

Thank you for your work on this patch,
-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] gitweb: use highlight's shebang detection
  2016-09-20 20:22 ` Jakub Narębski
@ 2016-09-21 16:38   ` Junio C Hamano
  2016-09-21 17:51     ` Jakub Narębski
  2016-09-21 22:15   ` Ian Kelling
  2016-09-21 22:18   ` Ian Kelling
  2 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2016-09-21 16:38 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Ian Kelling, git

Jakub Narębski <jnareb@gmail.com> writes:

> W dniu 06.09.2016 o 21:00, Ian Kelling pisze:
>
>> The highlight binary can detect language by shebang when we can't tell
>> the syntax type by the name of the file. 
>
> Was it something always present among highlight[1] binary capabilities,
> or is it something present only in new enough highlight app?  Or only
> in some specific fork / specific binary?  I couldn't find language
> detection in highlight[1] documentation...
> ...
> Thank you for your work on this patch,

Thanks for reviewing.  It seems that there will be further exchange
needed before I can pick it up?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] gitweb: use highlight's shebang detection
  2016-09-21 16:38   ` Junio C Hamano
@ 2016-09-21 17:51     ` Jakub Narębski
  0 siblings, 0 replies; 23+ messages in thread
From: Jakub Narębski @ 2016-09-21 17:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ian Kelling, git

W dniu 21.09.2016 o 18:38, Junio C Hamano pisze:
> Jakub Narębski <jnareb@gmail.com> writes:
>> W dniu 06.09.2016 o 21:00, Ian Kelling pisze:
>>
>>> The highlight binary can detect language by shebang when we can't tell
>>> the syntax type by the name of the file. 
>>
>> Was it something always present among highlight[1] binary capabilities,
>> or is it something present only in new enough highlight app?  Or only
>> in some specific fork / specific binary?  I couldn't find language
>> detection in highlight[1] documentation...
>> ...
>> Thank you for your work on this patch,
> 
> Thanks for reviewing.  It seems that there will be further exchange
> needed before I can pick it up?

Yes, I think so.

-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] gitweb: use highlight's shebang detection
  2016-09-20 20:22 ` Jakub Narębski
  2016-09-21 16:38   ` Junio C Hamano
@ 2016-09-21 22:15   ` Ian Kelling
  2016-09-21 22:18   ` Ian Kelling
  2 siblings, 0 replies; 23+ messages in thread
From: Ian Kelling @ 2016-09-21 22:15 UTC (permalink / raw)
  To: Jakub Narębski, git

On Tue, Sep 20, 2016, at 01:22 PM, Jakub Narębski wrote:
> W dniu 06.09.2016 o 21:00, Ian Kelling pisze:
>
> > The highlight binary can detect language by shebang when we can't tell
> > the syntax type by the name of the file.
>
> Was it something always present among highlight[1] binary capabilities,
> or is it something present only in new enough highlight app?  Or only
> in some specific fork / specific binary?  I couldn't find language
> detection in highlight[1] documentation...
>
> [1]: http://www.andre-simon.de/doku/highlight/en/highlight.php

Search for the word shebang, it's mentioned twice.

>
> If this feature is available only for some version, or for some
> highlighters, gitweb would have to provide an option to configure
> it.  It might be an additional configuration variable, it might
> be a special value in the %highlight_basename or %highlight_ext.

Good question. It was added upstream in 2007, and I tested that it's
functioning in the earliest distros I have easy access to: ubuntu 14.04
and debian wheezy.

>
> >                                          To use highlight's shebang
> > detection, add highlight to the pipeline whenever highlight is enabled.
>
> This describes what this patch does, but the sentence feels
> a bit convoluted, as it is stated.
>

Agreed. I've changed it in v2 of the patch, and perhaps this will make
the rest of the patch clearer too. The new paragraph is:

    The highlight binary can detect language by shebang when we can't
    tell
    the syntax type by the name of the file. In that case, pass the blob
    to "highlight --force" and the resulting html will have markup for
    highlighting if the language was detected.



> >
> > Document the shebang detection and add a test which exercises it in
> > t/t9500-gitweb-standalone-no-errors.sh.
>
> Nice!
>
> >
> > Signed-off-by: Ian Kelling <ian@iankelling.org>
> > ---
> >
> > Notes:
> >     I wondered if adding highlight to the pipeline would make viewing a blob
> >     with no highlighting take longer but it did not on my computer. I found
> >     no noticeable impact on small files and strangely, on a 159k file, it
> >     took 7% less time averaged over several requests.
>
> Strange.  I would guess that invoking separate binary and perl would
> always
> add to the time (especially on operation systems where forking / running
> command is expensive... though those are not often used with web servers,
> isn't it).

I dug into this a little more, and I think it's because when we call
highlight, we later call sanitize() instead of esc_html(). sanitize() is
faster and makes up for the extra time highlight takes. I ran a test on
my machine calling sanitize and esc_html on each line of gitweb.perl 100
times: 7.4s for sanitize, 12.4s for esc_html.

>
> >
> >  Documentation/gitweb.conf.txt          | 21 ++++++++++++++-------
> >  gitweb/gitweb.perl                     | 10 +++++-----
> >  t/t9500-gitweb-standalone-no-errors.sh | 18 +++++++++++++-----
> >  3 files changed, 32 insertions(+), 17 deletions(-)
> >
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > index a79e350..e632089 100644
> > --- a/Documentation/gitweb.conf.txt
> > +++ b/Documentation/gitweb.conf.txt
> > @@ -246,13 +246,20 @@ $highlight_bin::
> >  	Note that 'highlight' feature must be set for gitweb to actually
> >  	use syntax highlighting.
> >  +
> > -*NOTE*: if you want to add support for new file type (supported by
> > -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> > -or `%highlight_basename`, depending on whether you detect type of file
> > -based on extension (for example "sh") or on its basename (for example
> > -"Makefile").  The keys of these hashes are extension and basename,
> > -respectively, and value for given key is name of syntax to be passed via
> > -`--syntax <syntax>` to highlighter.
> > +*NOTE*: for a file to be highlighted, its syntax type must be detected
> > +and that syntax must be supported by "highlight".  The default syntax
> > +detection is minimal, and there are many supported syntax types with no
> > +detection by default.  There are three options for adding syntax
> > +detection.  The first and second priority are `%highlight_basename` and
> > +`%highlight_ext`, which detect based on basename (the full filename, for
> > +example "Makefile") and extension (for example "sh").  The keys of these
> > +hashes are the basename and extension, respectively, and the value for a
> > +given key is the name of the syntax to be passed via `--syntax <syntax>`
> > +to "highlight".  The last priority is the "highlight" configuration of
> > +`Shebang` regular expressions to detect the language based on the first
> > +line in the file, (for example, matching the line "#!/bin/bash").  See
> > +the highlight documentation and the default config at
> > +/etc/highlight/filetypes.conf for more details.
>
> All right; in addition to expanding the docs, it also improves them.

Noted in v2 commit log.

>
> >  +
> >  For example if repositories you are hosting use "phtml" extension for
> >  PHP files, and you want to have correct syntax-highlighting for those
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index 33d701d..a672181 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -3931,15 +3931,16 @@ sub guess_file_syntax {
> >  # or return original FD if no highlighting
> >  sub run_highlighter {
> >  	my ($fd, $highlight, $syntax) = @_;
> > -	return $fd unless ($highlight && defined $syntax);
> > +	return $fd unless ($highlight);
>
> Here we would have check if we want / can invoke "highlight".

I think it's right as is. $highlight says the user wants highlighting,
and now we still want to invoke it when we do not know $syntax.

While I was double checking, I noticed there was an unused parameter to
guess_file_syntax(), $mimetype, which could easily make this not
obvious. I removed it in v2.

>
> >
> >  	close $fd;
> > +	my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
> >  	open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
> >  	          quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse',
> >  	            '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
> >  	            '--', "-fe=$fallback_encoding")." | ".
> >  	          quote_command($highlight_bin).
> > -	          " --replace-tabs=8 --fragment --syntax $syntax |"
> > +	          " --replace-tabs=8 --fragment $syntax_arg |"
> >  		or die_error(500, "Couldn't open file or run syntax highlighter");
> >  	return $fd;
> >  }
>
> All right (well, except for the question asked at the beginning).
>
> > @@ -7063,8 +7064,7 @@ sub git_blob {
> >
> >  	my $highlight = gitweb_check_feature('highlight');
> >  	my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
> > -	$fd = run_highlighter($fd, $highlight, $syntax)
> > -		if $syntax;
>
> Hmmm... it looks like the old code checked if there was $syntax defined
> twice: once for truthy value in caller, once for definedness in
> run_highlighter().
>
> > +	$fd = run_highlighter($fd, $highlight, $syntax);
>
> All right.
>
> >
> >  	git_header_html(undef, $expires);
> >  	my $formats_nav = '';
> > @@ -7117,7 +7117,7 @@ sub git_blob {
> >  			$line = untabify($line);
> >  			printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
> >  			       $nr, esc_attr(href(-replay => 1)), $nr, $nr,
> > -			       $syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
> > +			       $highlight ? sanitize($line) : esc_html($line, -nbsp=>1);
>
> Oh, well.  It looks like checking if highlighter could be run in
> run_highlight() is wrong, as the caller (that is, git_blob()) needs
> to know if it is using "highlight" output (which is HTML) or raw blob
> contents (which needs to be HTML-escaped).

Per previous comment, run_highlight() is right, and we use the same
condition here to know if the highlight binary was used. If highlight
was run with --force and did not detect a language in the shebang, it
still outputs html but without adding the highlight markup.

>
> >  		}
> >  	}
> >  	close $fd
> > diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
> > index e94b2f1..9e5fcfe 100755
> > --- a/t/t9500-gitweb-standalone-no-errors.sh
> > +++ b/t/t9500-gitweb-standalone-no-errors.sh
> > @@ -702,12 +702,20 @@ test_expect_success HIGHLIGHT \
> >  	 gitweb_run "p=.git;a=blob;f=file"'
> >
> >  test_expect_success HIGHLIGHT \
> > -	'syntax highlighting (highlighted, shell script)' \
> > +	'syntax highlighting (highlighted, shell script shebang)' \
>
> It would be nice to have in test name that it checks if highlighter
> autodetection works, or at least doesn't crash gitweb.

I've updated it to:
syntax highlighting (highlighter language autodetection)
I'm happy to use any suggestion you have.

>
> >  	'git config gitweb.highlight yes &&
> > -	 echo "#!/usr/bin/sh" > test.sh &&
> > -	 git add test.sh &&
> > -	 git commit -m "Add test.sh" &&
> > -	 gitweb_run "p=.git;a=blob;f=test.sh"'
> > +	 echo "#!/usr/bin/sh" > test &&
> > +	 git add test &&
> > +	 git commit -m "Add test" &&
> > +	 gitweb_run "p=.git;a=blob;f=test"'
> > +
> > +test_expect_success HIGHLIGHT \
> > +	'syntax highlighting (highlighted, header file)' \
>
> Do we check explicit syntax knowledge (based on the extension),
> or autodetect again?

We have explicit syntax knowledge here. My thinking was this would
modify the existing test so that it highlights a different language than
the autodetected one, but the patch is simpler if I just make the
autodetcted one be a different language. I've done that in v2.

>
> > +	'git config gitweb.highlight yes &&
> > +	 echo "#define ANSWER 42" > test.h &&
> > +	 git add test.h &&
> > +	 git commit -m "Add test.h" &&
> > +	 gitweb_run "p=.git;a=blob;f=test.h"'
> >
> >  # ----------------------------------------------------------------------
> >  # forks of projects
> >
>
> Thank you for your work on this patch,
> --
> Jakub Narębski

Thank you for reviewing it!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] gitweb: use highlight's shebang detection
  2016-09-20 20:22 ` Jakub Narębski
  2016-09-21 16:38   ` Junio C Hamano
  2016-09-21 22:15   ` Ian Kelling
@ 2016-09-21 22:18   ` Ian Kelling
  2016-09-21 22:24     ` Ian Kelling
  2016-09-22 22:50     ` [PATCH v2] " Jakub Narębski
  2 siblings, 2 replies; 23+ messages in thread
From: Ian Kelling @ 2016-09-21 22:18 UTC (permalink / raw)
  To: git; +Cc: jnareb

The highlight binary can detect language by shebang when we can't tell
the syntax type by the name of the file. In that case, pass the blob
to "highlight --force" and the resulting html will have markup for
highlighting if the language was detected.

Document the feature and improve syntax highlight documentation, add
test to ensure gitweb doesn't crash when language detection is used,
and remove an unused parameter from gitweb_check_feature().

Signed-off-by: Ian Kelling <ian@iankelling.org>
---
 Documentation/gitweb.conf.txt          | 21 ++++++++++++++-------
 gitweb/gitweb.perl                     | 14 +++++++-------
 t/t9500-gitweb-standalone-no-errors.sh |  8 ++++++++
 3 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index a79e350..e632089 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -246,13 +246,20 @@ $highlight_bin::
 	Note that 'highlight' feature must be set for gitweb to actually
 	use syntax highlighting.
 +
-*NOTE*: if you want to add support for new file type (supported by
-"highlight" but not used by gitweb), you need to modify `%highlight_ext`
-or `%highlight_basename`, depending on whether you detect type of file
-based on extension (for example "sh") or on its basename (for example
-"Makefile").  The keys of these hashes are extension and basename,
-respectively, and value for given key is name of syntax to be passed via
-`--syntax <syntax>` to highlighter.
+*NOTE*: for a file to be highlighted, its syntax type must be detected
+and that syntax must be supported by "highlight".  The default syntax
+detection is minimal, and there are many supported syntax types with no
+detection by default.  There are three options for adding syntax
+detection.  The first and second priority are `%highlight_basename` and
+`%highlight_ext`, which detect based on basename (the full filename, for
+example "Makefile") and extension (for example "sh").  The keys of these
+hashes are the basename and extension, respectively, and the value for a
+given key is the name of the syntax to be passed via `--syntax <syntax>`
+to "highlight".  The last priority is the "highlight" configuration of
+`Shebang` regular expressions to detect the language based on the first
+line in the file, (for example, matching the line "#!/bin/bash").  See
+the highlight documentation and the default config at
+/etc/highlight/filetypes.conf for more details.
 +
 For example if repositories you are hosting use "phtml" extension for
 PHP files, and you want to have correct syntax-highlighting for those
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 33d701d..44094f4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3913,7 +3913,7 @@ sub blob_contenttype {
 # guess file syntax for syntax highlighting; return undef if no highlighting
 # the name of syntax can (in the future) depend on syntax highlighter used
 sub guess_file_syntax {
-	my ($highlight, $mimetype, $file_name) = @_;
+	my ($highlight, $file_name) = @_;
 	return undef unless ($highlight && defined $file_name);
 	my $basename = basename($file_name, '.in');
 	return $highlight_basename{$basename}
@@ -3931,15 +3931,16 @@ sub guess_file_syntax {
 # or return original FD if no highlighting
 sub run_highlighter {
 	my ($fd, $highlight, $syntax) = @_;
-	return $fd unless ($highlight && defined $syntax);
+	return $fd unless ($highlight);
 
 	close $fd;
+	my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
 	open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
 	          quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse',
 	            '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
 	            '--', "-fe=$fallback_encoding")." | ".
 	          quote_command($highlight_bin).
-	          " --replace-tabs=8 --fragment --syntax $syntax |"
+	          " --replace-tabs=8 --fragment $syntax_arg |"
 		or die_error(500, "Couldn't open file or run syntax highlighter");
 	return $fd;
 }
@@ -7062,9 +7063,8 @@ sub git_blob {
 	$have_blame &&= ($mimetype =~ m!^text/!);
 
 	my $highlight = gitweb_check_feature('highlight');
-	my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
-	$fd = run_highlighter($fd, $highlight, $syntax)
-		if $syntax;
+	my $syntax = guess_file_syntax($highlight, $file_name);
+	$fd = run_highlighter($fd, $highlight, $syntax);
 
 	git_header_html(undef, $expires);
 	my $formats_nav = '';
@@ -7117,7 +7117,7 @@ sub git_blob {
 			$line = untabify($line);
 			printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
 			       $nr, esc_attr(href(-replay => 1)), $nr, $nr,
-			       $syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
+			       $highlight ? sanitize($line) : esc_html($line, -nbsp=>1);
 		}
 	}
 	close $fd
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index e94b2f1..576db6d 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -709,6 +709,14 @@ test_expect_success HIGHLIGHT \
 	 git commit -m "Add test.sh" &&
 	 gitweb_run "p=.git;a=blob;f=test.sh"'
 
+test_expect_success HIGHLIGHT \
+	'syntax highlighting (highlighter language autodetection)' \
+	'git config gitweb.highlight yes &&
+	 echo "#!/usr/bin/ruby" > test &&
+	 git add test &&
+	 git commit -m "Add test" &&
+	 gitweb_run "p=.git;a=blob;f=test"'
+
 # ----------------------------------------------------------------------
 # forks of projects
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] gitweb: use highlight's shebang detection
  2016-09-21 22:18   ` Ian Kelling
@ 2016-09-21 22:24     ` Ian Kelling
  2016-09-22 22:50     ` [PATCH v2] " Jakub Narębski
  1 sibling, 0 replies; 23+ messages in thread
From: Ian Kelling @ 2016-09-21 22:24 UTC (permalink / raw)
  To: Ian Kelling, git; +Cc: jnareb

fyi: I mistakenly did not include v2 in the subject of the last message.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] gitweb: use highlight's shebang detection
  2016-09-21 22:18   ` Ian Kelling
  2016-09-21 22:24     ` Ian Kelling
@ 2016-09-22 22:50     ` Jakub Narębski
  2016-09-23  9:08       ` Ian Kelling
  2016-09-23  9:08       ` [PATCH v3 1/2] gitweb: remove unused function parameter Ian Kelling
  1 sibling, 2 replies; 23+ messages in thread
From: Jakub Narębski @ 2016-09-22 22:50 UTC (permalink / raw)
  To: Ian Kelling, git

W dniu 22.09.2016 o 00:18, Ian Kelling napisał:

> The highlight binary can detect language by shebang when we can't tell
> the syntax type by the name of the file. In that case, pass the blob
> to "highlight --force" and the resulting html will have markup for
> highlighting if the language was detected.

This description feels a bit convoluted. Perhaps something like this:

  The "highlight" binary can, in some cases, determine the language type
  by the means of file contents, for example the shebang in the first line
  for some scripting languages.  Make use of this autodetection for files
  which syntax is not known by gitweb.  In that case, pass the blob
  contents to "highlight --force"; the parameter is needed to make it
  always generate HTML output (which includes HTML-escaping).

Also, we might want to have the information about performance of this
solution either in the commit message, or in commit comments.

> 
> Document the feature and improve syntax highlight documentation, add
> test to ensure gitweb doesn't crash when language detection is used,

All right.

> and remove an unused parameter from gitweb_check_feature().

First, that is guess_file_syntax(), not gitweb_check_feature().
Second, this change could be made into independent patch, for example
preparatory one.

> 
> Signed-off-by: Ian Kelling <ian@iankelling.org>
> ---
>  Documentation/gitweb.conf.txt          | 21 ++++++++++++++-------
>  gitweb/gitweb.perl                     | 14 +++++++-------
>  t/t9500-gitweb-standalone-no-errors.sh |  8 ++++++++
>  3 files changed, 29 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> index a79e350..e632089 100644
> --- a/Documentation/gitweb.conf.txt
> +++ b/Documentation/gitweb.conf.txt
> @@ -246,13 +246,20 @@ $highlight_bin::
>  	Note that 'highlight' feature must be set for gitweb to actually
>  	use syntax highlighting.
>  +
> -*NOTE*: if you want to add support for new file type (supported by
> -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> -or `%highlight_basename`, depending on whether you detect type of file
> -based on extension (for example "sh") or on its basename (for example
> -"Makefile").  The keys of these hashes are extension and basename,
> -respectively, and value for given key is name of syntax to be passed via
> -`--syntax <syntax>` to highlighter.
> +*NOTE*: for a file to be highlighted, its syntax type must be detected
> +and that syntax must be supported by "highlight".  The default syntax
> +detection is minimal, and there are many supported syntax types with no
> +detection by default.  There are three options for adding syntax
> +detection.  The first and second priority are `%highlight_basename` and
> +`%highlight_ext`, which detect based on basename (the full filename, for
> +example "Makefile") and extension (for example "sh").  The keys of these
> +hashes are the basename and extension, respectively, and the value for a
> +given key is the name of the syntax to be passed via `--syntax <syntax>`
> +to "highlight".  The last priority is the "highlight" configuration of
> +`Shebang` regular expressions to detect the language based on the first
> +line in the file, (for example, matching the line "#!/bin/bash").  See
> +the highlight documentation and the default config at
> +/etc/highlight/filetypes.conf for more details.
>  +

I think the rewrite is a bit more readable.

>  For example if repositories you are hosting use "phtml" extension for
>  PHP files, and you want to have correct syntax-highlighting for those
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 33d701d..44094f4 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3913,7 +3913,7 @@ sub blob_contenttype {
>  # guess file syntax for syntax highlighting; return undef if no highlighting
>  # the name of syntax can (in the future) depend on syntax highlighter used
>  sub guess_file_syntax {
> -	my ($highlight, $mimetype, $file_name) = @_;
> +	my ($highlight, $file_name) = @_;

Right.

>  	return undef unless ($highlight && defined $file_name);
>  	my $basename = basename($file_name, '.in');
>  	return $highlight_basename{$basename}
> @@ -3931,15 +3931,16 @@ sub guess_file_syntax {
>  # or return original FD if no highlighting
>  sub run_highlighter {
>  	my ($fd, $highlight, $syntax) = @_;
> -	return $fd unless ($highlight && defined $syntax);
> +	return $fd unless ($highlight);

Run highlighter if it is defined, even if gitweb doesn't know syntax, right.

>  
>  	close $fd;
> +	my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
>  	open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
>  	          quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse',
>  	            '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
>  	            '--', "-fe=$fallback_encoding")." | ".
>  	          quote_command($highlight_bin).
> -	          " --replace-tabs=8 --fragment --syntax $syntax |"
> +	          " --replace-tabs=8 --fragment $syntax_arg |"

Use '--force' if syntax is unknown, right.

>  		or die_error(500, "Couldn't open file or run syntax highlighter");
>  	return $fd;
>  }
> @@ -7062,9 +7063,8 @@ sub git_blob {
>  	$have_blame &&= ($mimetype =~ m!^text/!);
>  
>  	my $highlight = gitweb_check_feature('highlight');
> -	my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
> -	$fd = run_highlighter($fd, $highlight, $syntax)
> -		if $syntax;
> +	my $syntax = guess_file_syntax($highlight, $file_name);
> +	$fd = run_highlighter($fd, $highlight, $syntax);

Remove unused parameter from callsite, *and* run highlighter even if we
don't know syntax.

>  
>  	git_header_html(undef, $expires);
>  	my $formats_nav = '';
> @@ -7117,7 +7117,7 @@ sub git_blob {
>  			$line = untabify($line);
>  			printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
>  			       $nr, esc_attr(href(-replay => 1)), $nr, $nr,
> -			       $syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
> +			       $highlight ? sanitize($line) : esc_html($line, -nbsp=>1);

This is a bit of code duplication / sync from run_highlighter(), but
it is not your fault; it was there (and I don't know how to improve it).

>  		}
>  	}
>  	close $fd
> diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
> index e94b2f1..576db6d 100755
> --- a/t/t9500-gitweb-standalone-no-errors.sh
> +++ b/t/t9500-gitweb-standalone-no-errors.sh

Nice.

> @@ -709,6 +709,14 @@ test_expect_success HIGHLIGHT \
>  	 git commit -m "Add test.sh" &&
>  	 gitweb_run "p=.git;a=blob;f=test.sh"'
>  
> +test_expect_success HIGHLIGHT \
> +	'syntax highlighting (highlighter language autodetection)' \
> +	'git config gitweb.highlight yes &&

Modern way would be

  +	'test_config gitweb.highlight yes &&

but other tests in this file do not use it.

> +	 echo "#!/usr/bin/ruby" > test &&

Preferred style would be

  +	 echo "#!/usr/bin/ruby" >test &&

but other tests in this file do not use it.

Sidenote: why Ruby, and not sh / bash, Perl or Python?

> +	 git add test &&
> +	 git commit -m "Add test" &&
> +	 gitweb_run "p=.git;a=blob;f=test"'
> +
>  # ----------------------------------------------------------------------
>  # forks of projects
>  
> 

Thank you for your work.
-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] gitweb: use highlight's shebang detection
  2016-09-22 22:50     ` [PATCH v2] " Jakub Narębski
@ 2016-09-23  9:08       ` Ian Kelling
  2016-09-23  9:08       ` [PATCH v3 1/2] gitweb: remove unused function parameter Ian Kelling
  1 sibling, 0 replies; 23+ messages in thread
From: Ian Kelling @ 2016-09-23  9:08 UTC (permalink / raw)
  To: Jakub Narębski, git

On Thu, Sep 22, 2016, at 03:50 PM, Jakub Narębski wrote:
> W dniu 22.09.2016 o 00:18, Ian Kelling napisał:
>
> > The highlight binary can detect language by shebang when we can't tell
> > the syntax type by the name of the file. In that case, pass the blob
> > to "highlight --force" and the resulting html will have markup for
> > highlighting if the language was detected.
>
> This description feels a bit convoluted. Perhaps something like this:
>
>   The "highlight" binary can, in some cases, determine the language type
>   by the means of file contents, for example the shebang in the first
>   line
>   for some scripting languages.  Make use of this autodetection for files
>   which syntax is not known by gitweb.  In that case, pass the blob
>   contents to "highlight --force"; the parameter is needed to make it
>   always generate HTML output (which includes HTML-escaping).

Nice. Using it in v3.

>
> Also, we might want to have the information about performance of this
> solution either in the commit message, or in commit comments.

I tested it more rigorously and added to v3 commit message.

>
> >
> > Document the feature and improve syntax highlight documentation, add
> > test to ensure gitweb doesn't crash when language detection is used,
>
> All right.
>
> > and remove an unused parameter from gitweb_check_feature().
>
> First, that is guess_file_syntax(), not gitweb_check_feature().
> Second, this change could be made into independent patch, for example
> preparatory one.


Oops. I split it out in v3.

>
> >
> > Signed-off-by: Ian Kelling <ian@iankelling.org>
> > ---
> >  Documentation/gitweb.conf.txt          | 21 ++++++++++++++-------
> >  gitweb/gitweb.perl                     | 14 +++++++-------
> >  t/t9500-gitweb-standalone-no-errors.sh |  8 ++++++++
> >  3 files changed, 29 insertions(+), 14 deletions(-)
> >
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > index a79e350..e632089 100644
> > --- a/Documentation/gitweb.conf.txt
> > +++ b/Documentation/gitweb.conf.txt
> > @@ -246,13 +246,20 @@ $highlight_bin::
> >  	Note that 'highlight' feature must be set for gitweb to actually
> >  	use syntax highlighting.
> >  +
> > -*NOTE*: if you want to add support for new file type (supported by
> > -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> > -or `%highlight_basename`, depending on whether you detect type of file
> > -based on extension (for example "sh") or on its basename (for example
> > -"Makefile").  The keys of these hashes are extension and basename,
> > -respectively, and value for given key is name of syntax to be passed via
> > -`--syntax <syntax>` to highlighter.
> > +*NOTE*: for a file to be highlighted, its syntax type must be detected
> > +and that syntax must be supported by "highlight".  The default syntax
> > +detection is minimal, and there are many supported syntax types with no
> > +detection by default.  There are three options for adding syntax
> > +detection.  The first and second priority are `%highlight_basename` and
> > +`%highlight_ext`, which detect based on basename (the full filename, for
> > +example "Makefile") and extension (for example "sh").  The keys of these
> > +hashes are the basename and extension, respectively, and the value for a
> > +given key is the name of the syntax to be passed via `--syntax <syntax>`
> > +to "highlight".  The last priority is the "highlight" configuration of
> > +`Shebang` regular expressions to detect the language based on the first
> > +line in the file, (for example, matching the line "#!/bin/bash").  See
> > +the highlight documentation and the default config at
> > +/etc/highlight/filetypes.conf for more details.
> >  +
>
> I think the rewrite is a bit more readable.
>
> >  For example if repositories you are hosting use "phtml" extension for
> >  PHP files, and you want to have correct syntax-highlighting for those
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index 33d701d..44094f4 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -3913,7 +3913,7 @@ sub blob_contenttype {
> >  # guess file syntax for syntax highlighting; return undef if no highlighting
> >  # the name of syntax can (in the future) depend on syntax highlighter used
> >  sub guess_file_syntax {
> > -	my ($highlight, $mimetype, $file_name) = @_;
> > +	my ($highlight, $file_name) = @_;
>
> Right.
>
> >  	return undef unless ($highlight && defined $file_name);
> >  	my $basename = basename($file_name, '.in');
> >  	return $highlight_basename{$basename}
> > @@ -3931,15 +3931,16 @@ sub guess_file_syntax {
> >  # or return original FD if no highlighting
> >  sub run_highlighter {
> >  	my ($fd, $highlight, $syntax) = @_;
> > -	return $fd unless ($highlight && defined $syntax);
> > +	return $fd unless ($highlight);
>
> Run highlighter if it is defined, even if gitweb doesn't know syntax,
> right.
>
> >
> >  	close $fd;
> > +	my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
> >  	open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
> >  	          quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse',
> >  	            '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
> >  	            '--', "-fe=$fallback_encoding")." | ".
> >  	          quote_command($highlight_bin).
> > -	          " --replace-tabs=8 --fragment --syntax $syntax |"
> > +	          " --replace-tabs=8 --fragment $syntax_arg |"
>
> Use '--force' if syntax is unknown, right.
>
> >  		or die_error(500, "Couldn't open file or run syntax highlighter");
> >  	return $fd;
> >  }
> > @@ -7062,9 +7063,8 @@ sub git_blob {
> >  	$have_blame &&= ($mimetype =~ m!^text/!);
> >
> >  	my $highlight = gitweb_check_feature('highlight');
> > -	my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
> > -	$fd = run_highlighter($fd, $highlight, $syntax)
> > -		if $syntax;
> > +	my $syntax = guess_file_syntax($highlight, $file_name);
> > +	$fd = run_highlighter($fd, $highlight, $syntax);
>
> Remove unused parameter from callsite, *and* run highlighter even if we
> don't know syntax.
>
> >
> >  	git_header_html(undef, $expires);
> >  	my $formats_nav = '';
> > @@ -7117,7 +7117,7 @@ sub git_blob {
> >  			$line = untabify($line);
> >  			printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
> >  			       $nr, esc_attr(href(-replay => 1)), $nr, $nr,
> > -			       $syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
> > +			       $highlight ? sanitize($line) : esc_html($line, -nbsp=>1);
>
> This is a bit of code duplication / sync from run_highlighter(), but
> it is not your fault; it was there (and I don't know how to improve it).
>
> >  		}
> >  	}
> >  	close $fd
> > diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
> > index e94b2f1..576db6d 100755
> > --- a/t/t9500-gitweb-standalone-no-errors.sh
> > +++ b/t/t9500-gitweb-standalone-no-errors.sh
>
> Nice.
>
> > @@ -709,6 +709,14 @@ test_expect_success HIGHLIGHT \
> >  	 git commit -m "Add test.sh" &&
> >  	 gitweb_run "p=.git;a=blob;f=test.sh"'
> >
> > +test_expect_success HIGHLIGHT \
> > +	'syntax highlighting (highlighter language autodetection)' \
> > +	'git config gitweb.highlight yes &&
>
> Modern way would be
>
>   +       'test_config gitweb.highlight yes &&
>
> but other tests in this file do not use it.
>
> > +	 echo "#!/usr/bin/ruby" > test &&
>
> Preferred style would be
>
>   +        echo "#!/usr/bin/ruby" >test &&
>
> but other tests in this file do not use it.

Agreed, but leaving it as is for consistency.

>
> Sidenote: why Ruby, and not sh / bash, Perl or Python?

Not sh / bash, just to exercise more functionality of highlight by using
a different language than the other test. ruby just was the first thing
to come to mind since I've worked with it recently, but since you made
me think of it, perl is more likely to exist in the builtin config for
longer, and it seems a bit more fitting with gitweb, so its perl in v3.

>
> > +	 git add test &&
> > +	 git commit -m "Add test" &&
> > +	 gitweb_run "p=.git;a=blob;f=test"'
> > +
> >  # ----------------------------------------------------------------------
> >  # forks of projects
> >
> >
>
> Thank you for your work.
> --
> Jakub Narębski
>

The only changes in v3 are the ones I described here.

Thank you for reviewing this.
--
Ian Kelling

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/2] gitweb: remove unused function parameter
  2016-09-22 22:50     ` [PATCH v2] " Jakub Narębski
  2016-09-23  9:08       ` Ian Kelling
@ 2016-09-23  9:08       ` Ian Kelling
  2016-09-23  9:08         ` [PATCH v3 2/2] gitweb: use highlight's shebang detection Ian Kelling
  2016-09-23 19:44         ` [PATCH v3 1/2] gitweb: remove unused function parameter Jakub Narębski
  1 sibling, 2 replies; 23+ messages in thread
From: Ian Kelling @ 2016-09-23  9:08 UTC (permalink / raw)
  To: git; +Cc: jnareb

Signed-off-by: Ian Kelling <ian@iankelling.org>
---
 gitweb/gitweb.perl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 33d701d..6cb4280 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3913,7 +3913,7 @@ sub blob_contenttype {
 # guess file syntax for syntax highlighting; return undef if no highlighting
 # the name of syntax can (in the future) depend on syntax highlighter used
 sub guess_file_syntax {
-	my ($highlight, $mimetype, $file_name) = @_;
+	my ($highlight, $file_name) = @_;
 	return undef unless ($highlight && defined $file_name);
 	my $basename = basename($file_name, '.in');
 	return $highlight_basename{$basename}
@@ -7062,7 +7062,7 @@ sub git_blob {
 	$have_blame &&= ($mimetype =~ m!^text/!);
 
 	my $highlight = gitweb_check_feature('highlight');
-	my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
+	my $syntax = guess_file_syntax($highlight, $file_name);
 	$fd = run_highlighter($fd, $highlight, $syntax)
 		if $syntax;
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/2] gitweb: use highlight's shebang detection
  2016-09-23  9:08       ` [PATCH v3 1/2] gitweb: remove unused function parameter Ian Kelling
@ 2016-09-23  9:08         ` Ian Kelling
  2016-09-23 22:15           ` Jakub Narębski
  2016-09-24 22:32           ` [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter Ian Kelling
  2016-09-23 19:44         ` [PATCH v3 1/2] gitweb: remove unused function parameter Jakub Narębski
  1 sibling, 2 replies; 23+ messages in thread
From: Ian Kelling @ 2016-09-23  9:08 UTC (permalink / raw)
  To: git; +Cc: jnareb

The "highlight" binary can, in some cases, determine the language type
by the means of file contents, for example the shebang in the first line
for some scripting languages.  Make use of this autodetection for files
which syntax is not known by gitweb.  In that case, pass the blob
contents to "highlight --force"; the parameter is needed to make it
always generate HTML output (which includes HTML-escaping).

Although we now run highlight on files which do not end up highlighted,
performance is virtually unaffected because when we call highlight, we
also call sanitize() instead of esc_html(), which is significantly
slower. After curling blob view of unhighlighted large and small text
files of perl code and license text 100 times each on a local
Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in
request time for all file types.

Document the feature and improve syntax highlight documentation, add
test to ensure gitweb doesn't crash when language detection is used.

Signed-off-by: Ian Kelling <ian@iankelling.org>
---
 Documentation/gitweb.conf.txt          | 21 ++++++++++++++-------
 gitweb/gitweb.perl                     | 10 +++++-----
 t/t9500-gitweb-standalone-no-errors.sh |  8 ++++++++
 3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index a79e350..e632089 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -246,13 +246,20 @@ $highlight_bin::
 	Note that 'highlight' feature must be set for gitweb to actually
 	use syntax highlighting.
 +
-*NOTE*: if you want to add support for new file type (supported by
-"highlight" but not used by gitweb), you need to modify `%highlight_ext`
-or `%highlight_basename`, depending on whether you detect type of file
-based on extension (for example "sh") or on its basename (for example
-"Makefile").  The keys of these hashes are extension and basename,
-respectively, and value for given key is name of syntax to be passed via
-`--syntax <syntax>` to highlighter.
+*NOTE*: for a file to be highlighted, its syntax type must be detected
+and that syntax must be supported by "highlight".  The default syntax
+detection is minimal, and there are many supported syntax types with no
+detection by default.  There are three options for adding syntax
+detection.  The first and second priority are `%highlight_basename` and
+`%highlight_ext`, which detect based on basename (the full filename, for
+example "Makefile") and extension (for example "sh").  The keys of these
+hashes are the basename and extension, respectively, and the value for a
+given key is the name of the syntax to be passed via `--syntax <syntax>`
+to "highlight".  The last priority is the "highlight" configuration of
+`Shebang` regular expressions to detect the language based on the first
+line in the file, (for example, matching the line "#!/bin/bash").  See
+the highlight documentation and the default config at
+/etc/highlight/filetypes.conf for more details.
 +
 For example if repositories you are hosting use "phtml" extension for
 PHP files, and you want to have correct syntax-highlighting for those
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 6cb4280..44094f4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3931,15 +3931,16 @@ sub guess_file_syntax {
 # or return original FD if no highlighting
 sub run_highlighter {
 	my ($fd, $highlight, $syntax) = @_;
-	return $fd unless ($highlight && defined $syntax);
+	return $fd unless ($highlight);
 
 	close $fd;
+	my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
 	open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
 	          quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse',
 	            '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
 	            '--', "-fe=$fallback_encoding")." | ".
 	          quote_command($highlight_bin).
-	          " --replace-tabs=8 --fragment --syntax $syntax |"
+	          " --replace-tabs=8 --fragment $syntax_arg |"
 		or die_error(500, "Couldn't open file or run syntax highlighter");
 	return $fd;
 }
@@ -7063,8 +7064,7 @@ sub git_blob {
 
 	my $highlight = gitweb_check_feature('highlight');
 	my $syntax = guess_file_syntax($highlight, $file_name);
-	$fd = run_highlighter($fd, $highlight, $syntax)
-		if $syntax;
+	$fd = run_highlighter($fd, $highlight, $syntax);
 
 	git_header_html(undef, $expires);
 	my $formats_nav = '';
@@ -7117,7 +7117,7 @@ sub git_blob {
 			$line = untabify($line);
 			printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
 			       $nr, esc_attr(href(-replay => 1)), $nr, $nr,
-			       $syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
+			       $highlight ? sanitize($line) : esc_html($line, -nbsp=>1);
 		}
 	}
 	close $fd
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index e94b2f1..6d06ed9 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -709,6 +709,14 @@ test_expect_success HIGHLIGHT \
 	 git commit -m "Add test.sh" &&
 	 gitweb_run "p=.git;a=blob;f=test.sh"'
 
+test_expect_success HIGHLIGHT \
+	'syntax highlighting (highlighter language autodetection)' \
+	'git config gitweb.highlight yes &&
+	 echo "#!/usr/bin/perl" > test &&
+	 git add test &&
+	 git commit -m "Add test" &&
+	 gitweb_run "p=.git;a=blob;f=test"'
+
 # ----------------------------------------------------------------------
 # forks of projects
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] gitweb: remove unused function parameter
  2016-09-23  9:08       ` [PATCH v3 1/2] gitweb: remove unused function parameter Ian Kelling
  2016-09-23  9:08         ` [PATCH v3 2/2] gitweb: use highlight's shebang detection Ian Kelling
@ 2016-09-23 19:44         ` Jakub Narębski
  2016-09-23 19:57           ` Junio C Hamano
  1 sibling, 1 reply; 23+ messages in thread
From: Jakub Narębski @ 2016-09-23 19:44 UTC (permalink / raw)
  To: Ian Kelling, git

W dniu 23.09.2016 o 11:08, Ian Kelling napisał:
>
> Subject: [PATCH v3 1/2] gitweb: remove unused function parameter

I think it would be better to be more descriptive, and say:

  Subject: [PATCH v3 1/2] gitweb: remove unused parameter from guess_file_syntax()

But that might be too long...

>
> Signed-off-by: Ian Kelling <ian@iankelling.org>

With, or without this change, it's nice.

Acked-by: Jakub Narębski <jnareb@gmail.com>

> ---
>  gitweb/gitweb.perl | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 33d701d..6cb4280 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3913,7 +3913,7 @@ sub blob_contenttype {
>  # guess file syntax for syntax highlighting; return undef if no highlighting
>  # the name of syntax can (in the future) depend on syntax highlighter used
>  sub guess_file_syntax {
> -	my ($highlight, $mimetype, $file_name) = @_;
> +	my ($highlight, $file_name) = @_;
>  	return undef unless ($highlight && defined $file_name);
>  	my $basename = basename($file_name, '.in');
>  	return $highlight_basename{$basename}
> @@ -7062,7 +7062,7 @@ sub git_blob {
>  	$have_blame &&= ($mimetype =~ m!^text/!);
>  
>  	my $highlight = gitweb_check_feature('highlight');
> -	my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
> +	my $syntax = guess_file_syntax($highlight, $file_name);
>  	$fd = run_highlighter($fd, $highlight, $syntax)
>  		if $syntax;
>  
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] gitweb: remove unused function parameter
  2016-09-23 19:44         ` [PATCH v3 1/2] gitweb: remove unused function parameter Jakub Narębski
@ 2016-09-23 19:57           ` Junio C Hamano
  0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2016-09-23 19:57 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Ian Kelling, git

Jakub Narębski <jnareb@gmail.com> writes:

> I think it would be better to be more descriptive, and say:
>
>   Subject: [PATCH v3 1/2] gitweb: remove unused parameter from guess_file_syntax()
> Acked-by: Jakub Narębski <jnareb@gmail.com>

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection
  2016-09-23  9:08         ` [PATCH v3 2/2] gitweb: use highlight's shebang detection Ian Kelling
@ 2016-09-23 22:15           ` Jakub Narębski
  2016-09-24 16:21             ` Jakub Narębski
  2016-09-24 22:34             ` Ian Kelling
  2016-09-24 22:32           ` [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter Ian Kelling
  1 sibling, 2 replies; 23+ messages in thread
From: Jakub Narębski @ 2016-09-23 22:15 UTC (permalink / raw)
  To: Ian Kelling, git

W dniu 23.09.2016 o 11:08, Ian Kelling napisał:

> The "highlight" binary can, in some cases, determine the language type
> by the means of file contents, for example the shebang in the first line
> for some scripting languages.  Make use of this autodetection for files
> which syntax is not known by gitweb.  In that case, pass the blob
> contents to "highlight --force"; the parameter is needed to make it
> always generate HTML output (which includes HTML-escaping).

Right.

> 
> Although we now run highlight on files which do not end up highlighted,
> performance is virtually unaffected because when we call highlight, we
> also call sanitize() instead of esc_html(), which is significantly
> slower. 

This paragraph is a bit unclear, for example it is not obvious what
"..., which is significantly slower" refers to: sanitize() or esc_html().

I think it would be better to write:

  Although we now run highlight on files which do not end up highlighted,
  performance is virtually unaffected because when we call highlight, it
  is used for escaping HTML.  In the case that highlight is used, gitweb
  calls sanitize() instead of esc_html(), and the latter is significantly
  slower (it does more, being roughly a superset of sanitize()).

>        After curling blob view of unhighlighted large and small text
> files of perl code and license text 100 times each on a local
> Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in
> request time for all file types.

Also, "curling" is not the word I would like to see. I would say:

  Simple benchmark comparing performance of 'blob' view of files without
  syntax highlighting in gitweb before and after this change indicates
  ±1% difference in request time for all file types.  Benchmark was
  performed on local instance on Debian, using Apache/2.4.23 web server
  and CGI/PSGI/FCGI/mod_perl.

      ^^^^^^^^^^^^^^^^^^^^^^--- select one

Or something like that; I'm not sure how detailed this should be.
But it is nice to have such benchmark in the commit message.

Anyway I think that adding yet another configuration toggle for selecting
whether to use "highlight" syntax autodetection or not would be just an
unnecessary complication.

Note that the performance loss might be quite higher on MS Windows, with
its higher cost of fork.  But then they probably do not configure
server-side highligher anyway.

> 
> Document the feature and improve syntax highlight documentation, add
> test to ensure gitweb doesn't crash when language detection is used.

Good.

> 
> Signed-off-by: Ian Kelling <ian@iankelling.org>
> ---
>  Documentation/gitweb.conf.txt          | 21 ++++++++++++++-------
>  gitweb/gitweb.perl                     | 10 +++++-----
>  t/t9500-gitweb-standalone-no-errors.sh |  8 ++++++++
>  3 files changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> index a79e350..e632089 100644
> --- a/Documentation/gitweb.conf.txt
> +++ b/Documentation/gitweb.conf.txt
> @@ -246,13 +246,20 @@ $highlight_bin::

We should probably say what does it mean to be "highlight"[1] compatible,
but it is outside of scope for this patch, and I think also out of scope
of this series.

>  	Note that 'highlight' feature must be set for gitweb to actually
>  	use syntax highlighting.
>  +
> -*NOTE*: if you want to add support for new file type (supported by
> -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> -or `%highlight_basename`, depending on whether you detect type of file
> -based on extension (for example "sh") or on its basename (for example
> -"Makefile").  The keys of these hashes are extension and basename,
> -respectively, and value for given key is name of syntax to be passed via
> -`--syntax <syntax>` to highlighter.
> +*NOTE*: for a file to be highlighted, its syntax type must be detected
> +and that syntax must be supported by "highlight".  The default syntax
> +detection is minimal, and there are many supported syntax types with no
> +detection by default.  There are three options for adding syntax
> +detection.  The first and second priority are `%highlight_basename` and
> +`%highlight_ext`, which detect based on basename (the full filename, for
> +example "Makefile") and extension (for example "sh").  The keys of these
> +hashes are the basename and extension, respectively, and the value for a
> +given key is the name of the syntax to be passed via `--syntax <syntax>`
> +to "highlight".  The last priority is the "highlight" configuration of
> +`Shebang` regular expressions to detect the language based on the first
> +line in the file, (for example, matching the line "#!/bin/bash").  See
> +the highlight documentation and the default config at
> +/etc/highlight/filetypes.conf for more details.

All right. I guess /etc/highlight/filetypes.conf is the standard location?

>  +
>  For example if repositories you are hosting use "phtml" extension for
>  PHP files, and you want to have correct syntax-highlighting for those
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 6cb4280..44094f4 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3931,15 +3931,16 @@ sub guess_file_syntax {
>  # or return original FD if no highlighting
>  sub run_highlighter {
>  	my ($fd, $highlight, $syntax) = @_;
> -	return $fd unless ($highlight && defined $syntax);
> +	return $fd unless ($highlight);
>  
>  	close $fd;
> +	my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
>  	open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
>  	          quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse',
>  	            '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
>  	            '--', "-fe=$fallback_encoding")." | ".
>  	          quote_command($highlight_bin).
> -	          " --replace-tabs=8 --fragment --syntax $syntax |"
> +	          " --replace-tabs=8 --fragment $syntax_arg |"
>  		or die_error(500, "Couldn't open file or run syntax highlighter");
>  	return $fd;
>  }

All right, nice and understandable.

> @@ -7063,8 +7064,7 @@ sub git_blob {
>  
>  	my $highlight = gitweb_check_feature('highlight');
>  	my $syntax = guess_file_syntax($highlight, $file_name);
> -	$fd = run_highlighter($fd, $highlight, $syntax)
> -		if $syntax;
> +	$fd = run_highlighter($fd, $highlight, $syntax);
>  
>  	git_header_html(undef, $expires);
>  	my $formats_nav = '';

Good, run unconditionally.

> @@ -7117,7 +7117,7 @@ sub git_blob {
>  			$line = untabify($line);
>  			printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
>  			       $nr, esc_attr(href(-replay => 1)), $nr, $nr,
> -			       $syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
> +			       $highlight ? sanitize($line) : esc_html($line, -nbsp=>1);
>  		}

Good, use highlighter if possible, not only if syntax is known
and highlighter is turned on.

Nice and easy to understand after earlier change.

>  	}
>  	close $fd
> diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
> index e94b2f1..6d06ed9 100755
> --- a/t/t9500-gitweb-standalone-no-errors.sh
> +++ b/t/t9500-gitweb-standalone-no-errors.sh
> @@ -709,6 +709,14 @@ test_expect_success HIGHLIGHT \
>  	 git commit -m "Add test.sh" &&
>  	 gitweb_run "p=.git;a=blob;f=test.sh"'
>  
> +test_expect_success HIGHLIGHT \
> +	'syntax highlighting (highlighter language autodetection)' \
> +	'git config gitweb.highlight yes &&
> +	 echo "#!/usr/bin/perl" > test &&
> +	 git add test &&
> +	 git commit -m "Add test" &&
> +	 gitweb_run "p=.git;a=blob;f=test"'

Nice.

> +
>  # ----------------------------------------------------------------------
>  # forks of projects
>  
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection
  2016-09-23 22:15           ` Jakub Narębski
@ 2016-09-24 16:21             ` Jakub Narębski
  2016-09-24 17:52               ` Junio C Hamano
  2016-09-24 22:35               ` Ian Kelling
  2016-09-24 22:34             ` Ian Kelling
  1 sibling, 2 replies; 23+ messages in thread
From: Jakub Narębski @ 2016-09-24 16:21 UTC (permalink / raw)
  To: Ian Kelling, git

W dniu 24.09.2016 o 00:15, Jakub Narębski pisze:
> W dniu 23.09.2016 o 11:08, Ian Kelling napisał: 

>>        After curling blob view of unhighlighted large and small text
>> files of perl code and license text 100 times each on a local
>> Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in
>> request time for all file types.
> 
> Also, "curling" is not the word I would like to see. I would say:
> 
>   Simple benchmark comparing performance of 'blob' view of files without
>   syntax highlighting in gitweb before and after this change indicates
>   ±1% difference in request time for all file types.  Benchmark was
>   performed on local instance on Debian, using Apache/2.4.23 web server
>   and CGI/PSGI/FCGI/mod_perl.
> 
>       ^^^^^^^^^^^^^^^^^^^^^^--- select one
> 
> Or something like that; I'm not sure how detailed this should be.
> But it is nice to have such benchmark in the commit message.

Sidenote: this way of benchmarking of gitweb falls between two ways of
doing a benchmark.

The first method is to simply run gitweb as a standalone script, passing
its parameters in CGI environment variables; just like the test suite
does it.  You would 'time' / 'times' it a few times, drop outliers, and
take average or a median.  With this method you don't even need to set
up a web server.

The second is to use a specialized program to benchmark the server-side
of a web page, for example 'ab' (ApacheBench), httperf, curl-loader
or JMeter.  The first one is usually distributed together with Apache
web server, so you probably have it installed already.  Those tools
provide timing statistics.

[...]
> Note that the performance loss might be quite higher on MS Windows, with
> its higher cost of fork.  But then they probably do not configure
> server-side highligher anyway.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection
  2016-09-24 16:21             ` Jakub Narębski
@ 2016-09-24 17:52               ` Junio C Hamano
  2016-09-24 22:35               ` Ian Kelling
  1 sibling, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2016-09-24 17:52 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Ian Kelling, git

Jakub Narębski <jnareb@gmail.com> writes:

>> Also, "curling" is not the word I would like to see. I would say:
>> 
>>   Simple benchmark comparing performance of 'blob' view of files without
>>   syntax highlighting in gitweb before and after this change indicates
>>   ±1% difference in request time for all file types.  Benchmark was
>>   performed on local instance on Debian, using Apache/2.4.23 web server
>>   and CGI/PSGI/FCGI/mod_perl.
>> 
>>       ^^^^^^^^^^^^^^^^^^^^^^--- select one

or state that all of them produced similar results ;-)

>> Or something like that; I'm not sure how detailed this should be.
>> But it is nice to have such benchmark in the commit message.
>
> Sidenote: this way of benchmarking of gitweb falls between two ways of
> doing a benchmark.

All good comments.  Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter
  2016-09-23  9:08         ` [PATCH v3 2/2] gitweb: use highlight's shebang detection Ian Kelling
  2016-09-23 22:15           ` Jakub Narębski
@ 2016-09-24 22:32           ` Ian Kelling
  2016-09-24 22:32             ` [PATCH v4 2/2] gitweb: use highlight's shebang detection Ian Kelling
  2016-09-25 17:57             ` [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter Jakub Narębski
  1 sibling, 2 replies; 23+ messages in thread
From: Ian Kelling @ 2016-09-24 22:32 UTC (permalink / raw)
  To: git; +Cc: jnareb

Signed-off-by: Ian Kelling <ian@iankelling.org>
---

Notes:
    The only change from v3 is a more descriptive commit message

 gitweb/gitweb.perl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 33d701d..6cb4280 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3913,7 +3913,7 @@ sub blob_contenttype {
 # guess file syntax for syntax highlighting; return undef if no highlighting
 # the name of syntax can (in the future) depend on syntax highlighter used
 sub guess_file_syntax {
-	my ($highlight, $mimetype, $file_name) = @_;
+	my ($highlight, $file_name) = @_;
 	return undef unless ($highlight && defined $file_name);
 	my $basename = basename($file_name, '.in');
 	return $highlight_basename{$basename}
@@ -7062,7 +7062,7 @@ sub git_blob {
 	$have_blame &&= ($mimetype =~ m!^text/!);
 
 	my $highlight = gitweb_check_feature('highlight');
-	my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
+	my $syntax = guess_file_syntax($highlight, $file_name);
 	$fd = run_highlighter($fd, $highlight, $syntax)
 		if $syntax;
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 2/2] gitweb: use highlight's shebang detection
  2016-09-24 22:32           ` [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter Ian Kelling
@ 2016-09-24 22:32             ` Ian Kelling
  2016-09-25 18:04               ` Jakub Narębski
  2016-09-25 17:57             ` [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter Jakub Narębski
  1 sibling, 1 reply; 23+ messages in thread
From: Ian Kelling @ 2016-09-24 22:32 UTC (permalink / raw)
  To: git; +Cc: jnareb

The "highlight" binary can, in some cases, determine the language type
by the means of file contents, for example the shebang in the first line
for some scripting languages.  Make use of this autodetection for files
which syntax is not known by gitweb.  In that case, pass the blob
contents to "highlight --force"; the parameter is needed to make it
always generate HTML output (which includes HTML-escaping).

Although we now run highlight on files which do not end up highlighted,
performance is virtually unaffected because when we call highlight, it
is used for escaping HTML.  In the case that highlight is used, gitweb
calls sanitize() instead of esc_html(), and the latter is significantly
slower (it does more, being roughly a superset of sanitize()).  Simple
benchmark comparing performance of 'blob' view of files without syntax
highlighting in gitweb before and after this change indicates ±1%
difference in request time for all file types.  Benchmark was performed
on local instance on Debian, using Apache/2.4.23 web server and CGI.

Document the feature and improve syntax highlight documentation, add
test to ensure gitweb doesn't crash when language detection is used.

Signed-off-by: Ian Kelling <ian@iankelling.org>
---

Notes:
    The only change from v3 is the commit message as suggested by Jakub
    Narębski

 Documentation/gitweb.conf.txt          | 21 ++++++++++++++-------
 gitweb/gitweb.perl                     | 10 +++++-----
 t/t9500-gitweb-standalone-no-errors.sh |  8 ++++++++
 3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index a79e350..e632089 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -246,13 +246,20 @@ $highlight_bin::
 	Note that 'highlight' feature must be set for gitweb to actually
 	use syntax highlighting.
 +
-*NOTE*: if you want to add support for new file type (supported by
-"highlight" but not used by gitweb), you need to modify `%highlight_ext`
-or `%highlight_basename`, depending on whether you detect type of file
-based on extension (for example "sh") or on its basename (for example
-"Makefile").  The keys of these hashes are extension and basename,
-respectively, and value for given key is name of syntax to be passed via
-`--syntax <syntax>` to highlighter.
+*NOTE*: for a file to be highlighted, its syntax type must be detected
+and that syntax must be supported by "highlight".  The default syntax
+detection is minimal, and there are many supported syntax types with no
+detection by default.  There are three options for adding syntax
+detection.  The first and second priority are `%highlight_basename` and
+`%highlight_ext`, which detect based on basename (the full filename, for
+example "Makefile") and extension (for example "sh").  The keys of these
+hashes are the basename and extension, respectively, and the value for a
+given key is the name of the syntax to be passed via `--syntax <syntax>`
+to "highlight".  The last priority is the "highlight" configuration of
+`Shebang` regular expressions to detect the language based on the first
+line in the file, (for example, matching the line "#!/bin/bash").  See
+the highlight documentation and the default config at
+/etc/highlight/filetypes.conf for more details.
 +
 For example if repositories you are hosting use "phtml" extension for
 PHP files, and you want to have correct syntax-highlighting for those
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 6cb4280..44094f4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3931,15 +3931,16 @@ sub guess_file_syntax {
 # or return original FD if no highlighting
 sub run_highlighter {
 	my ($fd, $highlight, $syntax) = @_;
-	return $fd unless ($highlight && defined $syntax);
+	return $fd unless ($highlight);
 
 	close $fd;
+	my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
 	open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
 	          quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse',
 	            '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
 	            '--', "-fe=$fallback_encoding")." | ".
 	          quote_command($highlight_bin).
-	          " --replace-tabs=8 --fragment --syntax $syntax |"
+	          " --replace-tabs=8 --fragment $syntax_arg |"
 		or die_error(500, "Couldn't open file or run syntax highlighter");
 	return $fd;
 }
@@ -7063,8 +7064,7 @@ sub git_blob {
 
 	my $highlight = gitweb_check_feature('highlight');
 	my $syntax = guess_file_syntax($highlight, $file_name);
-	$fd = run_highlighter($fd, $highlight, $syntax)
-		if $syntax;
+	$fd = run_highlighter($fd, $highlight, $syntax);
 
 	git_header_html(undef, $expires);
 	my $formats_nav = '';
@@ -7117,7 +7117,7 @@ sub git_blob {
 			$line = untabify($line);
 			printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
 			       $nr, esc_attr(href(-replay => 1)), $nr, $nr,
-			       $syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
+			       $highlight ? sanitize($line) : esc_html($line, -nbsp=>1);
 		}
 	}
 	close $fd
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index e94b2f1..6d06ed9 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -709,6 +709,14 @@ test_expect_success HIGHLIGHT \
 	 git commit -m "Add test.sh" &&
 	 gitweb_run "p=.git;a=blob;f=test.sh"'
 
+test_expect_success HIGHLIGHT \
+	'syntax highlighting (highlighter language autodetection)' \
+	'git config gitweb.highlight yes &&
+	 echo "#!/usr/bin/perl" > test &&
+	 git add test &&
+	 git commit -m "Add test" &&
+	 gitweb_run "p=.git;a=blob;f=test"'
+
 # ----------------------------------------------------------------------
 # forks of projects
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection
  2016-09-23 22:15           ` Jakub Narębski
  2016-09-24 16:21             ` Jakub Narębski
@ 2016-09-24 22:34             ` Ian Kelling
  1 sibling, 0 replies; 23+ messages in thread
From: Ian Kelling @ 2016-09-24 22:34 UTC (permalink / raw)
  To: Jakub Narębski, git

On Fri, Sep 23, 2016, at 03:15 PM, Jakub Narębski wrote:
> W dniu 23.09.2016 o 11:08, Ian Kelling napisał:
>
> > The "highlight" binary can, in some cases, determine the language type
> > by the means of file contents, for example the shebang in the first line
> > for some scripting languages.  Make use of this autodetection for files
> > which syntax is not known by gitweb.  In that case, pass the blob
> > contents to "highlight --force"; the parameter is needed to make it
> > always generate HTML output (which includes HTML-escaping).
>
> Right.
>
> >
> > Although we now run highlight on files which do not end up highlighted,
> > performance is virtually unaffected because when we call highlight, we
> > also call sanitize() instead of esc_html(), which is significantly
> > slower.
>
> This paragraph is a bit unclear, for example it is not obvious what
> "..., which is significantly slower" refers to: sanitize() or esc_html().
>
> I think it would be better to write:
>
>   Although we now run highlight on files which do not end up highlighted,
>   performance is virtually unaffected because when we call highlight, it
>   is used for escaping HTML.  In the case that highlight is used, gitweb
>   calls sanitize() instead of esc_html(), and the latter is significantly
>   slower (it does more, being roughly a superset of sanitize()).

Agree. Done in v4.

>
> >        After curling blob view of unhighlighted large and small text
> > files of perl code and license text 100 times each on a local
> > Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in
> > request time for all file types.
>
> Also, "curling" is not the word I would like to see. I would say:
>
>   Simple benchmark comparing performance of 'blob' view of files without
>   syntax highlighting in gitweb before and after this change indicates
>   ±1% difference in request time for all file types.  Benchmark was
>   performed on local instance on Debian, using Apache/2.4.23 web server
>   and CGI/PSGI/FCGI/mod_perl.
>
>       ^^^^^^^^^^^^^^^^^^^^^^--- select one
>
> Or something like that; I'm not sure how detailed this should be.
> But it is nice to have such benchmark in the commit message.


Sounds  good. Used it in v4.

>
> Anyway I think that adding yet another configuration toggle for selecting
> whether to use "highlight" syntax autodetection or not would be just an
> unnecessary complication.
>
> Note that the performance loss might be quite higher on MS Windows, with
> its higher cost of fork.  But then they probably do not configure
> server-side highligher anyway.
>
> >
> > Document the feature and improve syntax highlight documentation, add
> > test to ensure gitweb doesn't crash when language detection is used.
>
> Good.
>
> >
> > Signed-off-by: Ian Kelling <ian@iankelling.org>
> > ---
> >  Documentation/gitweb.conf.txt          | 21 ++++++++++++++-------
> >  gitweb/gitweb.perl                     | 10 +++++-----
> >  t/t9500-gitweb-standalone-no-errors.sh |  8 ++++++++
> >  3 files changed, 27 insertions(+), 12 deletions(-)
> >
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > index a79e350..e632089 100644
> > --- a/Documentation/gitweb.conf.txt
> > +++ b/Documentation/gitweb.conf.txt
> > @@ -246,13 +246,20 @@ $highlight_bin::
>
> We should probably say what does it mean to be "highlight"[1] compatible,
> but it is outside of scope for this patch, and I think also out of scope
> of this series.
>
> >  	Note that 'highlight' feature must be set for gitweb to actually
> >  	use syntax highlighting.
> >  +
> > -*NOTE*: if you want to add support for new file type (supported by
> > -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> > -or `%highlight_basename`, depending on whether you detect type of file
> > -based on extension (for example "sh") or on its basename (for example
> > -"Makefile").  The keys of these hashes are extension and basename,
> > -respectively, and value for given key is name of syntax to be passed via
> > -`--syntax <syntax>` to highlighter.
> > +*NOTE*: for a file to be highlighted, its syntax type must be detected
> > +and that syntax must be supported by "highlight".  The default syntax
> > +detection is minimal, and there are many supported syntax types with no
> > +detection by default.  There are three options for adding syntax
> > +detection.  The first and second priority are `%highlight_basename` and
> > +`%highlight_ext`, which detect based on basename (the full filename, for
> > +example "Makefile") and extension (for example "sh").  The keys of these
> > +hashes are the basename and extension, respectively, and the value for a
> > +given key is the name of the syntax to be passed via `--syntax <syntax>`
> > +to "highlight".  The last priority is the "highlight" configuration of
> > +`Shebang` regular expressions to detect the language based on the first
> > +line in the file, (for example, matching the line "#!/bin/bash").  See
> > +the highlight documentation and the default config at
> > +/etc/highlight/filetypes.conf for more details.
>
> All right. I guess /etc/highlight/filetypes.conf is the standard
> location?

I think so. I checked packages from homebrew, fedora, suse, debian to
confirm.

>
> >  +
> >  For example if repositories you are hosting use "phtml" extension for
> >  PHP files, and you want to have correct syntax-highlighting for those
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index 6cb4280..44094f4 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -3931,15 +3931,16 @@ sub guess_file_syntax {
> >  # or return original FD if no highlighting
> >  sub run_highlighter {
> >  	my ($fd, $highlight, $syntax) = @_;
> > -	return $fd unless ($highlight && defined $syntax);
> > +	return $fd unless ($highlight);
> >
> >  	close $fd;
> > +	my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
> >  	open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
> >  	          quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse',
> >  	            '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
> >  	            '--', "-fe=$fallback_encoding")." | ".
> >  	          quote_command($highlight_bin).
> > -	          " --replace-tabs=8 --fragment --syntax $syntax |"
> > +	          " --replace-tabs=8 --fragment $syntax_arg |"
> >  		or die_error(500, "Couldn't open file or run syntax highlighter");
> >  	return $fd;
> >  }
>
> All right, nice and understandable.
>
> > @@ -7063,8 +7064,7 @@ sub git_blob {
> >
> >  	my $highlight = gitweb_check_feature('highlight');
> >  	my $syntax = guess_file_syntax($highlight, $file_name);
> > -	$fd = run_highlighter($fd, $highlight, $syntax)
> > -		if $syntax;
> > +	$fd = run_highlighter($fd, $highlight, $syntax);
> >
> >  	git_header_html(undef, $expires);
> >  	my $formats_nav = '';
>
> Good, run unconditionally.
>
> > @@ -7117,7 +7117,7 @@ sub git_blob {
> >  			$line = untabify($line);
> >  			printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
> >  			       $nr, esc_attr(href(-replay => 1)), $nr, $nr,
> > -			       $syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
> > +			       $highlight ? sanitize($line) : esc_html($line, -nbsp=>1);
> >  		}
>
> Good, use highlighter if possible, not only if syntax is known
> and highlighter is turned on.
>
> Nice and easy to understand after earlier change.
>
> >  	}
> >  	close $fd
> > diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
> > index e94b2f1..6d06ed9 100755
> > --- a/t/t9500-gitweb-standalone-no-errors.sh
> > +++ b/t/t9500-gitweb-standalone-no-errors.sh
> > @@ -709,6 +709,14 @@ test_expect_success HIGHLIGHT \
> >  	 git commit -m "Add test.sh" &&
> >  	 gitweb_run "p=.git;a=blob;f=test.sh"'
> >
> > +test_expect_success HIGHLIGHT \
> > +	'syntax highlighting (highlighter language autodetection)' \
> > +	'git config gitweb.highlight yes &&
> > +	 echo "#!/usr/bin/perl" > test &&
> > +	 git add test &&
> > +	 git commit -m "Add test" &&
> > +	 gitweb_run "p=.git;a=blob;f=test"'
>
> Nice.
>
> > +
> >  # ----------------------------------------------------------------------
> >  # forks of projects
> >
> >
>

Thanks for great  suggestions.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection
  2016-09-24 16:21             ` Jakub Narębski
  2016-09-24 17:52               ` Junio C Hamano
@ 2016-09-24 22:35               ` Ian Kelling
  1 sibling, 0 replies; 23+ messages in thread
From: Ian Kelling @ 2016-09-24 22:35 UTC (permalink / raw)
  To: Jakub Narębski, git

On Sat, Sep 24, 2016, at 09:21 AM, Jakub Narębski wrote:
> W dniu 24.09.2016 o 00:15, Jakub Narębski pisze:
> 
> Sidenote: this way of benchmarking of gitweb falls between two ways of
> doing a benchmark.
> 
> The first method is to simply run gitweb as a standalone script, passing
> its parameters in CGI environment variables; just like the test suite
> does it.  You would 'time' / 'times' it a few times, drop outliers, and
> take average or a median.  With this method you don't even need to set
> up a web server.
> 
> The second is to use a specialized program to benchmark the server-side
> of a web page, for example 'ab' (ApacheBench), httperf, curl-loader
> or JMeter.  The first one is usually distributed together with Apache
> web server, so you probably have it installed already.  Those tools
> provide timing statistics.

Good to know. Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter
  2016-09-24 22:32           ` [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter Ian Kelling
  2016-09-24 22:32             ` [PATCH v4 2/2] gitweb: use highlight's shebang detection Ian Kelling
@ 2016-09-25 17:57             ` Jakub Narębski
  1 sibling, 0 replies; 23+ messages in thread
From: Jakub Narębski @ 2016-09-25 17:57 UTC (permalink / raw)
  To: Ian Kelling, git

W dniu 25.09.2016 o 00:32, Ian Kelling pisze:

> Subject: gitweb: remove unused guess_file_syntax() parameter
>
> Signed-off-by: Ian Kelling <ian@iankelling.org>

Acked-by: Jakub Narębski <jnareb@gmail.com>

> ---
> 
> Notes:
>     The only change from v3 is a more descriptive commit message
> 
>  gitweb/gitweb.perl | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 33d701d..6cb4280 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3913,7 +3913,7 @@ sub blob_contenttype {
>  # guess file syntax for syntax highlighting; return undef if no highlighting
>  # the name of syntax can (in the future) depend on syntax highlighter used
>  sub guess_file_syntax {
> -	my ($highlight, $mimetype, $file_name) = @_;
> +	my ($highlight, $file_name) = @_;
>  	return undef unless ($highlight && defined $file_name);
>  	my $basename = basename($file_name, '.in');
>  	return $highlight_basename{$basename}
> @@ -7062,7 +7062,7 @@ sub git_blob {
>  	$have_blame &&= ($mimetype =~ m!^text/!);
>  
>  	my $highlight = gitweb_check_feature('highlight');
> -	my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
> +	my $syntax = guess_file_syntax($highlight, $file_name);
>  	$fd = run_highlighter($fd, $highlight, $syntax)
>  		if $syntax;
>  
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 2/2] gitweb: use highlight's shebang detection
  2016-09-24 22:32             ` [PATCH v4 2/2] gitweb: use highlight's shebang detection Ian Kelling
@ 2016-09-25 18:04               ` Jakub Narębski
  2016-09-28  7:37                 ` Ian Kelling
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Narębski @ 2016-09-25 18:04 UTC (permalink / raw)
  To: Ian Kelling, git

W dniu 25.09.2016 o 00:32, Ian Kelling pisze:
> The "highlight" binary can, in some cases, determine the language type
> by the means of file contents, for example the shebang in the first line
> for some scripting languages.  Make use of this autodetection for files
> which syntax is not known by gitweb.  In that case, pass the blob
> contents to "highlight --force"; the parameter is needed to make it
> always generate HTML output (which includes HTML-escaping).
> 
> Although we now run highlight on files which do not end up highlighted,
> performance is virtually unaffected because when we call highlight, it
> is used for escaping HTML.  In the case that highlight is used, gitweb
> calls sanitize() instead of esc_html(), and the latter is significantly
> slower (it does more, being roughly a superset of sanitize()).  Simple
> benchmark comparing performance of 'blob' view of files without syntax
> highlighting in gitweb before and after this change indicates ±1%
> difference in request time for all file types.  Benchmark was performed
> on local instance on Debian, using Apache/2.4.23 web server and CGI.
> 
> Document the feature and improve syntax highlight documentation, add
> test to ensure gitweb doesn't crash when language detection is used.
> 
> Signed-off-by: Ian Kelling <ian@iankelling.org>

For what it is worth it:

Acked-by: Jakub Narębski <jnareb@gmail.com>

(but unfortunately *not* tested by).

> ---
> 
> Notes:
>     The only change from v3 is the commit message as suggested by Jakub
>     Narębski
> 
>  Documentation/gitweb.conf.txt          | 21 ++++++++++++++-------
>  gitweb/gitweb.perl                     | 10 +++++-----
>  t/t9500-gitweb-standalone-no-errors.sh |  8 ++++++++
>  3 files changed, 27 insertions(+), 12 deletions(-)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 2/2] gitweb: use highlight's shebang detection
  2016-09-25 18:04               ` Jakub Narębski
@ 2016-09-28  7:37                 ` Ian Kelling
  0 siblings, 0 replies; 23+ messages in thread
From: Ian Kelling @ 2016-09-28  7:37 UTC (permalink / raw)
  To: Jakub Narębski, git

On Sun, Sep 25, 2016, at 11:04 AM, Jakub Narębski wrote:
> 
> For what it is worth it:
> 
> Acked-by: Jakub Narębski <jnareb@gmail.com>
> 
> (but unfortunately *not* tested by).

Thank you for all your help.
--
Ian Kelling

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-09-28  7:37 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-06 19:00 [PATCH] gitweb: use highlight's shebang detection Ian Kelling
2016-09-20 20:22 ` Jakub Narębski
2016-09-21 16:38   ` Junio C Hamano
2016-09-21 17:51     ` Jakub Narębski
2016-09-21 22:15   ` Ian Kelling
2016-09-21 22:18   ` Ian Kelling
2016-09-21 22:24     ` Ian Kelling
2016-09-22 22:50     ` [PATCH v2] " Jakub Narębski
2016-09-23  9:08       ` Ian Kelling
2016-09-23  9:08       ` [PATCH v3 1/2] gitweb: remove unused function parameter Ian Kelling
2016-09-23  9:08         ` [PATCH v3 2/2] gitweb: use highlight's shebang detection Ian Kelling
2016-09-23 22:15           ` Jakub Narębski
2016-09-24 16:21             ` Jakub Narębski
2016-09-24 17:52               ` Junio C Hamano
2016-09-24 22:35               ` Ian Kelling
2016-09-24 22:34             ` Ian Kelling
2016-09-24 22:32           ` [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter Ian Kelling
2016-09-24 22:32             ` [PATCH v4 2/2] gitweb: use highlight's shebang detection Ian Kelling
2016-09-25 18:04               ` Jakub Narębski
2016-09-28  7:37                 ` Ian Kelling
2016-09-25 17:57             ` [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter Jakub Narębski
2016-09-23 19:44         ` [PATCH v3 1/2] gitweb: remove unused function parameter Jakub Narębski
2016-09-23 19:57           ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.