All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] gitweb: support the rel=vcs microformat
@ 2009-01-07  4:25 Joey Hess
  2009-01-07 12:30 ` Giuseppe Bilotta
  2009-01-09 23:49 ` Jakub Narebski
  0 siblings, 2 replies; 22+ messages in thread
From: Joey Hess @ 2009-01-07  4:25 UTC (permalink / raw)
  To: git

The rel=vcs microformat allows a web page to indicate the locations of
repositories related to it in a machine-parseable manner.
(See http://kitenet.net/~joey/rfc/rel-vcs/)

Make gitweb use the microformat in the header of pages it generates,
if it has been configured with project url information in any of the usual
ways.

Since getting the urls can require hitting disk, I avoided putting the
microformat on *every* page gitweb generates. Just put it on the project
summary page, the project list page, and the forks list page.
The first of these already looks up the urls, so adding the microformat was
free. There is a small overhead in including the microformat on the
latter two pages, but getting the project descriptions for those pages
already incurs a similar overhead, and the ability to get every repo url
in one place seems worthwhile.

This changes git_get_project_description() to not check wantarray, and only
return in list context -- the only way it is used AFAICS.

Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
---
 gitweb/gitweb.perl |   38 ++++++++++++++++++++++++++------------
 1 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 99f71b4..3f8a228 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -789,6 +789,9 @@ $git_dir = "$projectroot/$project" if $project;
 our @snapshot_fmts = gitweb_get_feature('snapshot');
 @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
 
+# populated later with git urls for the project
+our @git_url_list;
+
 # dispatch
 if (!defined $action) {
 	if (defined $hash) {
@@ -2100,17 +2103,22 @@ sub git_show_project_tagcloud {
 }
 
 sub git_get_project_url_list {
+	# use per project git URL list in $projectroot/$path/cloneurl
+	# or make project git URL from git base URL and project name
 	my $path = shift;
 
+	my @ret;
+
 	$git_dir = "$projectroot/$path";
-	open my $fd, "$git_dir/cloneurl"
-		or return wantarray ?
-		@{ config_to_multi(git_get_project_config('url')) } :
-		   config_to_multi(git_get_project_config('url'));
-	my @git_project_url_list = map { chomp; $_ } <$fd>;
-	close $fd;
+	if (open my $fd, "$git_dir/cloneurl") {
+		@ret = map { chomp; $_ } <$fd>;
+		close $fd;
+	}
+	else {
+	       @ret = @{ config_to_multi(git_get_project_config('url')) };
+	}
 
-	return wantarray ? @git_project_url_list : \@git_project_url_list;
+	return @ret ? @ret : map { "$_/$project" } @git_base_url_list;
 }
 
 sub git_get_projects_list {
@@ -2953,6 +2961,10 @@ EOF
 		print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
 	}
 
+	foreach my $url (@git_url_list) {
+		print qq{<link rel="vcs" type="git" href="$url" />\n};
+	}
+
 	print "</head>\n" .
 	      "<body>\n";
 
@@ -4380,6 +4392,8 @@ sub git_project_list {
 		die_error(404, "No projects found");
 	}
 
+	@git_url_list = map { git_get_project_url_list($_->{path}) } @list;
+
 	git_header_html();
 	if (-f $home_text) {
 		print "<div class=\"index_include\">\n";
@@ -4400,6 +4414,8 @@ sub git_forks {
 	if (defined $order && $order !~ m/none|project|descr|owner|age/) {
 		die_error(400, "Unknown order parameter");
 	}
+	
+	@git_url_list = map { git_get_project_url_list($_->{path}) } @list;
 
 	my @list = git_get_projects_list($project);
 	if (!@list) {
@@ -4457,6 +4473,8 @@ sub git_summary {
 		@forklist = git_get_projects_list($project);
 	}
 
+	@git_url_list = git_get_project_url_list($project);
+
 	git_header_html();
 	git_print_page_nav('summary','', $head);
 
@@ -4468,12 +4486,8 @@ sub git_summary {
 		print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
 	}
 
-	# use per project git URL list in $projectroot/$project/cloneurl
-	# or make project git URL from git base URL and project name
 	my $url_tag = "URL";
-	my @url_list = git_get_project_url_list($project);
-	@url_list = map { "$_/$project" } @git_base_url_list unless @url_list;
-	foreach my $git_url (@url_list) {
+	foreach my $git_url (@git_url_list) {
 		next unless $git_url;
 		print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
 		$url_tag = "";
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: support the rel=vcs microformat
  2009-01-07  4:25 [PATCH] gitweb: support the rel=vcs microformat Joey Hess
@ 2009-01-07 12:30 ` Giuseppe Bilotta
  2009-01-07 15:50   ` Joey Hess
  2009-01-09 23:56   ` Jakub Narebski
  2009-01-09 23:49 ` Jakub Narebski
  1 sibling, 2 replies; 22+ messages in thread
From: Giuseppe Bilotta @ 2009-01-07 12:30 UTC (permalink / raw)
  To: git

On Wednesday 07 January 2009 05:25, Joey Hess wrote:

> The rel=vcs microformat allows a web page to indicate the locations of
> repositories related to it in a machine-parseable manner.
> (See http://kitenet.net/~joey/rfc/rel-vcs/)

Interesting idea, I like it. However, I see a problem in the proposed
implementation versus the spec. According to the spec:

"""
The "title" is optional, but recommended if there are multiple, different
repositories linked to on one page. It is a human-readable description of the
repository.
[...]
If there are multiple repositories listed, without titles, tools should assume
they are different repositories.
"""

In this patch you do NOT add titles to the rel=vcs links, which means that
everything works fine only if there is a single URL for each project. If a
project has different URLs, it's going to appear multiple times as _different_
projects to a spec-compliant reader.

A possible solution would be to make @git_url_list into a map keyed by the
project name and having the description and repo URL(s) as values.

Since there is the possibility of different projects having the same
description (e.g. the default one), the link title could be composed of
"$project - $description" rather than simply $description.

Note that both in summary and in project list view you already retrieve the
description, so there are no additional disk hits.

> Make gitweb use the microformat in the header of pages it generates,
> if it has been configured with project url information in any of the usual
> ways.
> 
> Since getting the urls can require hitting disk, I avoided putting the
> microformat on *every* page gitweb generates. Just put it on the project
> summary page, the project list page, and the forks list page.
> The first of these already looks up the urls, so adding the microformat was
> free. There is a small overhead in including the microformat on the
> latter two pages, but getting the project descriptions for those pages
> already incurs a similar overhead, and the ability to get every repo url
> in one place seems worthwhile.
> 
> This changes git_get_project_description() to not check wantarray, and only
> return in list context -- the only way it is used AFAICS.

I assume you mean git_get_project_url_list()?

> 
> Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
> ---
>  gitweb/gitweb.perl |   38 ++++++++++++++++++++++++++------------
>  1 files changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 99f71b4..3f8a228 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -789,6 +789,9 @@ $git_dir = "$projectroot/$project" if $project;
>  our @snapshot_fmts = gitweb_get_feature('snapshot');
>  @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
>  
> +# populated later with git urls for the project
> +our @git_url_list;
> +
>  # dispatch
>  if (!defined $action) {
>       if (defined $hash) {
> @@ -2100,17 +2103,22 @@ sub git_show_project_tagcloud {
>  }
>  
>  sub git_get_project_url_list {
> +     # use per project git URL list in $projectroot/$path/cloneurl
> +     # or make project git URL from git base URL and project name
>       my $path = shift;
>  
> +     my @ret;
> +
>       $git_dir = "$projectroot/$path";
> -     open my $fd, "$git_dir/cloneurl"
> -             or return wantarray ?
> -             @{ config_to_multi(git_get_project_config('url')) } :
> -                config_to_multi(git_get_project_config('url'));
> -     my @git_project_url_list = map { chomp; $_ } <$fd>;
> -     close $fd;
> +     if (open my $fd, "$git_dir/cloneurl") {
> +             @ret = map { chomp; $_ } <$fd>;
> +             close $fd;
> +     }
> +     else {

Coding style: } else {

> +            @ret = @{ config_to_multi(git_get_project_config('url')) };
> +     }
>  
> -     return wantarray ? @git_project_url_list : \@git_project_url_list;
> +     return @ret ? @ret : map { "$_/$project" } @git_base_url_list;
>  }
>  
>  sub git_get_projects_list {
> @@ -2953,6 +2961,10 @@ EOF
>               print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
>       }
>  
> +     foreach my $url (@git_url_list) {
> +             print qq{<link rel="vcs" type="git" href="$url" />\n};
> +     }
> +
>       print "</head>\n" .
>             "<body>\n";
>  
> @@ -4380,6 +4392,8 @@ sub git_project_list {
>               die_error(404, "No projects found");
>       }
>  
> +     @git_url_list = map { git_get_project_url_list($_->{path}) } @list;
> +
>       git_header_html();
>       if (-f $home_text) {
>               print "<div class=\"index_include\">\n";
> @@ -4400,6 +4414,8 @@ sub git_forks {
>       if (defined $order && $order !~ m/none|project|descr|owner|age/) {
>               die_error(400, "Unknown order parameter");
>       }
> +     
> +     @git_url_list = map { git_get_project_url_list($_->{path}) } @list;
>  
>       my @list = git_get_projects_list($project);
>       if (!@list) {
> @@ -4457,6 +4473,8 @@ sub git_summary {
>               @forklist = git_get_projects_list($project);
>       }
>  
> +     @git_url_list = git_get_project_url_list($project);
> +
>       git_header_html();
>       git_print_page_nav('summary','', $head);
>  
> @@ -4468,12 +4486,8 @@ sub git_summary {
>               print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
>       }
>  
> -     # use per project git URL list in $projectroot/$project/cloneurl
> -     # or make project git URL from git base URL and project name
>       my $url_tag = "URL";
> -     my @url_list = git_get_project_url_list($project);
> -     @url_list = map { "$_/$project" } @git_base_url_list unless @url_list;
> -     foreach my $git_url (@url_list) {
> +     foreach my $git_url (@git_url_list) {
>               next unless $git_url;
>               print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
>               $url_tag = "";

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: support the rel=vcs microformat
  2009-01-07 12:30 ` Giuseppe Bilotta
@ 2009-01-07 15:50   ` Joey Hess
  2009-01-07 18:03     ` Giuseppe Bilotta
  2009-01-09 23:56   ` Jakub Narebski
  1 sibling, 1 reply; 22+ messages in thread
From: Joey Hess @ 2009-01-07 15:50 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1005 bytes --]

Giuseppe Bilotta wrote:
> In this patch you do NOT add titles to the rel=vcs links, which means that
> everything works fine only if there is a single URL for each project. If a
> project has different URLs, it's going to appear multiple times as _different_
> projects to a spec-compliant reader.
> 
> A possible solution would be to make @git_url_list into a map keyed by the
> project name and having the description and repo URL(s) as values.

Yes. I considered doing that, but didn't immediatly see a way to get the
project description w/o additional overhead (of looking it up a second
time).

> > This changes git_get_project_description() to not check wantarray, and only
> > return in list context -- the only way it is used AFAICS.
> 
> I assume you mean git_get_project_url_list()?

In fact yes.
 

Thanks for the feedback. There are some changes happening to the
microformat that should make gitweb's job slightly easier, I'll respin
the patch soon.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: support the rel=vcs microformat
  2009-01-07 15:50   ` Joey Hess
@ 2009-01-07 18:03     ` Giuseppe Bilotta
  2009-01-07 18:41       ` Joey Hess
  2009-01-07 18:45       ` Joey Hess
  0 siblings, 2 replies; 22+ messages in thread
From: Giuseppe Bilotta @ 2009-01-07 18:03 UTC (permalink / raw)
  To: Joey Hess; +Cc: git

On Wed, Jan 7, 2009 at 4:50 PM, Joey Hess <joey@kitenet.net> wrote:
> Giuseppe Bilotta wrote:
>> In this patch you do NOT add titles to the rel=vcs links, which means that
>> everything works fine only if there is a single URL for each project. If a
>> project has different URLs, it's going to appear multiple times as _different_
>> projects to a spec-compliant reader.
>>
>> A possible solution would be to make @git_url_list into a map keyed by the
>> project name and having the description and repo URL(s) as values.
>
> Yes. I considered doing that, but didn't immediatly see a way to get the
> project description w/o additional overhead (of looking it up a second
> time).

The solution I have in mind would be something like this: in summary
or projects list view (which are the views in which we put the links,
and also the views in which we loop up the repo URL and the
description anyway), you fill up former @git_url_list (now
%project_metadata) looking up the repo description and URLs. You then
use this information both in the link tag and in the appropriate
places for the visible part of the webpage: you don't have a
significant overhead, because you're just moving the project
description retrieval early on.

You probably want to refactor the code by making a
git_get_project_metadata() sub that extends the current URL retrieval
by retrieving description and URLs. The routine can then be used
either for one or for all the projects, as needed.

> Thanks for the feedback. There are some changes happening to the
> microformat that should make gitweb's job slightly easier, I'll respin
> the patch soon.

Let me know about this too, I very much like the idea of this microformat.

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: support the rel=vcs microformat
  2009-01-07 18:03     ` Giuseppe Bilotta
@ 2009-01-07 18:41       ` Joey Hess
  2009-01-10  0:01         ` Jakub Narebski
  2009-01-07 18:45       ` Joey Hess
  1 sibling, 1 reply; 22+ messages in thread
From: Joey Hess @ 2009-01-07 18:41 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 748 bytes --]

Giuseppe Bilotta wrote:
> > Thanks for the feedback. There are some changes happening to the
> > microformat that should make gitweb's job slightly easier, I'll respin
> > the patch soon.
> 
> Let me know about this too, I very much like the idea of this microformat.

FYI, I've updated the microformat's page with the changes. The
significant one for gitweb is that it can now be applied to <a> links.
So on the project page, the display of the git URL could be converted to
a link using the microformat, and there's no need to get the info
earlier to put it in the header. Unfortunatly, the same can't be done to
the project list page, unless it's changed to have "git" links as seen
on vger.kernel.org's gitweb.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: support the rel=vcs microformat
  2009-01-07 18:03     ` Giuseppe Bilotta
  2009-01-07 18:41       ` Joey Hess
@ 2009-01-07 18:45       ` Joey Hess
  2009-01-07 19:02         ` Joey Hess
  1 sibling, 1 reply; 22+ messages in thread
From: Joey Hess @ 2009-01-07 18:45 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 976 bytes --]

Giuseppe Bilotta wrote:
> The solution I have in mind would be something like this: in summary
> or projects list view (which are the views in which we put the links,
> and also the views in which we loop up the repo URL and the
> description anyway), you fill up former @git_url_list (now
> %project_metadata) looking up the repo description and URLs. You then
> use this information both in the link tag and in the appropriate
> places for the visible part of the webpage: you don't have a
> significant overhead, because you're just moving the project
> description retrieval early on.
> 
> You probably want to refactor the code by making a
> git_get_project_metadata() sub that extends the current URL retrieval
> by retrieving description and URLs. The routine can then be used
> either for one or for all the projects, as needed.

Another approach would be to just memoize git_get_project_description
and git_get_project_url_list.
 
-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: support the rel=vcs microformat
  2009-01-07 18:45       ` Joey Hess
@ 2009-01-07 19:02         ` Joey Hess
  2009-01-07 23:24           ` [PATCH] gitweb: support the rel=vcs-* microformat Joey Hess
  2009-01-10  0:03           ` [PATCH] gitweb: support the rel=vcs microformat Jakub Narebski
  0 siblings, 2 replies; 22+ messages in thread
From: Joey Hess @ 2009-01-07 19:02 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 241 bytes --]

Joey Hess wrote:
> Another approach would be to just memoize git_get_project_description
> and git_get_project_url_list.

Especially since git_get_project_description is already called more than
once for some pages.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] gitweb: support the rel=vcs-* microformat
  2009-01-07 19:02         ` Joey Hess
@ 2009-01-07 23:24           ` Joey Hess
  2009-01-08  7:56             ` Giuseppe Bilotta
  2009-01-10  0:52             ` Jakub Narebski
  2009-01-10  0:03           ` [PATCH] gitweb: support the rel=vcs microformat Jakub Narebski
  1 sibling, 2 replies; 22+ messages in thread
From: Joey Hess @ 2009-01-07 23:24 UTC (permalink / raw)
  To: git

The rel=vcs-* microformat allows a web page to indicate the locations of
repositories related to it in a machine-parseable manner.
(See http://kitenet.net/~joey/rfc/rel-vcs/)

Make gitweb use the microformat if it has been configured with project url
information in any of the usual ways. On the project summary page, the
repository URL display is simply marked up using the microformat. On the
project list page and forks list page, the microformat is embedded in the
header, since the URLs do not appear on the page.

The microformat could be included on other pages too, but I've skipped
doing so for now, since it would mean reading another file for every page
displayed.

There is a small overhead in including the microformat on project list
and forks list pages, but getting the project descriptions for those pages
already incurs a similar overhead, and the ability to get every repo url
in one place seems worthwhile.

This changes git_get_project_url_list() to not check wantarray, and only
return in list context -- the only way it is used AFAICS. It memoizes
both that function and git_get_project_description(), to avoid redundant
file reads.

Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
---
 gitweb/gitweb.perl |   78 +++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 62 insertions(+), 16 deletions(-)

This incorporates Giuseppe Bilotta's feedback, and uses new features
of the microformat. You can see this version running at
http://git.ikiwiki.info/

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 99f71b4..c238717 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2020,9 +2020,14 @@ sub git_get_path_by_hash {
 ## ......................................................................
 ## git utility functions, directly accessing git repository
 
+{
+my %project_descriptions; # cache
+
 sub git_get_project_description {
 	my $path = shift;
 
+	return $project_descriptions{$path} if exists $project_descriptions{$path};
+
 	$git_dir = "$projectroot/$path";
 	open my $fd, "$git_dir/description"
 		or return git_get_project_config('description');
@@ -2031,7 +2036,9 @@ sub git_get_project_description {
 	if (defined $descr) {
 		chomp $descr;
 	}
-	return $descr;
+	return $project_descriptions{$path}=$descr;
+}
+
 }
 
 sub git_get_project_ctags {
@@ -2099,18 +2106,30 @@ sub git_show_project_tagcloud {
 	}
 }
 
+{
+my %project_url_lists; # cache
+
 sub git_get_project_url_list {
+	# use per project git URL list in $projectroot/$path/cloneurl
+	# or make project git URL from git base URL and project name
 	my $path = shift;
 
+	return @{$project_url_lists{$path}} if exists $project_url_lists{$path};
+
+	my @ret;
 	$git_dir = "$projectroot/$path";
-	open my $fd, "$git_dir/cloneurl"
-		or return wantarray ?
-		@{ config_to_multi(git_get_project_config('url')) } :
-		   config_to_multi(git_get_project_config('url'));
-	my @git_project_url_list = map { chomp; $_ } <$fd>;
-	close $fd;
+	if (open my $fd, "$git_dir/cloneurl") {
+		@ret = map { chomp; $_ } <$fd>;
+		close $fd;
+	} else {
+	       @ret = @{ config_to_multi(git_get_project_config('url')) };
+	}
+	@ret=map { "$_/$project" } @git_base_url_list if ! @ret;
+
+	$project_url_lists{$path}=\@ret;
+	return @ret;
+}
 
-	return wantarray ? @git_project_url_list : \@git_project_url_list;
 }
 
 sub git_get_projects_list {
@@ -2856,6 +2875,7 @@ sub blob_contenttype {
 sub git_header_html {
 	my $status = shift || "200 OK";
 	my $expires = shift;
+	my $extraheader = shift;
 
 	my $title = "$site_name";
 	if (defined $project) {
@@ -2953,6 +2973,8 @@ EOF
 		print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
 	}
 
+	print $extraheader if defined $extraheader;
+
 	print "</head>\n" .
 	      "<body>\n";
 
@@ -4365,6 +4387,26 @@ sub git_search_grep_body {
 	print "</table>\n";
 }
 
+sub git_link_title {
+	my $project=shift;
+	
+	my $description=git_get_project_description($project);
+	return $project.(length $description ? " - $description" : "");
+}
+
+# generates header with links to the specified projects
+sub git_links_header {
+	my $ret='';
+	foreach my $project (@_) {
+		# rel=vcs-* microformat
+		my $title=git_link_title($project);
+		foreach my $url git_get_project_url_list($project) {
+			$ret.=qq{<link rel="vcs-git" href="$url" title="$title"/>\n}
+		}
+	}
+	return $ret;
+}
+
 ## ======================================================================
 ## ======================================================================
 ## actions
@@ -4380,7 +4422,9 @@ sub git_project_list {
 		die_error(404, "No projects found");
 	}
 
-	git_header_html();
+	my $extraheader=git_links_header(map { $_->{path} } @list);
+
+	git_header_html(undef, undef, $extraheader);
 	if (-f $home_text) {
 		print "<div class=\"index_include\">\n";
 		insert_file($home_text);
@@ -4405,8 +4449,10 @@ sub git_forks {
 	if (!@list) {
 		die_error(404, "No forks found");
 	}
+	
+	my $extraheader=git_links_header(map { $_->{path} } @list);
 
-	git_header_html();
+	git_header_html(undef, undef, $extraheader);
 	git_print_page_nav('','');
 	git_print_header_div('summary', "$project forks");
 	git_project_list_body(\@list, $order);
@@ -4468,14 +4514,14 @@ sub git_summary {
 		print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
 	}
 
-	# use per project git URL list in $projectroot/$project/cloneurl
-	# or make project git URL from git base URL and project name
 	my $url_tag = "URL";
-	my @url_list = git_get_project_url_list($project);
-	@url_list = map { "$_/$project" } @git_base_url_list unless @url_list;
-	foreach my $git_url (@url_list) {
+	my $title=git_link_title($project);
+	foreach my $git_url (git_get_project_url_list($project)) {
 		next unless $git_url;
-		print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
+		print "<tr class=\"metadata_url\"><td>$url_tag</td><td>".
+		      # rel=vcs-* microformat
+		      "<a rel=\"vcs-git\" href=\"$git_url\" title=\"$title\">$git_url</a>".
+		      "</td></tr>\n";
 		$url_tag = "";
 	}
 
-- 
1.5.6.5



-- 
see shy jo

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: support the rel=vcs-* microformat
  2009-01-07 23:24           ` [PATCH] gitweb: support the rel=vcs-* microformat Joey Hess
@ 2009-01-08  7:56             ` Giuseppe Bilotta
  2009-01-08 19:54               ` gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat) Joey Hess
  2009-01-10  1:04               ` [PATCH] gitweb: support the rel=vcs-* microformat Jakub Narebski
  2009-01-10  0:52             ` Jakub Narebski
  1 sibling, 2 replies; 22+ messages in thread
From: Giuseppe Bilotta @ 2009-01-08  7:56 UTC (permalink / raw)
  To: git

Hello Joey,

On Thursday 08 January 2009 00:24, Joey Hess wrote:

> The rel=vcs-* microformat allows a web page to indicate the locations of
> repositories related to it in a machine-parseable manner.
> (See http://kitenet.net/~joey/rfc/rel-vcs/)

Have you considered submitting the microformat to microformats.org?
That would make the microformat more official and would be an good
first step to have wider coverage of it, and additional reviews.

> Make gitweb use the microformat if it has been configured with project url
> information in any of the usual ways. On the project summary page, the
> repository URL display is simply marked up using the microformat. On the
> project list page and forks list page, the microformat is embedded in the
> header, since the URLs do not appear on the page.
> 
> The microformat could be included on other pages too, but I've skipped
> doing so for now, since it would mean reading another file for every page
> displayed.
> 
> There is a small overhead in including the microformat on project list
> and forks list pages, but getting the project descriptions for those pages
> already incurs a similar overhead, and the ability to get every repo url
> in one place seems worthwhile.

I agree with this, although people with very large project lists may
differ ... do we have timings on these?
 
> This changes git_get_project_url_list() to not check wantarray, and only
> return in list context -- the only way it is used AFAICS. It memoizes
> both that function and git_get_project_description(), to avoid redundant
> file reads.

You may want to consider splitting the patch into three: memoizing
of git_get_project_description(), reworking of
git_get_project_url_list(), and the actual rel=vc-* insertions.

> Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
> ---
>  gitweb/gitweb.perl |   78 +++++++++++++++++++++++++++++++++++++++++----------
>  1 files changed, 62 insertions(+), 16 deletions(-)
> 
> This incorporates Giuseppe Bilotta's feedback, and uses new features
> of the microformat. You can see this version running at
> http://git.ikiwiki.info/

Oh, and do consider cc'ing jnareb and paski when submitting patches
for gitweb, as they are the (unofficial?) maintainers. I usually cc
gitster (Junio C Hamano) too.

[ Also cc'ing me for this round would have been a nice idea too,
since we had the review going on ;-) ]

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 99f71b4..c238717 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2020,9 +2020,14 @@ sub git_get_path_by_hash {
>  ## ......................................................................
>  ## git utility functions, directly accessing git repository
>  
> +{
> +my %project_descriptions; # cache
> +

Out of curiosity, why the grouping? I would have had

our %project_descriptions;

up above with all the global variables.

>  sub git_get_project_description {
>       my $path = shift;
>  
> +     return $project_descriptions{$path} if exists $project_descriptions{$path};
> +

This line is bordering on the 80 characters, so you may want to
consider moving 'my $descr' here, with something such as

my $descr = $project_descriptions{$path};
return $descr if exists $descr;

Also, I'm no perl guru so I'm not sure about exists vs defined here.

>       $git_dir = "$projectroot/$path";
>       open my $fd, "$git_dir/description"
>               or return git_get_project_config('description');
> @@ -2031,7 +2036,9 @@ sub git_get_project_description {
>       if (defined $descr) {
>               chomp $descr;
>       }
> -     return $descr;
> +     return $project_descriptions{$path}=$descr;
> +}
> +
>  }

[This is where I would end the first patch]

>  
>  sub git_get_project_ctags {
> @@ -2099,18 +2106,30 @@ sub git_show_project_tagcloud {
>       }
>  }
>  
> +{
> +my %project_url_lists; # cache
> +

Ditto for this: why not our %project_url_lists; without scoping?

>  sub git_get_project_url_list {
> +     # use per project git URL list in $projectroot/$path/cloneurl
> +     # or make project git URL from git base URL and project name
>       my $path = shift;
>  
> +     return @{$project_url_lists{$path}} if exists $project_url_lists{$path};
> +
> +     my @ret;
>       $git_dir = "$projectroot/$path";
> -     open my $fd, "$git_dir/cloneurl"
> -             or return wantarray ?
> -             @{ config_to_multi(git_get_project_config('url')) } :
> -                config_to_multi(git_get_project_config('url'));
> -     my @git_project_url_list = map { chomp; $_ } <$fd>;
> -     close $fd;
> +     if (open my $fd, "$git_dir/cloneurl") {
> +             @ret = map { chomp; $_ } <$fd>;
> +             close $fd;
> +     } else {
> +            @ret = @{ config_to_multi(git_get_project_config('url')) };
> +     }
> +     @ret=map { "$_/$project" } @git_base_url_list if ! @ret;
> +
> +     $project_url_lists{$path}=\@ret;
> +     return @ret;
> +}
>  
> -     return wantarray ? @git_project_url_list : \@git_project_url_list;
>  }

[This is where I would end the second patch]

>  
>  sub git_get_projects_list {
> @@ -2856,6 +2875,7 @@ sub blob_contenttype {
>  sub git_header_html {
>       my $status = shift || "200 OK";
>       my $expires = shift;
> +     my $extraheader = shift;
>  
>       my $title = "$site_name";
>       if (defined $project) {
> @@ -2953,6 +2973,8 @@ EOF
>               print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
>       }
>  
> +     print $extraheader if defined $extraheader;
> +
>       print "</head>\n" .
>             "<body>\n";
>  
> @@ -4365,6 +4387,26 @@ sub git_search_grep_body {
>       print "</table>\n";
>  }
>  
> +sub git_link_title {
> +     my $project=shift;
> +     
> +     my $description=git_get_project_description($project);
> +     return $project.(length $description ? " - $description" : "");
> +}

Nice.

> +
> +# generates header with links to the specified projects
> +sub git_links_header {
> +     my $ret='';
> +     foreach my $project (@_) {
> +             # rel=vcs-* microformat
> +             my $title=git_link_title($project);
> +             foreach my $url git_get_project_url_list($project) {
> +                     $ret.=qq{<link rel="vcs-git" href="$url" title="$title"/>\n}
> +             }
> +     }
> +     return $ret;
> +}
> +
>  ## ======================================================================
>  ## ======================================================================
>  ## actions
> @@ -4380,7 +4422,9 @@ sub git_project_list {
>               die_error(404, "No projects found");
>       }
>  
> -     git_header_html();
> +     my $extraheader=git_links_header(map { $_->{path} } @list);
> +
> +     git_header_html(undef, undef, $extraheader);
>       if (-f $home_text) {
>               print "<div class=\"index_include\">\n";
>               insert_file($home_text);
> @@ -4405,8 +4449,10 @@ sub git_forks {
>       if (!@list) {
>               die_error(404, "No forks found");
>       }
> +     
> +     my $extraheader=git_links_header(map { $_->{path} } @list);
>  
> -     git_header_html();
> +     git_header_html(undef, undef, $extraheader);

This makes me wonder if it would be worth it to turn git_header_html
into -param => value style, but I'm not really sure it's worth it.

>       git_print_page_nav('','');
>       git_print_header_div('summary', "$project forks");
>       git_project_list_body(\@list, $order);
> @@ -4468,14 +4514,14 @@ sub git_summary {
>               print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
>       }
>  
> -     # use per project git URL list in $projectroot/$project/cloneurl
> -     # or make project git URL from git base URL and project name
>       my $url_tag = "URL";
> -     my @url_list = git_get_project_url_list($project);
> -     @url_list = map { "$_/$project" } @git_base_url_list unless @url_list;
> -     foreach my $git_url (@url_list) {
> +     my $title=git_link_title($project);
> +     foreach my $git_url (git_get_project_url_list($project)) {
>               next unless $git_url;
> -             print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
> +             print "<tr class=\"metadata_url\"><td>$url_tag</td><td>".
> +                   # rel=vcs-* microformat
> +                   "<a rel=\"vcs-git\" href=\"$git_url\" title=\"$title\">$git_url</a>".
> +                   "</td></tr>\n";
>               $url_tag = "";
>       }

Good. Of course the comment removal (which is actually a due move to
git_get_project_url_list) would go in the appropriate patch if you
split them 8-)

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 22+ messages in thread

* gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)
  2009-01-08  7:56             ` Giuseppe Bilotta
@ 2009-01-08 19:54               ` Joey Hess
  2009-01-08 23:53                 ` J.H.
  2009-01-10  1:11                 ` Jakub Narebski
  2009-01-10  1:04               ` [PATCH] gitweb: support the rel=vcs-* microformat Jakub Narebski
  1 sibling, 2 replies; 22+ messages in thread
From: Joey Hess @ 2009-01-08 19:54 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2147 bytes --]

Giuseppe Bilotta wrote:
> > There is a small overhead in including the microformat on project list
> > and forks list pages, but getting the project descriptions for those pages
> > already incurs a similar overhead, and the ability to get every repo url
> > in one place seems worthwhile.
> 
> I agree with this, although people with very large project lists may
> differ ... do we have timings on these?

AFAICS, when displaying the project list, gitweb reads each project's
description file, falling back to reading its config file if there is no
description file.

If performance was a problem here, the thing to do would be to add
project descriptions to the $project_list file, and use those in
preference to the description files. If a large site has done that,
they've not sent in the patch. :-)

With my patch, it will read each cloneurl file too. The best way to
optimise that for large sites seems to be to add an option that would
ignore the cloneurl files and config file and always use
@git_base_url_list.

I checked the only large site I have access to (git.debian.org) and they
use a $project_list file, but I see no other performance tuning. That's
a 2 ghz machine; it takes gitweb 28 (!) seconds to generate the nearly 1
MB index web page for 1671 repositories:

/srv/git.debian.org/http/cgi-bin/gitweb.cgi  3.04s user 9.24s system 43% cpu 28.515 total

Notice that most of the time is spent by child processes. For each
repository, gitweb runs git-for-each-ref to determine the time of the
last commit.

If that is removed (say if there were a way to get the info w/o
forking), performance improves nicely:

./gitweb.cgi > /dev/null  1.29s user 1.08s system 69% cpu 3.389 total

Making it not read description files for each project, as I suggest above,
is the next best optimisation:

./gitweb.cgi > /dev/null  1.08s user 0.05s system 96% cpu 1.170 total

So, I think it makes sense to optimise gitweb and offer knobs for performance
tuning at the expense of the flexability of description and cloneurl files.
But, git-for-each-ref is swamping everything else.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)
  2009-01-08 19:54               ` gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat) Joey Hess
@ 2009-01-08 23:53                 ` J.H.
  2009-01-09  0:16                   ` Miklos Vajna
                                     ` (2 more replies)
  2009-01-10  1:11                 ` Jakub Narebski
  1 sibling, 3 replies; 22+ messages in thread
From: J.H. @ 2009-01-08 23:53 UTC (permalink / raw)
  To: Joey Hess; +Cc: git

Joey Hess wrote:
> Giuseppe Bilotta wrote:
>   
>>> There is a small overhead in including the microformat on project list
>>> and forks list pages, but getting the project descriptions for those pages
>>> already incurs a similar overhead, and the ability to get every repo url
>>> in one place seems worthwhile.
>>>       
>> I agree with this, although people with very large project lists may
>> differ ... do we have timings on these?
>>     
>
> AFAICS, when displaying the project list, gitweb reads each project's
> description file, falling back to reading its config file if there is no
> description file.
>
> If performance was a problem here, the thing to do would be to add
> project descriptions to the $project_list file, and use those in
> preference to the description files. If a large site has done that,
> they've not sent in the patch. :-)
>   

No because all the large sites have pain points and issues elsewhere in 
the app.  Most of the large sites (which I can at least speak for 
Kernel.org) went and have built in full caching layers into gitweb 
itself to deal with the problem.  This means that we don't have to worry 
about nickle and dime performance improvements that are specific to one 
section, but can do a very broad sweep and get dramatically better 
performance across all of gitweb.  Those patches have all made it back 
out onto the mailing list, but for a number of different reasons none 
have been accepted into the mainline branch.

> With my patch, it will read each cloneurl file too. The best way to
> optimise that for large sites seems to be to add an option that would
> ignore the cloneurl files and config file and always use
> @git_base_url_list.
>
> I checked the only large site I have access to (git.debian.org) and they
> use a $project_list file, but I see no other performance tuning. That's
> a 2 ghz machine; it takes gitweb 28 (!) seconds to generate the nearly 1
> MB index web page for 1671 repositories:
>   

Look at either Lea's or my caching engines, it will help dramatically on 
something of that size.

> /srv/git.debian.org/http/cgi-bin/gitweb.cgi  3.04s user 9.24s system 43% cpu 28.515 total
>
> Notice that most of the time is spent by child processes. For each
> repository, gitweb runs git-for-each-ref to determine the time of the
> last commit.
>
> If that is removed (say if there were a way to get the info w/o
> forking), performance improves nicely:
>
> ./gitweb.cgi > /dev/null  1.29s user 1.08s system 69% cpu 3.389 total
>
> Making it not read description files for each project, as I suggest above,
> is the next best optimisation:
>
> ./gitweb.cgi > /dev/null  1.08s user 0.05s system 96% cpu 1.170 total
>
> So, I think it makes sense to optimise gitweb and offer knobs for performance
> tuning at the expense of the flexability of description and cloneurl files.
> But, git-for-each-ref is swamping everything else
The problem is the knobs are going to be very fine grained, you really 
are better off looking at one of the caching engines that's available 
now.  Performance options are hard, because it's difficult to relay to 
anyone the complex tradeoffs, thus keeping knobs like that to a minimum 
are really a necessity.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)
  2009-01-08 23:53                 ` J.H.
@ 2009-01-09  0:16                   ` Miklos Vajna
  2009-01-09  0:19                   ` Johannes Schindelin
  2009-01-10  1:44                   ` Jakub Narebski
  2 siblings, 0 replies; 22+ messages in thread
From: Miklos Vajna @ 2009-01-09  0:16 UTC (permalink / raw)
  To: J.H., git; +Cc: Joey Hess

[-- Attachment #1: Type: text/plain, Size: 433 bytes --]

On Thu, Jan 08, 2009 at 03:53:16PM -0800, "J.H." <warthog19@eaglescrag.net> wrote:
> Look at either Lea's or my caching engines, it will help dramatically on 
> something of that size.

repo.or.cz uses a single patch for caching the project list only:

http://repo.or.cz/w/git/repo.git?a=commit;h=152fb0b22d36c6981ac3c4403b69ad91b27a1bc6

you are probably better off with such a small patch instead of using a
gitweb fork.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)
  2009-01-08 23:53                 ` J.H.
  2009-01-09  0:16                   ` Miklos Vajna
@ 2009-01-09  0:19                   ` Johannes Schindelin
  2009-01-09  0:26                     ` J.H.
  2009-01-10  1:44                   ` Jakub Narebski
  2 siblings, 1 reply; 22+ messages in thread
From: Johannes Schindelin @ 2009-01-09  0:19 UTC (permalink / raw)
  To: J.H.; +Cc: Joey Hess, git

Hi,

On Thu, 8 Jan 2009, J.H. wrote:

> Look at either Lea's or my caching engines, it will help dramatically on 
> something of that size.

Speaking of which, do you have any performance comparisons between the 
two?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)
  2009-01-09  0:19                   ` Johannes Schindelin
@ 2009-01-09  0:26                     ` J.H.
  0 siblings, 0 replies; 22+ messages in thread
From: J.H. @ 2009-01-09  0:26 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Joey Hess, git

Johannes Schindelin wrote:
> Hi,
>
> On Thu, 8 Jan 2009, J.H. wrote:
>
>   
>> Look at either Lea's or my caching engines, it will help dramatically on 
>> something of that size.
>>     
>
> Speaking of which, do you have any performance comparisons between the 
> two?
>   
Lea's got some - I can see if I can dig up my copy (or if she's paying 
attention maybe she can publish them), though either one is orders of 
magnitude faster than the normal code.  Beyond that it waffles back and 
forth which one is faster & why mainly because of the approaches we each 
took on the caching.  Generally speaking I would push people more 
towards Lea's than my work, if nothing else hers is more in line with 
current gitweb, though I have had some thoughts about undoing my file 
breakout and getting my code base back up to speed.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: support the rel=vcs microformat
  2009-01-07  4:25 [PATCH] gitweb: support the rel=vcs microformat Joey Hess
  2009-01-07 12:30 ` Giuseppe Bilotta
@ 2009-01-09 23:49 ` Jakub Narebski
  1 sibling, 0 replies; 22+ messages in thread
From: Jakub Narebski @ 2009-01-09 23:49 UTC (permalink / raw)
  To: Joey Hess; +Cc: git

Joey Hess <joey@kitenet.net> writes:

> The rel=vcs microformat allows a web page to indicate the locations of
> repositories related to it in a machine-parseable manner.
> (See http://kitenet.net/~joey/rfc/rel-vcs/)

Let me put here an example from avove mentioned page:

  <head>
  <link rel="vcs-git" href="git://example.org/foo.git" 
        title="foo git repository" />
  </head>

  <a rel="vcs-git" href="git://example.org/foo.git" 
     title="git repository">git://example.org/foo.git</a>
  <a rel="vcs-git" href="git://example.org/foo.git">git repository</a>

There is one problem that is not solved in above microformat, but it
is problem only for git hosting sites like repo.or.cz or GitHub,
namely it does not allow to distinguish between fetch (read) link, and
push (write, publish) link.  This is not a problem for standard
(unmodified) gitweb as it shows only read-only git repositories links.

We also have to decide what to put in the 'title' attribute; I think
the simplest would be to put "$project git repository" or something
(for example "git/git.git git repository").

One thing I worry about is that those links (or at least some of those
links) are not meant for the browser to open; also SCP/SSH-like syntax
for SSH protocol in the form of 'user@host:/path/to/repo.git/' which
does not follow URL rules.

> 
> Make gitweb use the microformat in the header of pages it generates,
> if it has been configured with project url information in any of the usual
> ways.

There are two bit separate issues here: marking existing and future
URLs (current project fetch URLs which IIRC are not hyperlinked now;
planned/future 'git' links in project list page; perhaps also links in
OPML and RSS/Atom feeds) with 'rel="vcs-git"', and adding <link .../>
elements to page header.

> 
> Since getting the urls can require hitting disk, I avoided putting the
> microformat on *every* page gitweb generates. Just put it on the project
> summary page, the project list page, and the forks list page.
>
> The first of these already looks up the urls, so adding the microformat was
> free. 

I assume that this patch is only about adding <link ... /> elements to
head?  I think in the case of 'summary' view for a project it is an
excellent idea (similar to having 'prev' and 'next' link elements in
chaptered on-line book in HTML), and would allow for automation using
gitweb as a kind of service announcement.

> There is a small overhead in including the microformat on the latter
> two pages [projects list and list of forks], but getting the project
> descriptions for those pages already incurs a similar overhead, and
> the ability to get every repo url in one place seems worthwhile.

There is also OPML, which might be worth checking.

By the way, for 'projects_list' action and 'forks' actions we have to
decide whether to show _all_ links for each project (there can be more
than one), or whether we show only some main git link (like in the
case of proposed 'git' link).  And whether we trust @git_base_url_list
or do we take it as default and examine per-repository configuration
(more costly).

What is more important: 'project_list' page is already overly large
when hosting very large number of repositories (there were some
patches adding pagination for 'project_list', and perhaps they would
be resend).  Adding <link .../> elements would only add to its size;
and if will be divided into pages we would have also to take it into
account.

> 
> This changes git_get_project_description() to not check wantarray, and only
> return in list context -- the only way it is used AFAICS.

Errr... what? Why do you change git_get_project_description()
subroutine? I don't think it would be good source for 'title'
attribute; perhaps for 'desc' attribute, and only aftre sanitizing
"Unnamed repository; edit this file to name it for gitweb."

Errata: ah, it is git_get_project_url_list() subroutine...

> 
> Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
> ---
>  gitweb/gitweb.perl |   38 ++++++++++++++++++++++++++------------
>  1 files changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 99f71b4..3f8a228 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -789,6 +789,9 @@ $git_dir = "$projectroot/$project" if $project;
>  our @snapshot_fmts = gitweb_get_feature('snapshot');
>  @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
>  
> +# populated later with git urls for the project
> +our @git_url_list;
> +

I'm not sure why this have to be global, but I assume that you want to
avoid recalculationg it in git_header_html

>  # dispatch
>  if (!defined $action) {
>  	if (defined $hash) {
> @@ -2100,17 +2103,22 @@ sub git_show_project_tagcloud {
>  }
>  
>  sub git_get_project_url_list {
> +	# use per project git URL list in $projectroot/$path/cloneurl
> +	# or make project git URL from git base URL and project name

I'd rather use separate subroutine for the second, I think.

>  	my $path = shift;
>  
> +	my @ret;
> +
>  	$git_dir = "$projectroot/$path";
> -	open my $fd, "$git_dir/cloneurl"
> -		or return wantarray ?
> -		@{ config_to_multi(git_get_project_config('url')) } :
> -		   config_to_multi(git_get_project_config('url'));
> -	my @git_project_url_list = map { chomp; $_ } <$fd>;
> -	close $fd;
> +	if (open my $fd, "$git_dir/cloneurl") {
> +		@ret = map { chomp; $_ } <$fd>;
> +		close $fd;
> +	}
> +	else {

Style: "} else {"

> +	       @ret = @{ config_to_multi(git_get_project_config('url')) };
> +	}
>  
> -	return wantarray ? @git_project_url_list : \@git_project_url_list;
> +	return @ret ? @ret : map { "$_/$project" } @git_base_url_list;
>  }

Hmmm... currently gitweb does it at caller:

	my @url_list = git_get_project_url_list($project);
	@url_list = map { "$_/$project" } @git_base_url_list unless @url_list;

Why do you want to put this in git_get_project_url_list()? Please
explain (here and in the commit message too; it has to be mentioned in
commit message that you cnage semantics a bit, and explain why you did
so).

>  
>  sub git_get_projects_list {
> @@ -2953,6 +2961,10 @@ EOF

Sidenote: this should be

  @@ -2953,6 +2961,10 @@ sub git_header_html {

but I'm not sure if it would be possible to automate...

>  		print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
>  	}
>  
> +	foreach my $url (@git_url_list) {
> +		print qq{<link rel="vcs" type="git" href="$url" />\n};
> +	}
> +

Errr... in mentioned http://kitenet.net/~joey/rel-vcs/ it is

  <link rel="vcs-git" href="$url" title="$project git repository" />

and not

  <link rel="vcs" type="git" href="$url" />

Besides, 'type' attribute for A and LINK elements is about advisory
conent-type of the document pointed by link:

 type = content-type [CI]
    This attribute gives an advisory hint as to the content type of
    the content available at the link target address. It allows user
    agents to opt to use a fallback mechanism rather than fetch the
    content if they are advised that they will get content in a
    content type they do not support.  Authors who use this attribute
    take responsibility to manage the risk that it may become
    inconsistent with the content available at the link target
    address.  
    For the current list of registered content types, please consult
    [MIMETYPES].

>  	print "</head>\n" .
>  	      "<body>\n";
>  
> @@ -4380,6 +4392,8 @@ sub git_project_list {
>  		die_error(404, "No projects found");
>  	}
>  
> +	@git_url_list = map { git_get_project_url_list($_->{path}) } @list;
> +
>  	git_header_html();
>  	if (-f $home_text) {
>  		print "<div class=\"index_include\">\n";
> @@ -4400,6 +4414,8 @@ sub git_forks {
>  	if (defined $order && $order !~ m/none|project|descr|owner|age/) {
>  		die_error(400, "Unknown order parameter");
>  	}
> +	
> +	@git_url_list = map { git_get_project_url_list($_->{path}) } @list;
>  
>  	my @list = git_get_projects_list($project);
>  	if (!@list) {

Those two are pretty straightforward, but please note that
'project_list' view (action) might be _already_ too large...

> @@ -4457,6 +4473,8 @@ sub git_summary {
>  		@forklist = git_get_projects_list($project);
>  	}
>  
> +	@git_url_list = git_get_project_url_list($project);
> +
>  	git_header_html();
>  	git_print_page_nav('summary','', $head);
>  
> @@ -4468,12 +4486,8 @@ sub git_summary {
>  		print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
>  	}
>  
> -	# use per project git URL list in $projectroot/$project/cloneurl
> -	# or make project git URL from git base URL and project name
>  	my $url_tag = "URL";
> -	my @url_list = git_get_project_url_list($project);
> -	@url_list = map { "$_/$project" } @git_base_url_list unless @url_list;
> -	foreach my $git_url (@url_list) {
> +	foreach my $git_url (@git_url_list) {
>  		next unless $git_url;
>  		print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
>  		$url_tag = "";
> -- 
> 1.5.6.5

This is also pretty straightforward: it moves calculation earlier for
results to be shared with git_header_html (and uses global variable).

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: support the rel=vcs microformat
  2009-01-07 12:30 ` Giuseppe Bilotta
  2009-01-07 15:50   ` Joey Hess
@ 2009-01-09 23:56   ` Jakub Narebski
  1 sibling, 0 replies; 22+ messages in thread
From: Jakub Narebski @ 2009-01-09 23:56 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git

Giuseppe Bilotta <giuseppe.bilotta@gmail.com> writes:

> On Wednesday 07 January 2009 05:25, Joey Hess wrote:
> 
> > The rel=vcs microformat allows a web page to indicate the locations of
> > repositories related to it in a machine-parseable manner.
> > (See http://kitenet.net/~joey/rfc/rel-vcs/)
> 
> Interesting idea, I like it. However, I see a problem in the proposed
> implementation versus the spec. According to the spec:
> 
> """
> The "title" is optional, but recommended if there are multiple, different
> repositories linked to on one page. It is a human-readable description of the
> repository.
> [...]
> If there are multiple repositories listed, without titles, tools
> should assume they are different repositories.
> """

Good catch.

> 
> In this patch you do NOT add titles to the rel=vcs links, which means that
> everything works fine only if there is a single URL for each project. If a
> project has different URLs, it's going to appear multiple times as _different_
> projects to a spec-compliant reader.
> 
> A possible solution would be to make @git_url_list into a map keyed by the
> project name and having the description and repo URL(s) as values.
> 
> Since there is the possibility of different projects having the same
> description (e.g. the default one), the link title could be composed of
> "$project - $description" rather than simply $description.
> 
> Note that both in summary and in project list view you already retrieve the
> description, so there are no additional disk hits.

Wouldn't "$project git repository" (i.e. do not use description at
all) be a simpler, faster and also _better_ solution?
 

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: support the rel=vcs microformat
  2009-01-07 18:41       ` Joey Hess
@ 2009-01-10  0:01         ` Jakub Narebski
  0 siblings, 0 replies; 22+ messages in thread
From: Jakub Narebski @ 2009-01-10  0:01 UTC (permalink / raw)
  To: Joey Hess; +Cc: Giuseppe Bilotta, git

Joey Hess <joey@kitenet.net> writes:
> Giuseppe Bilotta wrote:
> > Joey Hess <joey@kitenet.net> writes:

> > > Thanks for the feedback. There are some changes happening to the
> > > microformat that should make gitweb's job slightly easier, I'll respin
> > > the patch soon.
> > 
> > Let me know about this too, I very much like the idea of this microformat.
> 
> FYI, I've updated the microformat's page with the changes. The
> significant one for gitweb is that it can now be applied to <a> links.
> So on the project page, the display of the git URL could be converted to
> a link using the microformat, and there's no need to get the info
> earlier to put it in the header. Unfortunatly, the same can't be done to
> the project list page, unless it's changed to have "git" links as seen
> on vger.kernel.org's gitweb.

I'm not sure if making repository URLs to be hyperlinks is a good
idea.  You cannot (should not) click on those in ordinary web browser;
they are to be used by git (that is also additional reason why I am
not so sure about 'git' link on projects_list page idea).

Besides LINK elements in page HEAD are meant mainly for machine; I
think it might be more important to add them for machine there, even
if they are as A elements (links) or just plain text URLs somewhere
else.  For example we have LINK elements with alternate versions,
among others OPML for projectless pages, and RSS/Atom for project
pages, aven though those links are also in page body.

So I'd rather have them LINKs...
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: support the rel=vcs microformat
  2009-01-07 19:02         ` Joey Hess
  2009-01-07 23:24           ` [PATCH] gitweb: support the rel=vcs-* microformat Joey Hess
@ 2009-01-10  0:03           ` Jakub Narebski
  1 sibling, 0 replies; 22+ messages in thread
From: Jakub Narebski @ 2009-01-10  0:03 UTC (permalink / raw)
  To: Joey Hess; +Cc: Giuseppe Bilotta, git

Joey Hess <joey@kitenet.net> writes:
> Joey Hess wrote:

> > Another approach would be to just memoize git_get_project_description
> > and git_get_project_url_list.
> 
> Especially since git_get_project_description is already called more than
> once for some pages.

Hmmm... this is an idea worth checking.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: support the rel=vcs-* microformat
  2009-01-07 23:24           ` [PATCH] gitweb: support the rel=vcs-* microformat Joey Hess
  2009-01-08  7:56             ` Giuseppe Bilotta
@ 2009-01-10  0:52             ` Jakub Narebski
  1 sibling, 0 replies; 22+ messages in thread
From: Jakub Narebski @ 2009-01-10  0:52 UTC (permalink / raw)
  To: Joey Hess; +Cc: git

Joey Hess <joey@kitenet.net> writes:

> The rel=vcs-* microformat allows a web page to indicate the locations of
> repositories related to it in a machine-parseable manner.
> (See http://kitenet.net/~joey/rfc/rel-vcs/)
> 
> Make gitweb use the microformat if it has been configured with project url
> information in any of the usual ways. On the project summary page, the
> repository URL display is simply marked up using the microformat. On the
> project list page and forks list page, the microformat is embedded in the
> header, since the URLs do not appear on the page.

I think having LINK elements also for 'summary' page would be a good
idea. This microformat is I think mainly for machines, and machines
can I guess read better a few LINK elements in fairly small HEAD of
page, than scan all of many link (A) elements on the page for those
matching vcs-* microformat.

Beside I am not sure if for example hyperlinking SCP-style repository
URL makes sense at all; I am also not sure if hyperlinking links on
which you cannot click on makes good sense (unless you use SPAN or
ABBR instead of A to mark repo links...)

> 
> The microformat could be included on other pages too, but I've skipped
> doing so for now, since it would mean reading another file for every page
> displayed.

Also it is not necessary: if some tool want to get repo links for
given project, it can get 'summary' page; if some tool want to get
list of all repos, it can access one of projects list actions.

> 
> There is a small overhead in including the microformat on project list
> and forks list pages, but getting the project descriptions for those pages
> already incurs a similar overhead, and the ability to get every repo url
> in one place seems worthwhile.

By the way, do you have any benchmarks for that?

> 
> This changes git_get_project_url_list() to not check wantarray, and only
> return in list context -- the only way it is used AFAICS. It memoizes
> both that function and git_get_project_description(), to avoid redundant
> file reads.

I would also add that, from what I understand, you have made
git_get_project_url_list() subroutine to be self-sufficient: it now
considers both per-repository configuration (gitweb.url in config,
cloneurl file in $GIT_DIR) and global gitweb configuration
(@git_base_url_list variable).

Simplification of code so it always return list and does nto check
contents is a side issue, orthogonal to issue mentioned above.

> 
> Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
> ---
>  gitweb/gitweb.perl |   78 +++++++++++++++++++++++++++++++++++++++++----------
>  1 files changed, 62 insertions(+), 16 deletions(-)
> 
> This incorporates Giuseppe Bilotta's feedback, and uses new features
> of the microformat. You can see this version running at
> http://git.ikiwiki.info/
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 99f71b4..c238717 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2020,9 +2020,14 @@ sub git_get_path_by_hash {
>  ## ......................................................................
>  ## git utility functions, directly accessing git repository
>  
> +{
> +my %project_descriptions; # cache
> +

Won't we get warnings (and perhaps errors) from mod_perl? Shouldn't
this be "our %project_descriptions;"?

>  sub git_get_project_description {
>  	my $path = shift;
>  
> +	return $project_descriptions{$path} if exists $project_descriptions{$path};
> +
>  	$git_dir = "$projectroot/$path";
>  	open my $fd, "$git_dir/description"
>  		or return git_get_project_config('description');
> @@ -2031,7 +2036,9 @@ sub git_get_project_description {
>  	if (defined $descr) {
>  		chomp $descr;
>  	}
> -	return $descr;
> +	return $project_descriptions{$path}=$descr;
> +}
> +
>  }

If we use 'title="$project git repository" for 'rel="vcs-git"' links,
is it still worth it extra complication to avoid double calculation of
project description in the case of 'summary' view for a project?
Because IIRC for 'projects_list' view it is already cached in
@projects list as 'descr' key...

>  
>  sub git_get_project_ctags {
> @@ -2099,18 +2106,30 @@ sub git_show_project_tagcloud {
>  	}
>  }
>  
> +{
> +my %project_url_lists; # cache
> +

Same question: would it work correctly for mod_perl?

>  sub git_get_project_url_list {
> +	# use per project git URL list in $projectroot/$path/cloneurl
> +	# or make project git URL from git base URL and project name
>  	my $path = shift;
>  
> +	return @{$project_url_lists{$path}} if exists $project_url_lists{$path};
> +
> +	my @ret;
>  	$git_dir = "$projectroot/$path";
> -	open my $fd, "$git_dir/cloneurl"
> -		or return wantarray ?
> -		@{ config_to_multi(git_get_project_config('url')) } :
> -		   config_to_multi(git_get_project_config('url'));
> -	my @git_project_url_list = map { chomp; $_ } <$fd>;
> -	close $fd;
> +	if (open my $fd, "$git_dir/cloneurl") {
> +		@ret = map { chomp; $_ } <$fd>;
> +		close $fd;
> +	} else {
> +	       @ret = @{ config_to_multi(git_get_project_config('url')) };
> +	}
> +	@ret=map { "$_/$project" } @git_base_url_list if ! @ret;

Style: 

+	@ret = map { "$_/$project" } @git_base_url_list if !@ret;

or even

+	@ret = map { "$_/$project" } @git_base_url_list unless @ret;

> +
> +	$project_url_lists{$path}=\@ret;
> +	return @ret;
> +}
>  
> -	return wantarray ? @git_project_url_list : \@git_project_url_list;
>  }

Again: is it worth caching? It is only for 'summary'; for
'projects_list' it might be better to extend @projects list instead

>  
>  sub git_get_projects_list {
> @@ -2856,6 +2875,7 @@ sub blob_contenttype {
>  sub git_header_html {
>  	my $status = shift || "200 OK";
>  	my $expires = shift;
> +	my $extraheader = shift;
>  
>  	my $title = "$site_name";
>  	if (defined $project) {
> @@ -2953,6 +2973,8 @@ EOF
>  		print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
>  	}
>  
> +	print $extraheader if defined $extraheader;
> +
>  	print "</head>\n" .
>  	      "<body>\n";
>  

Good solution, but shouldn't this be better put into separate commit,
simply extending git_header_html to allow to add extra data (no need
to name it $extraheader I think, $extra would be enough) to the HTML
header (HEAD element contents)?

> @@ -4365,6 +4387,26 @@ sub git_search_grep_body {
>  	print "</table>\n";
>  }
>  
> +sub git_link_title {
> +	my $project=shift;
> +	
> +	my $description=git_get_project_description($project);
> +	return $project.(length $description ? " - $description" : "");
> +}

Style (whitespace around '='), and the fact that IMHO "$project git
repository" is better than "$project - $description", also because of
  "Unnamed repository; edit this file to name it for gitweb." 
default template

> +
> +# generates header with links to the specified projects
> +sub git_links_header {

Good abstraction, but I'm not so sure about subroutine name.

> +	my $ret='';
> +	foreach my $project (@_) {

Style: I'd rather use named variables, like "my @projects = @_";
also everywhere else we use spaces around '=' usually.

> +		# rel=vcs-* microformat
> +		my $title=git_link_title($project);

Good abstraction.

> +		foreach my $url git_get_project_url_list($project) {
> +			$ret.=qq{<link rel="vcs-git" href="$url" title="$title"/>\n}

To be HTML compatibile, it is better to use 

> +			$ret.=qq{<link rel="vcs-git" href="$url" title="$title" />\n}

(note the space before "/>").

> +		}
> +	}
> +	return $ret;
> +}
> +
>  ## ======================================================================
>  ## ======================================================================
>  ## actions
> @@ -4380,7 +4422,9 @@ sub git_project_list {
>  		die_error(404, "No projects found");
>  	}
>  
> -	git_header_html();
> +	my $extraheader=git_links_header(map { $_->{path} } @list);
> +
> +	git_header_html(undef, undef, $extraheader);
>  	if (-f $home_text) {
>  		print "<div class=\"index_include\">\n";
>  		insert_file($home_text);
> @@ -4405,8 +4449,10 @@ sub git_forks {
>  	if (!@list) {
>  		die_error(404, "No forks found");
>  	}
> +	
> +	my $extraheader=git_links_header(map { $_->{path} } @list);
>  
> -	git_header_html();
> +	git_header_html(undef, undef, $extraheader);
>  	git_print_page_nav('','');
>  	git_print_header_div('summary', "$project forks");
>  	git_project_list_body(\@list, $order);
> @@ -4468,14 +4514,14 @@ sub git_summary {
>  		print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
>  	}
>  
> -	# use per project git URL list in $projectroot/$project/cloneurl
> -	# or make project git URL from git base URL and project name
>  	my $url_tag = "URL";
> -	my @url_list = git_get_project_url_list($project);
> -	@url_list = map { "$_/$project" } @git_base_url_list unless @url_list;
> -	foreach my $git_url (@url_list) {
> +	my $title=git_link_title($project);
> +	foreach my $git_url (git_get_project_url_list($project)) {
>  		next unless $git_url;
> -		print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
> +		print "<tr class=\"metadata_url\"><td>$url_tag</td><td>".
> +		      # rel=vcs-* microformat
> +		      "<a rel=\"vcs-git\" href=\"$git_url\" title=\"$title\">$git_url</a>".
> +		      "</td></tr>\n";
>  		$url_tag = "";
>  	}

Non clickable hyperlink... hmmm...

>  
> -- 
> 1.5.6.5
> 
> 
> 
> -- 
> see shy jo

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: support the rel=vcs-* microformat
  2009-01-08  7:56             ` Giuseppe Bilotta
  2009-01-08 19:54               ` gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat) Joey Hess
@ 2009-01-10  1:04               ` Jakub Narebski
  1 sibling, 0 replies; 22+ messages in thread
From: Jakub Narebski @ 2009-01-10  1:04 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Joey Hess

Giuseppe Bilotta <giuseppe.bilotta@gmail.com> writes: 
> On Thursday 08 January 2009 00:24, Joey Hess wrote:
> 
> > The rel=vcs-* microformat allows a web page to indicate the locations of
> > repositories related to it in a machine-parseable manner.
> > (See http://kitenet.net/~joey/rfc/rel-vcs/)
> 
> Have you considered submitting the microformat to microformats.org?
> That would make the microformat more official and would be an good
> first step to have wider coverage of it, and additional reviews.

Good thinking.  BTW. microformats.org is IIRC wiki (or at least part
of it is wiki), so it should be easy to do...

> 
> > Make gitweb use the microformat if it has been configured with project url
> > information in any of the usual ways. On the project summary page, the
> > repository URL display is simply marked up using the microformat. On the
> > project list page and forks list page, the microformat is embedded in the
> > header, since the URLs do not appear on the page.
> > 
> > The microformat could be included on other pages too, but I've skipped
> > doing so for now, since it would mean reading another file for every page
> > displayed.
> > 
> > There is a small overhead in including the microformat on project list
> > and forks list pages, but getting the project descriptions for those pages
> > already incurs a similar overhead, and the ability to get every repo url
> > in one place seems worthwhile.
> 
> I agree with this, although people with very large project lists may
> differ ... do we have timings on these?

I think while adding this microformat to 'summary' page is non-issue,
we might want to be able configure it out so it is not used for
projects_list page (which might be very large).

And what about OPML, RSS and Atom formats?

>  
> > This changes git_get_project_url_list() to not check wantarray, and only
> > return in list context -- the only way it is used AFAICS. It memoizes
> > both that function and git_get_project_description(), to avoid redundant
> > file reads.
> 
> You may want to consider splitting the patch into three: memoizing
> of git_get_project_description(), reworking of
> git_get_project_url_list(), and the actual rel=vc-* insertions.

Very good idea.  Small, single feature patches are nice.

[...]
> >  sub git_get_project_description {
> >       my $path = shift;
> >  
> > +     return $project_descriptions{$path} if exists $project_descriptions{$path};
> > +
> 
> This line is bordering on the 80 characters, so you may want to
> consider moving 'my $descr' here, with something such as
> 
> my $descr = $project_descriptions{$path};
> return $descr if exists $descr;
> 
> Also, I'm no perl guru so I'm not sure about exists vs defined here.

You might have undefined value in existing key, but I guess that we
can assume that those are equivalent for this.  While 'exists' seems
more up to what you check (does the key exosts in hash) you further on
rely on the fact that $descr is not undefined.

[...]
> >  ## ======================================================================
> >  ## ======================================================================
> >  ## actions
> > @@ -4380,7 +4422,9 @@ sub git_project_list {
> >               die_error(404, "No projects found");
> >       }
> >  
> > -     git_header_html();
> > +     my $extraheader=git_links_header(map { $_->{path} } @list);
> > +
> > +     git_header_html(undef, undef, $extraheader);
> >       if (-f $home_text) {
> >               print "<div class=\"index_include\">\n";
> >               insert_file($home_text);
> > @@ -4405,8 +4449,10 @@ sub git_forks {
> >       if (!@list) {
> >               die_error(404, "No forks found");
> >       }
> > +     
> > +     my $extraheader=git_links_header(map { $_->{path} } @list);
> >  
> > -     git_header_html();
> > +     git_header_html(undef, undef, $extraheader);
> 
> This makes me wonder if it would be worth it to turn git_header_html
> into -param => value style, but I'm not really sure it's worth it.

It is git_header_html(STATUS, EXPIRES, EXTRA)

Hmmm... now I have checked we use either git_header_html() in gitweb
(which is most common), or git_header_html(STATUS) in die_error, or in
a few cases git_header_html(undef, $expires); and now
git_header_html(undef, undef, $extra), so named parameters might be a
good idea... I don't have opinion here...

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)
  2009-01-08 19:54               ` gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat) Joey Hess
  2009-01-08 23:53                 ` J.H.
@ 2009-01-10  1:11                 ` Jakub Narebski
  1 sibling, 0 replies; 22+ messages in thread
From: Jakub Narebski @ 2009-01-10  1:11 UTC (permalink / raw)
  To: Joey Hess; +Cc: git, Giuseppe Bilotta

Joey Hess <joey@kitenet.net> writes:
> Giuseppe Bilotta wrote:

> > > There is a small overhead in including the microformat on project list
> > > and forks list pages, but getting the project descriptions for those pages
> > > already incurs a similar overhead, and the ability to get every repo url
> > > in one place seems worthwhile.
> > 
> > I agree with this, although people with very large project lists may
> > differ ... do we have timings on these?
> 
> AFAICS, when displaying the project list, gitweb reads each project's
> description file, falling back to reading its config file if there is no
> description file.
> 
> If performance was a problem here, the thing to do would be to add
> project descriptions to the $project_list file, and use those in
> preference to the description files. If a large site has done that,
> they've not sent in the patch. :-)

There was such patch sent by me, but IIRC it fall out, also because it
was sent IIRC in feature freeze time.  I have "gitweb: Extend
project_index file format by project description" in my StGit stack.

> 
> With my patch, it will read each cloneurl file too. The best way to
> optimise that for large sites seems to be to add an option that would
> ignore the cloneurl files and config file and always use
> @git_base_url_list.

Good idea.

> 
> I checked the only large site I have access to (git.debian.org) and they
> use a $project_list file, but I see no other performance tuning. That's
> a 2 ghz machine; it takes gitweb 28 (!) seconds to generate the nearly 1
> MB index web page for 1671 repositories:
> 
> /srv/git.debian.org/http/cgi-bin/gitweb.cgi  3.04s user 9.24s system 43% cpu 28.515 total
>
> 
> Notice that most of the time is spent by child processes. For each
> repository, gitweb runs git-for-each-ref to determine the time of the
> last commit.
> 
> If that is removed (say if there were a way to get the info w/o
> forking), performance improves nicely:
> 
> ./gitweb.cgi > /dev/null  1.29s user 1.08s system 69% cpu 3.389 total
> 
> Making it not read description files for each project, as I suggest above,
> is the next best optimisation:
> 
> ./gitweb.cgi > /dev/null  1.08s user 0.05s system 96% cpu 1.170 total
> 
> So, I think it makes sense to optimise gitweb and offer knobs for performance
> tuning at the expense of the flexability of description and cloneurl files.
> But, git-for-each-ref is swamping everything else.

One solution would be to limit number of projects displayed on the
page, for example to 100 projects, although that would mainly reduce
problem with dealing with large page on client size, less so server
load unless we _do not_ sort projects by age.

Another solution would be to use caching: repo.or.cz uses one solution
(caching only of projects_list action), kernel.org other solution
(gitweb caching from GSoC 2008 project).

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)
  2009-01-08 23:53                 ` J.H.
  2009-01-09  0:16                   ` Miklos Vajna
  2009-01-09  0:19                   ` Johannes Schindelin
@ 2009-01-10  1:44                   ` Jakub Narebski
  2 siblings, 0 replies; 22+ messages in thread
From: Jakub Narebski @ 2009-01-10  1:44 UTC (permalink / raw)
  To: J.H.; +Cc: Joey Hess, git, Giuseppe Bilotta

"J.H." <warthog19@eaglescrag.net> writes:
> Joey Hess wrote:
>> Giuseppe Bilotta wrote:
>>
>>>> There is a small overhead in including the microformat on project list
>>>> and forks list pages, but getting the project descriptions for those pages
>>>> already incurs a similar overhead, and the ability to get every repo url
>>>> in one place seems worthwhile.
>>>>
>>> I agree with this, although people with very large project lists may
>>> differ ... do we have timings on these?
>>>
>>
>> AFAICS, when displaying the project list, gitweb reads each project's
>> description file, falling back to reading its config file if there is no
>> description file.
>>
>> If performance was a problem here, the thing to do would be to add
>> project descriptions to the $project_list file, and use those in
>> preference to the description files. If a large site has done that,
>> they've not sent in the patch. :-)
> 
> No because all the large sites have pain points and issues elsewhere
> in the app.  Most of the large sites (which I can at least speak for
> Kernel.org) went and have built in full caching layers into gitweb
> itself to deal with the problem.  This means that we don't have to
> worry about nickle and dime performance improvements that are specific
> to one section, but can do a very broad sweep and get dramatically
> better performance across all of gitweb.  Those patches have all made
> it back out onto the mailing list, but for a number of different
> reasons none have been accepted into the mainline branch.

Additional issue is that when you add or delete repository (project),
you have to correct or regenerate projects_index file.  While it is I
think quite easy for git hosting sites such as repo.or.cz, it is
harder for sites which offer gitweb just like they ofer WWW homepages:
as a service, with repositories created (and descriptions updated)
outside of gitweb control.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-01-10  1:46 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-07  4:25 [PATCH] gitweb: support the rel=vcs microformat Joey Hess
2009-01-07 12:30 ` Giuseppe Bilotta
2009-01-07 15:50   ` Joey Hess
2009-01-07 18:03     ` Giuseppe Bilotta
2009-01-07 18:41       ` Joey Hess
2009-01-10  0:01         ` Jakub Narebski
2009-01-07 18:45       ` Joey Hess
2009-01-07 19:02         ` Joey Hess
2009-01-07 23:24           ` [PATCH] gitweb: support the rel=vcs-* microformat Joey Hess
2009-01-08  7:56             ` Giuseppe Bilotta
2009-01-08 19:54               ` gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat) Joey Hess
2009-01-08 23:53                 ` J.H.
2009-01-09  0:16                   ` Miklos Vajna
2009-01-09  0:19                   ` Johannes Schindelin
2009-01-09  0:26                     ` J.H.
2009-01-10  1:44                   ` Jakub Narebski
2009-01-10  1:11                 ` Jakub Narebski
2009-01-10  1:04               ` [PATCH] gitweb: support the rel=vcs-* microformat Jakub Narebski
2009-01-10  0:52             ` Jakub Narebski
2009-01-10  0:03           ` [PATCH] gitweb: support the rel=vcs microformat Jakub Narebski
2009-01-09 23:56   ` Jakub Narebski
2009-01-09 23:49 ` Jakub Narebski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.