Re: [PATCHv5 12/12] gitweb: gather more remote data

From: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCHv5 12/12] gitweb: gather more remote data
Date: Sat, 23 Oct 2010 18:17:51 +0200	[thread overview]
Message-ID: <AANLkTi=7gUAbdRKLCifPidZXdLa8fCENT0=iy5bKAeiA@mail.gmail.com> (raw)
In-Reply-To: <201009271747.39201.jnareb@gmail.com>

I'm awfully sorry for the delay in the reply to this email. I got
gobbled up by some real-world stuff just while finishing the review of
the last patch.

2010/9/27 Jakub Narebski <jnareb@gmail.com>:
> On Fri, 24 Sep 2010, Giuseppe Bilotta wrote:
>
> Isn't the most important and visible information the fact that both
> standalone 'remotes' view and 'remotes' section in the 'summary' page
> is now grouped by remotes?  This should be explicitely mentioned in the
> commit message.  It is not very visible IMVHO in what is written below.

You're right. I've rewritten the message by putting the grouping
layout in the subject and rephrasing the body of the commit to better
describe what happens both in remotes view and in limited summary
view.

>>
>> +# Return an array with: a hash of remote names mapping to the corresponding
>> +# remote heads, the value of the $limit parameter, and a boolean indicating
>> +# whether we managed to get all the remotes or not.
>
> Perhaps it would be better to provide an example, rather than trying to
> describe the output.

I'm really not sure about how it would help, but I'll try.

>> +# If $limit is specified and the number of heads found is higher, the head
>> +# list is empy. Additional filtering on the number of heads can be done when
>> +# displaying the remotes.
>> +sub git_get_remotes_data {
>> +     my ($limit, $wanted) = @_;
>
> I'm not sure if it wouldn't be better to not worry git_get_remotes_data
> about $limit, but do limiting on output.  The amount of work gitweb
> needs to do in both situation is, I guess, nearly the same.

I think that the real issue is that situations in which gathering all
data and then discarding much of it would be very rare, being that it
would require a very large number of remotes (not remote heads, just
remotes). But considering that even I, a rather casual user, have at
least one project where the number of remotes is in the 20s, I
wouldn't be surprised if there were cases of humongous remote lists.

>> +     my %remotes;
>> +     open my $fd, '-|' , git_cmd(), 'remote', '-v';
>> +     return unless $fd;
>> +     my $more = 1;
>> +     while (my $remote = <$fd> and $more) {
>
> Why not use 'last' instead of using $more variable (though I have not
> checked if it would really make code simpler and more readable).

Because we need to know whether we bail out early or not from the
loop. It might be appropriate to document this in the code.

>> +             chomp $remote;
>> +             $remote =~ s!\t(.*?)\s+\((\w+)\)$!!;
>> +             next if $wanted and not $remote eq $wanted;
>> +             my $url = $1;
>> +             my $key = $2;
>
> Minor nitpick: why not
>
>  +             my ($url, $key) = ($1, $2);

No reason. Can do.

>> +     if ($more) {
>> +             my @heads = map { "remotes/$_" } keys %remotes;
>> +             my @remoteheads = git_get_heads_list(undef, @heads);
>> +             foreach (keys %remotes) {
>> +                     my $remote = $_;
>> +                     $remotes{$remote}{'heads'} = [ grep {
>> +                             $_->{'name'} =~ s!^$remote/!!
>> +                     } @remoteheads ];
>> +             }
>> +     }
>> +     return (\%remotes, $limit, $more);
>
> Hmmmm... as it can be seen making this function do more work results
> in this ugly API.  If git_get_remotes_data didn't worry about limiting,
> we could return simply %remotes hash (or \%remotes hash reference).

If I read your comments correctly, your idea would be to have two
separate functions, one that gets the list of remotes (with their
respective URLs), and a separate one (called when intended) to also
gather the list of heads. The only case in which the latter would not
be called would be when in summary view more than the given number of
maximum remotes were returned from the previous function. Correct?

>> +# Display a single remote block
>> +sub git_remote_body {
>> +     my ($remote, $rdata, $limit, $head, $single) = @_;
>> +     my %rdata = %{$rdata};
>
> Why do you need this, instead of simply using $rdata->{'heads'} etc.?

Because I keep getting messed up with the refs syntax in Perl. Thanks
for the hint.

>> +     my $heads = $rdata{'heads'};
>
> Why not
>
>  +     my @heads = @{$rdata{'heads'}};
>
> Or
>
>  +     my @heads = @{$rdata->{'heads'}};
>
> without this strange '%rdata = %{$rdata};'

Well, the main use of heads was to be passed to git_heads_body so I
thought I would just leave it as a ref.

>> +     if (defined $fetch) {
>> +             if ($fetch eq $push) {
>> +                     $urls .= git_repo_url("URL", $fetch);
>> +             } else {
>> +                     $urls .= git_repo_url("Fetch URL", $fetch);
>> +                     $urls .= git_repo_url("Push URL", $push) if defined $push;
>> +             }
>> +     } elsif (defined $push) {
>> +             $urls .= git_repo_url("Push URL", $push);
>> +     } else {
>> +             $urls .= git_repo_url("", "No remote URL");
>> +     }
>
> Perhaps reverse order of conditions would result in less nested
> conditional... but perhaps not.

I tried. The problem is that we just have that many cases (fetch and
no push, push and no fetch, equal fetch and push, different fetch and
push). I could not think of a way to handle them all in a more
streamlined way.

>> +
>> +     $urls .= "</table>\n";
>> +
>> +     my ($maxheads, $dots);
>> +     if (defined $limit) {
>> +             $maxheads = $limit - 1;
>
> If git_get_remotes_data didn't do limiting, you could use
>
>  +             $maxheads = scalar @heads;
>
>> +             if ($#{$heads} > $maxheads) {
>> +                     $dots = $cgi->a({-href => href(action=>"remotes", hash=>$remote)}, "...");
>> +             }
>> +     }
>
> We can always check if we got more remotes than limit.
>

Actually, this is in reverse. get_remotes_data doesn't do limiting on
the number of heads, which is why $maxheads is $limit - 1 and the
number of heads is compared against it. git_remotes_data does limiting
on the number of _remotes_, and if the limit is hit, git_remote_body
would just not be called.

>> +
>> +     my $content = sub {
>> +             print $urls;
>> +             git_heads_body($heads, $head, 0, $maxheads, $dots);
>> +     };
>> +
>> +     if (defined $single and $single) {
>
> 'defined $single and $single' is the same as '$single', because
> undefined value is false-ish.

I was over-protecting against warnings about the use of undefined
symbols. Cleaned up.

>> +             $content->();
>> +     } else {
>> +             git_group("remotes", $remote, "remotes", $remote, $remote, $content);
>> +     }
>
> Hmmm... should git_remote_body be responsible for wrapping in subsection?
> Wouldn't it be better to have caller use
>
>  git_group("remotes", $remote, "remotes", $remote, $remote,
>            sub { git_remote_body($remote, $rdata, $limit, $head) });

Right.

> Sidenote: strange (but perhaps unavoidable) repetition we have here...

Even with the new syntax discussed for the previous patches this is
unavoidable, it seems.

>> +}
>> +
>> +# Display remote heads grouped by remote, unless there are too many
>> +# remotes ($have_all is false), in which case we only display the remote
>> +# names
>
> Wouldn't it be better to put displaying only remote names in a separate
> subroutine, which would make git_remotes_body extremely simple?

Sure.

>> +                     print "<td>" .
>> +                           $cgi->a({-href=> href(action=>'remotes', hash=>$remote),
>> +                                    -class=> "list name"},esc_html($remote)) . "</td>";
>> +                     print "<td class=\"link\">" .
>> +                           (defined $fetch ? $cgi->a({-href=> $fetch}, "fetch") : "fetch") .
>> +                           " | " .
>> +                           (defined $push ? $cgi->a({-href=> $push}, "push") : "push") .
>> +                           "</td>";
>
> I see that you don't worry here if $fetch == $push, only if they
> are defined.  But I think it might be all right... though the exact
> output might need some discussion.

I'm open to suggestions.

>>  sub git_search_grep_body {
>>       my ($commitlist, $from, $to, $extra) = @_;
>>       $from = 0 unless defined $from;
>> @@ -5164,7 +5302,7 @@ sub git_summary {
>>       # there are more ...
>>       my @taglist  = git_get_tags_list(16);
>>       my @headlist = git_get_heads_list(16, 'heads');
>> -     my @remotelist = $remote_heads ? git_get_heads_list(16, 'remotes') : ();
>> +     my @remotelist = $remote_heads ? git_get_remotes_data(16) : ();
>
> That's not @remotelist any longer, isn't it?

Right. I renamed it to @remotedata across the whole file.

>>       my @forklist;
>>       my $check_forks = gitweb_check_feature('forks');
>>
>> @@ -5244,9 +5382,7 @@ sub git_summary {
>>
>>       if (@remotelist) {
>>               git_print_header_div('remotes');
>> -             git_heads_body(\@remotelist, $head, 0, 15,
>> -                            $#remotelist <= 15 ? undef :
>> -                            $cgi->a({-href => href(action=>"remotes")}, "..."));
>> +             git_remotes_body(@remotelist, $head);
>
> Hmmm... (current API, with @remotelist (!) containing list of arguments
> to a subroutine).

It does make the call much cleaner though (think remotedata: it has
the data, and the information on whether the data is complete or not.
doesn't it make sense?)

>>       }
>>
>>       if (@forklist) {
>> @@ -5570,26 +5706,25 @@ sub git_remotes {
>>       my $head = git_get_head_hash($project);
>>       my $remote = $input_params{'hash'};
>>
>> +     my @remotelist = git_get_remotes_data(undef, $remote);
>> +     die_error(500, "Unable to get remote information") unless @remotelist;
>
> What if there are no remotes, and no remote-tracking branches?

Presently, we get 404 - No remotes found if there are no remotes. If
there are remotes without heads, the section will only contain the
URL.

The funny thing is that if there are no remotes there will be an empty
remotes section in summary view, so we probably to check for that too.

>> +
>> +     if (keys(%{$remotelist[0]}) == 0) {
>> +             die_error(404, defined $remote ?
>> +                     "Remote $remote not found" :
>> +                     "No remotes found");
>> +     }
>
> Ah, I see that it is handled here.  Sidenote: with proposed changed
> API we can distinguish error case from no-remotes case by returning
> undefined value versus returning empty hash / empty hash reference.

The question is: do we really want the different API? It does make a
few things cleaner, but it makes other things messier (I'm thinking
here about the calls to git_*_body; compare what we have for remotes
with what we have for tags or heads list in summary view ...)

-- 
Giuseppe "Oblomov" Bilotta