All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephan Beyer <s-beyer@gmx.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [PATCH 14/16] bisect: use a modified breadth-first search to find relevant weights
Date: Fri, 26 Feb 2016 21:55:16 +0100	[thread overview]
Message-ID: <56D0BBB4.9020305@gmx.net> (raw)
In-Reply-To: <xmqqr3g0kvzx.fsf@gitster.mtv.corp.google.com>

Hi,

On 02/26/2016 04:09 AM, Junio C Hamano wrote:
> Interesting.  So you walk from the bottom commits, incrementing the
> weight while walking on a part of the graph that is single strand of
> pearls, or doing the "count the reachable ones the hard way" when
> you hit a merge [*1*], but either way, once you know the commit you
> are looking at is above the mid-way, i.e. reaches more than half of
> the graph, then stop walking because its children will never be
> better than that commit?

Exactly. Maybe that's an even better description for the commit message l-)

> I haven't studied the code carefully, but your mention of BFS makes
> me wonder if you can "narrow" down the search space even more.

I had an idea that aimed at accomplishing this by finding the
highest-distance merge commit using binary search. (The BFS collected
all merge commits, in that order, then you could do binary search.)
So you have O(log(#mergecommits)) "do it the hard way" weight computations.
However, in the worst case (or even in every case, I did not thoroughly
think about the cases that can occur), it could happen that it also had
to do these "do it the hard way" computations for all merge commits with
smaller weight... That's why I dropped this idea in favor of the more
simple approach that I sent to the list.

> In an earlier step of the series, you introduced a work queue to
> replace the recursion.  The recursion forces the algorithm to work
> along the topology of the history, but you have more latitude in
> selecting the commit to dig further with a work queue.
> 
> I wonder if you can sort the elements on the work queue based on
> their distance (i.e. by using a priority queue).  You know the total
> upfront, so you may find a commit whose weight is exactly half of it
> before traversing all the bottom half of the history and it may turn
> out to be even more efficient.  After all, you are only looking for
> just one such commit, not trying to find all the commits with the
> best weight.

For the compute_weight() function (the "doing it the hard way"
function), this would not make sense since all parent commits (of a
merge commit we call compute_weight() on) will have smaller distance.

However, your idea can help:
I just noticed that it's not important that I use a *B*FS because of the
acyclity. So I could also use a DFS that is "more greedy" going towards
high numbers.
Note that the traversal is always on trees because the merge commits are
cut out. So whenever a branching (a commit with more than one child)
occurs, the DFS should follow the branch of maximum length to its leaf.
(These maximum lengths can be saved during build_reversed_dag().)
Then, if there is a halfway commit, we stop. If not, we can just go on
with the remaining DFS...
I think I only rethought and rephrased your idea. :) Sounds good to me.
I'm going to add commits for that.

> [Footnote]
> 
> *1* A merge between a commit that reaches N commits and another that
> reaches M commits may have weight smaller than the sum of N and M,
> and that is why I left the original "truly stupid algorithm" Linus
> wrote as-is when I did the obvious optimization for the linear parts
> of the history.

Yes. In fact, the "truly stupid algorithm" is not truly stupid. I'm not
quite sure, but I think it's still the best algorithm known so far for
that problem. (But maybe the problem is just not very interesting in
science.)

> *2* Whenever I revisited the bisection in my head, I thought about
> ways to improve that "truly stupid" counting, but never thought
> about an approach to simply reduce the number of times the "truly
> stupid" counting has to be done.

Well, I also improved that "truly stupid" counting a little in a former
commit (patch 7). But it's just the implementation that I improved
(time/memory trade-off), not the algorithm. :)

Cheers,
  Stephan

  reply	other threads:[~2016-02-26 20:55 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-26  2:04 [PATCH 00/16] git bisect improvements Stephan Beyer
2016-02-26  2:04 ` [PATCH 01/16] bisect: write about `bisect next` in documentation Stephan Beyer
2016-02-26  8:02   ` Jacob Keller
2016-02-26 18:47   ` Junio C Hamano
2016-02-27 13:45     ` Stephan Beyer
2016-02-27 18:03       ` Junio C Hamano
2016-02-27 19:38         ` Stephan Beyer
2016-02-28 18:28           ` Junio C Hamano
2016-02-26  2:04 ` [PATCH 02/16] bisect: add test for the bisect algorithm Stephan Beyer
2016-02-26  6:53   ` Christian Couder
2016-02-26 21:38     ` Stephan Beyer
2016-02-27 11:40       ` Christian Couder
2016-02-27 12:42         ` Matthieu Moy
2016-02-26  2:04 ` [PATCH 03/16] bisect: make bisect compile if DEBUG_BISECT is set Stephan Beyer
2016-02-26  2:04 ` [PATCH 04/16] bisect: make algorithm behavior independent of DEBUG_BISECT Stephan Beyer
2016-02-26  2:04 ` [PATCH 05/16] bisect: get rid of recursion in count_distance() Stephan Beyer
2016-02-26  2:04 ` [PATCH 06/16] bisect: use struct node_data array instead of int array Stephan Beyer
2016-02-26  2:04 ` [PATCH 07/16] bisect: replace clear_distance() by unique markers Stephan Beyer
2016-02-26  2:04 ` [PATCH 08/16] bisect: use commit instead of commit list as arguments when appropriate Stephan Beyer
2016-02-26  3:10   ` Junio C Hamano
2016-02-26  2:04 ` [PATCH 09/16] bisect: extract get_distance() function from code duplication Stephan Beyer
2016-02-26  2:04 ` [PATCH 10/16] bisect: introduce distance_direction() Stephan Beyer
2016-02-26  2:04 ` [PATCH 11/16] bisect: make total number of commits global Stephan Beyer
2016-02-26  2:04 ` [PATCH 12/16] bisect: rename count_distance() to compute_weight() Stephan Beyer
2016-02-26  2:04 ` [PATCH 13/16] bisect: prepare for different algorithms based on find_all Stephan Beyer
2016-02-26  2:04 ` [PATCH 14/16] bisect: use a modified breadth-first search to find relevant weights Stephan Beyer
2016-02-26  3:09   ` Junio C Hamano
2016-02-26 20:55     ` Stephan Beyer [this message]
2016-02-26  2:04 ` [PATCH 15/16] bisect: compute best bisection in compute_relevant_weights() Stephan Beyer
2016-02-26  2:04 ` [PATCH 16/16] bisect: get back halfway shortcut Stephan Beyer
2016-03-20 18:50 ` [PATCH 00/16] git bisect improvements Pranit Bauva
2016-03-21 22:22   ` Stephan Beyer
2016-03-22  7:35     ` Christian Couder
2016-03-22 11:35     ` Pranit Bauva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56D0BBB4.9020305@gmx.net \
    --to=s-beyer@gmx.net \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.