All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Stephan Beyer <s-beyer@gmx.net>
Cc: git@vger.kernel.org, Christian Couder <christian.couder@gmail.com>
Subject: Re: [PATCH v2 18/21] bisect: prepare for different algorithms based on find_all
Date: Fri, 15 Apr 2016 15:36:06 -0700	[thread overview]
Message-ID: <xmqqy48e7b55.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <1460294354-7031-19-git-send-email-s-beyer@gmx.net> (Stephan Beyer's message of "Sun, 10 Apr 2016 15:19:11 +0200")

Stephan Beyer <s-beyer@gmx.net> writes:

> This is a preparation commit with copy-and-paste involved.
> The function do_find_bisection() is changed and copied to
> two almost similar functions compute_all_weights() and
> compute_relevant_weights().
>
> The function compute_relevant_weights() stops when a
> "halfway" commit is found.
>
> To keep the code clean, the halfway commit is not returned
> and has to be found by best_bisection() afterwards.
> This results in a singular additional O(#commits)-time
> overhead but this will be outweighed by the following
> changes to compute_relevant_weights().
>
> It is necessary to keep compute_all_weights() for the
> "git rev-list --bisect-all" command. All other bisect-related
> commands will use compute_relevant_weights().
>
> Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
> ---
>  bisect.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 97 insertions(+), 19 deletions(-)
>
> diff --git a/bisect.c b/bisect.c
> index a254f28..c6bad43 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -179,6 +179,7 @@ static struct commit_list *best_bisection(struct commit_list *list)
>  		}
>  	}
>  
> +	best->next = NULL;
>  	return best;
>  }

At this point of this patch it is unclear how this change is
relevant.  I'll read on without complaining but with puzzlement.

> @@ -245,9 +246,8 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list)
>   * unknown.  After running compute_weight() first, they will get zero
>   * or positive distance.
>   */
> -static struct commit_list *do_find_bisection(struct commit_list *list,
> -					     struct node_data *weights,
> -					     int find_all)
> +static void compute_all_weights(struct commit_list *list,
> +				struct node_data *weights)
>  {
>  	int n, counted;
>  	struct commit_list *p;
> @@ -301,10 +301,88 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
>  		if (!(p->item->object.flags & UNINTERESTING)
>  		 && (node_data(p->item)->weight == -2)) {
>  			compute_weight(p->item);
> +			counted++;
> +		}
> +	}
> +
> +	show_list("bisection 2 compute_weight", counted, list);
> +
> +	while (counted < total) {
> +		for (p = list; p; p = p->next) {
> +			struct commit_list *q;
> +			unsigned flags = p->item->object.flags;
> +
> +			if (0 <= node_data(p->item)->weight)
> +				continue;
> +			for (q = p->item->parents; q; q = q->next) {
> +				if (q->item->object.flags & UNINTERESTING)
> +					continue;
> +				if (0 <= node_data(q->item)->weight)
> +					break;
> +			}
> +			if (!q)
> +				continue;
> +
> +			/*
> +			 * weight for p is unknown but q is known.
> +			 * add one for p itself if p is to be counted,
> +			 * otherwise inherit it from q directly.
> +			 */

This is not a new problem, but "q" in the comment should be
"q->item"; my bad.

> +			node_data(p->item)->weight = node_data(q->item)->weight;
> +			if (!(flags & TREESAME)) {
> +				node_data(p->item)->weight++;
> +				counted++;
> +				show_list("bisection 2 count one",
> +					  counted, list);
> +			}

This is not a new problem, and I do admit that I wrote the original
at 1daa09d9 (make the previous optimization work also on
path-limited rev-list --bisect, 2007-03-23), but this makes me
wonder why counted++ is not done for p that is treesame.

 ... goes and looks ...

Ahh, the original while loop was counting upto "nr", which is
different from total.  When the traversal is pathspec limited, I
counted in the original (and probably in 'master' branch before your
series) the number of tree-changing commits in 'nr' which can be
smaller than 'total'.  That is why skipping counted++ on a treesame
commit was the correct thing to do.

Is it possible that you did not test --bisect-all with pathspec?  I
have a suspicion that the above loop would not terminate because of
this change.  If that is the case, either "make total global and get
rid of nr" needs to be fixed, or (probably better) move counted++
out of this if statement.  At this point, counted indicates "how
many commits out of 'total' we have computed weight for?", so the
latter would make more sense to me, even though either is OK.

The "priming the well" step for the "no interesting parent--set
weight to either 0 or 1" also needs to adjust counted++, I suspect.

> +		}
> +	}
> +	show_list("bisection 2 counted all", counted, list);
> +}
> +
> +/* At the moment this is basically the same as compute_all_weights()
> + * but with a halfway shortcut */

/*
 * We write our multi-line comments like this.
 * The first and last lines have only asterisk
 * and slash.
 */

> +static void compute_relevant_weights(struct commit_list *list,
> +				     struct node_data *weights)
> +{
> +	int n, counted;
> +	struct commit_list *p;
> +
> +	counted = 0;
> +
> +	for (n = 0, p = list; p; p = p->next) {
> +		struct commit *commit = p->item;
> +		unsigned flags = commit->object.flags;
> +
> +		commit->util = &weights[n++];
> +		switch (count_interesting_parents(commit)) {
> +		case 0:
> +			if (!(flags & TREESAME)) {
> +				node_data(commit)->weight = 1;
> +				counted++;
> +				show_list("bisection 2 count one",
> +					  counted, list);
> +			}
> +			break;
> +		case 1:
> +			node_data(commit)->weight = -1;
> +			break;
> +		default:
> +			node_data(commit)->weight = -2;
> +			break;
> +		}
> +	}
> +
> +	show_list("bisection 2 initialize", counted, list);
> +
> +	for (p = list; p; p = p->next) {
> +		if (!(p->item->object.flags & UNINTERESTING)
> +		 && (node_data(p->item)->weight == -2)) {
> +			compute_weight(p->item);
>  
>  			/* Does it happen to be at exactly half-way? */
> -			if (!find_all && halfway(p->item))
> -				return p;
> +			if (halfway(p->item))
> +				return;

It is somewhat curious why this, after finding the desired commit as
p->item, does not return it.  If it is already known that we have
half-way one, can't we return it immediately (and when we fall
through without finding exactly the half-way one, we signal the
caller that it needs best_bisection() among the list by returning a
NULL or something?

But probably that is optimizing it prematurely.  I dunno.

>  			counted++;
>  		}
>  	}
> @@ -341,17 +419,11 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
>  			}
>  
>  			/* Does it happen to be at exactly half-way? */
> -			if (!find_all && halfway(p->item))
> -				return p;
> +			if (halfway(p->item))
> +				return;
>  		}
>  	}
> -
>  	show_list("bisection 2 counted all", counted, list);
> -
> -	if (!find_all)
> -		return best_bisection(list);
> -	else
> -		return best_bisection_sorted(list);
>  }
>  
>  struct commit_list *find_bisection(struct commit_list *list,
> @@ -365,6 +437,9 @@ struct commit_list *find_bisection(struct commit_list *list,
>  	total = 0;
>  	marker = 0;
>  
> +	if (!list)
> +		return NULL;
> +
>  	show_list("bisection 2 entry", 0, list);
>  
>  	/*
> @@ -391,13 +466,16 @@ struct commit_list *find_bisection(struct commit_list *list,
>  	*all = total;
>  	weights = (struct node_data *)xcalloc(on_list, sizeof(*weights));
>  
> -	/* Do the real work of finding bisection commit. */
> -	best = do_find_bisection(list, weights, find_all);
> -	if (best) {
> -		if (!find_all)
> -			best->next = NULL;
> -		*reaches = node_data(best->item)->weight;
> +	if (find_all) {
> +		compute_all_weights(list, weights);
> +		best = best_bisection_sorted(list);
> +	} else {
> +		compute_relevant_weights(list, weights);
> +		best = best_bisection(list);
>  	}
> +	assert(best);
> +	*reaches = node_data(best->item)->weight;
> +
>  	free(weights);
>  
>  	return best;

  reply	other threads:[~2016-04-15 22:36 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-10 13:18 [PATCH v2 00/21] git bisect improvements Stephan Beyer
2016-04-10 13:18 ` [PATCH v2 01/21] bisect: write about `bisect next` in documentation Stephan Beyer
2016-04-10 13:18 ` [PATCH v2 02/21] bisect: allow 'bisect run' if no good commit is known Stephan Beyer
2016-04-10 13:18 ` [PATCH v2 03/21] t/test-lib-functions.sh: generalize test_cmp_rev Stephan Beyer
2016-04-11  0:07   ` Eric Sunshine
2016-04-15 20:00   ` Junio C Hamano
2016-04-24 19:51     ` Stephan Beyer
2016-04-25 18:08       ` Junio C Hamano
2016-04-10 13:18 ` [PATCH v2 04/21] t: use test_cmp_rev() where appropriate Stephan Beyer
2016-04-11  0:07   ` Eric Sunshine
2016-04-15 20:48   ` Junio C Hamano
2016-04-10 13:18 ` [PATCH v2 05/21] t6030: generalize test to not rely on current implementation Stephan Beyer
2016-04-10 13:47   ` Torsten Bögershausen
2016-04-10 19:16     ` Junio C Hamano
2016-04-10 19:37       ` Stephan Beyer
2016-04-11  0:23     ` Eric Sunshine
2016-04-15 21:07   ` Junio C Hamano
2016-04-10 13:18 ` [PATCH v2 06/21] bisect: add test for the bisect algorithm Stephan Beyer
2016-04-15 21:13   ` Junio C Hamano
2016-04-10 13:19 ` [PATCH v2 07/21] bisect: plug the biggest memory leak Stephan Beyer
2016-04-15 21:18   ` Junio C Hamano
2016-04-10 13:19 ` [PATCH v2 08/21] bisect: make bisect compile if DEBUG_BISECT is set Stephan Beyer
2016-04-15 21:22   ` Junio C Hamano
2016-04-10 13:19 ` [PATCH v2 09/21] bisect: make algorithm behavior independent of DEBUG_BISECT Stephan Beyer
2016-04-15 21:25   ` Junio C Hamano
2016-04-10 13:19 ` [PATCH v2 10/21] bisect: get rid of recursion in count_distance() Stephan Beyer
2016-04-15 21:31   ` Junio C Hamano
2016-04-10 13:19 ` [PATCH v2 11/21] bisect: use struct node_data array instead of int array Stephan Beyer
2016-04-12 23:02   ` Christian Couder
2016-04-15 21:47   ` Junio C Hamano
2016-04-10 13:19 ` [PATCH v2 12/21] bisect: replace clear_distance() by unique markers Stephan Beyer
2016-04-12 23:20   ` Christian Couder
2016-04-15 22:07   ` Junio C Hamano
2016-04-10 13:19 ` [PATCH v2 13/21] bisect: use commit instead of commit list as arguments when appropriate Stephan Beyer
2016-04-15 22:08   ` Junio C Hamano
2016-04-10 13:19 ` [PATCH v2 14/21] bisect: extract get_distance() function from code duplication Stephan Beyer
2016-04-15 22:08   ` Junio C Hamano
2016-04-10 13:19 ` [PATCH v2 15/21] bisect: introduce distance_direction() Stephan Beyer
2016-04-15 22:10   ` Junio C Hamano
2016-04-10 13:19 ` [PATCH v2 16/21] bisect: make total number of commits global Stephan Beyer
2016-04-13 13:23   ` Christian Couder
2016-04-15 22:11   ` Junio C Hamano
2016-04-16  0:44   ` Junio C Hamano
2016-04-10 13:19 ` [PATCH v2 17/21] bisect: rename count_distance() to compute_weight() Stephan Beyer
2016-04-13 13:32   ` Christian Couder
2016-04-15 22:12   ` Junio C Hamano
2016-04-10 13:19 ` [PATCH v2 18/21] bisect: prepare for different algorithms based on find_all Stephan Beyer
2016-04-15 22:36   ` Junio C Hamano [this message]
2016-04-10 13:19 ` [PATCH v2 19/21] bisect: use a bottom-up traversal to find relevant weights Stephan Beyer
2016-04-13 14:11   ` Christian Couder
2016-04-15 22:47   ` Junio C Hamano
2016-04-15 22:49   ` Junio C Hamano
2016-04-26 18:27   ` Junio C Hamano
2016-04-10 13:24 ` [PATCH v2 20/21] bisect: compute best bisection in compute_relevant_weights() Stephan Beyer
2016-04-10 13:24   ` [PATCH v2 21/21] bisect: get back halfway shortcut Stephan Beyer
2016-04-15 22:53     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqy48e7b55.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=s-beyer@gmx.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.