All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 02/10] unpack-trees: move all skip-worktree check back to unpack_trees()
Date: Mon, 15 Nov 2010 10:01:35 -0600	[thread overview]
Message-ID: <20101115160135.GA16385@burratino> (raw)
In-Reply-To: <1289817410-32470-3-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy wrote:

> Earlier, the will_have_skip_worktree() checks are done in various places:
> 
> 1. in verify_* functions to prevent absent/uptodate checks outside
>    worktree
> 2. in merged_entry for new index entries
> 3. in apply_sparse_checkout() for all entries
> 
> In case all entries are added new ($GIT_DIR/index is missing) all the
> above checks will be done for all entries, which in the worst case can
> become cache_nr*el->nr*3 fnmatch() calls. Quite expensive.

Does this mean something like:

	Handling of sparse checkouts is slow.

	[timings]

	To fix this, we will do such-and-such.  As a first step,
	we'll do such-and-such which does not change semantics
	and at least does not make it any slower.

?

In other words, any commit message should make clear

 1. The purpose.
 2. The baseline of (sane or insane) behavior that is affected.
 3. The intended resulting functionality.

How the patch works and why it succeeds are just optional extras
(nice, certainly, but in 90% of cases it is obvious from the code once
you know (1), (2), and (3) anyway).

> --- a/cache.h
> +++ b/cache.h
> @@ -182,6 +182,7 @@ struct cache_entry {
>  #define CE_WT_REMOVE (0x400000) /* remove in work directory */
>  
>  #define CE_UNPACKED  (0x1000000)
> +#define CE_NEW_SKIP_WORKTREE (0x2000000)

Semantics?

> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -258,7 +258,7 @@ static int apply_sparse_checkout(struct cache_entry *ce, struct unpack_trees_opt
>  {
>  	int was_skip_worktree = ce_skip_worktree(ce);
>  
> -	if (!ce_stage(ce) && will_have_skip_worktree(ce, o))
> +	if (ce->ce_flags & CE_NEW_SKIP_WORKTREE)
>  		ce->ce_flags |= CE_SKIP_WORKTREE;

So CE_NEW_SKIP_WORKTREE roughly means "stage-0 entry outside the sparse checkout area"?

>  	else
>  		ce->ce_flags &= ~CE_SKIP_WORKTREE;

In particular, I guess the NEW part refers to the sparse checkout
area, not the entry, since existing index entries with SKIP_WORKTREE
bits need to keep that bit.

> @@ -834,6 +834,49 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>  	return mask;
>  }
>  
> +static void set_new_skip_worktree_1(struct unpack_trees_options *o)
> +{

Will comment on name once we get to the call site.

> +	int i;
> +
> +	for (i = 0;i < o->src_index->cache_nr;i++) {
> +		struct cache_entry *ce = o->src_index->cache[i];
> +		ce->ce_flags &= ~CE_ADDED;
> +		if (!ce_stage(ce) && will_have_skip_worktree(ce, o))
> +			ce->ce_flags |= CE_NEW_SKIP_WORKTREE;
> +		else
> +			ce->ce_flags &= ~CE_NEW_SKIP_WORKTREE;
> +	}
> +}

Populating the CE_NEW_SKIP_WORKTREE flags based on the new
worktree area.

> +
> +static int verify_absent(struct cache_entry *, enum unpack_trees_error_types, struct unpack_trees_options *);
> +static int set_new_skip_worktree_2(struct unpack_trees_options *o)
> +{
> +	int i;
> +
> +	/*
> +	 * CE_ADDED marks new index entries. These have not been processed
> +	 * by set_new_skip_worktree_1() so we do it here.
> +	 */

Probably this comment belongs at the call site instead, to avoid some
chasing.

> +	for (i = 0;i < o->result.cache_nr;i++) {
> +		struct cache_entry *ce = o->result.cache[i];
> +
> +		if (!(ce->ce_flags & CE_ADDED))
> +			continue;
> +
> +		ce->ce_flags &= ~CE_ADDED;
> +		if (!ce_stage(ce) && will_have_skip_worktree(ce, o))
> +			ce->ce_flags |= CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE;
> +		else
> +			ce->ce_flags &= ~(CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE);

CE_ADDED is private to the add_file_to_index() code path shared
between add and rerere builtins.  rerere is libified and used e.g. by
cherry-pick foo..bar.  Can it get us in trouble or do we have clear
the flags before using them here?  I think it would be worth a note in
api-in-core-index.txt to warn future refactorers.

> +
> +		/* Left-over checks from merged_entry when old == NULL */

Huh?  (That is completely cryptic outside the context of the patch.)

> +		if (verify_absent(ce, ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o))
> +			return -1;
> +	}
[...]
> @@ -862,6 +905,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>  			o->el = &el;
>  	}
>  
> +	if (!o->skip_sparse_checkout)
> +		set_new_skip_worktree_1(o);
> +

Why is this not folded into the above if ()?

This populates the NEW_SKIP_WORKTREE (= future SKIP_WORKTREE?) bit
in the index that represents the preimage for a reset or merge.

Perhaps call it:

		set_new_skip_worktree(o->src_index, 0);

and mark that function __attribute__((always_inline)) if the
optimizer doesn't want to cooperate?

Or:

		set_src_skip_worktree(o);

>  	memset(&o->result, 0, sizeof(o->result));
>  	o->result.initialized = 1;
>  	o->result.timestamp.sec = o->src_index->timestamp.sec;
> @@ -922,6 +968,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>  
>  	if (!o->skip_sparse_checkout) {
>  		int empty_worktree = 1;
> +
> +		if (set_new_skip_worktree_2(o))
> +			goto return_failed;
> +

Trivial part of the merge is over.  So now we can check the new
index entries against the sparse worktree patterns.  They were added in
that trivial part using add_entry() and will have the CE_ADDED flag.

So this might be:

		set_new_skip_worktree(&o->result, CE_ADDED);

or

		set_result_skip_worktree(o);

The CE_ADDED flag was set in merged_entry, called by the merge_fn_t
implementations.  If there were an api-traverse-trees.txt explaining
the API, it would be worth a note there; for now it should suffice
to add a note to future merge_fn_t implementors in the commit
message ("each unpack-trees merge function arranges for
CE_SKIP_WORKTREE and CE_SKIP_NEW_WORKTREE to be propagated and for the
CE_ADDED flag to be set on entries of new paths").

>  		for (i = 0;i < o->result.cache_nr;i++) {
>  			struct cache_entry *ce = o->result.cache[i];
>  
> @@ -1017,7 +1067,7 @@ static int verify_uptodate_1(struct cache_entry *ce,
>  static int verify_uptodate(struct cache_entry *ce,
>  			   struct unpack_trees_options *o)
>  {
> -	if (!o->skip_sparse_checkout && will_have_skip_worktree(ce, o))
> +	if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
[...]
> @@ -1204,7 +1254,7 @@ static int verify_absent(struct cache_entry *ce,
>  			 enum unpack_trees_error_types error_type,
>  			 struct unpack_trees_options *o)
>  {
> -	if (!o->skip_sparse_checkout && will_have_skip_worktree(ce, o))
> +	if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))

The point, I hope.  Currently we alternate between finding entries to
remove and deciding whether if lie in the worktree.  After this patch,
whether they lie in the worktree is precomputed.

> @@ -1226,10 +1276,15 @@ static int merged_entry(struct cache_entry *merge, struct cache_entry *old,
>  	int update = CE_UPDATE;
>  
>  	if (!old) {
> +		/*
> +		 * Set CE_NEW_SKIP_WORKTREE on "merge" to make
> +		 * verify_absent() no-op. The true check will be done
> +		 * later on in unpack_trees().
> +		 */
> +		merge->ce_flags |= CE_NEW_SKIP_WORKTREE;

Mm?  Since verify_absent() is a no-op, why call it?  This looks like a
placeholder for code that calls verify_absent later, when we know if
it lies in the worktree.

>  		if (verify_absent(merge, ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o))
>  			return -1;
> -		if (!o->skip_sparse_checkout && will_have_skip_worktree(merge, o))
> -			update |= CE_SKIP_WORKTREE;
> +		update |= CE_ADDED;

More of the main act.  Currently we alternate between finding entries
to add and deciding if they are in the worktree.  After the patch,
they are piled up for addition first, then checked against the
worktree in one batch.

>  		invalidate_ce_path(merge, o);
>  	} else if (!(old->ce_flags & CE_CONFLICTED)) {
>  		/*
> @@ -1245,8 +1300,8 @@ static int merged_entry(struct cache_entry *merge, struct cache_entry *old,
>  		} else {
>  			if (verify_uptodate(old, o))
>  				return -1;
> -			if (ce_skip_worktree(old))
> -				update |= CE_SKIP_WORKTREE;
> +			/* Migrate old bits over */
> +			update |= old->ce_flags & (CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE);

Thanks, this looks promising.
Jonathan

  parent reply	other threads:[~2010-11-15 16:02 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-15 10:36 [PATCH 00/10] Sparse checkout fixes and improvements Nguyễn Thái Ngọc Duy
2010-11-15 10:36 ` [PATCH 01/10] add: do not rely on dtype being NULL behavior Nguyễn Thái Ngọc Duy
2010-11-15 12:14   ` Jonathan Nieder
2010-11-16  2:18     ` Nguyen Thai Ngoc Duy
2010-11-16  2:42       ` Jonathan Nieder
2010-11-16 18:58       ` Junio C Hamano
2010-11-17  6:38         ` Nguyen Thai Ngoc Duy
2010-11-15 10:36 ` [PATCH 02/10] unpack-trees: move all skip-worktree check back to unpack_trees() Nguyễn Thái Ngọc Duy
2010-11-15 12:34   ` Thiago Farina
2010-11-16  2:19     ` Nguyen Thai Ngoc Duy
2010-11-15 16:01   ` Jonathan Nieder [this message]
2010-11-16  2:39     ` Nguyen Thai Ngoc Duy
2010-11-15 10:36 ` [PATCH 03/10] unpack-trees: add function to update ce_flags based on sparse patterns Nguyễn Thái Ngọc Duy
2010-11-15 18:30   ` Jonathan Nieder
2010-11-15 20:19   ` Jonathan Nieder
2010-11-15 10:36 ` [PATCH 04/10] unpack-trees: fix sparse checkout's "unable to match directories" fault Nguyễn Thái Ngọc Duy
2010-11-15 19:10   ` Jonathan Nieder
2010-11-16  2:43     ` Nguyen Thai Ngoc Duy
2010-11-15 10:36 ` [PATCH 05/10] unpack-trees: optimize full checkout case Nguyễn Thái Ngọc Duy
2010-11-15 20:41   ` Jonathan Nieder
2010-11-15 10:36 ` [PATCH 06/10] templates: add info/sparse-checkout Nguyễn Thái Ngọc Duy
2010-11-15 10:36 ` [PATCH 07/10] checkout: add -S to update sparse checkout Nguyễn Thái Ngọc Duy
2010-11-15 21:16   ` Jonathan Nieder
2010-11-15 21:52     ` Miles Bader
2010-11-17 15:02       ` Nguyen Thai Ngoc Duy
2010-11-16  3:08     ` Nguyen Thai Ngoc Duy
2010-11-15 10:36 ` [PATCH 08/10] checkout: add --full to fully populate working directory Nguyễn Thái Ngọc Duy
2010-11-15 21:23   ` Jonathan Nieder
2010-11-16  2:50     ` Nguyen Thai Ngoc Duy
2010-11-15 10:36 ` [PATCH 09/10] git-checkout.txt: mention of sparse checkout Nguyễn Thái Ngọc Duy
2010-11-15 10:36 ` [PATCH 10/10] clean: support cleaning sparse checkout with -S Nguyễn Thái Ngọc Duy
2010-11-15 21:30   ` Jonathan Nieder
2010-11-16  2:53     ` Nguyen Thai Ngoc Duy
2010-11-16  3:07       ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101115160135.GA16385@burratino \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.