All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 02/10] unpack-trees: move all skip-worktree check back to unpack_trees()
Date: Tue, 16 Nov 2010 09:39:47 +0700	[thread overview]
Message-ID: <AANLkTinpPyjd_Pr=XR9SkmQj1mPodQvG0OCeaxU8NjEp@mail.gmail.com> (raw)
In-Reply-To: <20101115160135.GA16385@burratino>

2010/11/15 Jonathan Nieder <jrnieder@gmail.com>:
> Nguyễn Thái Ngọc Duy wrote:
>
>> Earlier, the will_have_skip_worktree() checks are done in various places:
>>
>> 1. in verify_* functions to prevent absent/uptodate checks outside
>>    worktree
>> 2. in merged_entry for new index entries
>> 3. in apply_sparse_checkout() for all entries
>>
>> In case all entries are added new ($GIT_DIR/index is missing) all the
>> above checks will be done for all entries, which in the worst case can
>> become cache_nr*el->nr*3 fnmatch() calls. Quite expensive.
>
> Does this mean something like:
>
>        Handling of sparse checkouts is slow.
>
>        [timings]
>
>        To fix this, we will do such-and-such.  As a first step,
>        we'll do such-and-such which does not change semantics
>        and at least does not make it any slower.
>
> ?
>
> In other words, any commit message should make clear
>
>  1. The purpose.
>  2. The baseline of (sane or insane) behavior that is affected.
>  3. The intended resulting functionality.
>
> How the patch works and why it succeeds are just optional extras
> (nice, certainly, but in 90% of cases it is obvious from the code once
> you know (1), (2), and (3) anyway).

Addressing the slowness is a nice side effect. My main concern is to
collect all skip-worktree checks in a central place so that I can do
tree-like traversing.

>> --- a/cache.h
>> +++ b/cache.h
>> @@ -182,6 +182,7 @@ struct cache_entry {
>>  #define CE_WT_REMOVE (0x400000) /* remove in work directory */
>>
>>  #define CE_UNPACKED  (0x1000000)
>> +#define CE_NEW_SKIP_WORKTREE (0x2000000)
>
> Semantics?

Yep, missed this in the commit message. While unpack_trees() is
running, this bit will be set in entries that are outside worktree
area, according to the (possibly updated) sparse-checkout file.
Compare this bit with CE_SKIP_WORKTREE, we would know this entry
should be removed/added due to sparse-checkout updates.

>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -258,7 +258,7 @@ static int apply_sparse_checkout(struct cache_entry *ce, struct unpack_trees_opt
>>  {
>>       int was_skip_worktree = ce_skip_worktree(ce);
>>
>> -     if (!ce_stage(ce) && will_have_skip_worktree(ce, o))
>> +     if (ce->ce_flags & CE_NEW_SKIP_WORKTREE)
>>               ce->ce_flags |= CE_SKIP_WORKTREE;
>
> So CE_NEW_SKIP_WORKTREE roughly means "stage-0 entry outside the sparse checkout area"?

Yes. Conflicted entries must always stay in worktree regardless
sparse-checkout patterns.

>
>>       else
>>               ce->ce_flags &= ~CE_SKIP_WORKTREE;
>
> In particular, I guess the NEW part refers to the sparse checkout
> area, not the entry, since existing index entries with SKIP_WORKTREE
> bits need to keep that bit.

The NEW part still refers to entry. It's basically
ce_flags[CE_SKIP_WORKTREE] = ce_flags[CE_NEW_SKIP_WORKTREE]. The
former bit is kept in file, the later is in memory only.

>> +
>> +static int verify_absent(struct cache_entry *, enum unpack_trees_error_types, struct unpack_trees_options *);
>> +static int set_new_skip_worktree_2(struct unpack_trees_options *o)
>> +{
>> +     int i;
>> +
>> +     /*
>> +      * CE_ADDED marks new index entries. These have not been processed
>> +      * by set_new_skip_worktree_1() so we do it here.
>> +      */
>
> Probably this comment belongs at the call site instead, to avoid some
> chasing.

There is a comment in the callsite, though it does not directly say
"set_new_skip_worktree_2". Will fix.

>> +     for (i = 0;i < o->result.cache_nr;i++) {
>> +             struct cache_entry *ce = o->result.cache[i];
>> +
>> +             if (!(ce->ce_flags & CE_ADDED))
>> +                     continue;
>> +
>> +             ce->ce_flags &= ~CE_ADDED;
>> +             if (!ce_stage(ce) && will_have_skip_worktree(ce, o))
>> +                     ce->ce_flags |= CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE;
>> +             else
>> +                     ce->ce_flags &= ~(CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE);
>
> CE_ADDED is private to the add_file_to_index() code path shared
> between add and rerere builtins.  rerere is libified and used e.g. by
> cherry-pick foo..bar.  Can it get us in trouble or do we have clear
> the flags before using them here?  I think it would be worth a note in
> api-in-core-index.txt to warn future refactorers.

If it's too much trouble, I can use a new bit. Though the number of
available bits is decreasing.

>> +
>> +             /* Left-over checks from merged_entry when old == NULL */
>
> Huh?  (That is completely cryptic outside the context of the patch.)

Will elaborate.

>> @@ -862,6 +905,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>>                       o->el = &el;
>>       }
>>
>> +     if (!o->skip_sparse_checkout)
>> +             set_new_skip_worktree_1(o);
>> +
>
> Why is this not folded into the above if ()?
>
> This populates the NEW_SKIP_WORKTREE (= future SKIP_WORKTREE?) bit
> in the index that represents the preimage for a reset or merge.
>
> Perhaps call it:
>
>                set_new_skip_worktree(o->src_index, 0);
>
> and mark that function __attribute__((always_inline)) if the
> optimizer doesn't want to cooperate?

It's setting future skip-worktree on existing entries index, contrast
with the _2() function, which is setting future skip-worktree on _new_
entries in the index. How do you name those functions? Wait..

>>       memset(&o->result, 0, sizeof(o->result));
>>       o->result.initialized = 1;
>>       o->result.timestamp.sec = o->src_index->timestamp.sec;
>> @@ -922,6 +968,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>>
>>       if (!o->skip_sparse_checkout) {
>>               int empty_worktree = 1;
>> +
>> +             if (set_new_skip_worktree_2(o))
>> +                     goto return_failed;
>> +
>
> Trivial part of the merge is over.  So now we can check the new
> index entries against the sparse worktree patterns.  They were added in
> that trivial part using add_entry() and will have the CE_ADDED flag.
>
> So this might be:
>
>                set_new_skip_worktree(&o->result, CE_ADDED);
>
> or
>
>                set_result_skip_worktree(o);
>

OK I like the former over the latter.

> The CE_ADDED flag was set in merged_entry, called by the merge_fn_t
> implementations.  If there were an api-traverse-trees.txt explaining
> the API, it would be worth a note there; for now it should suffice
> to add a note to future merge_fn_t implementors in the commit
> message ("each unpack-trees merge function arranges for
> CE_SKIP_WORKTREE and CE_SKIP_NEW_WORKTREE to be propagated and for the
> CE_ADDED flag to be set on entries of new paths").

OK

>> @@ -1017,7 +1067,7 @@ static int verify_uptodate_1(struct cache_entry *ce,
>>  static int verify_uptodate(struct cache_entry *ce,
>>                          struct unpack_trees_options *o)
>>  {
>> -     if (!o->skip_sparse_checkout && will_have_skip_worktree(ce, o))
>> +     if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
> [...]
>> @@ -1204,7 +1254,7 @@ static int verify_absent(struct cache_entry *ce,
>>                        enum unpack_trees_error_types error_type,
>>                        struct unpack_trees_options *o)
>>  {
>> -     if (!o->skip_sparse_checkout && will_have_skip_worktree(ce, o))
>> +     if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
>
> The point, I hope.  Currently we alternate between finding entries to
> remove and deciding whether if lie in the worktree.  After this patch,
> whether they lie in the worktree is precomputed.

Yep.

>> @@ -1226,10 +1276,15 @@ static int merged_entry(struct cache_entry *merge, struct cache_entry *old,
>>       int update = CE_UPDATE;
>>
>>       if (!old) {
>> +             /*
>> +              * Set CE_NEW_SKIP_WORKTREE on "merge" to make
>> +              * verify_absent() no-op. The true check will be done
>> +              * later on in unpack_trees().
>> +              */
>> +             merge->ce_flags |= CE_NEW_SKIP_WORKTREE;
>
> Mm?  Since verify_absent() is a no-op, why call it?  This looks like a
> placeholder for code that calls verify_absent later, when we know if
> it lies in the worktree.

It is no-op only when sparse checkout is enabled. In such cases we
don't know (yet) whether we will add those files in worktree. In
normal cases, everything must be added in worktree, so verify_absent()
is real.

>> @@ -1245,8 +1300,8 @@ static int merged_entry(struct cache_entry *merge, struct cache_entry *old,
>>               } else {
>>                       if (verify_uptodate(old, o))
>>                               return -1;
>> -                     if (ce_skip_worktree(old))
>> -                             update |= CE_SKIP_WORKTREE;
>> +                     /* Migrate old bits over */
>> +                     update |= old->ce_flags & (CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE);
>
> Thanks, this looks promising.

Thanks for looking at it. These stuff can be error-prone.
-- 
Duy

  reply	other threads:[~2010-11-16  2:41 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-15 10:36 [PATCH 00/10] Sparse checkout fixes and improvements Nguyễn Thái Ngọc Duy
2010-11-15 10:36 ` [PATCH 01/10] add: do not rely on dtype being NULL behavior Nguyễn Thái Ngọc Duy
2010-11-15 12:14   ` Jonathan Nieder
2010-11-16  2:18     ` Nguyen Thai Ngoc Duy
2010-11-16  2:42       ` Jonathan Nieder
2010-11-16 18:58       ` Junio C Hamano
2010-11-17  6:38         ` Nguyen Thai Ngoc Duy
2010-11-15 10:36 ` [PATCH 02/10] unpack-trees: move all skip-worktree check back to unpack_trees() Nguyễn Thái Ngọc Duy
2010-11-15 12:34   ` Thiago Farina
2010-11-16  2:19     ` Nguyen Thai Ngoc Duy
2010-11-15 16:01   ` Jonathan Nieder
2010-11-16  2:39     ` Nguyen Thai Ngoc Duy [this message]
2010-11-15 10:36 ` [PATCH 03/10] unpack-trees: add function to update ce_flags based on sparse patterns Nguyễn Thái Ngọc Duy
2010-11-15 18:30   ` Jonathan Nieder
2010-11-15 20:19   ` Jonathan Nieder
2010-11-15 10:36 ` [PATCH 04/10] unpack-trees: fix sparse checkout's "unable to match directories" fault Nguyễn Thái Ngọc Duy
2010-11-15 19:10   ` Jonathan Nieder
2010-11-16  2:43     ` Nguyen Thai Ngoc Duy
2010-11-15 10:36 ` [PATCH 05/10] unpack-trees: optimize full checkout case Nguyễn Thái Ngọc Duy
2010-11-15 20:41   ` Jonathan Nieder
2010-11-15 10:36 ` [PATCH 06/10] templates: add info/sparse-checkout Nguyễn Thái Ngọc Duy
2010-11-15 10:36 ` [PATCH 07/10] checkout: add -S to update sparse checkout Nguyễn Thái Ngọc Duy
2010-11-15 21:16   ` Jonathan Nieder
2010-11-15 21:52     ` Miles Bader
2010-11-17 15:02       ` Nguyen Thai Ngoc Duy
2010-11-16  3:08     ` Nguyen Thai Ngoc Duy
2010-11-15 10:36 ` [PATCH 08/10] checkout: add --full to fully populate working directory Nguyễn Thái Ngọc Duy
2010-11-15 21:23   ` Jonathan Nieder
2010-11-16  2:50     ` Nguyen Thai Ngoc Duy
2010-11-15 10:36 ` [PATCH 09/10] git-checkout.txt: mention of sparse checkout Nguyễn Thái Ngọc Duy
2010-11-15 10:36 ` [PATCH 10/10] clean: support cleaning sparse checkout with -S Nguyễn Thái Ngọc Duy
2010-11-15 21:30   ` Jonathan Nieder
2010-11-16  2:53     ` Nguyen Thai Ngoc Duy
2010-11-16  3:07       ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTinpPyjd_Pr=XR9SkmQj1mPodQvG0OCeaxU8NjEp@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.