git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/9] Narrow/Sparse checkout round 3: "easy mode"
@ 2008-08-15 14:24 Nguyễn Thái Ngọc Duy
  2008-08-16 10:31 ` Junio C Hamano
  2008-08-19 21:10 ` James Pickens
  0 siblings, 2 replies; 10+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2008-08-15 14:24 UTC (permalink / raw)
  To: git

The implementation with insights from Junio turns out smaller and better
(and I was thinking about applying it for huge maildir).

So there is a bit of changes since the last round. This time I follow
"assume unchanged" code path and relax the rules a bit. There are issues
I will mention later.

From user POV, we can now checkout a single file or a subdirectory (checking
out subdirectory non-recursively is possible too). You may start with a narrow
clone like:

git clone --path="Documentation/*" git.git

or start from a full checkout and narrow it (very much like the last round):

git checkout --path="Documentation/*"

However narrow spec is now using wildcards, not prefix, so you can checkout
a single file, or just header files, etc.

git checkout --add-path|--remove-path has been aded to update narrow areas.
But you don't have to use those. Narrow areas will be widened as needed when
you do something outside of it (e.g "git checkout foo" or "git add foo"...)

Another difference from the last round is "narrow rules" will not be preserved
when switching branches. When you switch branch with no option, you will get
full checkout. You may want to use --path|--add-path|--remove-path when
switching branches to have narrow checkout again.

Now back to technical POV. I did not reuse CE_VALID (assume unchanged) bit
because it has been used for core.ignorestat. So I took another bit, which
seems to be the last on-disk bit in ce_flags.

I call this mode "easy mode" because most of constraints have been eliminated.
Now your "narrow rules" are like "I don't like those files, remove them if
they are not needed". If some operations
need those files on workdir again, they will be checked out. Those may include:

 - "git checkout foo" or "git apply"
 - git add foo (even if foo is marked no-checkout)
 - conflict files after merge
 - new files after merge

"Strict mode" may be added later but then it must clearly define which
operation is allowed to checkout files. There's a problem with strict mode if
it wants to limit checking out new files after merge. Because we don't save
"narrow rules" anymore (we applied the rules immediately in checkout/clone
stage, then update narrow areas over time), we will not know how to deal with
new files. Adding [--path|--add-path|--remove-path] to git merge commands, and
apply "narrow rules" again, looks too cumbersome to me. Comments?

Last bit. "Narrow rules" for --path|--add-path|--remove-path is currently
wildcards separated by colons. While it does the job, it does not allow to
checkout easily a subdirectory non-recusively. I was thinking about '*' as
"match everything except slashes" and '**' as "match everything even slashes".
Any ideas?

Oh.. and "git grep" may not work correctly (or "as expected") with narrow
checkout. Haven't checked it yet.

Nguyễn Thái Ngọc Duy (9):
  Introduce CE_NO_CHECKOUT bit
  update-index: add --checkout/--no-checkout options to update
    CE_NO_CHECKOUT bit
  ls-files: add --checkout option to show checked out files
  Prevent diff machinery from examining worktree outside narrow
    checkout
  Clear CE_NO_CHECKOUT on checked out entries.
  Add support for narrow checkout in unpack_trees()
  ls-files: add --narrow-match=spec option to test narrow matching
  clone: support narrow checkout with --path option
  checkout: add new options to support narrow checkout

 builtin-checkout.c     |   41 ++++++++++++++++++
 builtin-clone.c        |   13 ++++++
 builtin-ls-files.c     |   23 +++++++++--
 builtin-update-index.c |   40 +++++++++++-------
 cache.h                |   11 +++++
 diff-lib.c             |    5 +-
 diff.c                 |    4 +-
 entry.c                |    1 +
 read-cache.c           |    6 +-
 unpack-trees.c         |  106 ++++++++++++++++++++++++++++++++++++++++++++++++
 unpack-trees.h         |    4 ++
 11 files changed, 229 insertions(+), 25 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/9] Narrow/Sparse checkout round 3: "easy mode"
  2008-08-15 14:24 [RFC PATCH 0/9] Narrow/Sparse checkout round 3: "easy mode" Nguyễn Thái Ngọc Duy
@ 2008-08-16 10:31 ` Junio C Hamano
  2008-08-17  5:12   ` Nguyen Thai Ngoc Duy
  2008-08-19 21:10 ` James Pickens
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2008-08-16 10:31 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> The implementation with insights from Junio turns out smaller and better
> (and I was thinking about applying it for huge maildir).

I think I can agree with the general direction this is taking, except that
we would need to think about the transition plans.  Note that I haven't
really read through the full series yet.

> Another difference from the last round is "narrow rules" will not be preserved
> when switching branches. When you switch branch with no option, you will get
> full checkout. You may want to use --path|--add-path|--remove-path when
> switching branches to have narrow checkout again.

You could save the "narrow rules" in the extension section of the index.
If the final form of this series needs to use a separate CE_NO_CHECKOUT
bit (which would make the resulting index incompatible with the current
git), the narrow rules section can be marked as "your git must understand
this" class of extension to make sure that people do not mistakenly access
an index written by this new version of git with the current or older git.

> Now back to technical POV. I did not reuse CE_VALID (assume unchanged) bit
> because it has been used for core.ignorestat.

I am not sure what's the relation between these two.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/9] Narrow/Sparse checkout round 3: "easy mode"
  2008-08-16 10:31 ` Junio C Hamano
@ 2008-08-17  5:12   ` Nguyen Thai Ngoc Duy
  2008-08-17  5:50     ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2008-08-17  5:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 8/16/08, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>  > Another difference from the last round is "narrow rules" will not be preserved
>  > when switching branches. When you switch branch with no option, you will get
>  > full checkout. You may want to use --path|--add-path|--remove-path when
>  > switching branches to have narrow checkout again.
>
>
> You could save the "narrow rules" in the extension section of the index.
>  If the final form of this series needs to use a separate CE_NO_CHECKOUT
>  bit (which would make the resulting index incompatible with the current
>  git), the narrow rules section can be marked as "your git must understand
>  this" class of extension to make sure that people do not mistakenly access
>  an index written by this new version of git with the current or older git.

The problem is "narrow rules" may change over time in a way that git
may handle it wrong. Assume that you have a directory with two files:
a and b. You first narrow checkout a (which would save the rule
"checkout a"). Then you do "git checkout b". When you update HEAD,
what should happen?
 - consider only a and b in narrow area (new files not counted)
 - consider the whole directory in narrow area (new files counted)

This does not matter until we implement strict mode that only checkout
new files inside narrow area (the usage is similar to submodule).

>  > Now back to technical POV. I did not reuse CE_VALID (assume unchanged) bit
>  > because it has been used for core.ignorestat.
>
>
> I am not sure what's the relation between these two.

Because the usage is different? When you "git update-index foo" with
core.ignorestat=1, it will mark it CE_VALID. And if the same bit is
used for narrow checkout, the file is considered not existing in
workdir. I'd expect foo is still in my narrow area after "git
update-index foo".
-- 
Duy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/9] Narrow/Sparse checkout round 3: "easy mode"
  2008-08-17  5:12   ` Nguyen Thai Ngoc Duy
@ 2008-08-17  5:50     ` Junio C Hamano
  2008-08-17  6:10       ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2008-08-17  5:50 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

"Nguyen Thai Ngoc Duy" <pclouds@gmail.com> writes:

> On 8/16/08, Junio C Hamano <gitster@pobox.com> wrote:
>> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>>  > Another difference from the last round is "narrow rules" will not be preserved
>>  > when switching branches. When you switch branch with no option, you will get
>>  > full checkout. You may want to use --path|--add-path|--remove-path when
>>  > switching branches to have narrow checkout again.
>>
>>
>> You could save the "narrow rules" in the extension section of the index.
>>  If the final form of this series needs to use a separate CE_NO_CHECKOUT
>>  bit (which would make the resulting index incompatible with the current
>>  git), the narrow rules section can be marked as "your git must understand
>>  this" class of extension to make sure that people do not mistakenly access
>>  an index written by this new version of git with the current or older git.
>
> The problem is "narrow rules" may change over time in a way that git
> may handle it wrong. Assume that you have a directory with two files:
> a and b. You first narrow checkout a (which would save the rule
> "checkout a"). Then you do "git checkout b". When you update HEAD,
> what should happen?

I'd expect that this sequence:

	git checkout --set-no-checkout arch include
        git checkout arch/x86 include/asm-x86

to set up narrowing rules to (1) exclude everything in arch/ and include/
area by default, but (2) allow checking out everything in arch/x86/ and
include/asm-x86/ that currently exist _and_ will exist in different commits
when we switch to.

On the other hand, if I did this after the above two-command sequence:

	git checkout include/Kbuild

then I'd expect only that file to be added to the checkout set.  I think
you can record list of pathspecs (with positive and negative) to implement
that semantics, no?

>>  > Now back to technical POV. I did not reuse CE_VALID (assume unchanged) bit
>>  > because it has been used for core.ignorestat.
>>
>> I am not sure what's the relation between these two.
>
> Because the usage is different? When you "git update-index foo" with
> core.ignorestat=1, it will mark it CE_VALID. And if the same bit is
> used for narrow checkout, the file is considered not existing in
> workdir. I'd expect foo is still in my narrow area after "git
> update-index foo".

Ok.  We would need to use an extra bit for this.

The bit 0x4000 is the last one available, so we would want to use it as
"this index entry uses more bits than the traditional format" bit, and
define a backward incompatible on-disk index entry format to actually
record CE_NO_CHECKOUT and other flags we will invent in the future.

Perhaps ondisk_cache_entry structure will have an extra "unsigned int
flags2" after "flags" when that bit is on, and we can have 31 more bits in
flags2, with the highest bit of flags2 signalling the presense of flags3
word in the future, or something like that.

By the way, "uint" and "ushort" in struct ondisk_cache_entry are 4-byte
and 2-byte network byte order integers; should we write them as uint32_t
and uint16_t instead in the longer run?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/9] Narrow/Sparse checkout round 3: "easy mode"
  2008-08-17  5:50     ` Junio C Hamano
@ 2008-08-17  6:10       ` Junio C Hamano
  2008-08-17  9:49         ` [RFC PATCH 0/9] Narrow/Sparse checkout round 3: Eric Raible
  2008-08-17 13:36         ` [RFC PATCH 0/9] Narrow/Sparse checkout round 3: "easy mode" Nguyen Thai Ngoc Duy
  0 siblings, 2 replies; 10+ messages in thread
From: Junio C Hamano @ 2008-08-17  6:10 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> "Nguyen Thai Ngoc Duy" <pclouds@gmail.com> writes:
> ...
>> The problem is "narrow rules" may change over time in a way that git
>> may handle it wrong. Assume that you have a directory with two files:
>> a and b. You first narrow checkout a (which would save the rule
>> "checkout a"). Then you do "git checkout b". When you update HEAD,
>> what should happen?
>
> I'd expect that this sequence:
> ...
> you can record list of pathspecs (with positive and negative) to implement
> that semantics, no?

By the way, I was just mentioning the index extension area as a means to
store the rules if _you wanted to_.  I do not insist you to actually store
the rules, and in fact, I do not know if it is even a good idea to do so.

> Ok.  We would need to use an extra bit for this.
>
> The bit 0x4000 is the last one available, so we would want to use it as
> "this index entry uses more bits than the traditional format" bit, and
> define a backward incompatible on-disk index entry format to actually
> record CE_NO_CHECKOUT and other flags we will invent in the future.
>
> Perhaps ondisk_cache_entry structure will have an extra "unsigned int
> flags2" after "flags" when that bit is on, and we can have 31 more bits in
> flags2, with the highest bit of flags2 signalling the presense of flags3
> word in the future, or something like that.

It might make sense to do this first as a futureproof, if we really want
to go this route.  We can ensure that an index that does use the new flag
bits won't be misinterpreted by older git.

-- >8 --
From: Junio C Hamano <gitster@pobox.com>
Date: Sat, 16 Aug 2008 23:02:08 -0700
Subject: [PATCH] index: future proof for "extended" index entries

We do not have any more bits in the on-disk index flags word, but we would
need to have more in the future.  Use the last remaining bits as a signal
to tell us that the index entry we are looking at is an extended one.

Since we do not understand the extended format yet, we will just error out
when we see it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 cache.h      |    1 +
 read-cache.c |    4 ++++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/cache.h b/cache.h
index 2475de9..7b5cc83 100644
--- a/cache.h
+++ b/cache.h
@@ -126,6 +126,7 @@ struct cache_entry {
 
 #define CE_NAMEMASK  (0x0fff)
 #define CE_STAGEMASK (0x3000)
+#define CE_EXTENDED  (0x4000)
 #define CE_VALID     (0x8000)
 #define CE_STAGESHIFT 12
 
diff --git a/read-cache.c b/read-cache.c
index 2c03ec3..f0ba224 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1118,6 +1118,10 @@ static void convert_from_disk(struct ondisk_cache_entry *ondisk, struct cache_en
 	ce->ce_size  = ntohl(ondisk->size);
 	/* On-disk flags are just 16 bits */
 	ce->ce_flags = ntohs(ondisk->flags);
+
+	/* For future extension: we do not understand this entry yet */
+	if (ce->ce_flags & CE_EXTENDED)
+		die("Unknown index entry format");
 	hashcpy(ce->sha1, ondisk->sha1);
 
 	len = ce->ce_flags & CE_NAMEMASK;
-- 
1.6.0.rc3.18.g20157

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/9] Narrow/Sparse checkout round 3:
  2008-08-17  6:10       ` Junio C Hamano
@ 2008-08-17  9:49         ` Eric Raible
  2008-08-19  9:06           ` Junio C Hamano
  2008-08-17 13:36         ` [RFC PATCH 0/9] Narrow/Sparse checkout round 3: "easy mode" Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Raible @ 2008-08-17  9:49 UTC (permalink / raw)
  To: git

s/but we would need to have/but we may need/
in the commit message?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/9] Narrow/Sparse checkout round 3: "easy mode"
  2008-08-17  6:10       ` Junio C Hamano
  2008-08-17  9:49         ` [RFC PATCH 0/9] Narrow/Sparse checkout round 3: Eric Raible
@ 2008-08-17 13:36         ` Nguyen Thai Ngoc Duy
  1 sibling, 0 replies; 10+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2008-08-17 13:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 8/17/08, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>  > "Nguyen Thai Ngoc Duy" <pclouds@gmail.com> writes:
>
> > ...
>
> >> The problem is "narrow rules" may change over time in a way that git
>  >> may handle it wrong. Assume that you have a directory with two files:
>  >> a and b. You first narrow checkout a (which would save the rule
>  >> "checkout a"). Then you do "git checkout b". When you update HEAD,
>  >> what should happen?
>  >
>  > I'd expect that this sequence:
>
> > ...
>
> > you can record list of pathspecs (with positive and negative) to implement
>  > that semantics, no?
>
> By the way, I was just mentioning the index extension area as a means to
>  store the rules if _you wanted to_.  I do not insist you to actually store
>  the rules, and in fact, I do not know if it is even a good idea to do so.

I was more worried about those rules getting out of control because
git-checkout is not the only command that can change narrow rules.
After enough commands, the rules can become a mess that you don't even
want to look at them. I don't do negative rules now, but yes that's
possible.

>  > Ok.  We would need to use an extra bit for this.
>  >
>  > The bit 0x4000 is the last one available, so we would want to use it as
>  > "this index entry uses more bits than the traditional format" bit, and
>  > define a backward incompatible on-disk index entry format to actually
>  > record CE_NO_CHECKOUT and other flags we will invent in the future.
>  >
>  > Perhaps ondisk_cache_entry structure will have an extra "unsigned int
>  > flags2" after "flags" when that bit is on, and we can have 31 more bits in
>  > flags2, with the highest bit of flags2 signalling the presense of flags3
>  > word in the future, or something like that.
>
>
> It might make sense to do this first as a futureproof, if we really want
>  to go this route.  We can ensure that an index that does use the new flag
>  bits won't be misinterpreted by older git.

The patch is fine. Still we need to do something to prevent older git
from using new index format.
-- 
Duy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/9] Narrow/Sparse checkout round 3:
  2008-08-17  9:49         ` [RFC PATCH 0/9] Narrow/Sparse checkout round 3: Eric Raible
@ 2008-08-19  9:06           ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2008-08-19  9:06 UTC (permalink / raw)
  To: Eric Raible; +Cc: git

Eric Raible <raible@gmail.com> writes:

> s/but we would need to have/but we may need/
> in the commit message?

Yeah, strictly speaking, perhaps.

One thing that I refuse to believe is we will need only one more bit and
after assigning the 0x4000 bit to whatever that single purpose the index
will stay that way forever.  So we would need to reserve that bit as the
extension bit in any case.  If we do not have any extension forever, that
means any index entry with the bit set is corrupt, so erroring out would
be the right thing to do anyway ;-).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/9] Narrow/Sparse checkout round 3: "easy mode"
  2008-08-15 14:24 [RFC PATCH 0/9] Narrow/Sparse checkout round 3: "easy mode" Nguyễn Thái Ngọc Duy
  2008-08-16 10:31 ` Junio C Hamano
@ 2008-08-19 21:10 ` James Pickens
  2008-08-30  9:21   ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 10+ messages in thread
From: James Pickens @ 2008-08-19 21:10 UTC (permalink / raw)
  To: git

Nguyễn Thái Ngọc Duy <pclouds <at> gmail.com> writes:

> From user POV, we can now checkout a single file or a
> subdirectory (checking out subdirectory non-recursively is
> possible too). You may start with a narrow clone like:

Is there any reason for the change in terminology from "sparse"
to "narrow"?  I understand the difference between "partial"
and "sparse", but I can't tell if there's any difference
between "narrow" and "sparse".  If they are the same thing, then
I think "sparse" is the better term.

> Last bit. "Narrow rules" for --path|--add-path|--remove-path is
> currently wildcards separated by colons. While it does the job,
> it does not allow to checkout easily a subdirectory
> non-recusively. I was thinking about '*' as "match everything
> except slashes" and '**' as "match everything even slashes".

I like this idea - it would make this much more intuitive to use,
since '*' and '**' would work the same as they do in the
shell (for shells that support '**' at least).  I tried the patch
in it's current form, and it took me a while to figure out that
paths were non-recursive and '*' was matching everything,
including slashes.

James

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/9] Narrow/Sparse checkout round 3: "easy mode"
  2008-08-19 21:10 ` James Pickens
@ 2008-08-30  9:21   ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 10+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2008-08-30  9:21 UTC (permalink / raw)
  To: James Pickens; +Cc: git

On 8/20/08, James Pickens <jepicken@gmail.com> wrote:
> Nguyễn Thái Ngọc Duy <pclouds <at> gmail.com> writes:
>
>  > From user POV, we can now checkout a single file or a
>  > subdirectory (checking out subdirectory non-recursively is
>  > possible too). You may start with a narrow clone like:
>
>
> Is there any reason for the change in terminology from "sparse"
>  to "narrow"?  I understand the difference between "partial"
>  and "sparse", but I can't tell if there's any difference
>  between "narrow" and "sparse".  If they are the same thing, then
>  I think "sparse" is the better term.

I have no particular preference. It's up to the community to choose the name.

>  > Last bit. "Narrow rules" for --path|--add-path|--remove-path is
>  > currently wildcards separated by colons. While it does the job,
>  > it does not allow to checkout easily a subdirectory
>  > non-recusively. I was thinking about '*' as "match everything
>  > except slashes" and '**' as "match everything even slashes".
>
>
> I like this idea - it would make this much more intuitive to use,
>  since '*' and '**' would work the same as they do in the
>  shell (for shells that support '**' at least).  I tried the patch
>  in it's current form, and it took me a while to figure out that
>  paths were non-recursive and '*' was matching everything,
>  including slashes.

Tried the last few days but it was not easy, needed to duplicate
fnmatch code. I may come up with a less powerful syntax for
recusive/non-recursive '*'.
-- 
Duy

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-08-30  9:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-15 14:24 [RFC PATCH 0/9] Narrow/Sparse checkout round 3: "easy mode" Nguyễn Thái Ngọc Duy
2008-08-16 10:31 ` Junio C Hamano
2008-08-17  5:12   ` Nguyen Thai Ngoc Duy
2008-08-17  5:50     ` Junio C Hamano
2008-08-17  6:10       ` Junio C Hamano
2008-08-17  9:49         ` [RFC PATCH 0/9] Narrow/Sparse checkout round 3: Eric Raible
2008-08-19  9:06           ` Junio C Hamano
2008-08-17 13:36         ` [RFC PATCH 0/9] Narrow/Sparse checkout round 3: "easy mode" Nguyen Thai Ngoc Duy
2008-08-19 21:10 ` James Pickens
2008-08-30  9:21   ` Nguyen Thai Ngoc Duy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).