All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/8] Sparse checkout
@ 2009-08-11 15:43 Nguyễn Thái Ngọc Duy
  2009-08-11 15:43 ` [RFC PATCH v3 1/8] Prevent diff machinery from examining assume-unchanged entries on worktree Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 53+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2009-08-11 15:43 UTC (permalink / raw)
  To: git, Johannes Schindelin, Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy

Continuing the endless RFCs of sparse checkout, this series drops the sparse hook
in favor of .git/info/sparse. Changes from the last version


  Prevent diff machinery from examining assume-unchanged entries on worktree

    "if (ce_uptodate(ce) || CE_VALID)" is updated, as well as the corresponding test


  Avoid writing to buffer in add_excludes_from_file_1()

    Splitted out from the old second patch, as suggested by Johannes


  Read .gitignore from index if it is assume-unchanged

    read_index_data() is renamed. Commit message mentions add_excludes_from_file()


  excluded_1(): support exclude "directories" in index

    This one is new because index does not have "directory", more comments in the patch


  dir.c: export excluded_1() and add_excludes_from_file_1()

    New too, exported for use in unpack-trees.c


  unpack-trees.c: generalize verify_* functions

    Splitted out of the old third patch for easier review


  Support sparse checkout in unpack_trees() and read-tree

    Read .git/info/sparse instead of .git/hooks/sparse

    
  --sparse for porcelains
    RFC patch

 Documentation/technical/api-directory-listing.txt |    3 +
 builtin-checkout.c                                |    4 +
 builtin-clean.c                                   |    5 +-
 builtin-ls-files.c                                |    4 +-
 builtin-merge.c                                   |    5 +-
 builtin-read-tree.c                               |    4 +-
 cache.h                                           |    3 +
 diff-lib.c                                        |    6 +-
 dir.c                                             |  101 +++++++++++------
 dir.h                                             |    4 +
 git-pull.sh                                       |    6 +-
 t/t1009-read-tree-sparse.sh                       |   47 ++++++++
 t/t3001-ls-files-others-exclude.sh                |   22 ++++
 t/t4039-diff-assume-unchanged.sh                  |   31 ++++++
 t/t7300-clean.sh                                  |   19 ++++
 unpack-trees.c                                    |  121 ++++++++++++++++++++-
 unpack-trees.h                                    |    3 +
 17 files changed, 340 insertions(+), 48 deletions(-)
 create mode 100755 t/t1009-read-tree-sparse.sh
 create mode 100755 t/t4039-diff-assume-unchanged.sh

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [RFC PATCH v3 1/8] Prevent diff machinery from examining assume-unchanged entries on worktree
  2009-08-11 15:43 [RFC PATCH v3 0/8] Sparse checkout Nguyễn Thái Ngọc Duy
@ 2009-08-11 15:43 ` Nguyễn Thái Ngọc Duy
  2009-08-11 15:44   ` [RFC PATCH v3 2/8] Avoid writing to buffer in add_excludes_from_file_1() Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 53+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2009-08-11 15:43 UTC (permalink / raw)
  To: git, Johannes Schindelin, Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 diff-lib.c                       |    6 ++++--
 t/t4039-diff-assume-unchanged.sh |   31 +++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 2 deletions(-)
 create mode 100755 t/t4039-diff-assume-unchanged.sh

diff --git a/diff-lib.c b/diff-lib.c
index b7813af..e5b9fe0 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -162,7 +162,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 		if (ce_uptodate(ce))
 			continue;
 
-		changed = check_removed(ce, &st);
+		/* If CE_VALID is set, don't look at workdir for file removal */
+		changed = (ce->ce_flags & CE_VALID) ? 0 : check_removed(ce, &st);
 		if (changed) {
 			if (changed < 0) {
 				perror(ce->name);
@@ -337,6 +338,8 @@ static void do_oneway_diff(struct unpack_trees_options *o,
 	struct rev_info *revs = o->unpack_data;
 	int match_missing, cached;
 
+	/* if the entry is not checked out, don't examine work tree */
+	cached = o->index_only || (idx && (idx->ce_flags & CE_VALID));
 	/*
 	 * Backward compatibility wart - "diff-index -m" does
 	 * not mean "do not ignore merges", but "match_missing".
@@ -344,7 +347,6 @@ static void do_oneway_diff(struct unpack_trees_options *o,
 	 * But with the revision flag parsing, that's found in
 	 * "!revs->ignore_merges".
 	 */
-	cached = o->index_only;
 	match_missing = !revs->ignore_merges;
 
 	if (cached && idx && ce_stage(idx)) {
diff --git a/t/t4039-diff-assume-unchanged.sh b/t/t4039-diff-assume-unchanged.sh
new file mode 100755
index 0000000..9d9498b
--- /dev/null
+++ b/t/t4039-diff-assume-unchanged.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+
+test_description='diff with assume-unchanged entries'
+
+. ./test-lib.sh
+
+# external diff has been tested in t4020-diff-external.sh
+
+test_expect_success 'setup' '
+	echo zero > zero &&
+	git add zero &&
+	git commit -m zero &&
+	echo one > one &&
+	echo two > two &&
+	git add one two &&
+	git commit -m onetwo &&
+	git update-index --assume-unchanged one &&
+	echo borked >> one &&
+	test "$(git ls-files -v one)" = "h one"
+'
+
+test_expect_success 'diff-index does not examine assume-unchanged entries' '
+	git diff-index HEAD^ -- one | grep -q 5626abf0f72e58d7a153368ba57db4c673c0e171
+'
+
+test_expect_success 'diff-files does not examine assume-unchanged entries' '
+	rm one &&
+	test -z "$(git diff-files -- one)"
+'
+
+test_done
-- 
1.6.3.GIT

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v3 2/8] Avoid writing to buffer in add_excludes_from_file_1()
  2009-08-11 15:43 ` [RFC PATCH v3 1/8] Prevent diff machinery from examining assume-unchanged entries on worktree Nguyễn Thái Ngọc Duy
@ 2009-08-11 15:44   ` Nguyễn Thái Ngọc Duy
  2009-08-11 15:44     ` [RFC PATCH v3 3/8] Read .gitignore from index if it is assume-unchanged Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 53+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2009-08-11 15:44 UTC (permalink / raw)
  To: git, Johannes Schindelin, Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy

In the next patch, the buffer that is being used within
add_excludes_from_file_1() comes from another function and does not
have extra space to put \n at the end.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 dir.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index e05b850..1170d64 100644
--- a/dir.c
+++ b/dir.c
@@ -229,10 +229,9 @@ static int add_excludes_from_file_1(const char *fname,
 
 	if (buf_p)
 		*buf_p = buf;
-	buf[size++] = '\n';
 	entry = buf;
-	for (i = 0; i < size; i++) {
-		if (buf[i] == '\n') {
+	for (i = 0; i <= size; i++) {
+		if (i == size || buf[i] == '\n') {
 			if (entry != buf + i && entry[0] != '#') {
 				buf[i - (i && buf[i-1] == '\r')] = 0;
 				add_exclude(entry, base, baselen, which);
-- 
1.6.3.GIT

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v3 3/8] Read .gitignore from index if it is assume-unchanged
  2009-08-11 15:44   ` [RFC PATCH v3 2/8] Avoid writing to buffer in add_excludes_from_file_1() Nguyễn Thái Ngọc Duy
@ 2009-08-11 15:44     ` Nguyễn Thái Ngọc Duy
  2009-08-11 15:44       ` [RFC PATCH v3 4/8] excluded_1(): support exclude "directories" in index Nguyễn Thái Ngọc Duy
  2009-08-12  2:51       ` [RFC PATCH v3 3/8] Read .gitignore from index if it is assume-unchanged Junio C Hamano
  0 siblings, 2 replies; 53+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2009-08-11 15:44 UTC (permalink / raw)
  To: git, Johannes Schindelin, Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy

In sparse checkout mode (aka CE_VALID or assume-unchanged) some files
may be missing from working directory. If some of those files are
.gitignore, it will affect how git excludes files.

Because those files are by definition "assume unchanged" we can
instead read them from index. This adds index as a prerequisite for
directory listing. At the moment directory listing is used by "git
clean", "git add", "git ls-files" and "git status"/"git commit" and
unpack_trees()-related commands.  These commands have been
checked/modified to populate index before doing directory listing.

add_excludes_from_file() does not enable this feature, because it
is used to read .git/info/exclude and some explicit files specified
by "git ls-files".

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/technical/api-directory-listing.txt |    3 +
 builtin-clean.c                                   |    5 +-
 builtin-ls-files.c                                |    4 +-
 dir.c                                             |   66 ++++++++++++++------
 t/t3001-ls-files-others-exclude.sh                |   22 +++++++
 t/t7300-clean.sh                                  |   19 ++++++
 6 files changed, 97 insertions(+), 22 deletions(-)

diff --git a/Documentation/technical/api-directory-listing.txt b/Documentation/technical/api-directory-listing.txt
index 5bbd18f..7d0e282 100644
--- a/Documentation/technical/api-directory-listing.txt
+++ b/Documentation/technical/api-directory-listing.txt
@@ -58,6 +58,9 @@ The result of the enumeration is left in these fields::
 Calling sequence
 ----------------
 
+* Ensure the_index is populated as it may have CE_VALID entries that
+  affect directory listing.
+
 * Prepare `struct dir_struct dir` and clear it with `memset(&dir, 0,
   sizeof(dir))`.
 
diff --git a/builtin-clean.c b/builtin-clean.c
index 2d8c735..d917472 100644
--- a/builtin-clean.c
+++ b/builtin-clean.c
@@ -71,8 +71,11 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 
 	dir.flags |= DIR_SHOW_OTHER_DIRECTORIES;
 
-	if (!ignored)
+	if (!ignored) {
+		if (read_cache() < 0)
+			die("index file corrupt");
 		setup_standard_excludes(&dir);
+	}
 
 	pathspec = get_pathspec(prefix, argv);
 	read_cache();
diff --git a/builtin-ls-files.c b/builtin-ls-files.c
index f473220..d1a23c4 100644
--- a/builtin-ls-files.c
+++ b/builtin-ls-files.c
@@ -481,6 +481,9 @@ int cmd_ls_files(int argc, const char **argv, const char *prefix)
 		prefix_offset = strlen(prefix);
 	git_config(git_default_config, NULL);
 
+	if (read_cache() < 0)
+		die("index file corrupt");
+
 	argc = parse_options(argc, argv, prefix, builtin_ls_files_options,
 			ls_files_usage, 0);
 	if (show_tag || show_valid_bit) {
@@ -508,7 +511,6 @@ int cmd_ls_files(int argc, const char **argv, const char *prefix)
 	pathspec = get_pathspec(prefix, argv);
 
 	/* be nice with submodule paths ending in a slash */
-	read_cache();
 	if (pathspec)
 		strip_trailing_slash_from_submodules();
 
diff --git a/dir.c b/dir.c
index 1170d64..66b485c 100644
--- a/dir.c
+++ b/dir.c
@@ -200,11 +200,36 @@ void add_exclude(const char *string, const char *base,
 	which->excludes[which->nr++] = x;
 }
 
+static void *read_assume_unchanged_from_index(const char *path, size_t *size)
+{
+	int pos, len;
+	unsigned long sz;
+	enum object_type type;
+	void *data;
+	struct index_state *istate = &the_index;
+
+	len = strlen(path);
+	pos = index_name_pos(istate, path, len);
+	if (pos < 0)
+		return NULL;
+	/* only applies to CE_VALID entries */
+	if (!(istate->cache[pos]->ce_flags & CE_VALID))
+		return NULL;
+	data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
+	if (!data || type != OBJ_BLOB) {
+		free(data);
+		return NULL;
+	}
+	*size = xsize_t(sz);
+	return data;
+}
+
 static int add_excludes_from_file_1(const char *fname,
 				    const char *base,
 				    int baselen,
 				    char **buf_p,
-				    struct exclude_list *which)
+				    struct exclude_list *which,
+				    int check_index)
 {
 	struct stat st;
 	int fd, i;
@@ -212,20 +237,26 @@ static int add_excludes_from_file_1(const char *fname,
 	char *buf, *entry;
 
 	fd = open(fname, O_RDONLY);
-	if (fd < 0 || fstat(fd, &st) < 0)
-		goto err;
-	size = xsize_t(st.st_size);
-	if (size == 0) {
-		close(fd);
-		return 0;
+	if (fd < 0 || fstat(fd, &st) < 0) {
+		if (0 <= fd)
+			close(fd);
+		if (!check_index ||
+		    (buf = read_assume_unchanged_from_index(fname, &size)) == NULL)
+			return -1;
 	}
-	buf = xmalloc(size+1);
-	if (read_in_full(fd, buf, size) != size)
-	{
-		free(buf);
-		goto err;
+	else {
+		size = xsize_t(st.st_size);
+		if (size == 0) {
+			close(fd);
+			return 0;
+		}
+		buf = xmalloc(size);
+		if (read_in_full(fd, buf, size) != size) {
+			close(fd);
+			return -1;
+		}
+		close(fd);
 	}
-	close(fd);
 
 	if (buf_p)
 		*buf_p = buf;
@@ -240,17 +271,12 @@ static int add_excludes_from_file_1(const char *fname,
 		}
 	}
 	return 0;
-
- err:
-	if (0 <= fd)
-		close(fd);
-	return -1;
 }
 
 void add_excludes_from_file(struct dir_struct *dir, const char *fname)
 {
 	if (add_excludes_from_file_1(fname, "", 0, NULL,
-				     &dir->exclude_list[EXC_FILE]) < 0)
+				     &dir->exclude_list[EXC_FILE], 0) < 0)
 		die("cannot use %s as an exclude file", fname);
 }
 
@@ -301,7 +327,7 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 		strcpy(dir->basebuf + stk->baselen, dir->exclude_per_dir);
 		add_excludes_from_file_1(dir->basebuf,
 					 dir->basebuf, stk->baselen,
-					 &stk->filebuf, el);
+					 &stk->filebuf, el, 1);
 		dir->exclude_stack = stk;
 		current = stk->baselen;
 	}
diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index c65bca8..fdd5dd8 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -64,6 +64,8 @@ two/*.4
 echo '!*.2
 !*.8' >one/two/.gitignore
 
+allignores='.gitignore one/.gitignore one/two/.gitignore'
+
 test_expect_success \
     'git ls-files --others with various exclude options.' \
     'git ls-files --others \
@@ -85,6 +87,26 @@ test_expect_success \
        >output &&
      test_cmp expect output'
 
+test_expect_success 'setup sparse gitignore' '
+	git add $allignores &&
+	git update-index --assume-unchanged $allignores &&
+	rm $allignores
+'
+
+test_expect_success \
+    'git ls-files --others with various exclude options.' \
+    'git ls-files --others \
+       --exclude=\*.6 \
+       --exclude-per-directory=.gitignore \
+       --exclude-from=.git/ignore \
+       >output &&
+     test_cmp expect output'
+
+test_expect_success 'restore gitignore' '
+	git checkout $allignores &&
+	rm .git/index
+'
+
 cat > excludes-file <<\EOF
 *.[1-8]
 e*
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 929d5d4..4886d5f 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -22,6 +22,25 @@ test_expect_success 'setup' '
 
 '
 
+test_expect_success 'git clean with assume-unchanged .gitignore' '
+	git update-index --assume-unchanged .gitignore &&
+	rm .gitignore &&
+	mkdir -p build docs &&
+	touch a.out src/part3.c docs/manual.txt obj.o build/lib.so &&
+	git clean &&
+	test -f Makefile &&
+	test -f README &&
+	test -f src/part1.c &&
+	test -f src/part2.c &&
+	test ! -f a.out &&
+	test ! -f src/part3.c &&
+	test -f docs/manual.txt &&
+	test -f obj.o &&
+	test -f build/lib.so &&
+	git update-index --no-assume-unchanged .gitignore &&
+	git checkout .gitignore
+'
+
 test_expect_success 'git clean' '
 
 	mkdir -p build docs &&
-- 
1.6.3.GIT

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v3 4/8] excluded_1(): support exclude "directories" in index
  2009-08-11 15:44     ` [RFC PATCH v3 3/8] Read .gitignore from index if it is assume-unchanged Nguyễn Thái Ngọc Duy
@ 2009-08-11 15:44       ` Nguyễn Thái Ngọc Duy
  2009-08-11 15:44         ` [RFC PATCH v3 5/8] dir.c: export excluded_1() and add_excludes_from_file_1() Nguyễn Thái Ngọc Duy
  2009-08-12  2:51       ` [RFC PATCH v3 3/8] Read .gitignore from index if it is assume-unchanged Junio C Hamano
  1 sibling, 1 reply; 53+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2009-08-11 15:44 UTC (permalink / raw)
  To: git, Johannes Schindelin, Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy

Index does not really have "directories", attempts to match "foo/"
against index will fail unless someone tries to reconstruct directories
from a list of file.

Observing that dtype in this function can never be NULL (otherwise
it would segfault), dtype NULL will be used to say "hey.. you are
matching against index" and behave properly.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
  Having dtype to segfault when dtype is NULL is nice, but I found
  no way else to sneak the new code in. Defining DT_INDEX may clash
  existing system definitions..


 dir.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/dir.c b/dir.c
index 66b485c..c990938 100644
--- a/dir.c
+++ b/dir.c
@@ -350,6 +350,12 @@ static int excluded_1(const char *pathname,
 			int to_exclude = x->to_exclude;
 
 			if (x->flags & EXC_FLAG_MUSTBEDIR) {
+				if (!dtype) {
+					if (!prefixcmp(pathname, exclude))
+						return to_exclude;
+					else
+						continue;
+				}
 				if (*dtype == DT_UNKNOWN)
 					*dtype = get_dtype(NULL, pathname, pathlen);
 				if (*dtype != DT_DIR)
-- 
1.6.3.GIT

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v3 5/8] dir.c: export excluded_1() and add_excludes_from_file_1()
  2009-08-11 15:44       ` [RFC PATCH v3 4/8] excluded_1(): support exclude "directories" in index Nguyễn Thái Ngọc Duy
@ 2009-08-11 15:44         ` Nguyễn Thái Ngọc Duy
  2009-08-11 15:44           ` [RFC PATCH v3 6/8] unpack-trees.c: generalize verify_* functions Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 53+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2009-08-11 15:44 UTC (permalink / raw)
  To: git, Johannes Schindelin, Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy

These functions are used to handle .gitignore. They are now exported
so that sparse checkout can reuse.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 dir.c |   32 ++++++++++++++++----------------
 dir.h |    4 ++++
 2 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/dir.c b/dir.c
index c990938..bc35586 100644
--- a/dir.c
+++ b/dir.c
@@ -224,12 +224,12 @@ static void *read_assume_unchanged_from_index(const char *path, size_t *size)
 	return data;
 }
 
-static int add_excludes_from_file_1(const char *fname,
-				    const char *base,
-				    int baselen,
-				    char **buf_p,
-				    struct exclude_list *which,
-				    int check_index)
+int add_excludes_from_file_to_list(const char *fname,
+				   const char *base,
+				   int baselen,
+				   char **buf_p,
+				   struct exclude_list *which,
+				   int check_index)
 {
 	struct stat st;
 	int fd, i;
@@ -275,8 +275,8 @@ static int add_excludes_from_file_1(const char *fname,
 
 void add_excludes_from_file(struct dir_struct *dir, const char *fname)
 {
-	if (add_excludes_from_file_1(fname, "", 0, NULL,
-				     &dir->exclude_list[EXC_FILE], 0) < 0)
+	if (add_excludes_from_file_to_list(fname, "", 0, NULL,
+					   &dir->exclude_list[EXC_FILE], 0) < 0)
 		die("cannot use %s as an exclude file", fname);
 }
 
@@ -325,9 +325,9 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 		memcpy(dir->basebuf + current, base + current,
 		       stk->baselen - current);
 		strcpy(dir->basebuf + stk->baselen, dir->exclude_per_dir);
-		add_excludes_from_file_1(dir->basebuf,
-					 dir->basebuf, stk->baselen,
-					 &stk->filebuf, el, 1);
+		add_excludes_from_file_to_list(dir->basebuf,
+					       dir->basebuf, stk->baselen,
+					       &stk->filebuf, el, 1);
 		dir->exclude_stack = stk;
 		current = stk->baselen;
 	}
@@ -337,9 +337,9 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 /* Scan the list and let the last match determine the fate.
  * Return 1 for exclude, 0 for include and -1 for undecided.
  */
-static int excluded_1(const char *pathname,
-		      int pathlen, const char *basename, int *dtype,
-		      struct exclude_list *el)
+int excluded_from_list(const char *pathname,
+		       int pathlen, const char *basename, int *dtype,
+		       struct exclude_list *el)
 {
 	int i;
 
@@ -413,8 +413,8 @@ int excluded(struct dir_struct *dir, const char *pathname, int *dtype_p)
 
 	prep_exclude(dir, pathname, basename-pathname);
 	for (st = EXC_CMDL; st <= EXC_FILE; st++) {
-		switch (excluded_1(pathname, pathlen, basename,
-				   dtype_p, &dir->exclude_list[st])) {
+		switch (excluded_from_list(pathname, pathlen, basename,
+					   dtype_p, &dir->exclude_list[st])) {
 		case 0:
 			return 0;
 		case 1:
diff --git a/dir.h b/dir.h
index a631446..472e11e 100644
--- a/dir.h
+++ b/dir.h
@@ -69,7 +69,11 @@ extern int match_pathspec(const char **pathspec, const char *name, int namelen,
 extern int fill_directory(struct dir_struct *dir, const char **pathspec);
 extern int read_directory(struct dir_struct *, const char *path, int len, const char **pathspec);
 
+extern int excluded_from_list(const char *pathname, int pathlen, const char *basename,
+			      int *dtype, struct exclude_list *el);
 extern int excluded(struct dir_struct *, const char *, int *);
+extern int add_excludes_from_file_to_list(const char *fname, const char *base, int baselen,
+					  char **buf_p, struct exclude_list *which, int check_index);
 extern void add_excludes_from_file(struct dir_struct *, const char *fname);
 extern void add_exclude(const char *string, const char *base,
 			int baselen, struct exclude_list *which);
-- 
1.6.3.GIT

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v3 6/8] unpack-trees.c: generalize verify_* functions
  2009-08-11 15:44         ` [RFC PATCH v3 5/8] dir.c: export excluded_1() and add_excludes_from_file_1() Nguyễn Thái Ngọc Duy
@ 2009-08-11 15:44           ` Nguyễn Thái Ngọc Duy
  2009-08-11 15:44             ` [RFC PATCH v3 7/8] Support sparse checkout in unpack_trees() and read-tree Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 53+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2009-08-11 15:44 UTC (permalink / raw)
  To: git, Johannes Schindelin, Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 unpack-trees.c |   23 ++++++++++++++++++-----
 1 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 720f7a1..02ea236 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -445,8 +445,9 @@ static int same(struct cache_entry *a, struct cache_entry *b)
  * When a CE gets turned into an unmerged entry, we
  * want it to be up-to-date
  */
-static int verify_uptodate(struct cache_entry *ce,
-		struct unpack_trees_options *o)
+static int verify_uptodate_1(struct cache_entry *ce,
+				   struct unpack_trees_options *o,
+				   const char *error_msg)
 {
 	struct stat st;
 
@@ -471,7 +472,13 @@ static int verify_uptodate(struct cache_entry *ce,
 	if (errno == ENOENT)
 		return 0;
 	return o->gently ? -1 :
-		error(ERRORMSG(o, not_uptodate_file), ce->name);
+		error(error_msg, ce->name);
+}
+
+static int verify_uptodate(struct cache_entry *ce,
+			   struct unpack_trees_options *o)
+{
+	return verify_uptodate_1(ce, o, ERRORMSG(o, not_uptodate_file));
 }
 
 static void invalidate_ce_path(struct cache_entry *ce, struct unpack_trees_options *o)
@@ -579,8 +586,9 @@ static int icase_exists(struct unpack_trees_options *o, struct cache_entry *dst,
  * We do not want to remove or overwrite a working tree file that
  * is not tracked, unless it is ignored.
  */
-static int verify_absent(struct cache_entry *ce, const char *action,
-			 struct unpack_trees_options *o)
+static int verify_absent_1(struct cache_entry *ce, const char *action,
+				 struct unpack_trees_options *o,
+				 const char *error_msg)
 {
 	struct stat st;
 
@@ -660,6 +668,11 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 	}
 	return 0;
 }
+static int verify_absent(struct cache_entry *ce, const char *action,
+			 struct unpack_trees_options *o)
+{
+	return verify_absent_1(ce, action, o, ERRORMSG(o, would_lose_untracked));
+}
 
 static int merged_entry(struct cache_entry *merge, struct cache_entry *old,
 		struct unpack_trees_options *o)
-- 
1.6.3.GIT

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v3 7/8] Support sparse checkout in unpack_trees() and read-tree
  2009-08-11 15:44           ` [RFC PATCH v3 6/8] unpack-trees.c: generalize verify_* functions Nguyễn Thái Ngọc Duy
@ 2009-08-11 15:44             ` Nguyễn Thái Ngọc Duy
  2009-08-11 15:44               ` [RFC PATCH v3 8/8] --sparse for porcelains Nguyễn Thái Ngọc Duy
  2009-08-11 21:18               ` [RFC PATCH v3 7/8] Support sparse checkout in unpack_trees() and read-tree skillzero
  0 siblings, 2 replies; 53+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2009-08-11 15:44 UTC (permalink / raw)
  To: git, Johannes Schindelin, Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy

This patch makes unpack_trees() look at .git/info/sparse [1] to
determine which files should stay in working directory, after
merging, by:

 - setting CE_VALID properly so that other operations correctly ignore
   missing files
 - driving check_updates() to add/remove files in accordance to
   CE_VALID

The feature is disabled by default. Use "read-tree --sparse" to enable it.

[1] .git/info/sparse has the same syntax as .git/info/exclude. Files
that match the patterns will be set as CE_VALID.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin-read-tree.c         |    4 +-
 cache.h                     |    3 +
 t/t1009-read-tree-sparse.sh |   47 ++++++++++++++++++++
 unpack-trees.c              |   98 ++++++++++++++++++++++++++++++++++++++++++-
 unpack-trees.h              |    3 +
 5 files changed, 153 insertions(+), 2 deletions(-)
 create mode 100755 t/t1009-read-tree-sparse.sh

diff --git a/builtin-read-tree.c b/builtin-read-tree.c
index 9c2d634..888f136 100644
--- a/builtin-read-tree.c
+++ b/builtin-read-tree.c
@@ -31,7 +31,7 @@ static int list_tree(unsigned char *sha1)
 }
 
 static const char * const read_tree_usage[] = {
-	"git read-tree [[-m [--trivial] [--aggressive] | --reset | --prefix=<prefix>] [-u [--exclude-per-directory=<gitignore>] | -i]]  [--index-output=<file>] <tree-ish1> [<tree-ish2> [<tree-ish3>]]",
+	"git read-tree [[-m [--trivial] [--aggressive] | --reset | --prefix=<prefix>] [-u [--exclude-per-directory=<gitignore>] | -i]] [--sparse] [--index-output=<file>] <tree-ish1> [<tree-ish2> [<tree-ish3>]]",
 	NULL
 };
 
@@ -98,6 +98,8 @@ int cmd_read_tree(int argc, const char **argv, const char *unused_prefix)
 		  PARSE_OPT_NONEG, exclude_per_directory_cb },
 		OPT_SET_INT('i', NULL, &opts.index_only,
 			    "don't check the working tree after merging", 1),
+		OPT_SET_INT(0, "sparse", &opts.apply_sparse,
+			    "apply sparse checkout filter", 1),
 		OPT_END()
 	};
 
diff --git a/cache.h b/cache.h
index 1a2a3c9..dfad54a 100644
--- a/cache.h
+++ b/cache.h
@@ -177,6 +177,9 @@ struct cache_entry {
 #define CE_HASHED    (0x100000)
 #define CE_UNHASHED  (0x200000)
 
+/* Only remove in work directory, not index */
+#define CE_WT_REMOVE (0x400000)
+
 /*
  * Extended on-disk flags
  */
diff --git a/t/t1009-read-tree-sparse.sh b/t/t1009-read-tree-sparse.sh
new file mode 100755
index 0000000..f70852c
--- /dev/null
+++ b/t/t1009-read-tree-sparse.sh
@@ -0,0 +1,47 @@
+#!/bin/sh
+
+test_description='sparse checkout tests'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit one &&
+	mkdir two &&
+	test_commit two two/two.t two.t
+'
+
+test_expect_success 'read-tree without .git/info/sparse' '
+	git read-tree --sparse -m -u HEAD &&
+	test -f one.t &&
+	test -f two/two.t
+'
+
+test_expect_success 'read-tree with empty .git/info/sparse' '
+	echo > .git/info/sparse &&
+	git read-tree --sparse -m -u HEAD &&
+	test -f one.t &&
+	test -f two/two.t
+'
+
+test_expect_success 'read-tree --sparse' '
+	echo "one.t" > .git/info/sparse &&
+	git read-tree --sparse -m -u HEAD &&
+	test ! -f one.t &&
+	test -f two/two.t
+'
+
+test_expect_success 'read-tree --sparse foo where foo is "directory"' '
+	echo "two" > .git/info/sparse &&
+	git read-tree --sparse -m -u HEAD &&
+	test -f one.t &&
+	test -f two/two.t
+'
+
+test_expect_success 'read-tree --sparse foo/' '
+	echo "two/" > .git/info/sparse &&
+	git read-tree --sparse -m -u HEAD &&
+	test -f one.t &&
+	test ! -f two/two.t
+'
+
+test_done
diff --git a/unpack-trees.c b/unpack-trees.c
index 02ea236..d18d333 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -32,6 +32,12 @@ static struct unpack_trees_error_msgs unpack_plumbing_errors = {
 
 	/* bind_overlap */
 	"Entry '%s' overlaps with '%s'.  Cannot bind.",
+
+	/* sparse_not_uptodate_file */
+	"Entry '%s' not uptodate. Cannot update sparse checkout.",
+
+	/* would_lose_orphaned */
+	"Working tree file '%s' would be %s by sparse checkout update.",
 };
 
 #define ERRORMSG(o,fld) \
@@ -78,7 +84,7 @@ static int check_updates(struct unpack_trees_options *o)
 	if (o->update && o->verbose_update) {
 		for (total = cnt = 0; cnt < index->cache_nr; cnt++) {
 			struct cache_entry *ce = index->cache[cnt];
-			if (ce->ce_flags & (CE_UPDATE | CE_REMOVE))
+			if (ce->ce_flags & (CE_UPDATE | CE_REMOVE | CE_WT_REMOVE))
 				total++;
 		}
 
@@ -92,6 +98,13 @@ static int check_updates(struct unpack_trees_options *o)
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
+		if (ce->ce_flags & CE_WT_REMOVE) {
+			display_progress(progress, ++cnt);
+			if (o->update)
+				unlink_entry(ce);
+			continue;
+		}
+
 		if (ce->ce_flags & CE_REMOVE) {
 			display_progress(progress, ++cnt);
 			if (o->update)
@@ -118,6 +131,74 @@ static int check_updates(struct unpack_trees_options *o)
 	return errs != 0;
 }
 
+static int verify_uptodate_sparse(struct cache_entry *ce, struct unpack_trees_options *o);
+static int verify_absent_sparse(struct cache_entry *ce, const char *action, struct unpack_trees_options *o);
+static int apply_sparse_checkout(struct unpack_trees_options *o)
+{
+	struct index_state *index = &o->result;
+	struct exclude_list el;
+	int i, ret = 0;
+
+	memset(&el, 0, sizeof(el));
+	if (add_excludes_from_file_to_list(git_path("info/sparse"), "", 0, NULL, &el, 0) < 0)
+		return 0;
+
+	for (i = 0; i < index->cache_nr; i++) {
+		struct cache_entry *ce = index->cache[i];
+		const char *basename;
+		int was_valid = ce->ce_flags & CE_VALID;
+
+		if (ce_stage(ce))
+			continue;
+
+		basename = strrchr(ce->name, '/');
+		basename = basename ? basename+1 : ce->name;
+		if (excluded_from_list(ce->name, ce_namelen(ce), basename, NULL, &el) > 0)
+			ce->ce_flags |= CE_VALID;
+		else
+			ce->ce_flags &= ~CE_VALID;
+
+		/*
+		 * We only care about files getting into the checkout area
+		 * If merge strategies want to remove some, go ahead
+		 */
+		if (ce->ce_flags & CE_REMOVE)
+			continue;
+
+		if (!was_valid && (ce->ce_flags & CE_VALID)) {
+			/*
+			 * If CE_UPDATE is set, verify_uptodate() must be called already
+			 * also stat info may have lost after merged_entry() so calling
+			 * verify_uptodate() again may fail
+			 */
+			if (!(ce->ce_flags & CE_UPDATE) && verify_uptodate_sparse(ce, o)) {
+				ret = -1;
+				break;
+			}
+			ce->ce_flags |= CE_WT_REMOVE;
+		}
+		if (was_valid && !(ce->ce_flags & CE_VALID)) {
+			if (verify_absent_sparse(ce, "overwritten", o)) {
+				ret = -1;
+				break;
+			}
+			ce->ce_flags |= CE_UPDATE;
+		}
+
+		/* merge strategies may set CE_UPDATE outside checkout area */
+		if (ce->ce_flags & CE_VALID)
+			ce->ce_flags &= ~CE_UPDATE;
+
+	}
+
+	for (i = 0;i < el.nr;i++)
+		free(el.excludes[i]);
+	if (el.excludes)
+		free(el.excludes);
+
+	return ret;
+}
+
 static inline int call_unpack_fn(struct cache_entry **src, struct unpack_trees_options *o)
 {
 	int ret = o->fn(src, o);
@@ -416,6 +497,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	if (o->trivial_merges_only && o->nontrivial_merge)
 		return unpack_failed(o, "Merge requires file-level merging");
 
+	if (o->apply_sparse && apply_sparse_checkout(o))
+		return unpack_failed(o, NULL);
+
 	o->src_index = NULL;
 	ret = check_updates(o) ? (-2) : 0;
 	if (o->dst_index)
@@ -481,6 +565,12 @@ static int verify_uptodate(struct cache_entry *ce,
 	return verify_uptodate_1(ce, o, ERRORMSG(o, not_uptodate_file));
 }
 
+static int verify_uptodate_sparse(struct cache_entry *ce,
+				  struct unpack_trees_options *o)
+{
+	return verify_uptodate_1(ce, o, ERRORMSG(o, sparse_not_uptodate_file));
+}
+
 static void invalidate_ce_path(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	if (ce)
@@ -674,6 +764,12 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 	return verify_absent_1(ce, action, o, ERRORMSG(o, would_lose_untracked));
 }
 
+static int verify_absent_sparse(struct cache_entry *ce, const char *action,
+			 struct unpack_trees_options *o)
+{
+	return verify_absent_1(ce, action, o, ERRORMSG(o, would_lose_orphaned));
+}
+
 static int merged_entry(struct cache_entry *merge, struct cache_entry *old,
 		struct unpack_trees_options *o)
 {
diff --git a/unpack-trees.h b/unpack-trees.h
index d19df44..a09077b 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -14,6 +14,8 @@ struct unpack_trees_error_msgs {
 	const char *not_uptodate_dir;
 	const char *would_lose_untracked;
 	const char *bind_overlap;
+	const char *sparse_not_uptodate_file;
+	const char *would_lose_orphaned;
 };
 
 struct unpack_trees_options {
@@ -28,6 +30,7 @@ struct unpack_trees_options {
 		     skip_unmerged,
 		     initial_checkout,
 		     diff_index_cached,
+		     apply_sparse,
 		     gently;
 	const char *prefix;
 	int pos;
-- 
1.6.3.GIT

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-11 15:44             ` [RFC PATCH v3 7/8] Support sparse checkout in unpack_trees() and read-tree Nguyễn Thái Ngọc Duy
@ 2009-08-11 15:44               ` Nguyễn Thái Ngọc Duy
  2009-08-12  6:33                 ` Junio C Hamano
  2009-08-12  7:31                 ` Johannes Sixt
  2009-08-11 21:18               ` [RFC PATCH v3 7/8] Support sparse checkout in unpack_trees() and read-tree skillzero
  1 sibling, 2 replies; 53+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2009-08-11 15:44 UTC (permalink / raw)
  To: git, Johannes Schindelin, Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy

This series is useless until now because no one would use read-tree to
checkout. At least with this, you can really use/test the series.
Porcelain design was originally "if you have .git/info/sparse,
porcelains will use it, if you don't like that, remove
.git/info/sparse" while plumblings have an option to
enable/disable this feature.

And I still like that behavior. How about we enable sparse checkout
by default for porcelains and make a config option to disable it?

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin-checkout.c |    4 ++++
 builtin-merge.c    |    5 ++++-
 git-pull.sh        |    6 +++++-
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/builtin-checkout.c b/builtin-checkout.c
index 446cac7..cec21ab 100644
--- a/builtin-checkout.c
+++ b/builtin-checkout.c
@@ -30,6 +30,7 @@ struct checkout_opts {
 	int force;
 	int writeout_stage;
 	int writeout_error;
+	int apply_sparse;
 
 	const char *new_branch;
 	int new_branch_log;
@@ -402,6 +403,7 @@ static int merge_working_tree(struct checkout_opts *opts,
 		topts.dir = xcalloc(1, sizeof(*topts.dir));
 		topts.dir->flags |= DIR_SHOW_IGNORED;
 		topts.dir->exclude_per_dir = ".gitignore";
+		topts.apply_sparse = opts->apply_sparse;
 		tree = parse_tree_indirect(old->commit->object.sha1);
 		init_tree_desc(&trees[0], tree->buffer, tree->size);
 		tree = parse_tree_indirect(new->commit->object.sha1);
@@ -594,6 +596,8 @@ int cmd_checkout(int argc, const char **argv, const char *prefix)
 		OPT_BOOLEAN('m', "merge", &opts.merge, "merge"),
 		OPT_STRING(0, "conflict", &conflict_style, "style",
 			   "conflict style (merge or diff3)"),
+		OPT_SET_INT(0, "sparse", &opts.apply_sparse,
+			    "apply sparse checkout filter", 1),
 		OPT_END(),
 	};
 	int has_dash_dash;
diff --git a/builtin-merge.c b/builtin-merge.c
index 0b12fb3..c14b91d 100644
--- a/builtin-merge.c
+++ b/builtin-merge.c
@@ -43,7 +43,7 @@ static const char * const builtin_merge_usage[] = {
 
 static int show_diffstat = 1, option_log, squash;
 static int option_commit = 1, allow_fast_forward = 1;
-static int allow_trivial = 1, have_message;
+static int allow_trivial = 1, have_message, apply_sparse;
 static struct strbuf merge_msg;
 static struct commit_list *remoteheads;
 static unsigned char head[20], stash[20];
@@ -172,6 +172,7 @@ static struct option builtin_merge_options[] = {
 	OPT_CALLBACK('m', "message", &merge_msg, "message",
 		"message to be used for the merge commit (if any)",
 		option_parse_message),
+	OPT_SET_INT(0, "sparse", &apply_sparse, "apply sparse checkout filter", 1),
 	OPT__VERBOSITY(&verbosity),
 	OPT_END()
 };
@@ -494,6 +495,7 @@ static int read_tree_trivial(unsigned char *common, unsigned char *head,
 	opts.verbose_update = 1;
 	opts.trivial_merges_only = 1;
 	opts.merge = 1;
+	opts.apply_sparse = apply_sparse;
 	trees[nr_trees] = parse_tree_indirect(common);
 	if (!trees[nr_trees++])
 		return -1;
@@ -646,6 +648,7 @@ static int checkout_fast_forward(unsigned char *head, unsigned char *remote)
 	opts.verbose_update = 1;
 	opts.merge = 1;
 	opts.fn = twoway_merge;
+	opts.apply_sparse = apply_sparse;
 
 	trees[nr_trees] = parse_tree_indirect(head);
 	if (!trees[nr_trees++])
diff --git a/git-pull.sh b/git-pull.sh
index 0f24182..ba583bf 100755
--- a/git-pull.sh
+++ b/git-pull.sh
@@ -20,6 +20,7 @@ strategy_args= diffstat= no_commit= squash= no_ff= log_arg= verbosity=
 curr_branch=$(git symbolic-ref -q HEAD)
 curr_branch_short=$(echo "$curr_branch" | sed "s|refs/heads/||")
 rebase=$(git config --bool branch.$curr_branch_short.rebase)
+sparse=
 while :
 do
 	case "$1" in
@@ -65,6 +66,9 @@ do
 	--no-r|--no-re|--no-reb|--no-reba|--no-rebas|--no-rebase)
 		rebase=false
 		;;
+	--sparse)
+		sparse=--sparse
+		;;
 	-h|--h|--he|--hel|--help)
 		usage
 		;;
@@ -201,5 +205,5 @@ merge_name=$(git fmt-merge-msg $log_arg <"$GIT_DIR/FETCH_HEAD") || exit
 test true = "$rebase" &&
 	exec git-rebase $diffstat $strategy_args --onto $merge_head \
 	${oldremoteref:-$merge_head}
-exec git-merge $diffstat $no_commit $squash $no_ff $log_arg $strategy_args \
+exec git-merge $sparse $diffstat $no_commit $squash $no_ff $log_arg $strategy_args \
 	"$merge_name" HEAD $merge_head $verbosity
-- 
1.6.3.GIT

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 7/8] Support sparse checkout in unpack_trees() and  read-tree
  2009-08-11 15:44             ` [RFC PATCH v3 7/8] Support sparse checkout in unpack_trees() and read-tree Nguyễn Thái Ngọc Duy
  2009-08-11 15:44               ` [RFC PATCH v3 8/8] --sparse for porcelains Nguyễn Thái Ngọc Duy
@ 2009-08-11 21:18               ` skillzero
  2009-08-11 21:38                 ` Jakub Narebski
  1 sibling, 1 reply; 53+ messages in thread
From: skillzero @ 2009-08-11 21:18 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, Johannes Schindelin, Junio C Hamano

2009/8/11 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:
> [1] .git/info/sparse has the same syntax as .git/info/exclude. Files
> that match the patterns will be set as CE_VALID.

Does this mean it will only support excluding paths you don't want
rather than letting you only include paths you do want?

I'm currently using your other patch series that lets you include or
exclude paths (via config variable) and I find that I mostly use the
include side of it with only a few excluded paths. This is because I
typically want to include only a small subset of the repository so
using excludes would require a pretty large list and any time somebody
adds new files, I'd have to update the exclude list.

I appreciate the flexibility of the script to control what is included
or excluded, but like some other comments here, I like the simplicity
of having built-in support for including/excluding paths without
having to write a script to do it. Some of my projects run on Windows
so scripting is more difficult there.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 7/8] Support sparse checkout in unpack_trees() and  read-tree
  2009-08-11 21:18               ` [RFC PATCH v3 7/8] Support sparse checkout in unpack_trees() and read-tree skillzero
@ 2009-08-11 21:38                 ` Jakub Narebski
  2009-08-11 22:03                   ` skillzero
  0 siblings, 1 reply; 53+ messages in thread
From: Jakub Narebski @ 2009-08-11 21:38 UTC (permalink / raw)
  To: skillzero
  Cc: Nguyễn Thái Ngọc Duy, git, Johannes Schindelin,
	Junio C Hamano

skillzero@gmail.com writes:
> 2009/8/11 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:

> > [1] .git/info/sparse has the same syntax as .git/info/exclude. Files
> > that match the patterns will be set as CE_VALID.
> 
> Does this mean it will only support excluding paths you don't want
> rather than letting you only include paths you do want?

Errr... what I read is that paths set by .git/info/sparse would be
excluded from checkout (marked as assume-unchanged / CE_VALID).

But if it is the same mechanism as gitignore, then you can use ! 
prefix to set files (patterns) to include, e.g.

  !Documentation/
  *

(I think rules are processed top-down, first matching wins).
 
> I'm currently using your other patch series that lets you include or
> exclude paths (via config variable) and I find that I mostly use the
> include side of it with only a few excluded paths. This is because I
> typically want to include only a small subset of the repository so
> using excludes would require a pretty large list and any time somebody
> adds new files, I'd have to update the exclude list.

Not true, see above.

-- 
Jakub Narebski

Git User's Survey 2009
http://tinyurl.com/GitSurvey2009

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 7/8] Support sparse checkout in unpack_trees() and  read-tree
  2009-08-11 21:38                 ` Jakub Narebski
@ 2009-08-11 22:03                   ` skillzero
  2009-08-12  1:30                     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 53+ messages in thread
From: skillzero @ 2009-08-11 22:03 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Nguyễn Thái Ngọc Duy, git, Johannes Schindelin,
	Junio C Hamano

On Tue, Aug 11, 2009 at 2:38 PM, Jakub Narebski<jnareb@gmail.com> wrote:
> skillzero@gmail.com writes:
>> 2009/8/11 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:
>
>> > [1] .git/info/sparse has the same syntax as .git/info/exclude. Files
>> > that match the patterns will be set as CE_VALID.
>>
>> Does this mean it will only support excluding paths you don't want
>> rather than letting you only include paths you do want?
>
> Errr... what I read is that paths set by .git/info/sparse would be
> excluded from checkout (marked as assume-unchanged / CE_VALID).
>
> But if it is the same mechanism as gitignore, then you can use !
> prefix to set files (patterns) to include, e.g.
>
>  !Documentation/
>  *
>
> (I think rules are processed top-down, first matching wins).

I wasn't sure because the .gitignore negation stuff mentions negating
a previously ignored pattern. But for sparse patterns, there likely
wouldn't be a previous pattern. Include patterns are a little
different in that if there are no include patterns (but maybe some
exclude patterns), I think the expectation is that everything will be
included (minus excludes), but if you have some include patterns then
only those paths will be included (minus any excludes).

It's great if it already supports includes as well as excludes
(although it's a little confusing to say !Documentation to mean
"include it"), but I wasn't sure from the comment so I was just
asking.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 7/8] Support sparse checkout in unpack_trees() and  read-tree
  2009-08-11 22:03                   ` skillzero
@ 2009-08-12  1:30                     ` Nguyen Thai Ngoc Duy
  2009-08-12  4:59                       ` skillzero
  0 siblings, 1 reply; 53+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2009-08-12  1:30 UTC (permalink / raw)
  To: skillzero; +Cc: Jakub Narebski, git, Johannes Schindelin, Junio C Hamano

On Wed, Aug 12, 2009 at 5:03 AM, <skillzero@gmail.com> wrote:
> On Tue, Aug 11, 2009 at 2:38 PM, Jakub Narebski<jnareb@gmail.com> wrote:
>> skillzero@gmail.com writes:
>>> 2009/8/11 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:
>>
>>> > [1] .git/info/sparse has the same syntax as .git/info/exclude. Files
>>> > that match the patterns will be set as CE_VALID.
>>>
>>> Does this mean it will only support excluding paths you don't want
>>> rather than letting you only include paths you do want?
>>
>> Errr... what I read is that paths set by .git/info/sparse would be
>> excluded from checkout (marked as assume-unchanged / CE_VALID).
>>
>> But if it is the same mechanism as gitignore, then you can use !
>> prefix to set files (patterns) to include, e.g.
>>
>>  !Documentation/
>>  *
>>
>> (I think rules are processed top-down, first matching wins).
>
> I wasn't sure because the .gitignore negation stuff mentions negating
> a previously ignored pattern. But for sparse patterns, there likely
> wouldn't be a previous pattern.

No problem. We put pattern '*' at top (match everything). Previous
pattern issue solved.

> Include patterns are a little
> different in that if there are no include patterns (but maybe some
> exclude patterns), I think the expectation is that everything will be
> included (minus excludes), but if you have some include patterns then
> only those paths will be included (minus any excludes).

Let's say you want to include foo/ and bar/ only, this should work:

*
!foo/
!bar/

The evaluating order is from bottom up. When it first matches 'bar/',
because it a negate pattern, it returns "no don't match" and stops.
When it matches neither foo/ nor bar/ then it will be caught by '*'
and return "yes it matches" - that means "ignored" from checkout area.
In the end only foo/* and bar/* survive.

I think it's as easy as writing exclude patterns once you figure out '*'.
-- 
Duy

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 3/8] Read .gitignore from index if it is assume-unchanged
  2009-08-11 15:44     ` [RFC PATCH v3 3/8] Read .gitignore from index if it is assume-unchanged Nguyễn Thái Ngọc Duy
  2009-08-11 15:44       ` [RFC PATCH v3 4/8] excluded_1(): support exclude "directories" in index Nguyễn Thái Ngọc Duy
@ 2009-08-12  2:51       ` Junio C Hamano
  2009-08-13  6:37         ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2009-08-12  2:51 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Johannes Schindelin

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> diff --git a/Documentation/technical/api-directory-listing.txt b/Documentation/technical/api-directory-listing.txt
> index 5bbd18f..7d0e282 100644
> --- a/Documentation/technical/api-directory-listing.txt
> +++ b/Documentation/technical/api-directory-listing.txt
> @@ -58,6 +58,9 @@ The result of the enumeration is left in these fields::
>  Calling sequence
>  ----------------
>  
> +* Ensure the_index is populated as it may have CE_VALID entries that
> +  affect directory listing.
> +

When you want to enumerate all paths in the work tree, instead of not just
the untracked ones, it used to be possible to first run read_directory()
before calling read_cache().  You are now forbidding this.

I do not think it is hard to resurrect the feature if it is necessary (add
an option to dir_struct and teach dir_add_name() not to ignore paths the
index knows about), and I do not think none of the existing code relies on
it anymore (I think "git add" used to), but there may be some codepath I
forgot about, which is a concern.

> diff --git a/builtin-clean.c b/builtin-clean.c
> index 2d8c735..d917472 100644
> --- a/builtin-clean.c
> +++ b/builtin-clean.c
> @@ -71,8 +71,11 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
>  
>  	dir.flags |= DIR_SHOW_OTHER_DIRECTORIES;
>  
> -	if (!ignored)
> +	if (!ignored) {
> +		if (read_cache() < 0)
> +			die("index file corrupt");
>  		setup_standard_excludes(&dir);
> +	}
>  
>  	pathspec = get_pathspec(prefix, argv);
>  	read_cache();

Wouldn't it be much cleaner to move the existing read_cache() up, like you
did for ls-files, instead of conditionally reading the index at a random
place in the program sequence depending on the combinations of options?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 7/8] Support sparse checkout in unpack_trees() and  read-tree
  2009-08-12  1:30                     ` Nguyen Thai Ngoc Duy
@ 2009-08-12  4:59                       ` skillzero
  0 siblings, 0 replies; 53+ messages in thread
From: skillzero @ 2009-08-12  4:59 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Jakub Narebski, git, Johannes Schindelin, Junio C Hamano

On Tue, Aug 11, 2009 at 6:30 PM, Nguyen Thai Ngoc Duy<pclouds@gmail.com> wrote:

> I think it's as easy as writing exclude patterns once you figure out '*'.

That solves it for me. Thanks.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-11 15:44               ` [RFC PATCH v3 8/8] --sparse for porcelains Nguyễn Thái Ngọc Duy
@ 2009-08-12  6:33                 ` Junio C Hamano
  2009-08-12 10:01                   ` Nguyen Thai Ngoc Duy
  2009-08-13  7:20                   ` Nguyen Thai Ngoc Duy
  2009-08-12  7:31                 ` Johannes Sixt
  1 sibling, 2 replies; 53+ messages in thread
From: Junio C Hamano @ 2009-08-12  6:33 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Johannes Schindelin

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> @@ -594,6 +596,8 @@ int cmd_checkout(int argc, const char **argv, const char *prefix)
>  		OPT_BOOLEAN('m', "merge", &opts.merge, "merge"),
>  		OPT_STRING(0, "conflict", &conflict_style, "style",
>  			   "conflict style (merge or diff3)"),
> +		OPT_SET_INT(0, "sparse", &opts.apply_sparse,
> +			    "apply sparse checkout filter", 1),

Shouldn't this be BOOLEAN not INT, i.e. "--[no-]sparse"?  That way, you
could enable it by simply the presense of $GIT_DIR/info/sparse.

It could also require core.sparseworktree configuration set to true if we
are really paranoid, but without the actual sparse specification file
flipping that configuration to true would not be useful anyway, so in
practice, giving --sparse-work-tree option to these Porcelain commands
would be no-op, but --no-sparse-work-tree option would be useful to
ignore $GIT_DIR/info/sparse and populate the work tree fully.

Or am I missing something?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-11 15:44               ` [RFC PATCH v3 8/8] --sparse for porcelains Nguyễn Thái Ngọc Duy
  2009-08-12  6:33                 ` Junio C Hamano
@ 2009-08-12  7:31                 ` Johannes Sixt
  2009-08-12  9:53                   ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 53+ messages in thread
From: Johannes Sixt @ 2009-08-12  7:31 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, Johannes Schindelin, Junio C Hamano

Nguyễn Thái Ngọc Duy schrieb:
> This series is useless until now because no one would use read-tree to
> checkout. At least with this, you can really use/test the series.
> Porcelain design was originally "if you have .git/info/sparse,
> porcelains will use it, if you don't like that, remove
> .git/info/sparse" while plumblings have an option to
> enable/disable this feature.
> 
> And I still like that behavior. How about we enable sparse checkout
> by default for porcelains and make a config option to disable it?

I would enable sparse checkout by default even for plumbing. Whether the
checkout area is sparse should always be governed by .git/info/sparse.
This way, existing scripts and aliases should automatically work in sparse
worktrees.

BTW, the name .git/info/sparse is perhaps a bit too technical in the sense
that only git developers know that this feature runs under the name
"sparse checkout". Perhaps it should be named

   .git/info/indexonly
   .git/info/nocheckout

or so.

-- Hannes

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-12  7:31                 ` Johannes Sixt
@ 2009-08-12  9:53                   ` Nguyen Thai Ngoc Duy
  2009-08-12 15:40                     ` Raja R Harinath
  0 siblings, 1 reply; 53+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2009-08-12  9:53 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Johannes Schindelin, Junio C Hamano

2009/8/12 Johannes Sixt <j.sixt@viscovery.net>:
> BTW, the name .git/info/sparse is perhaps a bit too technical in the sense
> that only git developers know that this feature runs under the name
> "sparse checkout". Perhaps it should be named
>
>   .git/info/indexonly
>   .git/info/nocheckout
>
> or so.

I did not like the name "sparse" either. Another option is
.git/info/assume-unchanged.
-- 
Duy

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-12  6:33                 ` Junio C Hamano
@ 2009-08-12 10:01                   ` Nguyen Thai Ngoc Duy
  2009-08-13  7:20                   ` Nguyen Thai Ngoc Duy
  1 sibling, 0 replies; 53+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2009-08-12 10:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin

2009/8/12 Junio C Hamano <gitster@pobox.com>:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> @@ -594,6 +596,8 @@ int cmd_checkout(int argc, const char **argv, const char *prefix)
>>               OPT_BOOLEAN('m', "merge", &opts.merge, "merge"),
>>               OPT_STRING(0, "conflict", &conflict_style, "style",
>>                          "conflict style (merge or diff3)"),
>> +             OPT_SET_INT(0, "sparse", &opts.apply_sparse,
>> +                         "apply sparse checkout filter", 1),
>
> Shouldn't this be BOOLEAN not INT, i.e. "--[no-]sparse"?  That way, you
> could enable it by simply the presense of $GIT_DIR/info/sparse.

This patch was written carelessly. I wanted to have something to test.
If you agree on option name "--sparse" then yes BOOLEAN is better.

> It could also require core.sparseworktree configuration set to true if we
> are really paranoid, but without the actual sparse specification file
> flipping that configuration to true would not be useful anyway, so in
> practice, giving --sparse-work-tree option to these Porcelain commands
> would be no-op, but --no-sparse-work-tree option would be useful to
> ignore $GIT_DIR/info/sparse and populate the work tree fully.
>
> Or am I missing something?

Sounds good (and --sparse-work-tree is apparently better than
--sparse). So let's enable it by default, add --no-sparse-work-tree to
disable it and wait until some one complains, then we'll add
core.sparseworktree. I think core.sparseworktree can also be used to
specify what spec file to be used instead of the default
.git/info/sparse, if users like to switch among some well-defined spec
files.
-- 
Duy

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-12  9:53                   ` Nguyen Thai Ngoc Duy
@ 2009-08-12 15:40                     ` Raja R Harinath
  2009-08-13  7:37                       ` Johannes Sixt
  0 siblings, 1 reply; 53+ messages in thread
From: Raja R Harinath @ 2009-08-12 15:40 UTC (permalink / raw)
  To: git

Hi,

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> 2009/8/12 Johannes Sixt <j.sixt@viscovery.net>:
>> BTW, the name .git/info/sparse is perhaps a bit too technical in the sense
>> that only git developers know that this feature runs under the name
>> "sparse checkout". Perhaps it should be named
>>
>>   .git/info/indexonly
>>   .git/info/nocheckout
>>
>> or so.
>
> I did not like the name "sparse" either. Another option is
> .git/info/assume-unchanged.

Or .git/info/doppelgangers, or even .git/info/doppelgängers :-)

- Hari

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 3/8] Read .gitignore from index if it is  assume-unchanged
  2009-08-12  2:51       ` [RFC PATCH v3 3/8] Read .gitignore from index if it is assume-unchanged Junio C Hamano
@ 2009-08-13  6:37         ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 53+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2009-08-13  6:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin

2009/8/12 Junio C Hamano <gitster@pobox.com>:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> diff --git a/Documentation/technical/api-directory-listing.txt b/Documentation/technical/api-directory-listing.txt
>> index 5bbd18f..7d0e282 100644
>> --- a/Documentation/technical/api-directory-listing.txt
>> +++ b/Documentation/technical/api-directory-listing.txt
>> @@ -58,6 +58,9 @@ The result of the enumeration is left in these fields::
>>  Calling sequence
>>  ----------------
>>
>> +* Ensure the_index is populated as it may have CE_VALID entries that
>> +  affect directory listing.
>> +
>
> When you want to enumerate all paths in the work tree, instead of not just
> the untracked ones, it used to be possible to first run read_directory()
> before calling read_cache().  You are now forbidding this.

Either I phrased it badly, or I don't follow you. If you don't call
read_cache() before read_directory(), the_index should be empty and
read_assume_unchanged_from_index() will be no-op. So read_directory()
behavior does not change in this case.

> I do not think it is hard to resurrect the feature if it is necessary (add
> an option to dir_struct and teach dir_add_name() not to ignore paths the
> index knows about), and I do not think none of the existing code relies on
> it anymore (I think "git add" used to), but there may be some codepath I
> forgot about, which is a concern.

Hmm.. "git add" loaded index early since the first version of
builtin-add.c. I have checked all code path that can lead to
read_directory_recursively(). In all cases, index is loaded before
read_dir..() is called.

>> diff --git a/builtin-clean.c b/builtin-clean.c
>> index 2d8c735..d917472 100644
>> --- a/builtin-clean.c
>> +++ b/builtin-clean.c
>> @@ -71,8 +71,11 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
>>
>>       dir.flags |= DIR_SHOW_OTHER_DIRECTORIES;
>>
>> -     if (!ignored)
>> +     if (!ignored) {
>> +             if (read_cache() < 0)
>> +                     die("index file corrupt");
>>               setup_standard_excludes(&dir);
>> +     }
>>
>>       pathspec = get_pathspec(prefix, argv);
>>       read_cache();
>
> Wouldn't it be much cleaner to move the existing read_cache() up, like you
> did for ls-files, instead of conditionally reading the index at a random
> place in the program sequence depending on the combinations of options?

Agreed. read_cache() is called right below anyway.
-- 
Duy

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-12  6:33                 ` Junio C Hamano
  2009-08-12 10:01                   ` Nguyen Thai Ngoc Duy
@ 2009-08-13  7:20                   ` Nguyen Thai Ngoc Duy
  2009-08-13  9:58                     ` Jakub Narebski
  1 sibling, 1 reply; 53+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2009-08-13  7:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin

2009/8/12 Junio C Hamano <gitster@pobox.com>:
> It could also require core.sparseworktree configuration set to true if we
> are really paranoid, but without the actual sparse specification file
> flipping that configuration to true would not be useful anyway, so in
> practice, giving --sparse-work-tree option to these Porcelain commands
> would be no-op, but --no-sparse-work-tree option would be useful to
> ignore $GIT_DIR/info/sparse and populate the work tree fully.

Only part "ignore $GIT_DIR/info/sparse" is correct.
"--no-sparse-work-tree" would not clear CE_VALID from all entries in
index (which is good, if you are using CE_VALID for another purpose).

To quit sparse checkout, you must create an empty
$GIT_DIR/info/sparse, then do "git checkout" or "git read-tree -m -u
HEAD" so that the tree is full populated, then you can remove
$GIT_DIR/info/sparse. Quite unintuitive..
-- 
Duy

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-12 15:40                     ` Raja R Harinath
@ 2009-08-13  7:37                       ` Johannes Sixt
  0 siblings, 0 replies; 53+ messages in thread
From: Johannes Sixt @ 2009-08-13  7:37 UTC (permalink / raw)
  To: Raja R Harinath, Nguyen Thai Ngoc Duy
  Cc: git, Johannes Schindelin, Junio C Hamano

Raja R Harinath schrieb:
> Hi,
> 
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
> 
>> 2009/8/12 Johannes Sixt <j.sixt@viscovery.net>:
>>> BTW, the name .git/info/sparse is perhaps a bit too technical in the sense
>>> that only git developers know that this feature runs under the name
>>> "sparse checkout". Perhaps it should be named
>>>
>>>   .git/info/indexonly
>>>   .git/info/nocheckout
>>>
>>> or so.
>> I did not like the name "sparse" either. Another option is
>> .git/info/assume-unchanged.
> 
> Or .git/info/doppelgangers, or even .git/info/doppelgängers :-)

Heh!

   .git/info/phantoms
   git checkout --no-phantoms
   git read-tree --phantoms

-- Hannes

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-13  7:20                   ` Nguyen Thai Ngoc Duy
@ 2009-08-13  9:58                     ` Jakub Narebski
  2009-08-13 12:38                       ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 53+ messages in thread
From: Jakub Narebski @ 2009-08-13  9:58 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Junio C Hamano, git, Johannes Schindelin

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
> 2009/8/12 Junio C Hamano <gitster@pobox.com>:

> > It could also require core.sparseworktree configuration set to true if we
> > are really paranoid, but without the actual sparse specification file
> > flipping that configuration to true would not be useful anyway, so in
> > practice, giving --sparse-work-tree option to these Porcelain commands
> > would be no-op, but --no-sparse-work-tree option would be useful to
> > ignore $GIT_DIR/info/sparse and populate the work tree fully.
> 
> Only part "ignore $GIT_DIR/info/sparse" is correct.
> "--no-sparse-work-tree" would not clear CE_VALID from all entries in
> index (which is good, if you are using CE_VALID for another purpose).
> 
> To quit sparse checkout, you must create an empty
> $GIT_DIR/info/sparse, then do "git checkout" or "git read-tree -m -u
> HEAD" so that the tree is full populated, then you can remove
> $GIT_DIR/info/sparse. Quite unintuitive..

Hmmm... this looks like either argument for introducing --full option
to git-checkout (ignore CE_VALID bit, checkout everything, and clean
CE_VALID (?))...

...or for going with _separate_ bit for partial checkout, like in the
very first version of this series, which otherwise functions like
CE_VALID, or is just used to mark that CE_VALID was set using sparse.

Food for thought.
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-13  9:58                     ` Jakub Narebski
@ 2009-08-13 12:38                       ` Nguyen Thai Ngoc Duy
  2009-08-14 20:23                         ` Jakub Narebski
  0 siblings, 1 reply; 53+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2009-08-13 12:38 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Junio C Hamano, git, Johannes Schindelin

On 8/13/09, Jakub Narebski <jnareb@gmail.com> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>  > 2009/8/12 Junio C Hamano <gitster@pobox.com>:
>
>  > > It could also require core.sparseworktree configuration set to true if we
>  > > are really paranoid, but without the actual sparse specification file
>  > > flipping that configuration to true would not be useful anyway, so in
>  > > practice, giving --sparse-work-tree option to these Porcelain commands
>  > > would be no-op, but --no-sparse-work-tree option would be useful to
>  > > ignore $GIT_DIR/info/sparse and populate the work tree fully.
>  >
>  > Only part "ignore $GIT_DIR/info/sparse" is correct.
>  > "--no-sparse-work-tree" would not clear CE_VALID from all entries in
>  > index (which is good, if you are using CE_VALID for another purpose).
>  >
>  > To quit sparse checkout, you must create an empty
>  > $GIT_DIR/info/sparse, then do "git checkout" or "git read-tree -m -u
>  > HEAD" so that the tree is full populated, then you can remove
>  > $GIT_DIR/info/sparse. Quite unintuitive..
>
>
> Hmmm... this looks like either argument for introducing --full option
>  to git-checkout (ignore CE_VALID bit, checkout everything, and clean
>  CE_VALID (?))...
>
>  ...or for going with _separate_ bit for partial checkout, like in the
>  very first version of this series, which otherwise functions like
>  CE_VALID, or is just used to mark that CE_VALID was set using sparse.

In my opinion, making an empty .git/info/sparse to fully populate
worktree is not too bad. I wanted to have plumbing-level support in
git so that you could try sparse checkout on your projects (possibly
with a few additional scripts to make your life easier). Then good
Porcelain UI may emerge later (or in worst case, people would roll
their own sparse checkout).
-- 
Duy

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-13 12:38                       ` Nguyen Thai Ngoc Duy
@ 2009-08-14 20:23                         ` Jakub Narebski
  2009-08-15  2:01                           ` Junio C Hamano
  0 siblings, 1 reply; 53+ messages in thread
From: Jakub Narebski @ 2009-08-14 20:23 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Junio C Hamano, git, Johannes Schindelin

Dnia czwartek 13. sierpnia 2009 14:38, Nguyen Thai Ngoc Duy napisał:
> On 8/13/09, Jakub Narebski <jnareb@gmail.com> wrote:
>> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>>> 2009/8/12 Junio C Hamano <gitster@pobox.com>:
>>
>>>> It could also require core.sparseworktree configuration set to true if we
>>>> are really paranoid, but without the actual sparse specification file
>>>> flipping that configuration to true would not be useful anyway, so in
>>>> practice, giving --sparse-work-tree option to these Porcelain commands
>>>> would be no-op, but --no-sparse-work-tree option would be useful to
>>>> ignore $GIT_DIR/info/sparse and populate the work tree fully.
>>>
>>> Only part "ignore $GIT_DIR/info/sparse" is correct.
>>> "--no-sparse-work-tree" would not clear CE_VALID from all entries in
>>> index (which is good, if you are using CE_VALID for another purpose).
>>>
>>> To quit sparse checkout, you must create an empty
>>> $GIT_DIR/info/sparse, then do "git checkout" or "git read-tree -m -u
>>> HEAD" so that the tree is full populated, then you can remove
>>> $GIT_DIR/info/sparse. Quite unintuitive..
>>
>>
>> Hmmm... this looks like either argument for introducing --full option
>>  to git-checkout (ignore CE_VALID bit, checkout everything, and clean
>>  CE_VALID (?))...
>>
>>  ...or for going with _separate_ bit for partial checkout, like in the
>>  very first version of this series, which otherwise functions like
>>  CE_VALID, or is just used to mark that CE_VALID was set using sparse.
> 
> In my opinion, making an empty .git/info/sparse to fully populate
> worktree is not too bad. I wanted to have plumbing-level support in
> git so that you could try sparse checkout on your projects (possibly
> with a few additional scripts to make your life easier). Then good
> Porcelain UI may emerge later (or in worst case, people would roll
> their own sparse checkout).

Deciding whether sparse checkout should use CE_VALID only, or should it
(as it was in the very first version of series) use additional flag, 
either CE_NO_CHECKOUT, or CE_VALID_IS_USED_HERE_FOR_SPARSE_CHECKOUT ;-)
is a design decision about *plumbing-level* support.

Note that shallow clone, while using the same mechanism as grafts file,
nevertheless use separate file; so perhaps sparse checkout while using
the same mechanism as --assume-unchanged should use additional flag.


BTW. you might want to use GIT_SPARSE_FILE, similar to GIT_INDEX_FILE;
see the fact that plumbing doesn't have .gitignore not .git/info/excludes
hardcoded... well, except for --standard-excludes.  This way full
checkout would be as simple as using

  $ GIT_SPARSE_FILE= git checkout -- .

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-14 20:23                         ` Jakub Narebski
@ 2009-08-15  2:01                           ` Junio C Hamano
  2009-08-15 23:37                             ` Jakub Narebski
  0 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2009-08-15  2:01 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Nguyen Thai Ngoc Duy, git, Johannes Schindelin

Jakub Narebski <jnareb@gmail.com> writes:

>>> Hmmm... this looks like either argument for introducing --full option
>>>  to git-checkout (ignore CE_VALID bit, checkout everything, and clean
>>>  CE_VALID (?))...
>>>
>>>  ...or for going with _separate_ bit for partial checkout, like in the
>>>  very first version of this series, which otherwise functions like
>>>  CE_VALID, or is just used to mark that CE_VALID was set using sparse.

How would a separate bit help?  Just like you need to clear CE_VALID bit
to revert the index into a normal (or "non sparse") state somehow, you
would need to have a way to clear that separate bit anyway.

A separate bit would help only if you want to handle assume-unchanged and
sparse checkout independently. But my impression was that the recent lstat
reduction effort addressed the issue assume-unchanged were invented to
work around in the first place.

Cf. http://thread.gmane.org/gmane.comp.version-control.git/123218/focus=123252

There is no reason to use assume-unchanged to tell git not to lstat to see
if a path is up-to-date by promising that you are not going to touch it
after you checked it out.

So I do not understand why you would want a separate bit, nor why you
think a separate bit would help when changing the index state from sparse
to non-sparse (or vice versa).

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-15  2:01                           ` Junio C Hamano
@ 2009-08-15 23:37                             ` Jakub Narebski
  2009-08-16  8:14                               ` Johannes Schindelin
  0 siblings, 1 reply; 53+ messages in thread
From: Jakub Narebski @ 2009-08-15 23:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyen Thai Ngoc Duy, git, Johannes Schindelin

On Sat, 15 Aug 2009, Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
>>>> Hmmm... this looks like either argument for introducing --full option
>>>>  to git-checkout (ignore CE_VALID bit, checkout everything, and clean
>>>>  CE_VALID (?))...
>>>>
>>>>  ...or for going with _separate_ bit for partial checkout, like in the
>>>>  very first version of this series, which otherwise functions like
>>>>  CE_VALID, or is just used to mark that CE_VALID was set using sparse.
> 
> How would a separate bit help?  Just like you need to clear CE_VALID bit
> to revert the index into a normal (or "non sparse") state somehow, you
> would need to have a way to clear that separate bit anyway.
> 
> A separate bit would help only if you want to handle assume-unchanged and
> sparse checkout independently. But my impression was that the recent lstat
> reduction effort addressed the issue assume-unchanged were invented to
> work around in the first place.

Well, if we assume that we don't need (don't want) to handle
assume-unchanged and sparse checkout independently, then of course the
idea of having separate or additional bit for sparse doesn't make sense.
 
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-15 23:37                             ` Jakub Narebski
@ 2009-08-16  8:14                               ` Johannes Schindelin
  2009-08-17  9:08                                 ` Johannes Schindelin
  2009-08-17 16:01                                 ` Jakub Narebski
  0 siblings, 2 replies; 53+ messages in thread
From: Johannes Schindelin @ 2009-08-16  8:14 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Junio C Hamano, Nguyen Thai Ngoc Duy, git

Hi,

On Sun, 16 Aug 2009, Jakub Narebski wrote:

> On Sat, 15 Aug 2009, Junio C Hamano wrote:
> > Jakub Narebski <jnareb@gmail.com> writes:
> > 
> >>>> Hmmm... this looks like either argument for introducing --full 
> >>>> option to git-checkout (ignore CE_VALID bit, checkout everything, 
> >>>> and clean CE_VALID (?))...
> >>>>
> >>>>  ...or for going with _separate_ bit for partial checkout, like in 
> >>>>  the very first version of this series, which otherwise functions 
> >>>>  like CE_VALID, or is just used to mark that CE_VALID was set using 
> >>>>  sparse.
> > 
> > How would a separate bit help?  Just like you need to clear CE_VALID 
> > bit to revert the index into a normal (or "non sparse") state somehow, 
> > you would need to have a way to clear that separate bit anyway.
> > 
> > A separate bit would help only if you want to handle assume-unchanged 
> > and sparse checkout independently. But my impression was that the 
> > recent lstat reduction effort addressed the issue assume-unchanged 
> > were invented to work around in the first place.
> 
> Well, if we assume that we don't need (don't want) to handle 
> assume-unchanged and sparse checkout independently, then of course the 
> idea of having separate or additional bit for sparse doesn't make sense.

For the shallow/graft issue, we had a similar discussion.  Back then, I 
was convinced that shallow commits and grafted commits were something 
fundamentally different, and my recent patch to pack-objects shows that: 
shallow commits do not have the real parents in the current repository, 
and that makes them different from other grafted commits.

Now, if you want to say that assume-unchanged and sparse are two 
fundamentally different things, I would be interested in some equally 
convincing argument as for the shallow/graft issue.

There is a fundamental difference, I grant you that: the working directory 
does not contain the "sparse'd away" files while the same is not true for 
assume-unchanged files.

But does that matter?  The corresponding files are still in the index and 
the repository.

IOW under what circumstances would you want to be able to discern between 
assume-unchanged and "sparse'd away" files in the working directory?

I could _imagine_ that you'd want a tool that allows you to change the 
focus of the sparse checkout together with the working directory.  
Example: you have a sparse checkout of Documentation/ and now you want to 
have t/, too.  Just changing .git/info/sparse will not be enough.

The question is if the tool to change the "sparseness" [*1*] should not 
change .git/info/sparse itself; if it does not, it would be good to be 
able to discern between the "assume-unchanged" and "sparse'd away" files.

Although it might be enough to traverse the index and check the presence 
of the assume-unchanged files in the working directory to determine which 
files are sparse, and which ones are merely assume-unchanged.

Ciao,
Dscho

Footnote [*1*]: I think we need some nice and clear nomenclature here.  
Any English wizards with a good taste of naming things?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-16  8:14                               ` Johannes Schindelin
@ 2009-08-17  9:08                                 ` Johannes Schindelin
  2009-08-17 12:49                                   ` Nguyen Thai Ngoc Duy
  2009-08-17 15:41                                   ` Junio C Hamano
  2009-08-17 16:01                                 ` Jakub Narebski
  1 sibling, 2 replies; 53+ messages in thread
From: Johannes Schindelin @ 2009-08-17  9:08 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Junio C Hamano, Nguyen Thai Ngoc Duy, git

Hi,

On Sun, 16 Aug 2009, Johannes Schindelin wrote:

> [...] if you want to say that assume-unchanged and sparse are two 
> fundamentally different things, I would be interested in some equally 
> convincing argument as for the shallow/graft issue.
> 
> There is a fundamental difference, I grant you that: the working 
> directory does not contain the "sparse'd away" files while the same is 
> not true for assume-unchanged files.
> 
> But does that matter?  The corresponding files are still in the index 
> and the repository.
> 
> IOW under what circumstances would you want to be able to discern 
> between assume-unchanged and "sparse'd away" files in the working 
> directory?
> 
> I could _imagine_ that you'd want a tool that allows you to change the 
> focus of the sparse checkout together with the working directory.  
> Example: you have a sparse checkout of Documentation/ and now you want 
> to have t/, too.  Just changing .git/info/sparse will not be enough.
> 
> The question is if the tool to change the "sparseness" [*1*] should not 
> change .git/info/sparse itself; if it does not, it would be good to be 
> able to discern between the "assume-unchanged" and "sparse'd away" 
> files.
> 
> Although it might be enough to traverse the index and check the presence 
> of the assume-unchanged files in the working directory to determine 
> which files are sparse, and which ones are merely assume-unchanged.
> 
> Ciao,
> Dscho
> 
> Footnote [*1*]: I think we need some nice and clear nomenclature here.  
> Any English wizards with a good taste of naming things?

Turns out that somebody on IRC had a problem that requires to have 
sparse'd out files which _do_ have working directory copies.

So just having the assume-changed bit may not be enough.

The scenario is this: the repository contains a file that users are 
supposed to change, but not commit to (only the super-intelligent inventor 
of this scenario is allowed to).  As this repository is originally a 
subversion one, there is no problem: people just do not switch branches.

But this guy uses git-svn, so he does switch branches, and to avoid 
committing the file by mistake, he marked it assume-unchanged.  Only that 
a branch switch overwrites the local changes.

I suggested the use of the sparse feature, and mark this file (and this 
file alone) as sparse'd-out.

Is this an intended usage scenario?  Then we cannot reuse the 
assume-changed bit [*1*].

Ciao,
Dscho

Footnote [*1*]: in this particular scenario, we could still discern 
between sparse'd-out and regular assume-unchanged file, because 
.git/info/sparse knows about the file.  But the design is now brittle, and 
it is not hard at all to come up with a situation where it breaks.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17  9:08                                 ` Johannes Schindelin
@ 2009-08-17 12:49                                   ` Nguyen Thai Ngoc Duy
  2009-08-17 13:35                                     ` Johannes Schindelin
  2009-08-17 15:41                                   ` Junio C Hamano
  1 sibling, 1 reply; 53+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2009-08-17 12:49 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jakub Narebski, Junio C Hamano, git

On Mon, Aug 17, 2009 at 4:08 PM, Johannes
Schindelin<Johannes.Schindelin@gmx.de> wrote:
> Turns out that somebody on IRC had a problem that requires to have
> sparse'd out files which _do_ have working directory copies.
>
> So just having the assume-changed bit may not be enough.
>
> The scenario is this: the repository contains a file that users are
> supposed to change, but not commit to (only the super-intelligent inventor
> of this scenario is allowed to).  As this repository is originally a
> subversion one, there is no problem: people just do not switch branches.
>
> But this guy uses git-svn, so he does switch branches, and to avoid
> committing the file by mistake, he marked it assume-unchanged.

Hmm.. never thought of this use before. If he does not want to commit
by mistake, should he add to-be-committed changes to index and do "git
commit" without "-a" (even better, do "git diff --cached" first)?

> Only that a branch switch overwrites the local changes.

I don't think branch switch overwrites changes in this case. Whenever
Git is to touch worktree files, it ignores assumed-unchanged bit and
does lstat() to make sure worktree files are up to date.

> I suggested the use of the sparse feature, and mark this file (and this
> file alone) as sparse'd-out.

Sparse checkout only removes a file if its assume-unchanged bit
changes from 0 to 1. If it's already 1, it does not care whether there
is a corresponding file in worktree. So something like this should
work:

git checkout my-branch
git update-index --assume-unchanged that-special-file
echo that-special-file > .git/info/sparse
# edit that-special-file
git commit -a
# do whatever you want, git pull/checkout/read-tree... won't touch
that-special-file because it's assume-unchanged already

Too subtle?

Anyway I would not recommend this. the versions of that-special-file
in worktree and and in index will diverse. When you unmark
assume-unchanged (be it sparse checkout or plain assume-unchanged),
you may have already forgot what changes you made to this file and
"git diff" would not help.

> Is this an intended usage scenario?  Then we cannot reuse the
> assume-changed bit [*1*].

It'd be great if people tell us all the scenarios they have. My use
could be too limited.
-- 
Duy

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17 12:49                                   ` Nguyen Thai Ngoc Duy
@ 2009-08-17 13:35                                     ` Johannes Schindelin
  2009-08-17 14:41                                       ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 53+ messages in thread
From: Johannes Schindelin @ 2009-08-17 13:35 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Jakub Narebski, Junio C Hamano, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3893 bytes --]

Hi,

On Mon, 17 Aug 2009, Nguyen Thai Ngoc Duy wrote:

> On Mon, Aug 17, 2009 at 4:08 PM, Johannes
> Schindelin<Johannes.Schindelin@gmx.de> wrote:
> > Turns out that somebody on IRC had a problem that requires to have 
> > sparse'd out files which _do_ have working directory copies.
> >
> > So just having the assume-changed bit may not be enough.
> >
> > The scenario is this: the repository contains a file that users are 
> > supposed to change, but not commit to (only the super-intelligent 
> > inventor of this scenario is allowed to).  As this repository is 
> > originally a subversion one, there is no problem: people just do not 
> > switch branches.
> >
> > But this guy uses git-svn, so he does switch branches, and to avoid 
> > committing the file by mistake, he marked it assume-unchanged.
> 
> Hmm.. never thought of this use before. If he does not want to commit by 
> mistake, should he add to-be-committed changes to index and do "git 
> commit" without "-a" (even better, do "git diff --cached" first)?

You probably agree that this would be a _very_ fragile setup.  Very easy 
to make mistakes.

But we try to get away from that, don't we?  Git had a reputation to be 
easy fsck up for long enough.

> > Only that a branch switch overwrites the local changes.
> 
> I don't think branch switch overwrites changes in this case. Whenever
> Git is to touch worktree files, it ignores assumed-unchanged bit and
> does lstat() to make sure worktree files are up to date.

Well, it does there, thankyouverymuch.

The problem of course is that the other branch has an ancient version of 
that file (which should _not_ overwrite the current, modified version!), 
i.e. "git diff HEAD..other -- file" does not come empty.

As 'file' is assume-unchanged, zinnnng, the file gets "updated".

> > I suggested the use of the sparse feature, and mark this file (and 
> > this file alone) as sparse'd-out.
> 
> Sparse checkout only removes a file if its assume-unchanged bit
> changes from 0 to 1.

The problem is not removing, but overwriting.

And in this respect, 'assume-unchanged' is a very different beast from 
'sparse'.  I am growing more and more convinced that you cannot just reuse 
the assume-unchanged bit.

> If it's already 1, it does not care whether there is a corresponding 
> file in worktree. So something like this should work:
> 
> git checkout my-branch
> git update-index --assume-unchanged that-special-file
> echo that-special-file > .git/info/sparse
> # edit that-special-file
> git commit -a
> # do whatever you want, git pull/checkout/read-tree... won't touch
> that-special-file because it's assume-unchanged already

... except if you changed .git/info/sparse and a formerly sparse'd-out 
file is overwritten by "pull".  Not good.

> Anyway I would not recommend this. the versions of that-special-file in 
> worktree and and in index will diverse. When you unmark assume-unchanged 
> (be it sparse checkout or plain assume-unchanged), you may have already 
> forgot what changes you made to this file and "git diff" would not help.

My point is that we should take the current implementation as Dictated By 
The Dear Lord, but change it if the limitation is too severe.

And I do contend that 'assume-unchanged' is dissimilar enough from 
'sparse' to merit a change.

> > Is this an intended usage scenario?  Then we cannot reuse the 
> > assume-changed bit [*1*].
> 
> It'd be great if people tell us all the scenarios they have. My use 
> could be too limited.

The use case I would have is where a collaborator wants to work only on 
one subdirectory and the top-level directory.  All other subdirectories 
are of no interest to him.

Another use case: documentation.  I do not have that use case yet, but I 
know about people who do.  Specifying what you _want_ to have checked out 
is much more straight-forward here than the opposite.

Ciao,
Dscho


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17 13:35                                     ` Johannes Schindelin
@ 2009-08-17 14:41                                       ` Nguyen Thai Ngoc Duy
  2009-08-17 15:19                                         ` Johannes Schindelin
  0 siblings, 1 reply; 53+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2009-08-17 14:41 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jakub Narebski, Junio C Hamano, git

On Mon, Aug 17, 2009 at 8:35 PM, Johannes
Schindelin<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Mon, 17 Aug 2009, Nguyen Thai Ngoc Duy wrote:
>
>> On Mon, Aug 17, 2009 at 4:08 PM, Johannes
>> Schindelin<Johannes.Schindelin@gmx.de> wrote:
>> > Turns out that somebody on IRC had a problem that requires to have
>> > sparse'd out files which _do_ have working directory copies.
>> >
>> > So just having the assume-changed bit may not be enough.
>> >
>> > The scenario is this: the repository contains a file that users are
>> > supposed to change, but not commit to (only the super-intelligent
>> > inventor of this scenario is allowed to).  As this repository is
>> > originally a subversion one, there is no problem: people just do not
>> > switch branches.
>> >
>> > But this guy uses git-svn, so he does switch branches, and to avoid
>> > committing the file by mistake, he marked it assume-unchanged.
>>
>> Hmm.. never thought of this use before. If he does not want to commit by
>> mistake, should he add to-be-committed changes to index and do "git
>> commit" without "-a" (even better, do "git diff --cached" first)?
>
> You probably agree that this would be a _very_ fragile setup.  Very easy
> to make mistakes.
>
> But we try to get away from that, don't we?  Git had a reputation to be
> easy fsck up for long enough.

Well.. of course I don't want Git to keep that reputation :-)

>> > Only that a branch switch overwrites the local changes.
>>
>> I don't think branch switch overwrites changes in this case. Whenever
>> Git is to touch worktree files, it ignores assumed-unchanged bit and
>> does lstat() to make sure worktree files are up to date.
>
> Well, it does there, thankyouverymuch.
>
> The problem of course is that the other branch has an ancient version of
> that file (which should _not_ overwrite the current, modified version!),
> i.e. "git diff HEAD..other -- file" does not come empty.
>
> As 'file' is assume-unchanged, zinnnng, the file gets "updated".

Then it is a bug. Assume-unchanged as in reading is good.
Assume-unchanged in writing sounds scary. Something like this should
fix it (not well tested though). It's on top of my series, but you can
adapt it to 'next' or 'master' easily.

diff --git a/unpack-trees.c b/unpack-trees.c
index eb47676..7b9ddf6 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -538,7 +538,9 @@ static int verify_uptodate_1(struct cache_entry *ce,
 {
 	struct stat st;

-	if (o->index_only || o->reset || ce_uptodate(ce))
+	if (o->index_only || o->reset ||
+	    /* we are going to update worktree, don't trust ce_uptodate if
it is CE_VALID'd */
+	    (!(ce->ce_flags & CE_VALID) && ce_uptodate(ce)))
 		return 0;

 	if (!lstat(ce->name, &st)) {


>> > I suggested the use of the sparse feature, and mark this file (and
>> > this file alone) as sparse'd-out.
>>
>> Sparse checkout only removes a file if its assume-unchanged bit
>> changes from 0 to 1.
>
> The problem is not removing, but overwriting.
>
> And in this respect, 'assume-unchanged' is a very different beast from
> 'sparse'.  I am growing more and more convinced that you cannot just reuse
> the assume-unchanged bit.

And assume-unchanged bit could get lost during index merging, which
may cause unexpected effect if sparse checkout bases off
assume-unchanged. Let me think more of it tonight.

>> If it's already 1, it does not care whether there is a corresponding
>> file in worktree. So something like this should work:
>>
>> git checkout my-branch
>> git update-index --assume-unchanged that-special-file
>> echo that-special-file > .git/info/sparse
>> # edit that-special-file
>> git commit -a
>> # do whatever you want, git pull/checkout/read-tree... won't touch
>> that-special-file because it's assume-unchanged already
>
> ... except if you changed .git/info/sparse and a formerly sparse'd-out
> file is overwritten by "pull".  Not good.

Again, I think it's a bug.

>> > Is this an intended usage scenario?  Then we cannot reuse the
>> > assume-changed bit [*1*].
>>
>> It'd be great if people tell us all the scenarios they have. My use
>> could be too limited.
>
> The use case I would have is where a collaborator wants to work only on
> one subdirectory and the top-level directory.  All other subdirectories
> are of no interest to him.
>
> Another use case: documentation.  I do not have that use case yet, but I
> know about people who do.

Translators usually checkout one or two files (I am Vietnamese
Translation Coordinator of GNOME, but well... I check them all out. I
suppose "normal" translators would not want to do like I do.)

>  Specifying what you _want_ to have checked out
> is much more straight-forward here than the opposite.

I think it depends on type of projects. For documentation projects,
you may want a few files. For software projects, usually you need
everything _except_ a few big directories. For WebKit, it's a bunch of
test data that I don't care about. Firmware in hardware-related
projects or media files in game projects fall in the same category. I
don't have strong opinion on this. Either include or exclude is fine
to me.
-- 
Duy

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17 14:41                                       ` Nguyen Thai Ngoc Duy
@ 2009-08-17 15:19                                         ` Johannes Schindelin
  2009-08-17 16:13                                           ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 53+ messages in thread
From: Johannes Schindelin @ 2009-08-17 15:19 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Jakub Narebski, Junio C Hamano, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2127 bytes --]

Hi,

On Mon, 17 Aug 2009, Nguyen Thai Ngoc Duy wrote:

> On Mon, Aug 17, 2009 at 8:35 PM, Johannes
> Schindelin<Johannes.Schindelin@gmx.de> wrote:
>
> > The problem of course is that the other branch has an ancient version 
> > of that file (which should _not_ overwrite the current, modified 
> > version!), i.e. "git diff HEAD..other -- file" does not come empty.
> >
> > As 'file' is assume-unchanged, zinnnng, the file gets "updated".
> 
> Then it is a bug. Assume-unchanged as in reading is good.
> Assume-unchanged in writing sounds scary. Something like this should
> fix it (not well tested though). It's on top of my series, but you can
> adapt it to 'next' or 'master' easily.

No.

The purpose of 'assume-unchanged' is to tell Git that it has no business 
checking that the file is unchanged.  It should _assume_ that it is 
unchanged.  That's what this flag says.

So do you agree that assume-changed is not quite similar enough to sparse 
to use the same bit?

> > Another use case: documentation.  I do not have that use case yet, but 
> > I know about people who do.
> 
> Translators usually checkout one or two files (I am Vietnamese 
> Translation Coordinator of GNOME, but well... I check them all out. I 
> suppose "normal" translators would not want to do like I do.)

Exactly.

echo /Documentation/ > .git/info/sparse

Remember: the documentation contributors are the least programming-savvy 
contributors of any project.

> >  Specifying what you _want_ to have checked out is much more 
> > straight-forward here than the opposite.
> 
> I think it depends on type of projects. For documentation projects, you 
> may want a few files. For software projects, usually you need everything 
> _except_ a few big directories. For WebKit, it's a bunch of test data 
> that I don't care about. Firmware in hardware-related projects or media 
> files in game projects fall in the same category. I don't have strong 
> opinion on this. Either include or exclude is fine to me.

Okay, let me just ask: if you have a sparse checkout, what would you think 
I mean when I talk about the "sparse files"?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17  9:08                                 ` Johannes Schindelin
  2009-08-17 12:49                                   ` Nguyen Thai Ngoc Duy
@ 2009-08-17 15:41                                   ` Junio C Hamano
  2009-08-17 16:06                                     ` Nguyen Thai Ngoc Duy
                                                       ` (2 more replies)
  1 sibling, 3 replies; 53+ messages in thread
From: Junio C Hamano @ 2009-08-17 15:41 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jakub Narebski, Nguyen Thai Ngoc Duy, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> The scenario is this: the repository contains a file that users are 
> supposed to change, but not commit to (only the super-intelligent inventor 
> of this scenario is allowed to).  As this repository is originally a 
> subversion one, there is no problem: people just do not switch branches.
>
> But this guy uses git-svn, so he does switch branches, and to avoid 
> committing the file by mistake, he marked it assume-unchanged.  Only that 
> a branch switch overwrites the local changes.

If it is a problem that a branch switch overwrites the local changes in
assume-unchanged file, perhaps that is what this person needs to change?

Let's step back a bit and think.

Local changes in git do not belong to any particular branch.  They belong
to the work tree and the index.  Hence you (1) can switch from branch A to
branch B iff the branches do not have difference in the path with local
changes, and (2) have to stash save, switch branches and then stash pop if
you have local changes to paths that are different between branches you
are switching between.

How should assume-unchanged play with this philosophy?

I'd say that assume-unchanged is a promise you make git that you won't
change these paths, and in return to the promise git will give you faster
response by not running lstat on them.  Having changes in such paths is
your problem and you deserve these chanegs to be lost.  At least, that is
the interpretation according to the original assume-unchanged semantics.

If some paths should not be committed, I'd say it should be handled by a
pre commit hook, and not assume-unchanged.

Is checking with "diff --cached" on the paths and either erroring out (or
better yet resetting the problematic paths in the index) an option?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-16  8:14                               ` Johannes Schindelin
  2009-08-17  9:08                                 ` Johannes Schindelin
@ 2009-08-17 16:01                                 ` Jakub Narebski
  1 sibling, 0 replies; 53+ messages in thread
From: Jakub Narebski @ 2009-08-17 16:01 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Nguyen Thai Ngoc Duy, git

On Sun, 16 Aug 2009, Johannes Schindelin wrote:
> On Sun, 16 Aug 2009, Jakub Narebski wrote:
>> On Sat, 15 Aug 2009, Junio C Hamano wrote:
>>> Jakub Narebski <jnareb@gmail.com> writes:
>>> 
>>>>>> Hmmm... this looks like either argument for introducing --full 
>>>>>> option to git-checkout (ignore CE_VALID bit, checkout everything, 
>>>>>> and clean CE_VALID (?))...
>>>>>>
>>>>>>  ...or for going with _separate_ bit for partial checkout, like in 
>>>>>>  the very first version of this series, which otherwise functions 
>>>>>>  like CE_VALID, or is just used to mark that CE_VALID was set using 
>>>>>>  sparse.
>>> 
>>> How would a separate bit help?  Just like you need to clear CE_VALID 
>>> bit to revert the index into a normal (or "non sparse") state somehow, 
>>> you would need to have a way to clear that separate bit anyway.
>>> 
>>> A separate bit would help only if you want to handle assume-unchanged 
>>> and sparse checkout independently. But my impression was that the 
>>> recent lstat reduction effort addressed the issue assume-unchanged 
>>> were invented to work around in the first place.
>> 
>> Well, if we assume that we don't need (don't want) to handle 
>> assume-unchanged and sparse checkout independently, then of course the 
>> idea of having separate or additional bit for sparse doesn't make sense.
> 
> For the shallow/graft issue, we had a similar discussion.  Back then, I 
> was convinced that shallow commits and grafted commits were something 
> fundamentally different, and my recent patch to pack-objects shows that: 
> shallow commits do not have the real parents in the current repository, 
> and that makes them different from other grafted commits.
> 
> Now, if you want to say that assume-unchanged and sparse are two 
> fundamentally different things, I would be interested in some equally 
> convincing argument as for the shallow/graft issue.
> 
> There is a fundamental difference, I grant you that: the working directory 
> does not contain the "sparse'd away" files while the same is not true for 
> assume-unchanged files.
> 
> But does that matter?  The corresponding files are still in the index and 
> the repository.
> 
> IOW under what circumstances would you want to be able to discern between 
> assume-unchanged and "sparse'd away" files in the working directory?

>From what I understand it, assume-unchanged is performance optimization.
Sparse checkout is about files which are (assumed to) not be in working
directory, which means that they have to be assume-unchanged for git to
not try to access working area version of files which aren't there.

$GIT_DIR/info/sparse (or how it would be named; the name 'sparse' 
doesn't tell us whether patterns are about the files that are checked
out, or are about files which are not present in working directory)
is about specifying which files to checkout with "git checkout --sparse"
(or core.sparse / checkout.sparse = true).

> I could _imagine_ that you'd want a tool that allows you to change the 
> focus of the sparse checkout together with the working directory.  
> Example: you have a sparse checkout of Documentation/ and now you want to 
> have t/, too.  Just changing .git/info/sparse will not be enough.
> 
> The question is if the tool to change the "sparseness" [*1*] should not 
> change .git/info/sparse itself; if it does not, it would be good to be 
> able to discern between the "assume-unchanged" and "sparse'd away" files.
> 
> Although it might be enough to traverse the index and check the presence 
> of the assume-unchanged files in the working directory to determine which 
> files are sparse, and which ones are merely assume-unchanged.

There are quite a few possibilities: file can be marked "sparse" in
index (which also implies also marking it "assume-unchanged", if 
"assume-unchanged" doesn't work alone as "sparse" index bit) or not,
file can match 'no-checkout' pattern in $GIT_DIR/info/sparse or not,
file can be present in working directory or not:

 * match no-checkout
   - assume-unchanged
     + present in working directory
     + absent from working directory
   - no assume-unchanged
     + present
     + absent
 * doesn't match no-checkout
   - assume-unchanged
     + present
     + absent
   - no assume-unchanged
     + present
     + absent

> Footnote [*1*]: I think we need some nice and clear nomenclature here.  
> Any English wizards with a good taste of naming things?
 
English is not my native language, but what about:

 $GIT_DIR/info/
    no-checkout
    exclude-checkout
    workdir-exclude
    ignore-change
    assume-unchanged
    ghosts
  
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17 15:41                                   ` Junio C Hamano
@ 2009-08-17 16:06                                     ` Nguyen Thai Ngoc Duy
  2009-08-17 16:19                                     ` Johannes Schindelin
  2009-08-17 16:46                                     ` Junio C Hamano
  2 siblings, 0 replies; 53+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2009-08-17 16:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, Jakub Narebski, git

On Mon, Aug 17, 2009 at 10:41 PM, Junio C Hamano<gitster@pobox.com> wrote:
> How should assume-unchanged play with this philosophy?
>
> I'd say that assume-unchanged is a promise you make git that you won't
> change these paths, and in return to the promise git will give you faster
> response by not running lstat on them.  Having changes in such paths is
> your problem and you deserve these chanegs to be lost.  At least, that is
> the interpretation according to the original assume-unchanged semantics.

But commit 5f73076 ("Assume unchanged" git) says [1] it favors safety
over performance? Otherwise I'd need to resurrect no-checkout bit.

[1] excerpt from the mentioned commit:
--<--
    Index entries marked with CE_VALID bit are assumed to be
    unchanged most of the time.  However, there are cases that
    CE_VALID bit is ignored for the sake of safety and usability:

     - while "git-read-tree -m" or git-apply need to make sure
       that the paths involved in the merge do not have local
       modifications.  This sacrifices performance for safety.

     - when git-checkout-index -f -q -u -a tries to see if it needs
       to checkout the paths.  Otherwise you can never check
       anything out ;-).

     - when git-update-index --really-refresh (a new flag) tries to
       see if the index entry is up to date.  You can start with
       everything marked as CE_VALID and run this once to drop
       CE_VALID bit for paths that are modified.
--<--
-- 
Duy

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17 15:19                                         ` Johannes Schindelin
@ 2009-08-17 16:13                                           ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 53+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2009-08-17 16:13 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jakub Narebski, Junio C Hamano, git

On Mon, Aug 17, 2009 at 10:19 PM, Johannes
Schindelin<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Mon, 17 Aug 2009, Nguyen Thai Ngoc Duy wrote:
>
>> On Mon, Aug 17, 2009 at 8:35 PM, Johannes
>> Schindelin<Johannes.Schindelin@gmx.de> wrote:
>>
>> > The problem of course is that the other branch has an ancient version
>> > of that file (which should _not_ overwrite the current, modified
>> > version!), i.e. "git diff HEAD..other -- file" does not come empty.
>> >
>> > As 'file' is assume-unchanged, zinnnng, the file gets "updated".
>>
>> Then it is a bug. Assume-unchanged as in reading is good.
>> Assume-unchanged in writing sounds scary. Something like this should
>> fix it (not well tested though). It's on top of my series, but you can
>> adapt it to 'next' or 'master' easily.
>
> No.
>
> The purpose of 'assume-unchanged' is to tell Git that it has no business
> checking that the file is unchanged.  It should _assume_ that it is
> unchanged.  That's what this flag says.
>
> So do you agree that assume-changed is not quite similar enough to sparse
> to use the same bit?

If you define it that way, yes I agree.

>> > Another use case: documentation.  I do not have that use case yet, but
>> > I know about people who do.
>>
>> Translators usually checkout one or two files (I am Vietnamese
>> Translation Coordinator of GNOME, but well... I check them all out. I
>> suppose "normal" translators would not want to do like I do.)
>
> Exactly.
>
> echo /Documentation/ > .git/info/sparse
>
> Remember: the documentation contributors are the least programming-savvy
> contributors of any project.

[wanted to make a joke here, but it seemed destructive, snipped]

>> >  Specifying what you _want_ to have checked out is much more
>> > straight-forward here than the opposite.
>>
>> I think it depends on type of projects. For documentation projects, you
>> may want a few files. For software projects, usually you need everything
>> _except_ a few big directories. For WebKit, it's a bunch of test data
>> that I don't care about. Firmware in hardware-related projects or media
>> files in game projects fall in the same category. I don't have strong
>> opinion on this. Either include or exclude is fine to me.
>
> Okay, let me just ask: if you have a sparse checkout, what would you think
> I mean when I talk about the "sparse files"?

If I have to answer in 2 seconds, "sparse files" are files in working
directory. If I have more time, I tend to think that in "sparse
<something>", something should be a container, an area, therefore
"sparse files" do not make sense to me while "sparse
checkout/worktree" does. So, .git/info/sparse-checkout (with "in"
patterns)?
-- 
Duy

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17 15:41                                   ` Junio C Hamano
  2009-08-17 16:06                                     ` Nguyen Thai Ngoc Duy
@ 2009-08-17 16:19                                     ` Johannes Schindelin
  2009-08-17 18:39                                       ` Junio C Hamano
  2009-08-17 16:46                                     ` Junio C Hamano
  2 siblings, 1 reply; 53+ messages in thread
From: Johannes Schindelin @ 2009-08-17 16:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jakub Narebski, Nguyen Thai Ngoc Duy, git

Hi,

On Mon, 17 Aug 2009, Junio C Hamano wrote:

> I'd say that assume-unchanged is a promise you make git that you won't 
> change these paths, and in return to the promise git will give you 
> faster response by not running lstat on them.  Having changes in such 
> paths is your problem and you deserve these chanegs to be lost.  At 
> least, that is the interpretation according to the original 
> assume-unchanged semantics.

That's why I did not suggest using assume-unchanged (which the guy did 
previously, and was burnt, deservedly, as you say).

However, my illustration of the scenario was only to one end, namely to 
convince all of you that assume-changed != sparse.

And maybe to the end to explain that sparse checkout could help this guy.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17 15:41                                   ` Junio C Hamano
  2009-08-17 16:06                                     ` Nguyen Thai Ngoc Duy
  2009-08-17 16:19                                     ` Johannes Schindelin
@ 2009-08-17 16:46                                     ` Junio C Hamano
  2009-08-17 21:45                                       ` Johannes Schindelin
  2 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2009-08-17 16:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Jakub Narebski, Nguyen Thai Ngoc Duy, git

Junio C Hamano <gitster@pobox.com> writes:

> Local changes in git do not belong to any particular branch.  They belong
> to the work tree and the index.  Hence you (1) can switch from branch A to
> branch B iff the branches do not have difference in the path with local
> changes, and (2) have to stash save, switch branches and then stash pop if
> you have local changes to paths that are different between branches you
> are switching between.
>
> How should assume-unchanged play with this philosophy?
>
> I'd say that assume-unchanged is a promise you make git that you won't
> change these paths, and in return to the promise git will give you faster
> response by not running lstat on them.  Having changes in such paths is
> your problem and you deserve these chanegs to be lost.  At least, that is
> the interpretation according to the original assume-unchanged semantics.

Having said that, we could (re)define assume-unchanged to mean "I may or
may not have changes to these paths, but I do not mean to commit them, so
do not show them as modified when I ask you for diff.  But the changes are
precious nevertheless".

I think the writeout codepath pays attention to assume-unchanged bit
already for that reason (CE_MATCH_IGNORE_VALID is all about this issue).

So with that, how should assume-unchanged play with the "local changes
belong to the index and the work tree"?

 - When adding to the index, the changes should be ignored;

 - When checking out of the index?  I.e. the user tells "git checkout
   path" when path is marked as assume-unchanged.  Such an explicit
   request should probably lose the local changes in the work tree.

 - When checking out of a commit?  The same deal.

 - When switching branches?

   - If the branches do not touch assume-unchanged paths, we should keep
     changes _and_ assume-unchanged bit.  I do not know if that is what
     the current code does.

   - If the branches do touch assume-unchanged paths, what should happen?
     We shouldn't blindly overwrite the local changes, so at least we
     should change the code to error out if we do not already do so.  But
     then what?  How does the user deal with this?  Perhaps...

     - Drop assume-unchanged temporarily;
     - Stash save;
     - Switch;
     - Stash pop;
     - Add assume-unchanged again.

     ???

Is such an updated (or "corrected") assume-unchanged any different from a
sparse checkout?  After all, paths that are not to be checked out in a
sparse checkout are "pretend that the lack of these paths are illusion--they
are logically there.  I do not intend to commit their removal, and I do not
want to lose the sparseness across branch switch".

There is one nit about this.  If a path is outside the checkout area,
should it unconditionally stay outside the checkout area when you switch
branches?  I may be interested in not checking out Documentation/
subdirectory and that may hold true for all _my_ branches, and it is a
sane thing not to complain "Oops, you actually removed Makefile in
Documentation/ in your work tree in reality, and you are switching to
another branch that has a different Makefile --- it is a delete-modify
conflict you need to resolve, and we won't let you switch branches" in
such a case.

But is that generally true in all "sparse checkout" settings?

It is unfortunate that this message raises more questions than it answers,
but I think a sparse checkout will have to answer them, whether it uses a
bit separate from assume-unchanged or it reuses the assume-unchanged bit.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17 16:19                                     ` Johannes Schindelin
@ 2009-08-17 18:39                                       ` Junio C Hamano
  2009-08-17 22:02                                         ` Johannes Schindelin
  0 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2009-08-17 18:39 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Jakub Narebski, Nguyen Thai Ngoc Duy, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> However, my illustration of the scenario was only to one end, namely to 
> convince all of you that assume-changed != sparse.
>
> And maybe to the end to explain that sparse checkout could help this guy.

How?  If sparse is _not to check it out_, then that is not what the person
is doing either.  It feels to me that you are suggesting an inappropriate
hack to replace another inappropriate hack, suggesting to use a hacksaw
because an earlier attempt to use a hammer did not quite work to drive the
screw in.

I never said assume-unchanged _is_ sparse.  You cannot mark an index entry
that does not exist, obviously you need more (either the earlier "hook
that tells what should/shouldn't exist", or "the pattern").

But I think the work-tree semantics you need to _implement_ sparse matches
what you would want from assume-unchanged.  Not the original, draconian
one that updates the work tree by saying "you promised me you wouldn't
change them", but the updated one that tells git to pretend that the local
change is not there but still keep the local modification, including
deletion.  The work-tree "local changes" sparse makes is a small subset of
possible local changes assume-unchanged would need to support.  It only
deletes work tree files.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17 16:46                                     ` Junio C Hamano
@ 2009-08-17 21:45                                       ` Johannes Schindelin
  0 siblings, 0 replies; 53+ messages in thread
From: Johannes Schindelin @ 2009-08-17 21:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jakub Narebski, Nguyen Thai Ngoc Duy, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5107 bytes --]

Hi,

On Mon, 17 Aug 2009, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Local changes in git do not belong to any particular branch.  They 
> > belong to the work tree and the index.  Hence you (1) can switch from 
> > branch A to branch B iff the branches do not have difference in the 
> > path with local changes, and (2) have to stash save, switch branches 
> > and then stash pop if you have local changes to paths that are 
> > different between branches you are switching between.
> >
> > How should assume-unchanged play with this philosophy?
> >
> > I'd say that assume-unchanged is a promise you make git that you won't 
> > change these paths, and in return to the promise git will give you 
> > faster response by not running lstat on them.  Having changes in such 
> > paths is your problem and you deserve these chanegs to be lost.  At 
> > least, that is the interpretation according to the original 
> > assume-unchanged semantics.
> 
> Having said that, we could (re)define assume-unchanged to mean "I may or
> may not have changes to these paths, but I do not mean to commit them, so
> do not show them as modified when I ask you for diff.  But the changes are
> precious nevertheless"

I am hesitant.  The feature was introduced because of some report that Git 
was too slow.  While the speed has increased dramatically (the report was 
for Windows), we cannot work miracles: the file system layer is just not 
cooperative.

So I could imagine that redefining the meaning of assume-unchanged results 
in a substantially longer runtime again, which some people (yours truly) 
might interpret as regression.

> I think the writeout codepath pays attention to assume-unchanged bit 
> already for that reason (CE_MATCH_IGNORE_VALID is all about this issue).

If I were that reporter, I would not be happy, I guess.  Basically, when 
new files come in and I marked all the files as "assume unchanged; I know 
what I'm doing!" Git would tell me "no you're an idiot, dummy, I know 
better, and I will check all over again!".

> So with that, how should assume-unchanged play with the "local changes 
> belong to the index and the work tree"?
> 
>  - When adding to the index, the changes should be ignored;
> 
>  - When checking out of the index?  I.e. the user tells "git checkout
>    path" when path is marked as assume-unchanged.  Such an explicit
>    request should probably lose the local changes in the work tree.
> 
>  - When checking out of a commit?  The same deal.
> 
>  - When switching branches?
> 
>    - If the branches do not touch assume-unchanged paths, we should keep
>      changes _and_ assume-unchanged bit.  I do not know if that is what
>      the current code does.
> 
>    - If the branches do touch assume-unchanged paths, what should happen?
>      We shouldn't blindly overwrite the local changes, so at least we
>      should change the code to error out if we do not already do so.  But
>      then what?  How does the user deal with this?  Perhaps...
> 
>      - Drop assume-unchanged temporarily;
>      - Stash save;
>      - Switch;
>      - Stash pop;
>      - Add assume-unchanged again.
> 
>      ???

In my book all this is overly complicated.  If I tell Git to assume a file 
is unchanged, it is not Git's business to question me.

> Is such an updated (or "corrected") assume-unchanged any different from a
> sparse checkout?  After all, paths that are not to be checked out in a
> sparse checkout are "pretend that the lack of these paths are illusion--they
> are logically there.  I do not intend to commit their removal, and I do not
> want to lose the sparseness across branch switch".
> 
> There is one nit about this.  If a path is outside the checkout area,
> should it unconditionally stay outside the checkout area when you switch
> branches?  I may be interested in not checking out Documentation/
> subdirectory and that may hold true for all _my_ branches, and it is a
> sane thing not to complain "Oops, you actually removed Makefile in
> Documentation/ in your work tree in reality, and you are switching to
> another branch that has a different Makefile --- it is a delete-modify
> conflict you need to resolve, and we won't let you switch branches" in
> such a case.
> 
> But is that generally true in all "sparse checkout" settings?
> 
> It is unfortunate that this message raises more questions than it answers,
> but I think a sparse checkout will have to answer them, whether it uses a
> bit separate from assume-unchanged or it reuses the assume-unchanged bit.

I think you will come around and agree that the original, very simple 
therefore powerful, concept of "assume-unchanged" should be, well, 
unchanged, and not be bent to half-fit the original intention and half-fit 
the sparse intention.

Rather, I agree with Nguyễn that the no-checkout bit (which is definitely 
free to behave differently from assume-unchanged) is needed.

Maybe I contradict myself here with what I said after the third iteration 
of the sparse checkout series, but that only proves that I am able to 
learn.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17 18:39                                       ` Junio C Hamano
@ 2009-08-17 22:02                                         ` Johannes Schindelin
  2009-08-17 23:02                                           ` skillzero
  0 siblings, 1 reply; 53+ messages in thread
From: Johannes Schindelin @ 2009-08-17 22:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jakub Narebski, Nguyen Thai Ngoc Duy, git

Hi,

On Mon, 17 Aug 2009, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > However, my illustration of the scenario was only to one end, namely 
> > to convince all of you that assume-changed != sparse.
> >
> > And maybe to the end to explain that sparse checkout could help this 
> > guy.
> 
> How?  If sparse is _not to check it out_, then that is not what the 
> person is doing either.  It feels to me that you are suggesting an 
> inappropriate hack to replace another inappropriate hack, suggesting to 
> use a hacksaw because an earlier attempt to use a hammer did not quite 
> work to drive the screw in.

Not exactly.

What does "sparse checkout" mean, really?  It means that Git should only 
check out a part of the tracked files, and not even so much as look 
outside.  It means to me that everything outside of that focus is clearly 
to be handled as all the other untracked data.

And here comes the problem: if something is treated untracked because it 
was outside of the sparse checkout, then I want it to be treated as 
untracked _even if_ I happened to broaden the checkout by editing 
.git/info/sparse.  The file did not just magically become subject to 
overwriting just because I edited .git/info/sparse (which could be a 
simple mistake).  So the index _needs_ to know that the sparse'd-out 
attribute is something completely different from the assume-unchanged 
attribute, even if Git should _handle_ the files with those attributes 
pretty similar _most_ of the time.

> I never said assume-unchanged _is_ sparse.  You cannot mark an index 
> entry that does not exist, obviously you need more (either the earlier 
> "hook that tells what should/shouldn't exist", or "the pattern").

Right.

> But I think the work-tree semantics you need to _implement_ sparse 
> matches what you would want from assume-unchanged.  Not the original, 
> draconian one that updates the work tree by saying "you promised me you 
> wouldn't change them", but the updated one that tells git to pretend 
> that the local change is not there but still keep the local 
> modification, including deletion.  The work-tree "local changes" sparse 
> makes is a small subset of possible local changes assume-unchanged would 
> need to support.  It only deletes work tree files.

As I tried to convince you already, it is not wise to mix up the two 
meanings.  They _are_ different: in one case, we _have_ a file, and we 
even _expect_ the file to actually have the same contents as what is 
recorded in the index.  In the other case, we do _not_ have a file, so we 
do _not_ even expect the file to have the same contents.

In fact, in the latter case (the sparse case) we do not want to look for 
the file; not for the reason that we expect the contents to be the same 
anyway, but because we expect it not even to be there!

So while the _technical_ side is pretty much the same (most of the time, I 
illustrated a corner case, it it is very easy to think of other corner 
cases that might even be inadvertent, all the more reason to protect the 
user) -- don't look for the file -- the _semantics_ are _very_ different.

And you see that they are different when all of a sudden you cannot take 
the _absence_ of the file as the indicator for "assume-unchanged" and 
"sparse".

In fact, with the semantics implied by the label 'assume-unchanged', it 
could well be argued that making the file _absent_ (for the sparse 
checkout) is a dirty trick.  This is not what "assume that the file is 
unchanged" implies at all.

So let's just keep the semantics utterly simple and stupid, and have an

- assumed-unchanged bit, which assumes that a file is there, but that the 
  contents need not to be checked for performance reasons, and

- a no-checkout bit, which assumes that the user never checked out that 
  file (if it exists, it comes from somewhere else, and needs to be 
  protected like untracked files that would be overwritten by a branch 
  switch).

I hope this explanation was clear.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17 22:02                                         ` Johannes Schindelin
@ 2009-08-17 23:02                                           ` skillzero
  2009-08-17 23:16                                             ` Johannes Schindelin
  0 siblings, 1 reply; 53+ messages in thread
From: skillzero @ 2009-08-17 23:02 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Jakub Narebski, Nguyen Thai Ngoc Duy, git

On Mon, Aug 17, 2009 at 3:02 PM, Johannes
Schindelin<Johannes.Schindelin@gmx.de> wrote:

> And here comes the problem: if something is treated untracked because it
> was outside of the sparse checkout, then I want it to be treated as
> untracked _even if_ I happened to broaden the checkout by editing
> .git/info/sparse.  The file did not just magically become subject to
> overwriting just because I edited .git/info/sparse (which could be a
> simple mistake).

Maybe I'm misunderstanding what you're saying, but why would you want
a file that's become part of the checkout by editing .git/info/sparse
to still be treated as untracked?

If I have a file on that's excluded via .git/info/sparse then I edit
.git/info/sparse to include it and switch to a branch that doesn't
have that file, I'd expect that file to be deleted from the working
copy if the content matches what's in the repository. If it's modified
then I'd expect the branch switch to fail (like it would without a
sparse checkout).

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17 23:02                                           ` skillzero
@ 2009-08-17 23:16                                             ` Johannes Schindelin
  2009-08-18  0:17                                               ` Jakub Narebski
  2009-08-18  0:23                                               ` skillzero
  0 siblings, 2 replies; 53+ messages in thread
From: Johannes Schindelin @ 2009-08-17 23:16 UTC (permalink / raw)
  To: skillzero; +Cc: Junio C Hamano, Jakub Narebski, Nguyen Thai Ngoc Duy, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2023 bytes --]

Hi,

On Mon, 17 Aug 2009, skillzero@gmail.com wrote:

> On Mon, Aug 17, 2009 at 3:02 PM, Johannes
> Schindelin<Johannes.Schindelin@gmx.de> wrote:
> 
> > And here comes the problem: if something is treated untracked because 
> > it was outside of the sparse checkout, then I want it to be treated as 
> > untracked _even if_ I happened to broaden the checkout by editing 
> > .git/info/sparse.  The file did not just magically become subject to 
> > overwriting just because I edited .git/info/sparse (which could be a 
> > simple mistake).
> 
> Maybe I'm misunderstanding what you're saying, but why would you want a 
> file that's become part of the checkout by editing .git/info/sparse to 
> still be treated as untracked?
> 
> If I have a file on that's excluded via .git/info/sparse then I edit 
> .git/info/sparse to include it and switch to a branch that doesn't have 
> that file, I'd expect that file to be deleted from the working copy if 
> the content matches what's in the repository. If it's modified then I'd 
> expect the branch switch to fail (like it would without a sparse 
> checkout).

First things first: with sparse checkout, you should not check out 
_anything_ outside of the focus of the sparse checkout.

So I contend that you would only end up with a sparse'd-out file 
that was formerly tracked if you did something wrong.  That should not 
happen.

Even if: all the more reason to have a flag that indicated that this file 
is not sparsed'd-out -- contradicting .git/info/sparse.

The thing is: we need a way to determine quickly and without any 
ambiguity whether a file is tracked, assumed unchanged, or sparse'd-out 
(which Nguyễn calls no-checkout).

And if we change .git/info/sparse, that state _must not_ change.  We did 
not touch the file by editing .git/info/sparse, so the state must be 
unchanged.

Whether "git checkout" should realize that a checked out file (which has 
no changes, mind you!) needs to be deleted and marked no-checkout is a 
different question.

Ciao,
Dscho


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17 23:16                                             ` Johannes Schindelin
@ 2009-08-18  0:17                                               ` Jakub Narebski
  2009-08-18  0:34                                                 ` skillzero
  2009-08-18  0:49                                                 ` [RFC PATCH v3 8/8] --sparse for porcelains Jakub Narebski
  2009-08-18  0:23                                               ` skillzero
  1 sibling, 2 replies; 53+ messages in thread
From: Jakub Narebski @ 2009-08-18  0:17 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: skillzero, Junio C Hamano, Nguyen Thai Ngoc Duy, git

Johannes Schindelin wrote:

> The thing is: we need a way to determine quickly and without any 
> ambiguity whether a file is tracked, assumed unchanged, or sparse'd-out 
> (which Nguyễn calls no-checkout).

Let's reiterate: "assume-unchanged" is about telling git that it should
assume for performance reasons that state of file in working directory
is the same as state of file in the index.  But, from what was said in
this thread, there are situations where git for correctness reasons
ignores performance hack.

"no-checkout" bit is about telling git that the file is not present
in working directory, and it has to use version from the index.  Then
there is a question if there is file in working area (e.g. from applying
patch) which corresponds to a "no-checkout" file in index (corresponds
because of rename detection).

> And if we change .git/info/sparse, that state _must not_ change.  We did 
> not touch the file by editing .git/info/sparse, so the state must be 
> unchanged.

I think this situation (and the issue of correctness vs "assume-unchanged"
mentioned above) hints that "no-checkout" and "assume-unchanged" should
be separate bits, even if both tell git to use version from index.

There is e.g. question if "git grep" should search "no-checkout" files;
in the "assume-unchanged" case it should, I think, search index version.


P.S. I wonder if it would be worth resurrecting series adding support
for directories in index (which can help performance and 'empty 
directories' issue)...  It would help, I think, with sparse checkout.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-17 23:16                                             ` Johannes Schindelin
  2009-08-18  0:17                                               ` Jakub Narebski
@ 2009-08-18  0:23                                               ` skillzero
  1 sibling, 0 replies; 53+ messages in thread
From: skillzero @ 2009-08-18  0:23 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Jakub Narebski, Nguyen Thai Ngoc Duy, git

On Mon, Aug 17, 2009 at 4:16 PM, Johannes
Schindelin<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Mon, 17 Aug 2009, skillzero@gmail.com wrote:
>
>> On Mon, Aug 17, 2009 at 3:02 PM, Johannes
>> Schindelin<Johannes.Schindelin@gmx.de> wrote:
>>
>> > And here comes the problem: if something is treated untracked because
>> > it was outside of the sparse checkout, then I want it to be treated as
>> > untracked _even if_ I happened to broaden the checkout by editing
>> > .git/info/sparse.  The file did not just magically become subject to
>> > overwriting just because I edited .git/info/sparse (which could be a
>> > simple mistake).
>>
>> Maybe I'm misunderstanding what you're saying, but why would you want a
>> file that's become part of the checkout by editing .git/info/sparse to
>> still be treated as untracked?
>>
>> If I have a file on that's excluded via .git/info/sparse then I edit
>> .git/info/sparse to include it and switch to a branch that doesn't have
>> that file, I'd expect that file to be deleted from the working copy if
>> the content matches what's in the repository. If it's modified then I'd
>> expect the branch switch to fail (like it would without a sparse
>> checkout).
>
> First things first: with sparse checkout, you should not check out
> _anything_ outside of the focus of the sparse checkout.
>
> So I contend that you would only end up with a sparse'd-out file
> that was formerly tracked if you did something wrong.  That should not
> happen.

I was thinking if you copied the file there manually and changed
.git/info/sparse to include it. I would expect git checkout, git
status, etc. to act as just as if I had never excluded it via
.git/info/sparse, similar to .gitignore and .git/info/exclude.

> The thing is: we need a way to determine quickly and without any
> ambiguity whether a file is tracked, assumed unchanged, or sparse'd-out
> (which Nguyễn calls no-checkout).
>
> And if we change .git/info/sparse, that state _must not_ change.  We did
> not touch the file by editing .git/info/sparse, so the state must be
> unchanged.

I don't know enough to have an opinion on assume-unchanged vs
no-checkout, but if you edit .git/info/sparse it seems like it should
affect whether git cares about a file or not. If a file previously had
the no-checkout bit and you change .git/info/sparse to include the
file, the next time you do something with git, I would expect it to
start caring about that path. For example, I can edit .gitignore and
.git/info/exclude and it notices the next time I use git without
having to do anything special.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-18  0:17                                               ` Jakub Narebski
@ 2009-08-18  0:34                                                 ` skillzero
  2009-08-18  1:43                                                   ` Nguyen Thai Ngoc Duy
  2009-08-18  0:49                                                 ` [RFC PATCH v3 8/8] --sparse for porcelains Jakub Narebski
  1 sibling, 1 reply; 53+ messages in thread
From: skillzero @ 2009-08-18  0:34 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Johannes Schindelin, Junio C Hamano, Nguyen Thai Ngoc Duy, git

On Mon, Aug 17, 2009 at 5:17 PM, Jakub Narebski<jnareb@gmail.com> wrote:

> There is e.g. question if "git grep" should search "no-checkout" files;
> in the "assume-unchanged" case it should, I think, search index version.

I would like it to git grep to not search paths outside the sparse
area (although --no-sparse would be nice for git grep in case you did
want to search everything). The main reason I want sparse checkouts is
for performance reasons. For example, git grep can take 10 minutes on
my full repository so excluding paths outside the sparse area would
reduce that to a few seconds.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-18  0:17                                               ` Jakub Narebski
  2009-08-18  0:34                                                 ` skillzero
@ 2009-08-18  0:49                                                 ` Jakub Narebski
  1 sibling, 0 replies; 53+ messages in thread
From: Jakub Narebski @ 2009-08-18  0:49 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: skillzero, Junio C Hamano, Nguyen Thai Ngoc Duy, git

Jakub Narebski wrote:
> Johannes Schindelin wrote:
> 
> > The thing is: we need a way to determine quickly and without any 
> > ambiguity whether a file is tracked, assumed unchanged, or sparse'd-out 
> > (which Nguyễn calls no-checkout).
> 
> Let's reiterate: "assume-unchanged" is about telling git that it should
> assume for performance reasons that state of file in working directory
> is the same as state of file in the index.  But, from what was said in
> this thread, there are situations where git for correctness reasons
> ignores performance hack.
> 
> "no-checkout" bit is about telling git that the file is not present
> in working directory, and it has to use version from the index.  Then
> there is a question if there is file in working area (e.g. from applying
> patch) which corresponds to a "no-checkout" file in index (corresponds
> because of rename detection).

Also there is a question if one might want to use them together.  I think
it is not inconceivable ;-)  One might want for example to limit checkout
to some subdirectory, but within that directory one might want to use 
assume-unchanged bit, because filesystem performance sucks (FAT, NFS).
Now couple that with changing in sparse patterns...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v3 8/8] --sparse for porcelains
  2009-08-18  0:34                                                 ` skillzero
@ 2009-08-18  1:43                                                   ` Nguyen Thai Ngoc Duy
  2009-08-18  6:25                                                     ` git find (was: [RFC PATCH v3 8/8] --sparse for porcelains) Jakub Narebski
  0 siblings, 1 reply; 53+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2009-08-18  1:43 UTC (permalink / raw)
  To: skillzero; +Cc: Jakub Narebski, Johannes Schindelin, Junio C Hamano, git

On Tue, Aug 18, 2009 at 7:34 AM, <skillzero@gmail.com> wrote:
> On Mon, Aug 17, 2009 at 5:17 PM, Jakub Narebski<jnareb@gmail.com> wrote:
>
>> There is e.g. question if "git grep" should search "no-checkout" files;
>> in the "assume-unchanged" case it should, I think, search index version.
>
> I would like it to git grep to not search paths outside the sparse
> area (although --no-sparse would be nice for git grep in case you did
> want to search everything). The main reason I want sparse checkouts is
> for performance reasons. For example, git grep can take 10 minutes on
> my full repository so excluding paths outside the sparse area would
> reduce that to a few seconds.

That's a porcelain question that I'd leave it for now. FWIW you can do
something like this:

git ls-files -v|grep '^H'|cut -c 2-|xargs git grep

/me misses "cleartool find"
-- 
Duy

^ permalink raw reply	[flat|nested] 53+ messages in thread

* git find (was: [RFC PATCH v3 8/8] --sparse for porcelains)
  2009-08-18  1:43                                                   ` Nguyen Thai Ngoc Duy
@ 2009-08-18  6:25                                                     ` Jakub Narebski
  2009-08-18 14:35                                                       ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 53+ messages in thread
From: Jakub Narebski @ 2009-08-18  6:25 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: skillzero, Johannes Schindelin, Junio C Hamano, git

On Tue, Aug 18, 2009, Nguyen Thai Ngoc Duy wrote:
> On Tue, Aug 18, 2009 at 7:34 AM, <skillzero@gmail.com> wrote:

> > I would like it to git grep to not search paths outside the sparse
> > area (although --no-sparse would be nice for git grep in case you did
> > want to search everything). The main reason I want sparse checkouts is
> > for performance reasons. For example, git grep can take 10 minutes on
> > my full repository so excluding paths outside the sparse area would
> > reduce that to a few seconds.
> 
> That's a porcelain question that I'd leave it for now. FWIW you can do
> something like this:
> 
> git ls-files -v|grep '^H'|cut -c 2-|xargs git grep
> 
> /me misses "cleartool find"

Well, I also think that it would be nice and useful to have "git find"
in addition to current "git grep".

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: git find (was: [RFC PATCH v3 8/8] --sparse for porcelains)
  2009-08-18  6:25                                                     ` git find (was: [RFC PATCH v3 8/8] --sparse for porcelains) Jakub Narebski
@ 2009-08-18 14:35                                                       ` Nguyen Thai Ngoc Duy
  2009-08-18 16:00                                                         ` Jakub Narebski
  0 siblings, 1 reply; 53+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2009-08-18 14:35 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: skillzero, Johannes Schindelin, Junio C Hamano, git

On Tue, Aug 18, 2009 at 1:25 PM, Jakub Narebski<jnareb@gmail.com> wrote:
> Well, I also think that it would be nice and useful to have "git find"
> in addition to current "git grep".

Can you make a draft on how you want "git find" to be? Except the
"-exec" part, Git allows us to search using various commands
(ls-files, rev-list, log). I don't think a single "git find" can cover
them all. I was thinking about putting more find-options to search
commands we already have. ls-files would support -exec, for example.

A few things that I'd love to have supported:
 - --depth for ls-files (probably all pathspec-as-argument commands)
 - logical combination of search criteria
 - unified blob locator. git-show understands SHA-1:/path/to/blob
syntax. What if git-log can output using similar syntax, then feed
them to git-grep in order to grep through (across commits)?
-- 
Duy

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: git find (was: [RFC PATCH v3 8/8] --sparse for porcelains)
  2009-08-18 14:35                                                       ` Nguyen Thai Ngoc Duy
@ 2009-08-18 16:00                                                         ` Jakub Narebski
  0 siblings, 0 replies; 53+ messages in thread
From: Jakub Narebski @ 2009-08-18 16:00 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: skillzero, Johannes Schindelin, Junio C Hamano, git

On Tue, Aug 18, 2009, Nguyen Thai Ngoc Duy wrote:
> On Tue, Aug 18, 2009 at 1:25 PM, Jakub Narebski<jnareb@gmail.com> wrote:
> >
> > Well, I also think that it would be nice and useful to have "git find"
> > in addition to current "git grep".
> 
> Can you make a draft on how you want "git find" to be? Except the
> "-exec" part, Git allows us to search using various commands
> (ls-files, rev-list, log). I don't think a single "git find" can cover
> them all. I was thinking about putting more find-options to search
> commands we already have. ls-files would support -exec, for example.

Both git-rev-list and git-ls-files are plumbing, not porcelain.  Among
tools / commands you have mentioned only git-log is porcelain.

You need to process output of git-ls-files if you want to use more
complicated search criteria. 

> 
> A few things that I'd love to have supported:
>  - --depth for ls-files (probably all pathspec-as-argument commands)
>  - logical combination of search criteria
>  - unified blob locator. git-show understands SHA-1:/path/to/blob
> syntax. What if git-log can output using similar syntax, then feed
> them to git-grep in order to grep through (across commits)?

Draft specification for git-find.  git-find, like git-grep, searches
the filesystem dimension, and not time dimension like git-log.

git-find(1)
===========

NAME
----
git-find - Search for files in a repository

SYNOPSIS
--------
'git find' [--cached] [-z|--null] [(<tree> | <path>)...] [<expression>]

OPTIONS
-------
--cached::
        Instead of searching in the working tree files, check
        the blobs registered in the index file.

EXPRESSIONS
-----------
The expression is made up of options (which affect overall operation rather
than the processing of a specific file, and always return true), tests
(which return a true or false value), and actions (which have side effects
and return a true or false value), all separated by operators. `--and`  is
assumed where the operator is omitted.  If the expression contains no
actions other than `--prune`, `--print` is performed on all files for which
the expression is true.

OPTIONS
~~~~~~~
--max-depth <levels>::
        Descend  at  most levels (a non-negative integer) levels of 
        directories below the command line arguments.   `--max-depth 0`
        means only apply the tests and actions to the command line 
        arguments.

--min-depth <levels>::
        Do not apply any tests or actions at levels less than levels 
        (a non-negative integer).  `--min-depth 1` means process all
        files except the  command line arguments.

TESTS
~~~~~
--false::
        Always false.

--true::
        Always true.

--name <pattern>::
--iname <pattern>::
--path <pattern>::
--ipath <pattern>::
        [Entire] Filename matches glob.

--regex <expr>::
--iregex <expr>::
        Entire file name matches regular expression.

--lname <pattern>::
--ilname <pattern>::
        True if the file is a symbolic link whose contents match glob.

--size <n>[<unit>]::
        True if the file uses N units of space, rounding up.

--empty::
        File is empty and is either a regular file or a directory.

--type <C>::
        True if file is of type C: 'd' for directory, 'f' for regular
        file, 'l' for symbolic link, 's' for submodule, 'x' for 
        executable regular file (replaces `-perm` from 'find').

ACTIONS
~~~~~~~
(--exec | --ok) <command> ;
        Execute command; true if 0 status is returned.

(--execdir | --okdir) <command> ;
        Like `--exec`, but the specified command is run from the 
        subdirectory containing the matched file.

--print::
--print0::
--printf <format>::
--fprint <file>::
--fprint0 <file>::
--fprintf <file> <format>::
        True; print the full file name.

--prune::
        True; if the file is a directory, do not descend into it.

--quit::
        Exit immediately.


OPERATORS
~~~~~~~~~
--and::
--or::
--not::
( ... )::
        Specify how multiple expressions are combined using Boolean
        expressions.  `--and` is the default operator.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2009-08-18 16:01 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-11 15:43 [RFC PATCH v3 0/8] Sparse checkout Nguyễn Thái Ngọc Duy
2009-08-11 15:43 ` [RFC PATCH v3 1/8] Prevent diff machinery from examining assume-unchanged entries on worktree Nguyễn Thái Ngọc Duy
2009-08-11 15:44   ` [RFC PATCH v3 2/8] Avoid writing to buffer in add_excludes_from_file_1() Nguyễn Thái Ngọc Duy
2009-08-11 15:44     ` [RFC PATCH v3 3/8] Read .gitignore from index if it is assume-unchanged Nguyễn Thái Ngọc Duy
2009-08-11 15:44       ` [RFC PATCH v3 4/8] excluded_1(): support exclude "directories" in index Nguyễn Thái Ngọc Duy
2009-08-11 15:44         ` [RFC PATCH v3 5/8] dir.c: export excluded_1() and add_excludes_from_file_1() Nguyễn Thái Ngọc Duy
2009-08-11 15:44           ` [RFC PATCH v3 6/8] unpack-trees.c: generalize verify_* functions Nguyễn Thái Ngọc Duy
2009-08-11 15:44             ` [RFC PATCH v3 7/8] Support sparse checkout in unpack_trees() and read-tree Nguyễn Thái Ngọc Duy
2009-08-11 15:44               ` [RFC PATCH v3 8/8] --sparse for porcelains Nguyễn Thái Ngọc Duy
2009-08-12  6:33                 ` Junio C Hamano
2009-08-12 10:01                   ` Nguyen Thai Ngoc Duy
2009-08-13  7:20                   ` Nguyen Thai Ngoc Duy
2009-08-13  9:58                     ` Jakub Narebski
2009-08-13 12:38                       ` Nguyen Thai Ngoc Duy
2009-08-14 20:23                         ` Jakub Narebski
2009-08-15  2:01                           ` Junio C Hamano
2009-08-15 23:37                             ` Jakub Narebski
2009-08-16  8:14                               ` Johannes Schindelin
2009-08-17  9:08                                 ` Johannes Schindelin
2009-08-17 12:49                                   ` Nguyen Thai Ngoc Duy
2009-08-17 13:35                                     ` Johannes Schindelin
2009-08-17 14:41                                       ` Nguyen Thai Ngoc Duy
2009-08-17 15:19                                         ` Johannes Schindelin
2009-08-17 16:13                                           ` Nguyen Thai Ngoc Duy
2009-08-17 15:41                                   ` Junio C Hamano
2009-08-17 16:06                                     ` Nguyen Thai Ngoc Duy
2009-08-17 16:19                                     ` Johannes Schindelin
2009-08-17 18:39                                       ` Junio C Hamano
2009-08-17 22:02                                         ` Johannes Schindelin
2009-08-17 23:02                                           ` skillzero
2009-08-17 23:16                                             ` Johannes Schindelin
2009-08-18  0:17                                               ` Jakub Narebski
2009-08-18  0:34                                                 ` skillzero
2009-08-18  1:43                                                   ` Nguyen Thai Ngoc Duy
2009-08-18  6:25                                                     ` git find (was: [RFC PATCH v3 8/8] --sparse for porcelains) Jakub Narebski
2009-08-18 14:35                                                       ` Nguyen Thai Ngoc Duy
2009-08-18 16:00                                                         ` Jakub Narebski
2009-08-18  0:49                                                 ` [RFC PATCH v3 8/8] --sparse for porcelains Jakub Narebski
2009-08-18  0:23                                               ` skillzero
2009-08-17 16:46                                     ` Junio C Hamano
2009-08-17 21:45                                       ` Johannes Schindelin
2009-08-17 16:01                                 ` Jakub Narebski
2009-08-12  7:31                 ` Johannes Sixt
2009-08-12  9:53                   ` Nguyen Thai Ngoc Duy
2009-08-12 15:40                     ` Raja R Harinath
2009-08-13  7:37                       ` Johannes Sixt
2009-08-11 21:18               ` [RFC PATCH v3 7/8] Support sparse checkout in unpack_trees() and read-tree skillzero
2009-08-11 21:38                 ` Jakub Narebski
2009-08-11 22:03                   ` skillzero
2009-08-12  1:30                     ` Nguyen Thai Ngoc Duy
2009-08-12  4:59                       ` skillzero
2009-08-12  2:51       ` [RFC PATCH v3 3/8] Read .gitignore from index if it is assume-unchanged Junio C Hamano
2009-08-13  6:37         ` Nguyen Thai Ngoc Duy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.