All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/16] Subtree clone proof of concept
@ 2010-07-31 16:18 Nguyễn Thái Ngọc Duy
  2010-07-31 16:18 ` [PATCH 01/16] Add core.subtree Nguyễn Thái Ngọc Duy
                   ` (17 more replies)
  0 siblings, 18 replies; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Something to play with so we can evaluate which is the best strategy
for non-full clone (or whatever you call it).

The idea is the same: pack only enough to access a subtree, rewrite
commits at client side, rewrite again when pushing. However I put
git-replace into the mix, so at least commit SHA-1 looks as same as from
upstream. git-subtree is not needed (although it's still an option)

With this, I can clone Documentaion/ from git.git, update and push. I
haven't tested it further. Space consumption is 24MB (58MB for full
repo).  Not really impressive, but if one truely cares about disk
space, he/she should also use shallow clone.

Performance is impacted, due to bulk commit replacement. There is a
split second delay for every command. It's the price of replacing 24k
commits every time. I think the delay could be improved a little bit
(caching or mmap..)

Rewriting commits at clone takes time too. Doing individual object
writing takes lots of space and time. I put all new objects directly
to a pack now. Rewriting time now becomes quite acceptable (a few
seconds). Although deep subtree/repo may take longer. Rewriting on
demand can be considered in such cases.

Repo-care commands like fsck, repack, gc are left out for now.

Finally, it's more of a hack just to see how far I can go. It will
break things.

Nguyễn Thái Ngọc Duy (16):
  Add core.subtree
  list-objects: limit traversing within the given subtree if
    core.subtree is set
  parse_object: keep sha1 even when parsing replaced one
  Allow to invalidate a commit in in-memory object store
  Hook up replace-object to allow bulk commit replacement
  upload-pack: use a separate variable to control whether internal
    rev-list is used
  upload-pack: support subtree pack
  fetch-pack: support --subtree
  subtree: rewrite incoming commits
  clone: support subtree clone with parameter --subtree
  pack-objects: add --subtree (for pushing)
  subtree: rewriting outgoing commits
  Update commit_tree() interface to take base tree too
  commit_tree(): rewriting/replacing new commits
  commit: rewrite outgoing commits
  do not use thin packs and subtree together (just a bad feeling about
    this)

 Makefile               |    2 +
 builtin/clone.c        |   10 +
 builtin/commit-tree.c  |    2 +-
 builtin/commit.c       |    4 +-
 builtin/fetch-pack.c   |    8 +
 builtin/merge.c        |    4 +-
 builtin/notes.c        |    2 +-
 builtin/pack-objects.c |    4 +
 builtin/send-pack.c    |    2 +
 cache.h                |    1 +
 commit.c               |   25 +++-
 commit.h               |    4 +-
 config.c               |    3 +
 environment.c          |    2 +
 list-objects.c         |   23 ++-
 notes-cache.c          |    2 +-
 object.c               |    2 +-
 replace_object.c       |    5 +
 subtree.c              |  534 ++++++++++++++++++++++++++++++++++++++++++++++++
 subtree.h              |    4 +
 upload-pack.c          |   28 ++-
 21 files changed, 651 insertions(+), 20 deletions(-)
 create mode 100644 subtree.c
 create mode 100644 subtree.h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/16] Add core.subtree
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-07-31 16:18 ` [PATCH 02/16] list-objects: limit traversing within the given subtree if core.subtree is set Nguyễn Thái Ngọc Duy
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This variable contains the subtree. With core_subtree being non-empty
behavior of git may be totally different.

Perhaps this should not stay in .git/config, rather .git/subtree

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h       |    1 +
 config.c      |    3 +++
 environment.c |    2 ++
 3 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/cache.h b/cache.h
index c9fa3df..04ebe6e 100644
--- a/cache.h
+++ b/cache.h
@@ -551,6 +551,7 @@ extern int read_replace_refs;
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_apply_sparse_checkout;
+extern const char *core_subtree;
 
 enum safe_crlf {
 	SAFE_CRLF_FALSE = 0,
diff --git a/config.c b/config.c
index cdcf583..86ded29 100644
--- a/config.c
+++ b/config.c
@@ -595,6 +595,9 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.subtree"))
+		return git_config_string(&core_subtree, var, value);
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
 }
diff --git a/environment.c b/environment.c
index 83d38d3..1365dd0 100644
--- a/environment.c
+++ b/environment.c
@@ -57,6 +57,8 @@ int core_apply_sparse_checkout;
 /* Parallel index stat data preload? */
 int core_preload_index = 0;
 
+const char *core_subtree;
+
 /* This is set by setup_git_dir_gently() and/or git_default_config() */
 char *git_work_tree_cfg;
 static char *work_tree;
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/16] list-objects: limit traversing within the given subtree if core.subtree is set
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
  2010-07-31 16:18 ` [PATCH 01/16] Add core.subtree Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-08-01 11:30   ` Ævar Arnfjörð Bjarmason
  2010-08-02  4:21   ` Elijah Newren
  2010-07-31 16:18 ` [PATCH 03/16] parse_object: keep sha1 even when parsing replaced one Nguyễn Thái Ngọc Duy
                   ` (15 subsequent siblings)
  17 siblings, 2 replies; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 list-objects.c |   23 +++++++++++++++++------
 1 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 8953548..1b25b54 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -61,12 +61,15 @@ static void process_tree(struct rev_info *revs,
 			 struct tree *tree,
 			 show_object_fn show,
 			 struct name_path *path,
-			 const char *name)
+			 const char *name,
+			 const char *subtree)
 {
 	struct object *obj = &tree->object;
 	struct tree_desc desc;
 	struct name_entry entry;
 	struct name_path me;
+	const char *slash;
+	int subtree_len;
 
 	if (!revs->tree_objects)
 		return;
@@ -82,13 +85,21 @@ static void process_tree(struct rev_info *revs,
 	me.elem = name;
 	me.elem_len = strlen(name);
 
+	if (subtree) {
+		slash = strchr(subtree, '/');
+		subtree_len = slash ? slash - subtree : strlen(subtree);
+	}
+
 	init_tree_desc(&desc, tree->buffer, tree->size);
 
 	while (tree_entry(&desc, &entry)) {
-		if (S_ISDIR(entry.mode))
-			process_tree(revs,
-				     lookup_tree(entry.sha1),
-				     show, &me, entry.path);
+		if (S_ISDIR(entry.mode)) {
+			if (!subtree || !strncmp(entry.path, subtree, subtree_len))
+				process_tree(revs,
+					     lookup_tree(entry.sha1),
+					     show, &me, entry.path,
+					     slash && slash[1] ? slash+1 : NULL);
+		}
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(revs, entry.sha1,
 					show, &me, entry.path);
@@ -164,7 +175,7 @@ void traverse_commit_list(struct rev_info *revs,
 		}
 		if (obj->type == OBJ_TREE) {
 			process_tree(revs, (struct tree *)obj, show_object,
-				     NULL, name);
+				     NULL, name, core_subtree);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/16] parse_object: keep sha1 even when parsing replaced one
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
  2010-07-31 16:18 ` [PATCH 01/16] Add core.subtree Nguyễn Thái Ngọc Duy
  2010-07-31 16:18 ` [PATCH 02/16] list-objects: limit traversing within the given subtree if core.subtree is set Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-07-31 16:18 ` [PATCH 04/16] Allow to invalidate a commit in in-memory object store Nguyễn Thái Ngọc Duy
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 object.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/object.c b/object.c
index 277b3dd..7adfda7 100644
--- a/object.c
+++ b/object.c
@@ -199,7 +199,7 @@ struct object *parse_object(const unsigned char *sha1)
 			return NULL;
 		}
 
-		obj = parse_object_buffer(repl, type, size, buffer, &eaten);
+		obj = parse_object_buffer(sha1, type, size, buffer, &eaten);
 		if (!eaten)
 			free(buffer);
 		return obj;
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/16] Allow to invalidate a commit in in-memory object store
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (2 preceding siblings ...)
  2010-07-31 16:18 ` [PATCH 03/16] parse_object: keep sha1 even when parsing replaced one Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-07-31 16:18 ` [PATCH 05/16] Hook up replace-object to allow bulk commit replacement Nguyễn Thái Ngọc Duy
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This is needed if replacing object happens at run time.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 commit.c |   15 +++++++++++++++
 commit.h |    2 ++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/commit.c b/commit.c
index e9b0750..d1e30b2 100644
--- a/commit.c
+++ b/commit.c
@@ -315,6 +315,21 @@ int parse_commit(struct commit *item)
 	return ret;
 }
 
+int invalidate_commit(struct commit *item)
+{
+	if (!item)
+		return -1;
+
+	if (item->object.parsed) {
+		item->object.parsed = 0;
+		if (item->buffer) {
+			free(item->buffer);
+			item->buffer = NULL;
+		}
+	}
+	return 0;
+}
+
 struct commit_list *commit_list_insert(struct commit *item, struct commit_list **list_p)
 {
 	struct commit_list *new_list = xmalloc(sizeof(struct commit_list));
diff --git a/commit.h b/commit.h
index eb2b8ac..d8c01ea 100644
--- a/commit.h
+++ b/commit.h
@@ -41,6 +41,8 @@ int parse_commit_buffer(struct commit *item, void *buffer, unsigned long size);
 
 int parse_commit(struct commit *item);
 
+int invalidate_commit(struct commit *item);
+
 struct commit_list * commit_list_insert(struct commit *item, struct commit_list **list_p);
 unsigned commit_list_count(const struct commit_list *l);
 struct commit_list * insert_by_date(struct commit *item, struct commit_list **list);
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/16] Hook up replace-object to allow bulk commit replacement
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (3 preceding siblings ...)
  2010-07-31 16:18 ` [PATCH 04/16] Allow to invalidate a commit in in-memory object store Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-08-02 19:58   ` Junio C Hamano
  2010-07-31 16:18 ` [PATCH 06/16] upload-pack: use a separate variable to control whether internal rev-list is used Nguyễn Thái Ngọc Duy
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

$GIT_DIR/subtree contains commit mapping in subtree mode. It's quite
large that putting it in $GIT_DIR/refs/replace may slow git down
significantly. Even with this, there will be a split second delay for
every git command.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Makefile         |    2 +
 replace_object.c |    5 ++
 subtree.c        |  117 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 subtree.h        |    2 +
 4 files changed, 126 insertions(+), 0 deletions(-)
 create mode 100644 subtree.c
 create mode 100644 subtree.h

diff --git a/Makefile b/Makefile
index f33648d..0d13538 100644
--- a/Makefile
+++ b/Makefile
@@ -525,6 +525,7 @@ LIB_H += sigchain.h
 LIB_H += strbuf.h
 LIB_H += string-list.h
 LIB_H += submodule.h
+LIB_H += subtree.h
 LIB_H += tag.h
 LIB_H += transport.h
 LIB_H += tree.h
@@ -629,6 +630,7 @@ LIB_OBJS += sigchain.o
 LIB_OBJS += strbuf.o
 LIB_OBJS += string-list.o
 LIB_OBJS += submodule.o
+LIB_OBJS += subtree.o
 LIB_OBJS += symlinks.o
 LIB_OBJS += tag.o
 LIB_OBJS += trace.o
diff --git a/replace_object.c b/replace_object.c
index eb59604..5fe4099 100644
--- a/replace_object.c
+++ b/replace_object.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "sha1-lookup.h"
 #include "refs.h"
+#include "subtree.h"
 
 static struct replace_object {
 	unsigned char sha1[2][20];
@@ -82,6 +83,7 @@ static void prepare_replace_object(void)
 	if (replace_object_prepared)
 		return;
 
+	prepare_subtree_commit();
 	for_each_replace_ref(register_replace_ref, NULL);
 	replace_object_prepared = 1;
 }
@@ -99,6 +101,9 @@ const unsigned char *lookup_replace_object(const unsigned char *sha1)
 
 	prepare_replace_object();
 
+	if (core_subtree)
+		cur = subtree_lookup_object(cur);
+
 	/* Try to recursively replace the object */
 	do {
 		if (--depth < 0)
diff --git a/subtree.c b/subtree.c
new file mode 100644
index 0000000..601d827
--- /dev/null
+++ b/subtree.c
@@ -0,0 +1,117 @@
+#include "cache.h"
+#include "commit.h"
+#include "tree.h"
+#include "diff.h"
+#include "revision.h"
+#include "refs.h"
+#include "tag.h"
+#include "progress.h"
+#include "pack.h"
+#include "sha1-lookup.h"
+#include "csum-file.h"
+
+static struct replace_object {
+	unsigned char sha1[2][20];
+} **subtree_commit;
+
+static struct replace_object **subtree_commit, **subtree_commit_r;
+static int subtree_commit_nr, subtree_commit_r_nr, subtree_commit_alloc;
+
+static const unsigned char *replace_sha1_access(size_t index, void *table)
+{
+	struct replace_object **replace = table;
+	return replace[index]->sha1[0];
+}
+
+static int subtree_replace_object_pos(struct replace_object **store, int nr,
+				      const unsigned char *sha1)
+{
+	return sha1_pos(sha1, store, nr, replace_sha1_access);
+}
+
+static int subtree_register_object(struct replace_object **store,
+					   int *nr,
+					   const unsigned char *sha1,
+					   struct replace_object *replace,
+					   int ignore_dups)
+{
+	int pos = subtree_replace_object_pos(store, *nr, sha1);
+
+	if (0 <= pos) {
+		if (ignore_dups)
+			free(replace);
+		else {
+			free(store[pos]);
+			store[pos] = replace;
+		}
+		return 1;
+	}
+	pos = -pos - 1;
+	(*nr)++;
+	if (pos < *nr)
+		memmove(store + pos + 1,
+			store + pos,
+			(*nr - pos - 1) *
+			sizeof(*store));
+	store[pos] = replace;
+	return 0;
+}
+
+void prepare_subtree_commit()
+{
+	int fd;
+	struct stat stat;
+	struct replace_object *ro;
+	int ro_size, ro_table_size;
+	char *subtree, *entry;
+
+	if (!core_subtree)
+		return;
+
+	fd = open(git_path("subtree"), O_RDONLY);
+	if (fd == -1)
+		return;
+
+	if (fstat(fd, &stat))
+		die("Could not stat .git/subtree");
+
+	if (stat.st_size % 82)
+		die("Invalid .git/subtree size");
+
+	subtree_commit_alloc = stat.st_size / 82;
+	ro_size = sizeof(struct replace_object) * subtree_commit_alloc;
+	ro_table_size = sizeof(struct replace_object*) * subtree_commit_alloc;
+	subtree_commit_nr = 0;
+	subtree_commit_r_nr = 0;
+
+	subtree_commit = xmalloc(ro_size + ro_table_size*2);
+	subtree_commit_r = (struct replace_object **)(((char*)subtree_commit) + ro_table_size);
+	ro = (struct replace_object *)(((char*)subtree_commit) + 2*ro_table_size);
+
+	entry = subtree = xmmap(NULL, stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+
+	for (entry = subtree; subtree_commit_nr < subtree_commit_alloc; entry += 82, ro++) {
+		if (entry[40] != ' ' || entry[81] != '\n')
+			die("Broken .git/subtree");
+
+		get_sha1_hex(entry,    ro->sha1[0]);
+		get_sha1_hex(entry+41, ro->sha1[1]);
+		if (subtree_register_object(subtree_commit, &subtree_commit_nr,
+						    ro->sha1[0], ro, 1) ||
+		    subtree_register_object(subtree_commit_r, &subtree_commit_r_nr,
+						    ro->sha1[1], ro, 1))
+			die("duplicate replace ref: %s", sha1_to_hex(ro->sha1[0]));
+	}
+	munmap(subtree, stat.st_size);
+	close(fd);
+}
+
+const unsigned char *subtree_lookup_object(const unsigned char *sha1)
+{
+	int pos = subtree_replace_object_pos(subtree_commit,
+					     subtree_commit_nr,
+					     sha1);
+	if (0 <= pos)
+		return subtree_commit[pos]->sha1[1];
+	return sha1;
+}
diff --git a/subtree.h b/subtree.h
new file mode 100644
index 0000000..157153a
--- /dev/null
+++ b/subtree.h
@@ -0,0 +1,2 @@
+void prepare_subtree_commit();
+const unsigned char *subtree_lookup_object(const unsigned char *sha1);
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/16] upload-pack: use a separate variable to control whether internal rev-list is used
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (4 preceding siblings ...)
  2010-07-31 16:18 ` [PATCH 05/16] Hook up replace-object to allow bulk commit replacement Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-08-02  4:25   ` Elijah Newren
  2010-07-31 16:18 ` [PATCH 07/16] upload-pack: support subtree pack Nguyễn Thái Ngọc Duy
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 upload-pack.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index dc464d7..e432e83 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -160,8 +160,9 @@ static void create_pack_file(void)
 	ssize_t sz;
 	const char *argv[10];
 	int arg = 0;
+	int internal_rev_list = shallow_nr;
 
-	if (shallow_nr) {
+	if (internal_rev_list) {
 		memset(&rev_list, 0, sizeof(rev_list));
 		rev_list.proc = do_rev_list;
 		rev_list.out = -1;
@@ -187,7 +188,7 @@ static void create_pack_file(void)
 	argv[arg++] = NULL;
 
 	memset(&pack_objects, 0, sizeof(pack_objects));
-	pack_objects.in = shallow_nr ? rev_list.out : -1;
+	pack_objects.in = internal_rev_list ? rev_list.out : -1;
 	pack_objects.out = -1;
 	pack_objects.err = -1;
 	pack_objects.git_cmd = 1;
@@ -197,7 +198,7 @@ static void create_pack_file(void)
 		die("git upload-pack: unable to fork git-pack-objects");
 
 	/* pass on revisions we (don't) want */
-	if (!shallow_nr) {
+	if (!internal_rev_list) {
 		FILE *pipe_fd = xfdopen(pack_objects.in, "w");
 		if (!create_full_pack) {
 			int i;
@@ -311,7 +312,7 @@ static void create_pack_file(void)
 		error("git upload-pack: git-pack-objects died with error.");
 		goto fail;
 	}
-	if (shallow_nr && finish_async(&rev_list))
+	if (internal_rev_list && finish_async(&rev_list))
 		goto fail;	/* error was already reported */
 
 	/* flush the data */
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/16] upload-pack: support subtree pack
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (5 preceding siblings ...)
  2010-07-31 16:18 ` [PATCH 06/16] upload-pack: use a separate variable to control whether internal rev-list is used Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-08-02  4:27   ` Elijah Newren
  2010-07-31 16:18 ` [PATCH 08/16] fetch-pack: support --subtree Nguyễn Thái Ngọc Duy
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

With core_subtree turned on (capability "subtree", request "subtree"
from fetch-pack), traverse_commit_list will be in "subtree mode",
which will not go farther than the given subtree.

As the result, the pack is broken be design, only contains enough
blobs/trees/commits to reach the given subtree.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 upload-pack.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index e432e83..9b6710a 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -160,7 +160,7 @@ static void create_pack_file(void)
 	ssize_t sz;
 	const char *argv[10];
 	int arg = 0;
-	int internal_rev_list = shallow_nr;
+	int internal_rev_list = shallow_nr || core_subtree;
 
 	if (internal_rev_list) {
 		memset(&rev_list, 0, sizeof(rev_list));
@@ -505,6 +505,20 @@ static void receive_needs(void)
 		if (debug_fd)
 			write_in_full(debug_fd, line, len);
 
+		if (!prefixcmp(line, "subtree ")) {
+			int len;
+			char *subtree;
+			if (core_subtree)
+				die("sorry, only one subtree supported");
+			len = strlen(line+8);
+			subtree = malloc(len+1);
+			memcpy(subtree, line+8, len-1);
+			subtree[len-1] = '\0'; /* \n */
+			if (subtree[len-2] != '/')
+				die("subtree request must end with a slash");
+			core_subtree = subtree;
+			continue;
+		}
 		if (!prefixcmp(line, "shallow ")) {
 			unsigned char sha1[20];
 			struct object *object;
@@ -624,7 +638,7 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo
 {
 	static const char *capabilities = "multi_ack thin-pack side-band"
 		" side-band-64k ofs-delta shallow no-progress"
-		" include-tag multi_ack_detailed";
+		" include-tag multi_ack_detailed subtree";
 	struct object *o = parse_object(sha1);
 
 	if (!o)
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/16] fetch-pack: support --subtree
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (6 preceding siblings ...)
  2010-07-31 16:18 ` [PATCH 07/16] upload-pack: support subtree pack Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-07-31 16:18 ` [PATCH 09/16] subtree: rewrite incoming commits Nguyễn Thái Ngọc Duy
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This options requires subtree-aware upload-pack. It simply pass the
subtree from command line (or from $GIT_DIR/config) to upload-pack.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/fetch-pack.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index dbd8b7b..7460ecc 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -237,6 +237,8 @@ static int find_common(int fd[2], unsigned char *result_sha1,
 	for_each_ref(rev_list_insert_ref, NULL);
 
 	fetching = 0;
+	if (core_subtree)
+		packet_buf_write(&req_buf, "subtree %s\n", core_subtree);
 	for ( ; refs ; refs = refs->next) {
 		unsigned char *remote = refs->old_sha1;
 		const char *remote_hex;
@@ -692,6 +694,8 @@ static struct ref *do_fetch_pack(int fd[2],
 
 	if (is_repository_shallow() && !server_supports("shallow"))
 		die("Server does not support shallow clients");
+	if (core_subtree && !server_supports("subtree"))
+		die("Server does not support subtree");
 	if (server_supports("multi_ack_detailed")) {
 		if (args.verbose)
 			fprintf(stderr, "Server supports multi_ack_detailed\n");
@@ -860,6 +864,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 				pack_lockfile_ptr = &pack_lockfile;
 				continue;
 			}
+			if (!prefixcmp(arg, "--subtree=")) {
+				core_subtree = arg + 10;
+				continue;
+			}
 			usage(fetch_pack_usage);
 		}
 		dest = (char *)arg;
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/16] subtree: rewrite incoming commits
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (7 preceding siblings ...)
  2010-07-31 16:18 ` [PATCH 08/16] fetch-pack: support --subtree Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-08-02  4:37   ` Elijah Newren
  2010-07-31 16:18 ` [PATCH 10/16] clone: support subtree clone with parameter --subtree Nguyễn Thái Ngọc Duy
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This adds the main function, subtree_import(), which is intended to be
used by "git clone".

Because subtree packs are not complete. They are barely usable. Git
client will cry out missing objects here and there... Theortically,
client code could be adapted to only look for objects within
subtree. That was painful to try.

Alternatively, subtree_import() rewrites commits to have only the
specified subtree, sealing all broken path. Git client now happily
works with these new commits.

However, users might not, because it's different commit, different
SHA-1. They can't use those SHA-1 to communicate within their team. To
work around this, all original commits are replaced by new commits
using git-replace.

Of course this is still not perfect. Users may be able to send SHA-1
around, which is consistent. They may not do the same with tree SHA-1.

Rewriting/replacing commits takes time and space. For replacing _all_
commits, the current replace mechanism is not suitable, which is why
subtree_lookup_object() was introduced in previous patches.

For rewriting, writing a huge number of objects is slow. So
subtree_import() builds a pack for all new objects. These packs are
not optimized. But it does reduce wait time for rewriting.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 subtree.c |  244 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 subtree.h |    1 +
 2 files changed, 245 insertions(+), 0 deletions(-)

diff --git a/subtree.c b/subtree.c
index 601d827..8c075be 100644
--- a/subtree.c
+++ b/subtree.c
@@ -115,3 +115,247 @@ const unsigned char *subtree_lookup_object(const unsigned char *sha1)
 		return subtree_commit[pos]->sha1[1];
 	return sha1;
 }
+
+static unsigned long do_compress(void **pptr, unsigned long size)
+{
+	z_stream stream;
+	void *in, *out;
+	unsigned long maxsize;
+
+	memset(&stream, 0, sizeof(stream));
+	deflateInit(&stream, Z_DEFAULT_COMPRESSION);
+	maxsize = deflateBound(&stream, size);
+
+	in = *pptr;
+	out = xmalloc(maxsize);
+	*pptr = out;
+
+	stream.next_in = in;
+	stream.avail_in = size;
+	stream.next_out = out;
+	stream.avail_out = maxsize;
+	while (deflate(&stream, Z_FINISH) == Z_OK)
+		; /* nothing */
+	deflateEnd(&stream);
+
+	return stream.total_out;
+}
+
+static int nr_written;
+static int add_sha1_to_pack(int fd, void *buf, unsigned long size, enum object_type type)
+{
+	unsigned long datalen;
+	unsigned hdrlen;
+	unsigned char header[10];
+
+	datalen = do_compress(&buf, size);
+	hdrlen = encode_in_pack_object_header(type, size, header);
+	write(fd, header, hdrlen);
+	write(fd, buf, datalen);
+	nr_written++;
+	free(buf);
+	return 0;
+}
+
+/*
+ * Take sha1 of a tree, rewrite it to only return the prefix and return
+ * the newsha1.
+ *
+ * If if is zero, write to object store. If fd is greater than zero,
+ * it's a pack file handle.
+ */
+static int narrow_tree(const unsigned char *sha1, unsigned char *newsha1,
+		       const char *prefix, int fd)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	struct strbuf buffer;
+	const char *slash;
+	int subtree_len;
+	enum object_type type;
+	unsigned long size;
+	char *tree;
+	struct object *obj;
+
+	slash = strchr(prefix, '/');
+	subtree_len = slash ? slash - prefix : strlen(prefix);
+
+	tree = read_sha1_file(sha1, &type, &size);
+	if (type != OBJ_TREE)
+		die("%s is not a tree", sha1_to_hex(sha1));
+
+	init_tree_desc(&desc, tree, size);
+	strbuf_init(&buffer, 1024);
+	while (tree_entry(&desc, &entry)) {
+		if (!S_ISDIR(entry.mode))
+			continue;
+
+		if (subtree_len == strlen(entry.path) &&
+		    !strncmp(entry.path, prefix, subtree_len)) {
+			unsigned char newtree_sha1[20];
+
+			if (slash && slash[1]) /* trailing slash does not count */
+				narrow_tree(entry.sha1, newtree_sha1, prefix+subtree_len+1, fd);
+			else
+				memcpy(newtree_sha1, entry.sha1, 20);
+
+			strbuf_addf(&buffer, "%o %.*s%c", entry.mode, strlen(entry.path), entry.path, '\0');
+			strbuf_add(&buffer, newtree_sha1, 20);
+			break;
+		}
+	}
+	free(tree);
+
+	if (fd == 0) {
+		if (write_sha1_file(buffer.buf, buffer.len, tree_type, newsha1)) {
+			error("Could not write replaced tree for %s", sha1_to_hex(sha1));
+			strbuf_release(&buffer);
+			return 1;
+		}
+		strbuf_release(&buffer);
+		return 0;
+	}
+
+	hash_sha1_file(buffer.buf, buffer.len, tree_type, newsha1);
+	obj = (struct object *)lookup_tree(newsha1);
+	if (fd > 0 &&
+	    !(obj->flags & SEEN) &&
+	    add_sha1_to_pack(fd, buffer.buf, buffer.len, OBJ_TREE)) {
+		error("Could not write replaced tree for %s", sha1_to_hex(sha1));
+		strbuf_release(&buffer);
+		return 1;
+	}
+	obj->flags |= SEEN;
+
+	strbuf_release(&buffer);
+	return 0;
+}
+
+/*
+ * Take sha1 of a commit, rewrite its tree using narrow_tree(), then
+ * add a replace entry to file pointer fp (which is $GIT_DIR/subtree).
+ *
+ * Also update replace-object database so that the given sha1 can be
+ * replaced with the new one right after this function returns.
+ */
+static int shadow_commit(const unsigned char *sha1, const char *prefix, int fd, FILE *fp)
+{
+	unsigned char newsha1[20], treesha1[20];
+	enum object_type type;
+	unsigned long size;
+	void *buffer;
+	struct object *obj;
+	int saved_read_replace_refs = read_replace_refs;
+
+	read_replace_refs = 0;
+	buffer = read_sha1_file(sha1, &type, &size);
+	read_replace_refs = saved_read_replace_refs;
+	get_sha1_hex(buffer+5, treesha1);
+
+	if (!buffer || type != OBJ_COMMIT ||
+	    narrow_tree(treesha1, newsha1, prefix, fd)) {
+		free(buffer);
+		error("Failed to narrow tree for commit %s", sha1_to_hex(sha1));
+		return 1;
+	}
+
+	/* replace new tree in */
+	memcpy((char*)buffer+5, sha1_to_hex(newsha1), 40);
+
+	if (fd == 0) {
+		if (write_sha1_file(buffer, size, commit_type, newsha1)) {
+			free(buffer);
+			error("Could not write replaced commit for %s", sha1_to_hex(sha1));
+			return 1;
+		}
+	}
+	else {
+		hash_sha1_file(buffer, size, commit_type, newsha1);
+		obj = (struct object *)lookup_commit(newsha1);
+		if (fd > 0 &&
+		    !(obj->flags & SEEN) &&
+		    add_sha1_to_pack(fd, buffer, size, OBJ_COMMIT)) {
+			free(buffer);
+			error("Could not write replaced commit for %s", sha1_to_hex(sha1));
+			return 1;
+		}
+		obj->flags |= SEEN;
+	}
+
+	if (fp) {
+		char buf[82];
+		memcpy(buf, sha1_to_hex(sha1), 40);
+		buf[40] = ' ';
+		memcpy(buf+41, sha1_to_hex(newsha1), 40);
+		buf[81] = '\n';
+		fwrite(buf, 82, 1, fp);
+	}
+	free(buffer);
+
+	return 0;
+}
+
+/*
+ * Rewrite all reachable commits in repo using shadow_commit().
+ * Write out the pack that contains new tree/commit objects.
+ */
+void subtree_import()
+{
+	const char *args[] = {"rev-list", "--all", NULL};
+	struct pack_header hdr;
+	struct progress *ps;
+	struct rev_info revs;
+	struct commit *c;
+	unsigned char sha1[20];
+	unsigned commit_nr = 0;
+	char *pack_tmp_name;
+	char tmpname[PATH_MAX];
+	int pack_fd, i;
+	FILE *fp;
+	char cmd[1024];
+
+	/* Packing */
+	init_revisions(&revs, NULL);
+	setup_revisions(2, args, &revs, NULL);
+	if (prepare_revision_walk(&revs))
+		die("revision walk setup failed");
+	fp = fopen(git_path("subtree"), "w+");
+
+	pack_fd = odb_mkstemp(tmpname, sizeof(tmpname), "pack/tmp_pack_XXXXXX");
+	pack_tmp_name = xstrdup(tmpname);
+
+	hdr.hdr_signature = htonl(PACK_SIGNATURE);
+	hdr.hdr_version = htonl(PACK_VERSION);
+	hdr.hdr_entries = htonl(0);
+	write(pack_fd, &hdr, sizeof(hdr));
+
+	ps = start_progress("Preparing subtree commits", 0);
+	while ((c = get_revision(&revs)) != NULL) {
+		if (shadow_commit(c->object.sha1, core_subtree, pack_fd, fp))
+			die("Failed to shadow commit %s", c->object.sha1);
+		display_progress(ps, ++commit_nr);
+	}
+	stop_progress(&ps);
+	fclose(fp);
+	fixup_pack_header_footer(pack_fd, sha1, pack_tmp_name, nr_written, NULL, 0);
+	close(pack_fd);
+	sprintf(cmd, "git index-pack --stdin < %s", pack_tmp_name);
+	system(cmd);
+	unlink(pack_tmp_name);
+
+	reprepare_packed_git();
+	if (subtree_commit)
+		free(subtree_commit);
+	prepare_subtree_commit();
+
+	/* Invalidate all replaced commits */
+	for (i = 0; i < subtree_commit_nr; i++) {
+		/* lookup_commit() would create new objects, we don't want that */
+		c = (struct commit *)lookup_object(subtree_commit[i]->sha1[0]);
+		if (c)
+			invalidate_commit(c);
+	}
+
+	if (revs.pending.nr)
+		free(revs.pending.objects);
+}
diff --git a/subtree.h b/subtree.h
index 157153a..3512e2a 100644
--- a/subtree.h
+++ b/subtree.h
@@ -1,2 +1,3 @@
 void prepare_subtree_commit();
 const unsigned char *subtree_lookup_object(const unsigned char *sha1);
+void subtree_import();
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/16] clone: support subtree clone with parameter --subtree
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (8 preceding siblings ...)
  2010-07-31 16:18 ` [PATCH 09/16] subtree: rewrite incoming commits Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-07-31 16:18 ` [PATCH 11/16] pack-objects: add --subtree (for pushing) Nguyễn Thái Ngọc Duy
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

With all the preparation work, here comes --subtree. So clone away!

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/clone.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index efb1e6f..43bc34b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,7 @@
 #include "branch.h"
 #include "remote.h"
 #include "run-command.h"
+#include "subtree.h"
 
 /*
  * Overall FIXMEs:
@@ -78,6 +79,8 @@ static struct option builtin_clone_options[] = {
 		   "path to git-upload-pack on the remote"),
 	OPT_STRING(0, "depth", &option_depth, "depth",
 		    "create a shallow clone of that depth"),
+	OPT_STRING(0, "subtree", &core_subtree, "subtree",
+		   "subtree clone"),
 
 	OPT_END()
 };
@@ -515,6 +518,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	strbuf_reset(&value);
 
 	if (path && !is_bundle) {
+		if (core_subtree)
+			die("Local subtree clone does not work (now)");
 		refs = clone_local(path, git_dir);
 		mapped_refs = wanted_peer_refs(refs, refspec);
 	} else {
@@ -623,6 +628,11 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		transport_disconnect(transport);
 	}
 
+	if (core_subtree) {
+		git_config_set("core.subtree", core_subtree);
+		subtree_import();
+	}
+
 	if (!option_no_checkout) {
 		struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
 		struct unpack_trees_options opts;
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/16] pack-objects: add --subtree (for pushing)
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (9 preceding siblings ...)
  2010-07-31 16:18 ` [PATCH 10/16] clone: support subtree clone with parameter --subtree Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-07-31 16:18 ` [PATCH 12/16] subtree: rewriting outgoing commits Nguyễn Thái Ngọc Duy
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/pack-objects.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0e81673..5d7b277 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2277,6 +2277,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			grafts_replace_parents = 0;
 			continue;
 		}
+		if (!prefixcmp(arg, "--subtree=")) {
+			core_subtree = arg + 10;
+			continue;
+		}
 		usage(pack_usage);
 	}
 
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 12/16] subtree: rewriting outgoing commits
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (10 preceding siblings ...)
  2010-07-31 16:18 ` [PATCH 11/16] pack-objects: add --subtree (for pushing) Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-08-02  4:40   ` Elijah Newren
  2010-07-31 16:18 ` [PATCH 13/16] Update commit_tree() interface to take base tree too Nguyễn Thái Ngọc Duy
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Which is exactly the opposite of rewriting incoming commits.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 subtree.c |  173 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 subtree.h |    1 +
 2 files changed, 174 insertions(+), 0 deletions(-)

diff --git a/subtree.c b/subtree.c
index 8c075be..739ff5f 100644
--- a/subtree.c
+++ b/subtree.c
@@ -359,3 +359,176 @@ void subtree_import()
 	if (revs.pending.nr)
 		free(revs.pending.objects);
 }
+
+/*
+ * The opposite of narrow_tree(). Put the subtree back to the original tree.
+ */
+static int widen_tree(const unsigned char *sha1,
+		      unsigned char *newsha1,
+		      const unsigned char *subtree_sha1,
+		      const char *prefix)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	struct strbuf buffer;
+	const char *slash;
+	int subtree_len;
+	enum object_type type;
+	unsigned long size;
+	char *tree;
+
+	slash = strchr(prefix, '/');
+	subtree_len = slash ? slash - prefix : strlen(prefix);
+
+	tree = read_sha1_file(sha1, &type, &size);
+	if (type != OBJ_TREE)
+		die("%s is not a tree", sha1_to_hex(sha1));
+
+	init_tree_desc(&desc, tree, size);
+	strbuf_init(&buffer, 8192);
+	while (tree_entry(&desc, &entry)) {
+		strbuf_addf(&buffer, "%o %.*s%c", entry.mode, strlen(entry.path), entry.path, '\0');
+
+		if (S_ISDIR(entry.mode) &&
+		    subtree_len == strlen(entry.path) &&
+		    !strncmp(entry.path, prefix, subtree_len)) {
+			unsigned char newtree_sha1[20];
+
+			if (slash && slash[1]) /* trailing slash does not count */
+				widen_tree(entry.sha1, newtree_sha1, subtree_sha1,
+					   prefix+subtree_len+1);
+			else
+				/* replace the tree */
+				memcpy(newtree_sha1, subtree_sha1, 20);
+
+			strbuf_add(&buffer, newtree_sha1, 20);
+		}
+		else
+			strbuf_add(&buffer, entry.sha1, 20);
+	}
+	free(tree);
+
+	if (write_sha1_file(buffer.buf, buffer.len, tree_type, newsha1)) {
+		error("Could not write replaced tree for %s", sha1_to_hex(sha1));
+		strbuf_release(&buffer);
+		return 1;
+	}
+	strbuf_release(&buffer);
+	return 0;
+}
+
+static int find_subtree(const unsigned char *sha1, unsigned char *newsha1, const char *prefix)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	const char *slash;
+	enum object_type type;
+	unsigned long size;
+	int subtree_len;
+	char *tree;
+
+	slash = strchr(prefix, '/');
+	subtree_len = slash ? slash - prefix : strlen(prefix);
+
+	tree = read_sha1_file(sha1, &type, &size);
+	if (type != OBJ_TREE)
+		die("%s is not a tree", sha1_to_hex(sha1));
+
+	init_tree_desc(&desc, tree, size);
+	while (tree_entry(&desc, &entry)) {
+		if (!S_ISDIR(entry.mode))
+			continue;
+
+		if (subtree_len == strlen(entry.path) &&
+		    !strncmp(entry.path, prefix, subtree_len)) {
+
+			if (slash && slash[1]) { /* trailing slash does not count */
+				if (find_subtree(entry.sha1, newsha1, prefix+subtree_len+1))
+					return 1;
+			}
+			else
+				memcpy(newsha1, entry.sha1, 20);
+			free(tree);
+			return 0;
+		}
+	}
+	free(tree);
+
+	return 1;
+}
+
+/* The opposite of shadow_commit() */
+static int expose_commit(const unsigned char *sha1, unsigned char *newsha1,
+			 const unsigned char *basesha1,
+			 const char *prefix, FILE *fp)
+{
+	unsigned char treesha1[20], subtree_sha1[20];
+	enum object_type type;
+	unsigned long size, base_size;
+	void *base_buffer, *buffer;
+	int saved_read_replace_refs = read_replace_refs;
+
+	/* Get subtree from the new commit, sha1 */
+	read_replace_refs = 0;
+	buffer = read_sha1_file(sha1, &type, &size);
+	read_replace_refs = saved_read_replace_refs;
+	get_sha1_hex(buffer+5, treesha1);
+
+	if (!buffer || type != OBJ_COMMIT ||
+	    find_subtree(treesha1, subtree_sha1, prefix)) {
+		free(buffer);
+		error("Failed to find subtree tree in base commit %s", sha1_to_hex(sha1));
+		return 1;
+	}
+
+	/* Get the old base tree from basesha1 */
+	read_replace_refs = 0;
+	base_buffer = read_sha1_file(basesha1, &type, &base_size);
+	read_replace_refs = saved_read_replace_refs;
+	get_sha1_hex(base_buffer+5, treesha1);
+
+	if (!buffer || type != OBJ_COMMIT ||
+	    widen_tree(treesha1, newsha1, subtree_sha1, prefix)) {
+		free(buffer);
+		error("Failed to widen tree for commit %s", sha1_to_hex(sha1));
+		return 1;
+	}
+	free(base_buffer);
+
+	/* replace new tree in */
+	memcpy((char*)buffer+5, sha1_to_hex(newsha1), 40);
+
+	if (write_sha1_file(buffer, size, commit_type, newsha1)) {
+		free(buffer);
+		error("Could not write replaced commit for %s", sha1_to_hex(sha1));
+		return 1;
+	}
+
+	if (fp) {
+		char buf[82];
+		memcpy(buf, sha1_to_hex(newsha1), 40);
+		buf[40] = ' ';
+		memcpy(buf+41, sha1_to_hex(sha1), 40);
+		buf[81] = '\n';
+		fwrite(buf, 82, 1, fp);
+	}
+	free(buffer);
+
+	return 0;
+}
+
+int subtree_export(unsigned char *sha1, unsigned char *basesha1, unsigned char *newsha1)
+{
+	FILE *fp;
+
+	fp = fopen(git_path("subtree"), "a+");
+	if (expose_commit(sha1, newsha1, basesha1, core_subtree, fp))
+		die("Failed to rewrite commit %s", sha1_to_hex(sha1));
+	fclose(fp);
+
+	if (subtree_commit)
+		free(subtree_commit);
+	prepare_subtree_commit();
+
+	return 0;
+}
diff --git a/subtree.h b/subtree.h
index 3512e2a..081838f 100644
--- a/subtree.h
+++ b/subtree.h
@@ -1,3 +1,4 @@
 void prepare_subtree_commit();
 const unsigned char *subtree_lookup_object(const unsigned char *sha1);
 void subtree_import();
+int subtree_export(unsigned char *sha1, unsigned char *basesha1, unsigned char *newsha1);
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 13/16] Update commit_tree() interface to take base tree too
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (11 preceding siblings ...)
  2010-07-31 16:18 ` [PATCH 12/16] subtree: rewriting outgoing commits Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-07-31 16:18 ` [PATCH 14/16] commit_tree(): rewriting/replacing new commits Nguyễn Thái Ngọc Duy
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

In subtree mode, you work on a narrowed trees. You make narrowed
commits. If you want to push upstream, you would need to put your
updated subtree back to the full tree again. Otherwise upstream would
complain you delete all trees but your subtree, not good.

In order to do that, commit_tree() now takes the base tree SHA-1. With
that, it can create upstream-compatible commits. It does not now,
though.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/commit-tree.c |    2 +-
 builtin/commit.c      |    2 +-
 builtin/merge.c       |    4 ++--
 builtin/notes.c       |    2 +-
 commit.c              |    2 +-
 commit.h              |    2 +-
 notes-cache.c         |    2 +-
 7 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c
index 87f0591..88a6833 100644
--- a/builtin/commit-tree.c
+++ b/builtin/commit-tree.c
@@ -56,7 +56,7 @@ int cmd_commit_tree(int argc, const char **argv, const char *prefix)
 	if (strbuf_read(&buffer, 0, 0) < 0)
 		die_errno("git commit-tree: failed to read");
 
-	if (!commit_tree(buffer.buf, tree_sha1, parents, commit_sha1, NULL)) {
+	if (!commit_tree(buffer.buf, tree_sha1, NULL, parents, commit_sha1, NULL)) {
 		printf("%s\n", sha1_to_hex(commit_sha1));
 		return 0;
 	}
diff --git a/builtin/commit.c b/builtin/commit.c
index 2bb30c0..6b4c678 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1350,7 +1350,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		exit(1);
 	}
 
-	if (commit_tree(sb.buf, active_cache_tree->sha1, parents, commit_sha1,
+	if (commit_tree(sb.buf, active_cache_tree->sha1, NULL, parents, commit_sha1,
 			fmt_ident(author_name, author_email, author_date,
 				IDENT_ERROR_ON_NO_NAME))) {
 		rollback_index_files();
diff --git a/builtin/merge.c b/builtin/merge.c
index 37ce4f5..8745b54 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -779,7 +779,7 @@ static int merge_trivial(void)
 	parent->next = xmalloc(sizeof(*parent->next));
 	parent->next->item = remoteheads->item;
 	parent->next->next = NULL;
-	commit_tree(merge_msg.buf, result_tree, parent, result_commit, NULL);
+	commit_tree(merge_msg.buf, result_tree, NULL, parent, result_commit, NULL);
 	finish(result_commit, "In-index merge");
 	drop_save();
 	return 0;
@@ -808,7 +808,7 @@ static int finish_automerge(struct commit_list *common,
 	}
 	free_commit_list(remoteheads);
 	strbuf_addch(&merge_msg, '\n');
-	commit_tree(merge_msg.buf, result_tree, parents, result_commit, NULL);
+	commit_tree(merge_msg.buf, result_tree, NULL, parents, result_commit, NULL);
 	strbuf_addf(&buf, "Merge made by %s.", wt_strategy);
 	finish(result_commit, buf.buf);
 	strbuf_release(&buf);
diff --git a/builtin/notes.c b/builtin/notes.c
index 190005f..8d574eb 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -303,7 +303,7 @@ int commit_notes(struct notes_tree *t, const char *msg)
 		hashclr(prev_commit);
 		parent = NULL;
 	}
-	if (commit_tree(buf.buf + 7, tree_sha1, parent, new_commit, NULL))
+	if (commit_tree(buf.buf + 7, tree_sha1, NULL, parent, new_commit, NULL))
 		die("Failed to commit notes tree to database");
 
 	/* Update notes ref with new commit */
diff --git a/commit.c b/commit.c
index d1e30b2..7121631 100644
--- a/commit.c
+++ b/commit.c
@@ -811,7 +811,7 @@ static const char commit_utf8_warn[] =
 "You may want to amend it after fixing the message, or set the config\n"
 "variable i18n.commitencoding to the encoding your project uses.\n";
 
-int commit_tree(const char *msg, unsigned char *tree,
+int commit_tree(const char *msg, unsigned char *tree, unsigned char *base_tree,
 		struct commit_list *parents, unsigned char *ret,
 		const char *author)
 {
diff --git a/commit.h b/commit.h
index d8c01ea..7c34368 100644
--- a/commit.h
+++ b/commit.h
@@ -166,7 +166,7 @@ static inline int single_parent(struct commit *commit)
 
 struct commit_list *reduce_heads(struct commit_list *heads);
 
-extern int commit_tree(const char *msg, unsigned char *tree,
+extern int commit_tree(const char *msg, unsigned char *tree, unsigned char *base_tree,
 		struct commit_list *parents, unsigned char *ret,
 		const char *author);
 
diff --git a/notes-cache.c b/notes-cache.c
index dee6d62..96afb25 100644
--- a/notes-cache.c
+++ b/notes-cache.c
@@ -56,7 +56,7 @@ int notes_cache_write(struct notes_cache *c)
 
 	if (write_notes_tree(&c->tree, tree_sha1))
 		return -1;
-	if (commit_tree(c->validity, tree_sha1, NULL, commit_sha1, NULL) < 0)
+	if (commit_tree(c->validity, tree_sha1, NULL, NULL, commit_sha1, NULL) < 0)
 		return -1;
 	if (update_ref("update notes cache", c->tree.ref, commit_sha1, NULL,
 		       0, QUIET_ON_ERR) < 0)
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 14/16] commit_tree(): rewriting/replacing new commits
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (12 preceding siblings ...)
  2010-07-31 16:18 ` [PATCH 13/16] Update commit_tree() interface to take base tree too Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-07-31 16:18 ` [PATCH 15/16] commit: rewrite outgoing commits Nguyễn Thái Ngọc Duy
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 commit.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/commit.c b/commit.c
index 7121631..258d3fb 100644
--- a/commit.c
+++ b/commit.c
@@ -6,6 +6,7 @@
 #include "diff.h"
 #include "revision.h"
 #include "notes.h"
+#include "subtree.h"
 
 int save_commit_buffer = 1;
 
@@ -858,5 +859,12 @@ int commit_tree(const char *msg, unsigned char *tree, unsigned char *base_tree,
 
 	result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret);
 	strbuf_release(&buffer);
+
+	if (core_subtree && !result) {
+		unsigned char subtree_commit[20];
+		memcpy(subtree_commit, ret, 20);
+		result = subtree_export(subtree_commit, base_tree, ret);
+	}
+
 	return result;
 }
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 15/16] commit: rewrite outgoing commits
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (13 preceding siblings ...)
  2010-07-31 16:18 ` [PATCH 14/16] commit_tree(): rewriting/replacing new commits Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-07-31 16:18 ` [PATCH 16/16] do not use thin packs and subtree together (just a bad feeling about this) Nguyễn Thái Ngọc Duy
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/commit.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 6b4c678..c551d72 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1350,7 +1350,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		exit(1);
 	}
 
-	if (commit_tree(sb.buf, active_cache_tree->sha1, NULL, parents, commit_sha1,
+	if (commit_tree(sb.buf, active_cache_tree->sha1,
+			parents ? parents->item->object.sha1 : NULL,
+			parents, commit_sha1,
 			fmt_ident(author_name, author_email, author_date,
 				IDENT_ERROR_ON_NO_NAME))) {
 		rollback_index_files();
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 16/16] do not use thin packs and subtree together (just a bad feeling about this)
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (14 preceding siblings ...)
  2010-07-31 16:18 ` [PATCH 15/16] commit: rewrite outgoing commits Nguyễn Thái Ngọc Duy
@ 2010-07-31 16:18 ` Nguyễn Thái Ngọc Duy
  2010-08-01  4:14 ` [PATCH 00/16] Subtree clone proof of concept Sverre Rabbelier
  2010-08-02  5:18 ` Elijah Newren
  17 siblings, 0 replies; 33+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-07-31 16:18 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/send-pack.c |    2 ++
 upload-pack.c       |    3 +++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 481602d..fb1ad2b 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -53,6 +53,8 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
 	int i;
 
 	i = 4;
+	if (core_subtree)
+		args->use_thin_pack = 0;
 	if (args->use_thin_pack)
 		argv[i++] = "--thin";
 	if (args->use_ofs_delta)
diff --git a/upload-pack.c b/upload-pack.c
index 9b6710a..c65a3cb 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -581,6 +581,9 @@ static void receive_needs(void)
 	if (!use_sideband && daemon_mode)
 		no_progress = 1;
 
+	if (core_subtree)
+		use_thin_pack = 0;
+
 	if (depth == 0 && shallows.nr == 0)
 		return;
 	if (depth > 0) {
-- 
1.7.1.rc1.69.g24c2f7

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/16] Subtree clone proof of concept
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (15 preceding siblings ...)
  2010-07-31 16:18 ` [PATCH 16/16] do not use thin packs and subtree together (just a bad feeling about this) Nguyễn Thái Ngọc Duy
@ 2010-08-01  4:14 ` Sverre Rabbelier
  2010-08-01  6:58   ` Nguyen Thai Ngoc Duy
  2010-08-02  5:18 ` Elijah Newren
  17 siblings, 1 reply; 33+ messages in thread
From: Sverre Rabbelier @ 2010-08-01  4:14 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Heya,

2010/7/31 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:
> With this, I can clone Documentaion/ from git.git, update and push.

Very nice!

> Space consumption is 24MB (58MB for full
> repo).  Not really impressive, but if one truely cares about disk
> space, he/she should also use shallow clone.

Can they be combined to create the fabled narrow checkout?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/16] Subtree clone proof of concept
  2010-08-01  4:14 ` [PATCH 00/16] Subtree clone proof of concept Sverre Rabbelier
@ 2010-08-01  6:58   ` Nguyen Thai Ngoc Duy
  2010-08-01 20:05     ` Sverre Rabbelier
  0 siblings, 1 reply; 33+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-08-01  6:58 UTC (permalink / raw)
  To: Sverre Rabbelier; +Cc: git

2010/8/1 Sverre Rabbelier <srabbelier@gmail.com>:
>> Space consumption is 24MB (58MB for full
>> repo).  Not really impressive, but if one truely cares about disk
>> space, he/she should also use shallow clone.
>
> Can they be combined to create the fabled narrow checkout?

Yes. For the record, --subtree=Documentation/ with --depth=1 made a pack of 5MB.
-- 
Duy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/16] list-objects: limit traversing within the given  subtree if core.subtree is set
  2010-07-31 16:18 ` [PATCH 02/16] list-objects: limit traversing within the given subtree if core.subtree is set Nguyễn Thái Ngọc Duy
@ 2010-08-01 11:30   ` Ævar Arnfjörð Bjarmason
  2010-08-01 23:11     ` Nguyen Thai Ngoc Duy
  2010-08-02  4:21   ` Elijah Newren
  1 sibling, 1 reply; 33+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-08-01 11:30 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

> +       int subtree_len;

Shouldn't that be size_t? strlen returns size_t, and strncmp expects
size_t, not int.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/16] Subtree clone proof of concept
  2010-08-01  6:58   ` Nguyen Thai Ngoc Duy
@ 2010-08-01 20:05     ` Sverre Rabbelier
  0 siblings, 0 replies; 33+ messages in thread
From: Sverre Rabbelier @ 2010-08-01 20:05 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

Heya,

On Sun, Aug 1, 2010 at 01:58, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> 2010/8/1 Sverre Rabbelier <srabbelier@gmail.com>:
>> Can they be combined to create the fabled narrow checkout?
>
> Yes. For the record, --subtree=Documentation/ with --depth=1 made a pack of 5MB.

I hope everybody is paying attention to these patches then! :)

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/16] list-objects: limit traversing within the given  subtree if core.subtree is set
  2010-08-01 11:30   ` Ævar Arnfjörð Bjarmason
@ 2010-08-01 23:11     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 33+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-08-01 23:11 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

2010/8/1 Ævar Arnfjörð Bjarmason <avarab@gmail.com>:
>> +       int subtree_len;
>
> Shouldn't that be size_t? strlen returns size_t, and strncmp expects
> size_t, not int.
>

Hmm.. yeah. The compiler didn't warn me. Anyway subtree_len should be
small enough (i.e. < PATH_MAX) that type does not really matters.
-- 
Duy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/16] list-objects: limit traversing within the given  subtree if core.subtree is set
  2010-07-31 16:18 ` [PATCH 02/16] list-objects: limit traversing within the given subtree if core.subtree is set Nguyễn Thái Ngọc Duy
  2010-08-01 11:30   ` Ævar Arnfjörð Bjarmason
@ 2010-08-02  4:21   ` Elijah Newren
  2010-08-02  6:51     ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 33+ messages in thread
From: Elijah Newren @ 2010-08-02  4:21 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Hi,

2010/7/31 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  list-objects.c |   23 +++++++++++++++++------
>  1 files changed, 17 insertions(+), 6 deletions(-)
>
> diff --git a/list-objects.c b/list-objects.c
> index 8953548..1b25b54 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -61,12 +61,15 @@ static void process_tree(struct rev_info *revs,
>                         struct tree *tree,
>                         show_object_fn show,
>                         struct name_path *path,
> -                        const char *name)
> +                        const char *name,
> +                        const char *subtree)
>  {
>        struct object *obj = &tree->object;
>        struct tree_desc desc;
>        struct name_entry entry;
>        struct name_path me;
> +       const char *slash;
> +       int subtree_len;

Perhaps slash should be initialized to NULL?  Otherwise I think it
will be used uninitialized.

>
>        if (!revs->tree_objects)
>                return;
> @@ -82,13 +85,21 @@ static void process_tree(struct rev_info *revs,
>        me.elem = name;
>        me.elem_len = strlen(name);
>
> +       if (subtree) {
> +               slash = strchr(subtree, '/');
> +               subtree_len = slash ? slash - subtree : strlen(subtree);
> +       }
> +
>        init_tree_desc(&desc, tree->buffer, tree->size);
>
>        while (tree_entry(&desc, &entry)) {
> -               if (S_ISDIR(entry.mode))
> -                       process_tree(revs,
> -                                    lookup_tree(entry.sha1),
> -                                    show, &me, entry.path);
> +               if (S_ISDIR(entry.mode)) {
> +                       if (!subtree || !strncmp(entry.path, subtree, subtree_len))

Only one subdirectory allowed?  What if someone wants a sparse clone
containing two or more directories?  (Actually, that's not so much of
a "what if" -- it's exactly what I want in about half my usecases for
sparse clones.)

> +                               process_tree(revs,
> +                                            lookup_tree(entry.sha1),
> +                                            show, &me, entry.path,
> +                                            slash && slash[1] ? slash+1 : NULL);

If I read correctly, slash will be used uninitialized here whenever
subtree == NULL.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/16] upload-pack: use a separate variable to control  whether internal rev-list is used
  2010-07-31 16:18 ` [PATCH 06/16] upload-pack: use a separate variable to control whether internal rev-list is used Nguyễn Thái Ngọc Duy
@ 2010-08-02  4:25   ` Elijah Newren
  0 siblings, 0 replies; 33+ messages in thread
From: Elijah Newren @ 2010-08-02  4:25 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Hi,

2010/7/31 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  upload-pack.c |    9 +++++----
>  1 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/upload-pack.c b/upload-pack.c
> index dc464d7..e432e83 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -160,8 +160,9 @@ static void create_pack_file(void)
>        ssize_t sz;
>        const char *argv[10];
>        int arg = 0;
> +       int internal_rev_list = shallow_nr;
<snip>

I've got the exact same changes in one of my in-progress-patches in my
sparse-clone branch.  That is, other than the variable name, but I
like yours better.  Needless to say, I agree with this change.  :-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/16] upload-pack: support subtree pack
  2010-07-31 16:18 ` [PATCH 07/16] upload-pack: support subtree pack Nguyễn Thái Ngọc Duy
@ 2010-08-02  4:27   ` Elijah Newren
  0 siblings, 0 replies; 33+ messages in thread
From: Elijah Newren @ 2010-08-02  4:27 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

2010/7/31 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:
<snip>
> diff --git a/upload-pack.c b/upload-pack.c
> index e432e83..9b6710a 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
<snip>
> @@ -505,6 +505,20 @@ static void receive_needs(void)
>                if (debug_fd)
>                        write_in_full(debug_fd, line, len);
>
> +               if (!prefixcmp(line, "subtree ")) {
> +                       int len;
> +                       char *subtree;
> +                       if (core_subtree)
> +                               die("sorry, only one subtree supported");

I'm not sure users would understand this error message; perhaps
something more like "Fetching/cloning from a subtree-sparse repository
not supported"?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/16] subtree: rewrite incoming commits
  2010-07-31 16:18 ` [PATCH 09/16] subtree: rewrite incoming commits Nguyễn Thái Ngọc Duy
@ 2010-08-02  4:37   ` Elijah Newren
  0 siblings, 0 replies; 33+ messages in thread
From: Elijah Newren @ 2010-08-02  4:37 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Hi,

2010/7/31 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:
> This adds the main function, subtree_import(), which is intended to be
> used by "git clone".
>
> Because subtree packs are not complete. They are barely usable. Git
> client will cry out missing objects here and there... Theortically,
> client code could be adapted to only look for objects within
> subtree. That was painful to try.

It may have been painful, but personally I think it's still the right
way to do it.  Of course, that's a pretty easy thing for me to say,
since you're pretty far ahead of me and I haven't felt your pain yet.
Maybe I'll change my mind after trying it for a while, but I'm not
convinced just yet.

> +/*
> + * Take sha1 of a tree, rewrite it to only return the prefix and return
> + * the newsha1.
> + *
> + * If if is zero, write to object store. If fd is greater than zero,
> + * it's a pack file handle.

Should the second word of the second paragraph be 'fd' rather than another 'if'?


> +                       if (slash && slash[1]) /* trailing slash does not count */
> +                               narrow_tree(entry.sha1, newtree_sha1, prefix+subtree_len+1, fd);
> +                       else
> +                               memcpy(newtree_sha1, entry.sha1, 20);
> +
> +                       strbuf_addf(&buffer, "%o %.*s%c", entry.mode, strlen(entry.path), entry.path, '\0');

My compiler complains that you didn't typecast the return value from
strlen to an int.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 12/16] subtree: rewriting outgoing commits
  2010-07-31 16:18 ` [PATCH 12/16] subtree: rewriting outgoing commits Nguyễn Thái Ngọc Duy
@ 2010-08-02  4:40   ` Elijah Newren
  0 siblings, 0 replies; 33+ messages in thread
From: Elijah Newren @ 2010-08-02  4:40 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

2010/7/31 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:
<snip>
> +       init_tree_desc(&desc, tree, size);
> +       strbuf_init(&buffer, 8192);
> +       while (tree_entry(&desc, &entry)) {
> +               strbuf_addf(&buffer, "%o %.*s%c", entry.mode, strlen(entry.path), entry.path, '\0');

Again, gcc here complains that "subtree.c:390: warning: field
precision should have type ‘int’, but argument 4 has type ‘size_t’" --
typecast the return value of strlen to int?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/16] Subtree clone proof of concept
  2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
                   ` (16 preceding siblings ...)
  2010-08-01  4:14 ` [PATCH 00/16] Subtree clone proof of concept Sverre Rabbelier
@ 2010-08-02  5:18 ` Elijah Newren
  2010-08-02  7:10   ` Nguyen Thai Ngoc Duy
  2010-08-02 22:55   ` Nguyen Thai Ngoc Duy
  17 siblings, 2 replies; 33+ messages in thread
From: Elijah Newren @ 2010-08-02  5:18 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Hi,

2010/7/31 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:
> Something to play with so we can evaluate which is the best strategy
> for non-full clone (or whatever you call it).

Very nice, it's awesome you're working on this.  I'm of the same
opinion that Shawn stated earlier, namely that I don't like the route
of rewriting commits on the fly like this (more on that later), but
it's really cool to see some ideas being tried and pushed to their
limits.

> The idea is the same: pack only enough to access a subtree, rewrite
> commits at client side, rewrite again when pushing. However I put
> git-replace into the mix, so at least commit SHA-1 looks as same as from
> upstream. git-subtree is not needed (although it's still an option)
>
> With this, I can clone Documentaion/ from git.git, update and push. I

I tried it out, but I seem to be doing something wrong.  I applied
your patches to current master, and tried the following -- am I doing
something wrong or omitting any important steps?

$ git --version
git version 1.7.2.1.22.g236df

$ git clone file://$(pwd)/git fullclone
Cloning into fullclone...
warning: templates not found /home/newren/share/git-core/templates
remote: Counting objects: 96220, done.
remote: Compressing objects: 100% (24925/24925), done.
remote: Total 96220 (delta 70575), reused 95687 (delta 70236)
Receiving objects: 100% (96220/96220), 18.45 MiB | 11.43 MiB/s, done.
Resolving deltas: 100% (70575/70575), done.
fatal: unable to read tree 49374ea4780c0db6db7c604697194bc9b148f3dc

$ git clone --subtree=Documentation/ file://$(pwd)/git docclone
Cloning into docclone...
warning: templates not found /home/newren/share/git-core/templates
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed


> haven't tested it further. Space consumption is 24MB (58MB for full
> repo).  Not really impressive, but if one truely cares about disk
> space, he/she should also use shallow clone.

58 MB for full repo?  What are you counting?  For me, I get 25M:

$ git clone git://git.kernel.org/pub/scm/git/git.git
$ ls -lh git/.git/objects/pack/*.pack
-r--r--r--. 1 newren newren 25M 2010-08-01 18:05
git/.git/objects/pack/pack-d41d36a8f0f34d5bc647b3c83c5d6b64fbc059c8.pack

Are you counting the full checkout too or something?  If so, that
varies very wildly between systems, making it hard to compare numbers.
 (For me, 'du -hs git/' returns 44 MB.)  I'd like to be able to
duplicate your numbers and investigate further.  It seems to me that
we ought to be able to get that lower.

> Performance is impacted, due to bulk commit replacement. There is a
> split second delay for every command. It's the price of replacing 24k
> commits every time. I think the delay could be improved a little bit
> (caching or mmap..)
>
> Rewriting commits at clone takes time too. Doing individual object
> writing takes lots of space and time. I put all new objects directly
> to a pack now. Rewriting time now becomes quite acceptable (a few
> seconds). Although deep subtree/repo may take longer. Rewriting on
> demand can be considered in such cases.
>
> Repo-care commands like fsck, repack, gc are left out for now.
>
> Finally, it's more of a hack just to see how far I can go. It will
> break things.

I think it's a pretty nifty hack.  It's fun to see.  :-)  However, I
do have a number of reservations about the general strategy:  As
mentioned earlier, I'm not sure I like the on-the-fly commit
rewriting, as mentioned by Shawn in your previous
subtree-for-upload-pack patch series.  You did take care of the
"referring to commit-sha1" issue he brought up by using the replace
mechanism, but I'm still not sure I'm comfortable with it.  The
performance implications also worry me (a lot of the reason for sparse
clones was to improve performance, at least from my view), as does the
fact that it only works on exactly one subtree (at least your current
implementation; most of my usecases involve multiple sibling
subdirectories that I'd like to get), as does the fact that it
(currently) only handles trees and does not handle files (ruling out
the translator usecase I'd like to see covered, e.g. cloning just
po/de.po and its history without all sibling files).

Also, I couldn't tell if your implementation downloaded full commit
information for commits that didn't touch any of the files under the
relevant subtree.  I think it does, but couldn't tell for sure (I
wanted to use a clone and dig into it to find out, but ran into the
problems I mentioned above).  If so, that also worries me a bit -- see
http://article.gmane.org/gmane.comp.version-control.git/152343.

Your implementation also suffers from the same limitations as current
shallow clones.  For example, you can't clone or fetch from a subtree
clone.  That limits collaboration between people needing to work on
the same subset of history, and was a limitation I was hoping to see
fixed, rather than propagated to more features.

I hope I'm not coming across as too critical.  I'm really excited to
see work in this area.  Hopefully I can get more time to pursue my
route a bit further; currently I don't have too much more than a
detailed idea write-up (heavily revised since the previous thread --
thanks for the feedback, btw).  Or maybe you just know how to address
all my concerns and you beat me to the punch.  That'd be awesome.


Elijah

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/16] list-objects: limit traversing within the given  subtree if core.subtree is set
  2010-08-02  4:21   ` Elijah Newren
@ 2010-08-02  6:51     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 33+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-08-02  6:51 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git

2010/8/2 Elijah Newren <newren@gmail.com>:
>>
>>        if (!revs->tree_objects)
>>                return;
>> @@ -82,13 +85,21 @@ static void process_tree(struct rev_info *revs,
>>        me.elem = name;
>>        me.elem_len = strlen(name);
>>
>> +       if (subtree) {
>> +               slash = strchr(subtree, '/');
>> +               subtree_len = slash ? slash - subtree : strlen(subtree);
>> +       }
>> +
>>        init_tree_desc(&desc, tree->buffer, tree->size);
>>
>>        while (tree_entry(&desc, &entry)) {
>> -               if (S_ISDIR(entry.mode))
>> -                       process_tree(revs,
>> -                                    lookup_tree(entry.sha1),
>> -                                    show, &me, entry.path);
>> +               if (S_ISDIR(entry.mode)) {
>> +                       if (!subtree || !strncmp(entry.path, subtree, subtree_len))
>
> Only one subdirectory allowed?  What if someone wants a sparse clone
> containing two or more directories?  (Actually, that's not so much of
> a "what if" -- it's exactly what I want in about half my usecases for
> sparse clones.)

One is simpler. So one first, multiple may come later.

>> +                               process_tree(revs,
>> +                                            lookup_tree(entry.sha1),
>> +                                            show, &me, entry.path,
>> +                                            slash && slash[1] ? slash+1 : NULL);
>
> If I read correctly, slash will be used uninitialized here whenever
> subtree == NULL.

Yes. Thanks. Will fix.
-- 
Duy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/16] Subtree clone proof of concept
  2010-08-02  5:18 ` Elijah Newren
@ 2010-08-02  7:10   ` Nguyen Thai Ngoc Duy
  2010-08-02 22:55   ` Nguyen Thai Ngoc Duy
  1 sibling, 0 replies; 33+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-08-02  7:10 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git

2010/8/2 Elijah Newren <newren@gmail.com>:
>> The idea is the same: pack only enough to access a subtree, rewrite
>> commits at client side, rewrite again when pushing. However I put
>> git-replace into the mix, so at least commit SHA-1 looks as same as from
>> upstream. git-subtree is not needed (although it's still an option)
>>
>> With this, I can clone Documentaion/ from git.git, update and push. I
>
> I tried it out, but I seem to be doing something wrong.  I applied
> your patches to current master, and tried the following -- am I doing
> something wrong or omitting any important steps?
>
> $ git --version
> git version 1.7.2.1.22.g236df
>
> $ git clone file://$(pwd)/git fullclone
> Cloning into fullclone...
> warning: templates not found /home/newren/share/git-core/templates
> remote: Counting objects: 96220, done.
> remote: Compressing objects: 100% (24925/24925), done.
> remote: Total 96220 (delta 70575), reused 95687 (delta 70236)
> Receiving objects: 100% (96220/96220), 18.45 MiB | 11.43 MiB/s, done.
> Resolving deltas: 100% (70575/70575), done.
> fatal: unable to read tree 49374ea4780c0db6db7c604697194bc9b148f3dc

This one looks like the unintialized case you pointed out in
process_tree(). No I did not try full clone on my patched git :-P

> $ git clone --subtree=Documentation/ file://$(pwd)/git docclone
> Cloning into docclone...
> warning: templates not found /home/newren/share/git-core/templates
> fatal: The remote end hung up unexpectedly
> fatal: early EOF
> fatal: index-pack failed

Not sure. Does file:// use receive-pack/upload-pack? I tested it over
local ssh. Will try again soon.

>> haven't tested it further. Space consumption is 24MB (58MB for full
>> repo).  Not really impressive, but if one truely cares about disk
>> space, he/she should also use shallow clone.
>
> 58 MB for full repo?  What are you counting?  For me, I get 25M:
>
> $ git clone git://git.kernel.org/pub/scm/git/git.git
> $ ls -lh git/.git/objects/pack/*.pack
> -r--r--r--. 1 newren newren 25M 2010-08-01 18:05
> git/.git/objects/pack/pack-d41d36a8f0f34d5bc647b3c83c5d6b64fbc059c8.pack
>
> Are you counting the full checkout too or something?  If so, that
> varies very wildly between systems, making it hard to compare numbers.
>  (For me, 'du -hs git/' returns 44 MB.)  I'd like to be able to
> duplicate your numbers and investigate further.  It seems to me that
> we ought to be able to get that lower.

It's my git.git, probably has more topic branches plus junk stuff. If
you are only interested in numbers, playing with git pack-objects is
enough. You need changes in list-objects.c and builtin/pack-objects.c,
then you can

git pack-objects --stdout --subtree=foo/ > temp.pack

and examine it with verify-pack.

>> Finally, it's more of a hack just to see how far I can go. It will
>> break things.
>
> I think it's a pretty nifty hack.  It's fun to see.  :-)  However, I
> do have a number of reservations about the general strategy:  As
> mentioned earlier, I'm not sure I like the on-the-fly commit
> rewriting, as mentioned by Shawn in your previous
> subtree-for-upload-pack patch series.  You did take care of the
> "referring to commit-sha1" issue he brought up by using the replace
> mechanism, but I'm still not sure I'm comfortable with it.  The
> performance implications also worry me (a lot of the reason for sparse
> clones was to improve performance, at least from my view), as does the
> fact that it only works on exactly one subtree (at least your current
> implementation; most of my usecases involve multiple sibling
> subdirectories that I'd like to get), as does the fact that it
> (currently) only handles trees and does not handle files (ruling out
> the translator usecase I'd like to see covered, e.g. cloning just
> po/de.po and its history without all sibling files).

And it's also fun to try. I'd like to try it on larger repos but I
have quite limited network until October.

> Also, I couldn't tell if your implementation downloaded full commit
> information for commits that didn't touch any of the files under the
> relevant subtree.  I think it does, but couldn't tell for sure (I
> wanted to use a clone and dig into it to find out, but ran into the
> problems I mentioned above).  If so, that also worries me a bit -- see
> http://article.gmane.org/gmane.comp.version-control.git/152343.

It does. Yes, that's also something to think of.

> Your implementation also suffers from the same limitations as current
> shallow clones.  For example, you can't clone or fetch from a subtree
> clone.  That limits collaboration between people needing to work on
> the same subset of history, and was a limitation I was hoping to see
> fixed, rather than propagated to more features.

I agree. Being able to fetch from an incomplete repo is very nice.
Though I admit I don't know how to do it. I think sparse clone would
suffer the same, wouldn't it?

> I hope I'm not coming across as too critical.  I'm really excited to
> see work in this area.  Hopefully I can get more time to pursue my
> route a bit further; currently I don't have too much more than a
> detailed idea write-up (heavily revised since the previous thread --
> thanks for the feedback, btw).  Or maybe you just know how to address
> all my concerns and you beat me to the punch.  That'd be awesome.

Look forward to see sparse clone realized. Although I think that would
be painful :-)
-- 
Duy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/16] Hook up replace-object to allow bulk commit replacement
  2010-07-31 16:18 ` [PATCH 05/16] Hook up replace-object to allow bulk commit replacement Nguyễn Thái Ngọc Duy
@ 2010-08-02 19:58   ` Junio C Hamano
  2010-08-02 22:42     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2010-08-02 19:58 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

I really do not like the use of "replace" for the purpose of narrow
clones.  While "replace" is about fixing a mistake by tweaking trees, a
desire to have a narrow clone at this moment is _not_ a mistake.  You may
want to have wider or full clone of the project tomorrow.  You may want to
push the result of committing on top of such a narrowed clone back to a
full repository.  My gut feeling is that that use of "replace" to stub out
the objects that you do not currently have would make it a nightmare when
you would want to widen (especially to widen over the wire while pushing
into a full repository on the other end), although I haven't looked at all
the patches in the series.

Can you back up a bit and give us a high-level overview of how various
operations in a narrowed clone should work, and how you achieve that
design goal?

Let's take an example of starting from git.git and narrow-clone only its
Documentation/ (as you seem to have used as a guinea-pig) subdirectory.
For the sake of simplicity, let's say the upstream project has only one
commit.

One plausible approach would be to have the commit, its top level tree
object, its Documentation/ tree object and all the blobs below that level,
while other blobs and trees that are reachable from the top level tree
object are left missing, but somehow are marked so that fsck would think
they are OK to be missing.  Your worktree would obviously be narrowed to
the same Documentation/ area, and unlike the narrow checkout codepath, you
do not widen on demand (unless you automatically fetch missing parts of
the tree, which I do not think you should do by default to help people who
work while at 30,000ft).  Instead, any operation that tries to modify
outside the "subtree" area should fail.

When you build a commit that represents a Documentation patch on top of
such a narrowed clone, because you have a full tree of Documentation/
area, you can come up with the updated tree object for that part of the
project.  If "subtree" mode (aka narrowed clone) rejects operation outside
the cloned area, your commit is guaranteed to touch only Documentation/
area and nothing outside.  You therefore should be able to compute the
tree object for the whole repository (i.e. all the other entries in the
top level tree object should be the same as those from HEAD).

Because the index is a flat structure, you would need to fudge the entries
that are missing-but-OK in there somehow, _and_ you would need to be able
to recompute the tree after updating Documentation/ area.  E.g. you may
know ppc/ is tree db31c066 but may not know that it has three blobs
underneath it nor what their object names are, so your index operating in
this mode would need to record (ppc -> db31c066) mapping in order to be
able to recreate the tree object out of it.

Using cache-tree data structure might help in doing this.  It so far has
been an optimization (i.e. when it says it has an up-to-date information,
it does, but if it doesn't you can always recompute what is needed from
the flat index entries), but I would imagine that you can add an "out of
cloned area" bit to cache-tree entries, and mark a subtree that represents
missing parts (e.g. 'ppc/') as such---anything that tries to invalidate
such a cache-tree entry would be an error anyway, and when you need to
write the index out as a tree, such cache-tree entries that record the
trees outside your cloned area can be reused, no?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/16] Hook up replace-object to allow bulk commit  replacement
  2010-08-02 19:58   ` Junio C Hamano
@ 2010-08-02 22:42     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 33+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-08-02 22:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

2010/8/3 Junio C Hamano <gitster@pobox.com>:
> I really do not like the use of "replace" for the purpose of narrow
> clones.  While "replace" is about fixing a mistake by tweaking trees, a
> desire to have a narrow clone at this moment is _not_ a mistake.  You may
> want to have wider or full clone of the project tomorrow.  You may want to
> push the result of committing on top of such a narrowed clone back to a
> full repository.  My gut feeling is that that use of "replace" to stub out
> the objects that you do not currently have would make it a nightmare when
> you would want to widen (especially to widen over the wire while pushing
> into a full repository on the other end), although I haven't looked at all
> the patches in the series.

Indeed. My intention was "hey this repo is too big, I only need some
pieces of it. Let me grab something and do my work. (Then throw away
the cloned repo)". It's best used together with shallow clone to give
low download/disk space, and a minimum tree to fix something quick.

I'm not really sure if such repos are sustainable in long run. And no
I did not want to widen/narrow the tree (as it was to be throw away
tree). Now thinking of widening. The way I do narrow clone is quite
similar with shallow clone. I hope the way shallow clone is deepen can
be applied to widening clone.

> Can you back up a bit and give us a high-level overview of how various
> operations in a narrowed clone should work, and how you achieve that
> design goal?

Operations work as normal (as the incomplete clone is augmented to
become "normal"). In order to make it looks normal, every time a new
commit comes in (either from another repository, or user creates a new
one), the commit needs to be processed/replaced, so that the repo
looks normal from git perspective.

> Let's take an example of starting from git.git and narrow-clone only its
> Documentation/ (as you seem to have used as a guinea-pig) subdirectory.
> For the sake of simplicity, let's say the upstream project has only one
> commit.
>
> One plausible approach would be to have the commit, its top level tree
> object, its Documentation/ tree object and all the blobs below that level,
> while other blobs and trees that are reachable from the top level tree
> object are left missing, but somehow are marked so that fsck would think
> they are OK to be missing.  Your worktree would obviously be narrowed to
> the same Documentation/ area, and unlike the narrow checkout codepath, you
> do not widen on demand (unless you automatically fetch missing parts of
> the tree, which I do not think you should do by default to help people who
> work while at 30,000ft).  Instead, any operation that tries to modify
> outside the "subtree" area should fail.

Changes outside the subtree area are dropped on the floor now, not
fail. But yes, it should fail.

> When you build a commit that represents a Documentation patch on top of
> such a narrowed clone, because you have a full tree of Documentation/
> area, you can come up with the updated tree object for that part of the
> project.  If "subtree" mode (aka narrowed clone) rejects operation outside
> the cloned area, your commit is guaranteed to touch only Documentation/
> area and nothing outside.  You therefore should be able to compute the
> tree object for the whole repository (i.e. all the other entries in the
> top level tree object should be the same as those from HEAD).

Correct. Except..

> Because the index is a flat structure, you would need to fudge the entries
> that are missing-but-OK in there somehow, _and_ you would need to be able
> to recompute the tree after updating Documentation/ area.  E.g. you may
> know ppc/ is tree db31c066 but may not know that it has three blobs
> underneath it nor what their object names are, so your index operating in
> this mode would need to record (ppc -> db31c066) mapping in order to be
> able to recreate the tree object out of it.

This is where git-replace comes in. I do not want to deal with full
flat index. Giving pointers to missing objects may make git commands
nervous. I rewrite the commit so that now it only has Documentation/
and nothing else (which I have all needed objects). The index is
narrowed too. Because the index (even narrowed) is complete (i.e. all
entries reachable), most operations should work.

Then, to hide the helper commit from user, I replace the original
(full) commit with this new commit. So from outside git sees SHA-1 of
the original commit, but its content is from the helper one. These
helper commits guarantee git won't reach out for missing objects.

It's a trade off. Doing full index requires much more effort into git.
Using "git-subtree split", while free git developers to do other
things, might be inconvenient for users (without server support, full
repo must be downloaded, replaced SHA-1 from git-subtree cannot be
used to communicate with coworkers..)

> Using cache-tree data structure might help in doing this.  It so far has
> been an optimization (i.e. when it says it has an up-to-date information,
> it does, but if it doesn't you can always recompute what is needed from
> the flat index entries), but I would imagine that you can add an "out of
> cloned area" bit to cache-tree entries, and mark a subtree that represents
> missing parts (e.g. 'ppc/') as such---anything that tries to invalidate
> such a cache-tree entry would be an error anyway, and when you need to
> write the index out as a tree, such cache-tree entries that record the
> trees outside your cloned area can be reused, no?

That's just a part of the story. Repository integrity is a
prerequisite in git from the beginning. git-merge operates directly on
trees so cache-tree won't help much. git-commit does sha1 existence
check on every sha1 before commit, so it needs to be
narrow-clone-aware too. That made me wonder if has_sha1_file was used
elsewhere. Then "git grep has_sha1_file" scared me off and I'm back
away.
-- 
Duy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/16] Subtree clone proof of concept
  2010-08-02  5:18 ` Elijah Newren
  2010-08-02  7:10   ` Nguyen Thai Ngoc Duy
@ 2010-08-02 22:55   ` Nguyen Thai Ngoc Duy
  1 sibling, 0 replies; 33+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-08-02 22:55 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git

2010/8/2 Elijah Newren <newren@gmail.com>:
>> haven't tested it further. Space consumption is 24MB (58MB for full
>> repo).  Not really impressive, but if one truely cares about disk
>> space, he/she should also use shallow clone.
>
> 58 MB for full repo?  What are you counting?  For me, I get 25M:

My number 24MB was incorrect because process_tree() leaked too many
blobs. It should have been 16MB. Anyway I have updated my series and
put it here (to spam git mailing less)

http://repo.or.cz/w/git/pclouds.git/shortlog/refs/heads/subtree
(caveat: constantly rebased tree)

if you still want to play with it. For number lovers, fetching only
Documentation from linux-2.6.git took 94MB (full repo 366MB). Yeah
Documentation was an easy target.
-- 
Duy

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2010-08-02 22:55 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 01/16] Add core.subtree Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 02/16] list-objects: limit traversing within the given subtree if core.subtree is set Nguyễn Thái Ngọc Duy
2010-08-01 11:30   ` Ævar Arnfjörð Bjarmason
2010-08-01 23:11     ` Nguyen Thai Ngoc Duy
2010-08-02  4:21   ` Elijah Newren
2010-08-02  6:51     ` Nguyen Thai Ngoc Duy
2010-07-31 16:18 ` [PATCH 03/16] parse_object: keep sha1 even when parsing replaced one Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 04/16] Allow to invalidate a commit in in-memory object store Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 05/16] Hook up replace-object to allow bulk commit replacement Nguyễn Thái Ngọc Duy
2010-08-02 19:58   ` Junio C Hamano
2010-08-02 22:42     ` Nguyen Thai Ngoc Duy
2010-07-31 16:18 ` [PATCH 06/16] upload-pack: use a separate variable to control whether internal rev-list is used Nguyễn Thái Ngọc Duy
2010-08-02  4:25   ` Elijah Newren
2010-07-31 16:18 ` [PATCH 07/16] upload-pack: support subtree pack Nguyễn Thái Ngọc Duy
2010-08-02  4:27   ` Elijah Newren
2010-07-31 16:18 ` [PATCH 08/16] fetch-pack: support --subtree Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 09/16] subtree: rewrite incoming commits Nguyễn Thái Ngọc Duy
2010-08-02  4:37   ` Elijah Newren
2010-07-31 16:18 ` [PATCH 10/16] clone: support subtree clone with parameter --subtree Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 11/16] pack-objects: add --subtree (for pushing) Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 12/16] subtree: rewriting outgoing commits Nguyễn Thái Ngọc Duy
2010-08-02  4:40   ` Elijah Newren
2010-07-31 16:18 ` [PATCH 13/16] Update commit_tree() interface to take base tree too Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 14/16] commit_tree(): rewriting/replacing new commits Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 15/16] commit: rewrite outgoing commits Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 16/16] do not use thin packs and subtree together (just a bad feeling about this) Nguyễn Thái Ngọc Duy
2010-08-01  4:14 ` [PATCH 00/16] Subtree clone proof of concept Sverre Rabbelier
2010-08-01  6:58   ` Nguyen Thai Ngoc Duy
2010-08-01 20:05     ` Sverre Rabbelier
2010-08-02  5:18 ` Elijah Newren
2010-08-02  7:10   ` Nguyen Thai Ngoc Duy
2010-08-02 22:55   ` Nguyen Thai Ngoc Duy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.