All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCH 0/7] Rework git core for native submodules
@ 2013-04-04 18:30 Ramkumar Ramachandra
  2013-04-04 18:30 ` [PATCH 1/7] link.c, link.h: introduce fifth object type Ramkumar Ramachandra
                   ` (10 more replies)
  0 siblings, 11 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 18:30 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Linus Torvalds

Hi,

The purpose of this series is to convince you that we've made a lot of
fundamental mistakes while designing submodules, and that we should
fix them now.  [1/7] argues for a new object type, and this is the
core of the idea.

It's an entire beautiful design + UI/UX package.  To demo it now:

    # Switch your git to https://github.com/artagnon/git/tree/link

    $ git config --global url."git://github.com/".insteadOf gh:
    $ git config --global clone.submodulegitdir /home/<user>/bare
    $ cd /tmp
    $ git clone gh:artagnon/varlog
    $ cd varlog
    $ git clone gh:artagnon/clayoven
    # Notice how it puts clayoven.git in ~/bare
    $ git add clayoven
    # Just works!
    $ git commit -a -m "add subproject clayoven"
    $ git ls-tree HEAD
    # Try to cat-file or show the link object
    $ cd clayoven/lib
    $ git clone gh:artagnon/sandbox
    $ git add sandbox
    # Again, just works!  No cd-to-toplevel nonsense
    $ git ci -a -m "add subproject sandbox"
    $ cd ../..
    $ git rm clayoven
    $ git ci -a -m "remove subproject clayoven"
    # start hacking: read the rest of this email
    # note that git diff is broken

>From a design perspective, the goal is to make possible various kinds
of compositions, essentially replacing repo, mr, gitslave,
git-subtree, (the old) git-submodule, and other similar tools.  All
submdodules are not equal: each one will have tweakable parameters
that will change how git-core treats them.

>From the UI/UX perspective, the goal is to get existing git commands
to work seamlessly with submodules without introducing any
submodule-specific commands (or atleast keeping it to a minimum).  The
deprecation path for git-submodule is clear: first, we have to strip
it down to be a very thin wrapper around existing git commands, and
then announce that it's no longer necessary.

This series is two days of unedited unrebased work ([5/7] is dead code
for instance).  I've written in a big hurry, and it's meant to be a
proof-of-concept only.  I discovered lots of core bugs along the way
that need to be fixed first.  Off the top of my head:

1. 'git add' should not go past submodule boundaries.  I should not be
   able to 'git add clayoven/' or 'git add clayoven/LICENSE'.  In
   addition, the shell completion also needs to be fixed.

2. An empty directory containing a .git file is a perfectly valid
   worktree, but does not show up in the superproject's 'git status'
   output.  How can it be treated like an empty directory?

3. sha1_file.c:index_path() should not return paths with trailing
   slashes, I think.

4. There really must be a better way to figure out if I'm in a
   worktree than setup_git_directory_gently().

Also, I'm going to need your help to finish this.  I was trying to
write the 8th patch when I got stuck.  I'm guessing I need to
understand how wt-status extracts the differences between two trees,
and filter it for added/removed link objects.  Then, I have to follow
the example of updating the HEAD ref to update my
refs/modules/<branch>/* refs.

If you think this is all a big waste of time, and that we should focus
on improving git-submodule.sh, you're probably deranged.  Because it's
_very_ clear to me.

Thank you for reading.

Ramkumar Ramachandra (7):
  link.c, link.h: introduce fifth object type
  sha1_file, link: write link objects to the database
  teach ce_compare_gitlink() about OBJ_LINK
  builtin/log: teach show about OBJ_LINK
  edit-link: add new builtin
  clone: introduce clone.submodulegitdir
  sha1_file: write ref_name to link object

 Makefile          |  2 ++
 alloc.c           |  3 ++
 builtin/clone.c   | 29 +++++++++++++++++++
 builtin/log.c     | 17 +++++++++++
 builtin/ls-tree.c |  4 +--
 cache.h           |  3 +-
 environment.c     | 11 -------
 git-edit-link.sh  | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 link.c            | 49 +++++++++++++++++++++++++++++++
 link.h            | 26 +++++++++++++++++
 object.c          |  9 ++++++
 read-cache.c      | 33 ++++++++++++++-------
 sha1_file.c       | 48 +++++++++++++++++++++++++++++-
 13 files changed, 296 insertions(+), 25 deletions(-)
 create mode 100644 git-edit-link.sh
 create mode 100644 link.c
 create mode 100644 link.h

-- 
1.8.2.380.g0d4e79b

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 1/7] link.c, link.h: introduce fifth object type
  2013-04-04 18:30 [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
@ 2013-04-04 18:30 ` Ramkumar Ramachandra
  2013-04-04 18:30 ` [PATCH 2/7] sha1_file, link: write link objects to the database Ramkumar Ramachandra
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 18:30 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Linus Torvalds

Submodules suffer from one major design flaw: they are represented as
commit objects in the tree.  There are several problems with this:

1. Since the object is actually part of the submodule's object
   store (you can't even cat-file the commit), the superproject knows
   very little about the submodule.  We currently work around this by
   having an ugly .gitmodules file in the toplevel directory mapping
   upstream URLs to submodule paths.

2. We are restricted to having a concrete SHA-1 to convey what the
   subproject's HEAD should point to.  As a result, it is impossible
   to have true floating submodules.

3. It is impossible to initialize a nested submodule without
   initializing the containing submodule first.  This is a consequence
   of our .gitmodules hack, and we should really fix it.

4. Always stat'ing all subproject worktrees can be problematic if
   there are very large subprojects; we should be able to turn it off
   for some submodules selectively.  A good future direction would be
   to add more submodule-specific properties, instead of assuming that
   all submodules are equal.  Stuffing more properties into
   .gitmodules is really not a solution.

5. Finally, the git-submodule shell script has lots of horrible warts
   (like cd-to-toplevel for any operation) that are non-trivial to
   fix, and introduces a very unnatural layer of abstraction.

There are various specialized tools like mr, repo, gitslave, and
git-subtree, which offer different compositions solving some problems
while introducing limitations of their own.

We propose to fix the problem for good.  More specifically, introduce
a new object type corresponding to mode 16000 (gitlink).  This new
object will convey all the required information about the individual
submodules to the superproject.  Rework git core to get rid of
.gitmodules and git-submodule altogether.

This patch doesn't do anything by itself: although it is possible to
create link objects using 'git hash-object -t link',
parse_link_buffer() is unimplemented.  In future patches, we intend to
flesh out how core git will handle this object.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 Makefile |  2 ++
 alloc.c  |  3 +++
 cache.h  |  3 ++-
 link.c   | 27 +++++++++++++++++++++++++++
 link.h   | 26 ++++++++++++++++++++++++++
 object.c |  9 +++++++++
 6 files changed, 69 insertions(+), 1 deletion(-)
 create mode 100644 link.c
 create mode 100644 link.h

diff --git a/Makefile b/Makefile
index 0f931a2..cd4b6f9 100644
--- a/Makefile
+++ b/Makefile
@@ -673,6 +673,7 @@ LIB_H += help.h
 LIB_H += http.h
 LIB_H += kwset.h
 LIB_H += levenshtein.h
+LIB_H += link.h
 LIB_H += list-objects.h
 LIB_H += ll-merge.h
 LIB_H += log-tree.h
@@ -801,6 +802,7 @@ LIB_OBJS += hex.o
 LIB_OBJS += ident.o
 LIB_OBJS += kwset.o
 LIB_OBJS += levenshtein.o
+LIB_OBJS += link.o
 LIB_OBJS += list-objects.o
 LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
diff --git a/alloc.c b/alloc.c
index aeae55c..1445879 100644
--- a/alloc.c
+++ b/alloc.c
@@ -15,6 +15,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "link.h"
 
 #define BLOCKING 1024
 
@@ -49,6 +50,7 @@ DEFINE_ALLOCATOR(blob, struct blob)
 DEFINE_ALLOCATOR(tree, struct tree)
 DEFINE_ALLOCATOR(commit, struct commit)
 DEFINE_ALLOCATOR(tag, struct tag)
+DEFINE_ALLOCATOR(link, struct link)
 DEFINE_ALLOCATOR(object, union any_object)
 
 static void report(const char *name, unsigned int count, size_t size)
@@ -66,4 +68,5 @@ void alloc_report(void)
 	REPORT(tree);
 	REPORT(commit);
 	REPORT(tag);
+	REPORT(link);
 }
diff --git a/cache.h b/cache.h
index ec2fd7a..ca0583f 100644
--- a/cache.h
+++ b/cache.h
@@ -317,7 +317,7 @@ enum object_type {
 	OBJ_TREE = 2,
 	OBJ_BLOB = 3,
 	OBJ_TAG = 4,
-	/* 5 for future expansion */
+	OBJ_LINK = 5,
 	OBJ_OFS_DELTA = 6,
 	OBJ_REF_DELTA = 7,
 	OBJ_ANY,
@@ -1241,6 +1241,7 @@ extern void *alloc_blob_node(void);
 extern void *alloc_tree_node(void);
 extern void *alloc_commit_node(void);
 extern void *alloc_tag_node(void);
+extern void *alloc_link_node(void);
 extern void *alloc_object_node(void);
 extern void alloc_report(void);
 
diff --git a/link.c b/link.c
new file mode 100644
index 0000000..bb20a51
--- /dev/null
+++ b/link.c
@@ -0,0 +1,27 @@
+#include "cache.h"
+#include "link.h"
+
+const char *link_type = "link";
+
+struct link *lookup_link(const unsigned char *sha1)
+{
+	struct object *obj = lookup_object(sha1);
+	if (!obj)
+		return create_object(sha1, OBJ_LINK, alloc_link_node());
+	if (!obj->type)
+		obj->type = OBJ_LINK;
+	if (obj->type != OBJ_LINK) {
+		error("Object %s is a %s, not a link",
+		      sha1_to_hex(sha1), typename(obj->type));
+		return NULL;
+	}
+	return (struct link *) obj;
+}
+
+int parse_link_buffer(struct link *item, void *buffer, unsigned long size)
+{
+	if (item->object.parsed)
+		return 0;
+	item->object.parsed = 1;
+	return 0;
+}
diff --git a/link.h b/link.h
new file mode 100644
index 0000000..64dd19d
--- /dev/null
+++ b/link.h
@@ -0,0 +1,26 @@
+#ifndef LINK_H
+#define LINK_H
+
+#include "object.h"
+
+extern const char *link_type;
+
+struct link {
+	struct object object;
+	const char *upstream_url;
+	const char *checkout_rev;
+	const char *ref_name;
+	unsigned int floating:1;
+	unsigned int statthrough:1;
+};
+
+struct link *lookup_link(const unsigned char *sha1);
+
+int parse_link_buffer(struct link *item, void *buffer, unsigned long size);
+
+/**
+ * Links do not contain references to other objects, but have
+ * structured data that needs parsing.
+ **/
+
+#endif /* LINK_H */
diff --git a/object.c b/object.c
index 20703f5..d3674ea 100644
--- a/object.c
+++ b/object.c
@@ -4,6 +4,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "link.h"
 
 static struct object **obj_hash;
 static int nr_objs, obj_hash_size;
@@ -24,6 +25,7 @@ static const char *object_type_strings[] = {
 	"tree",		/* OBJ_TREE = 2 */
 	"blob",		/* OBJ_BLOB = 3 */
 	"tag",		/* OBJ_TAG = 4 */
+	"link",		/* OBJ_LINK = 5 */
 };
 
 const char *typename(unsigned int type)
@@ -175,6 +177,13 @@ struct object *parse_object_buffer(const unsigned char *sha1, enum object_type t
 			       return NULL;
 			obj = &tag->object;
 		}
+	} else if (type == OBJ_LINK) {
+		struct link *link = lookup_link(sha1);
+		if (link) {
+			if (parse_link_buffer(link, buffer, size))
+			       return NULL;
+			obj = &link->object;
+		}
 	} else {
 		warning("object %s has unknown type id %d", sha1_to_hex(sha1), type);
 		obj = NULL;
-- 
1.8.2.380.g0d4e79b

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [PATCH 2/7] sha1_file, link: write link objects to the database
  2013-04-04 18:30 [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
  2013-04-04 18:30 ` [PATCH 1/7] link.c, link.h: introduce fifth object type Ramkumar Ramachandra
@ 2013-04-04 18:30 ` Ramkumar Ramachandra
  2013-04-05  7:11   ` Ramkumar Ramachandra
  2013-04-04 18:30 ` [PATCH 3/7] teach ce_compare_gitlink() about OBJ_LINK Ramkumar Ramachandra
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 18:30 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Linus Torvalds

On a 'git add', instead of returning the SHA-1 of the subproject
commit, write a real link object to the object database.  Also
implement parse_link_buffer() correspondingly.

index_path() determines the upstream_url and checkout_rev from a
pre-cloned submodule.  The checkout_rev is set to the SHA-1 of the
HEAD, and we get a non-floating submodule.

While at it, fix the 'ls-tree' output to correctly show a link object.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 builtin/ls-tree.c |  4 ++--
 link.c            | 22 ++++++++++++++++++++++
 sha1_file.c       | 38 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index fb76e38..ab17fb5 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -6,7 +6,7 @@
 #include "cache.h"
 #include "blob.h"
 #include "tree.h"
-#include "commit.h"
+#include "link.h"
 #include "quote.h"
 #include "builtin.h"
 #include "parse-options.h"
@@ -76,7 +76,7 @@ static int show_tree(const unsigned char *sha1, const char *base, int baselen,
 			retval = READ_TREE_RECURSIVE;
 		 *
 		 */
-		type = commit_type;
+		type = link_type;
 	} else if (S_ISDIR(mode)) {
 		if (show_recursive(base, baselen, pathname)) {
 			retval = READ_TREE_RECURSIVE;
diff --git a/link.c b/link.c
index bb20a51..349646d 100644
--- a/link.c
+++ b/link.c
@@ -20,8 +20,30 @@ struct link *lookup_link(const unsigned char *sha1)
 
 int parse_link_buffer(struct link *item, void *buffer, unsigned long size)
 {
+	char *bufptr = buffer;
+	char *tail = buffer + size;
+	char *eol;
+
 	if (item->object.parsed)
 		return 0;
 	item->object.parsed = 1;
+	while (bufptr < tail) {
+		eol = strchr(bufptr, '\n');
+		*eol = '\0';
+		if (!prefixcmp(bufptr, "upstream_url = "))
+			item->upstream_url = xstrdup(bufptr + 15);
+		else if (!prefixcmp(bufptr, "checkout_rev = "))
+			item->checkout_rev = xstrdup(bufptr + 15);
+		else if (!prefixcmp(bufptr, "ref_name = "))
+			item->ref_name = xstrdup(bufptr + 11);
+		else if (!prefixcmp(bufptr, "floating = "))
+			item->floating = atoi(bufptr + 11);
+		else if (!prefixcmp(bufptr, "statthrough = "))
+			item->statthrough = atoi(bufptr + 14);
+		else
+			return error("Parse error in link buffer");
+
+		bufptr = eol + 1;
+	}
 	return 0;
 }
diff --git a/sha1_file.c b/sha1_file.c
index 5f573d9..a8a6d72 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -12,6 +12,7 @@
 #include "pack.h"
 #include "blob.h"
 #include "commit.h"
+#include "link.h"
 #include "run-command.h"
 #include "tag.h"
 #include "tree.h"
@@ -35,6 +36,7 @@
 static inline uintmax_t sz_fmt(size_t s) { return s; }
 
 const unsigned char null_sha1[20];
+void *upstream_url = NULL;
 
 /*
  * This is meant to hold a *small* number of objects that you would
@@ -2859,10 +2861,19 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	return ret;
 }
 
+static int parse_origin_url(const char *key, const char *value, void *cb) {
+	if (!strcmp(key, "remote.origin.url"))
+		upstream_url = xstrdup(value);
+	return 0;
+}
+
 int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags)
 {
 	int fd;
 	struct strbuf sb = STRBUF_INIT;
+	char pathbuf[PATH_MAX];
+	const char *submodule_gitdir;
+	unsigned char checkout_rev[20];
 
 	switch (st->st_mode & S_IFMT) {
 	case S_IFREG:
@@ -2888,7 +2899,32 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned
 		strbuf_release(&sb);
 		break;
 	case S_IFDIR:
-		return resolve_gitlink_ref(path, "HEAD", sha1);
+		/* gitlink.  Prepare and write a new link object to
+		 * the database.
+		 */
+
+		/* Figure out upstream_url */
+		sprintf(pathbuf, "%s/%s", path, ".git");
+		submodule_gitdir = resolve_gitdir(pathbuf);
+		sprintf(pathbuf, "%s/%s", submodule_gitdir, "config");
+		git_config_from_file(parse_origin_url, pathbuf, NULL);
+		if (!upstream_url)
+			die("Unable to read remote.origin.url from submodule");
+
+		/* Figure out checkout_rev */
+		if (resolve_gitlink_ref(path, "HEAD", checkout_rev) < 0)
+			die("Unable to resolve submodule HEAD");
+
+		/* Add fields to the strbuf */
+		strbuf_addf(&sb, "upstream_url = %s\n", (char *) upstream_url);
+		strbuf_addf(&sb, "checkout_rev = %s\n", sha1_to_hex(checkout_rev));
+		if (!(flags & HASH_WRITE_OBJECT))
+			hash_sha1_file(sb.buf, sb.len, link_type, sha1);
+		else if (write_sha1_file(sb.buf, sb.len, link_type, sha1))
+			return error("%s: failed to insert into database",
+				     path);
+		strbuf_release(&sb);
+		break;
 	default:
 		return error("%s: unsupported file type", path);
 	}
-- 
1.8.2.380.g0d4e79b

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [PATCH 3/7] teach ce_compare_gitlink() about OBJ_LINK
  2013-04-04 18:30 [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
  2013-04-04 18:30 ` [PATCH 1/7] link.c, link.h: introduce fifth object type Ramkumar Ramachandra
  2013-04-04 18:30 ` [PATCH 2/7] sha1_file, link: write link objects to the database Ramkumar Ramachandra
@ 2013-04-04 18:30 ` Ramkumar Ramachandra
  2013-04-04 18:30 ` [PATCH 4/7] builtin/log: teach show " Ramkumar Ramachandra
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 18:30 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Linus Torvalds

This simply requires parsing out the checkout_rev from the link
object, and comparing its SHA-1 with that of HEAD.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 read-cache.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 5a9704f..f22c1c0 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -11,6 +11,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "blob.h"
+#include "link.h"
 #include "resolve-undo.h"
 #include "strbuf.h"
 #include "varint.h"
@@ -128,19 +129,31 @@ static int ce_compare_link(struct cache_entry *ce, size_t expected_size)
 
 static int ce_compare_gitlink(struct cache_entry *ce)
 {
-	unsigned char sha1[20];
+	unsigned char checkout_rev_sha1[20], head_sha1[20];
+	void *buffer;
+	unsigned long size;
+	enum object_type type;
+	struct link link;
 
-	/*
-	 * We don't actually require that the .git directory
-	 * under GITLINK directory be a valid git directory. It
-	 * might even be missing (in case nobody populated that
-	 * sub-project).
-	 *
-	 * If so, we consider it always to match.
+	buffer = read_sha1_file(ce->sha1, &type, &size);
+
+	/* For compatibility with an older version: earlier, gitlinks
+	 * were represented as commit SHA-1s (that wouldn't resolve)
+	 * in the cache.
 	 */
-	if (resolve_gitlink_ref(ce->name, "HEAD", sha1) < 0)
+	if (!buffer) {
+		if (resolve_gitlink_ref(ce->name, "HEAD", head_sha1) < 0)
+			return 0;
+		return hashcmp(head_sha1, ce->sha1);
+	}
+
+	memset(&link, 0, sizeof(struct link));
+	if (parse_link_buffer(&link, buffer, size) < 0)
+		die("Cannot continue.");
+	if (resolve_gitlink_ref(ce->name, link.checkout_rev, checkout_rev_sha1) < 0 ||
+		resolve_gitlink_ref(ce->name, "HEAD", head_sha1) < 0)
 		return 0;
-	return hashcmp(sha1, ce->sha1);
+	return hashcmp(head_sha1, checkout_rev_sha1);
 }
 
 static int ce_modified_check_fs(struct cache_entry *ce, struct stat *st)
-- 
1.8.2.380.g0d4e79b

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [PATCH 4/7] builtin/log: teach show about OBJ_LINK
  2013-04-04 18:30 [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
                   ` (2 preceding siblings ...)
  2013-04-04 18:30 ` [PATCH 3/7] teach ce_compare_gitlink() about OBJ_LINK Ramkumar Ramachandra
@ 2013-04-04 18:30 ` Ramkumar Ramachandra
  2013-04-04 18:30 ` [PATCH 5/7] edit-link: add new builtin Ramkumar Ramachandra
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 18:30 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Linus Torvalds

'git show' now works with link objects.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 builtin/log.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/builtin/log.c b/builtin/log.c
index 0f31810..a170df9 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -411,6 +411,20 @@ static int show_blob_object(const unsigned char *sha1, struct rev_info *rev)
 	return stream_blob_to_fd(1, sha1, NULL, 0);
 }
 
+static int show_link_object(const unsigned char *sha1, struct rev_info *rev)
+{
+	unsigned long size;
+	enum object_type type;
+	char *buf = read_sha1_file(sha1, &type, &size);
+
+	if (!buf)
+		return error(_("Could not read object %s"), sha1_to_hex(sha1));
+
+	assert(type == OBJ_LINK);
+	printf("%s", buf);
+	return 0;
+}
+
 static int show_tag_object(const unsigned char *sha1, struct rev_info *rev)
 {
 	unsigned long size;
@@ -534,6 +548,9 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			add_object_array(o, name, &rev.pending);
 			ret = cmd_log_walk(&rev);
 			break;
+		case OBJ_LINK:
+			ret = show_link_object(o->sha1, NULL);
+			break;
 		default:
 			ret = error(_("Unknown type: %d"), o->type);
 		}
-- 
1.8.2.380.g0d4e79b

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [PATCH 5/7] edit-link: add new builtin
  2013-04-04 18:30 [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
                   ` (3 preceding siblings ...)
  2013-04-04 18:30 ` [PATCH 4/7] builtin/log: teach show " Ramkumar Ramachandra
@ 2013-04-04 18:30 ` Ramkumar Ramachandra
  2013-04-04 18:30 ` [PATCH 6/7] clone: introduce clone.submodulegitdir Ramkumar Ramachandra
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 18:30 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Linus Torvalds

This is a WIP.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 git-edit-link.sh | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)
 create mode 100644 git-edit-link.sh

diff --git a/git-edit-link.sh b/git-edit-link.sh
new file mode 100644
index 0000000..3ff0e84
--- /dev/null
+++ b/git-edit-link.sh
@@ -0,0 +1,87 @@
+#!/bin/sh
+# Copyright (c) 2013 Ramkumar Ramachandra
+
+dashless=$(basename "$0" | sed -e 's/-/ /')
+USAGE='[--floating] [--ref-name <name>] [--checkout-rev <rev>] [--statthrough] <directory>'
+
+SUBDIRECTORY_OK=Yes
+OPTIONS_SPEC=
+START_DIR=`pwd`
+. git-sh-setup
+. git-sh-i18n
+require_work_tree
+cd_to_toplevel
+
+link_spec="$GIT_DIR/LINK_SPEC"
+
+read_and_verify_link_spec () {
+	test -f "$link_spec" || die "fatal: could not open $link_spec."
+
+	## TODO
+	## Rules:
+	#  upstream_url is a mandatory field; others are optional
+	#  if floating is false, checkout_rev has to be a SHA-1 hex
+	#  ref_name must be a valid name, and not conflict with an existing ref
+
+	return 0;
+}
+
+# Prepare an initial link_spec using the command-line options
+rm -f "$link_spec"
+touch "$link_spec" || die "fatal: could not create $link_spec."
+
+total_argc=$#
+while test $# != 0
+do
+	case "$1" in
+		--checkout-rev)
+			shift
+			cat >>"$link_spec" <<-EOF
+			checkout_ref = $1
+			EOF
+			;;
+		--ref-name)
+			shift
+			cat >>"$link_spec" <<-EOF
+			ref_name = $1
+			EOF
+			;;
+		--floating)
+			cat >>"$link_spec" <<-\EOF
+			floating = 1
+			EOF
+			;;
+		--statthrough)
+			cat >>"$link_spec" <<-\EOF
+			statthrough = 1
+			EOF
+			;;
+		--)
+			shift
+			break
+			;;
+	esac
+	shift
+done
+
+link_directory="$1"
+test -d "$link_directory" || die "fatal: $link_directory is not a valid directory."
+
+cd "$link_directory" && {
+	test -f "$link_directory/.git" &&
+	test "$(git rev-parse --is-inside-work-tree 2>/dev/null)" = true ||
+	die "fatal: $link_directory is not a valid bare working tree."
+
+	# Determine the upstream_url
+	upstream_url=$(git config --get remote.origin.url)
+}
+
+test -z $upstream_url &&
+die "fatal: $link_directory does not have a configured upstream remote origin."
+cat >>"$link_spec" <<-\EOF
+upstream_url = $upstream_url
+EOF
+
+# Launch the editor
+git_editor "$link_spec" || die "$(gettext "Could not execute editor")"
+read_and_verify_link_spec
-- 
1.8.2.380.g0d4e79b

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [PATCH 6/7] clone: introduce clone.submodulegitdir
  2013-04-04 18:30 [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
                   ` (4 preceding siblings ...)
  2013-04-04 18:30 ` [PATCH 5/7] edit-link: add new builtin Ramkumar Ramachandra
@ 2013-04-04 18:30 ` Ramkumar Ramachandra
  2013-04-05  7:07   ` Ramkumar Ramachandra
  2013-04-04 18:30 ` [PATCH 7/7] sha1_file: write ref_name to link object Ramkumar Ramachandra
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 18:30 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Linus Torvalds

This configuration variable comes into effect when 'git clone' is
invoked in an existing git repository.  Instead of cloning the
repository as-is, it relocates the gitdir of the repository to the
path specified by this variable.  Arguably, it does the right thing
when working with submodules.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 builtin/clone.c | 29 +++++++++++++++++++++++++++++
 environment.c   | 11 -----------
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index e0aaf13..1b798e6 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -43,6 +43,7 @@ static char *option_template, *option_depth;
 static char *option_origin = NULL;
 static char *option_branch = NULL;
 static const char *real_git_dir;
+static const char *submodule_gitdir;
 static char *option_upload_pack = "git-upload-pack";
 static int option_verbosity;
 static int option_progress = -1;
@@ -658,11 +659,22 @@ static void write_refspec_config(const char* src_ref_prefix,
 	strbuf_release(&value);
 }
 
+static int git_clone_config(const char *var, const char *value, void *cb)
+{
+	if (!strcmp(var, "clone.submodulegitdir")) {
+		git_config_string(&submodule_gitdir, var, value);
+		return 0;
+	}
+	return git_default_config(var, value, cb);
+}
+
 int cmd_clone(int argc, const char **argv, const char *prefix)
 {
 	int is_bundle = 0, is_local;
 	struct stat buf;
 	const char *repo_name, *repo, *work_tree, *git_dir;
+	char dest_git_dir[PATH_MAX];
+	char cwd[PATH_MAX];
 	char *path, *dir;
 	int dest_exists;
 	const struct ref *refs, *remote_head;
@@ -676,6 +688,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	const char *src_ref_prefix = "refs/heads/";
 	struct remote *remote;
 	int err = 0, complete_refs_before_fetch = 1;
+	int nongit = 1;
 
 	struct refspec *refspec;
 	const char *fetch_pattern;
@@ -683,6 +696,14 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	junk_pid = getpid();
 
 	packet_trace_identity("clone");
+
+	/* setup_git_directory_gently without changing directories */
+	getcwd(cwd, sizeof(cwd) - 1);
+	setup_git_directory_gently(&nongit);
+	chdir(cwd);
+
+	git_config(git_clone_config, NULL);
+
 	argc = parse_options(argc, argv, prefix, builtin_clone_options,
 			     builtin_clone_usage, 0);
 
@@ -736,6 +757,14 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		die(_("destination path '%s' already exists and is not "
 			"an empty directory."), dir);
 
+	if (!nongit && submodule_gitdir) {
+		sprintf(dest_git_dir, "%s/%s.git", real_path(submodule_gitdir), dir);
+		if (!stat(dest_git_dir, &buf) && !is_empty_dir(dest_git_dir))
+			die(_("destination path '%s' already exists and is not "
+					"an empty directory."), dest_git_dir);
+		real_git_dir = dest_git_dir;
+	}
+
 	strbuf_addf(&reflog_msg, "clone: from %s", repo);
 
 	if (option_bare)
diff --git a/environment.c b/environment.c
index e2e75c1..9dce4c7 100644
--- a/environment.c
+++ b/environment.c
@@ -182,8 +182,6 @@ const char *strip_namespace(const char *namespaced_ref)
 	return namespaced_ref + namespace_len;
 }
 
-static int git_work_tree_initialized;
-
 /*
  * Note.  This works only before you used a work tree.  This was added
  * primarily to support git-clone to work in a new repository it just
@@ -191,15 +189,6 @@ static int git_work_tree_initialized;
  */
 void set_git_work_tree(const char *new_work_tree)
 {
-	if (git_work_tree_initialized) {
-		new_work_tree = real_path(new_work_tree);
-		if (strcmp(new_work_tree, work_tree))
-			die("internal error: work tree has already been set\n"
-			    "Current worktree: %s\nNew worktree: %s",
-			    work_tree, new_work_tree);
-		return;
-	}
-	git_work_tree_initialized = 1;
 	work_tree = xstrdup(real_path(new_work_tree));
 }
 
-- 
1.8.2.380.g0d4e79b

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [PATCH 7/7] sha1_file: write ref_name to link object
  2013-04-04 18:30 [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
                   ` (5 preceding siblings ...)
  2013-04-04 18:30 ` [PATCH 6/7] clone: introduce clone.submodulegitdir Ramkumar Ramachandra
@ 2013-04-04 18:30 ` Ramkumar Ramachandra
  2013-04-05  7:03   ` Ramkumar Ramachandra
  2013-04-04 18:40 ` [RFC/PATCH 0/7] Rework git core for native submodules Linus Torvalds
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 18:30 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Linus Torvalds

Great.  Now, we just have to write refs/modules/<branch>/* at
commit-time.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 sha1_file.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/sha1_file.c b/sha1_file.c
index a8a6d72..2ea101a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2874,6 +2874,7 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned
 	char pathbuf[PATH_MAX];
 	const char *submodule_gitdir;
 	unsigned char checkout_rev[20];
+	char *ref_name;
 
 	switch (st->st_mode & S_IFMT) {
 	case S_IFREG:
@@ -2915,9 +2916,18 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned
 		if (resolve_gitlink_ref(path, "HEAD", checkout_rev) < 0)
 			die("Unable to resolve submodule HEAD");
 
+		/* Construct a ref_name from path */
+		sprintf(pathbuf, "%s", path);
+		pathbuf[strlen(pathbuf) - 1] = '\0'; /* Remove trailing slash */
+		if (strchr(pathbuf, '/'))
+			ref_name = xstrdup(strrchr(pathbuf, '/') + 1);
+		else
+			ref_name = xstrdup(pathbuf);
+
 		/* Add fields to the strbuf */
 		strbuf_addf(&sb, "upstream_url = %s\n", (char *) upstream_url);
 		strbuf_addf(&sb, "checkout_rev = %s\n", sha1_to_hex(checkout_rev));
+		strbuf_addf(&sb, "ref_name = %s\n", ref_name);
 		if (!(flags & HASH_WRITE_OBJECT))
 			hash_sha1_file(sb.buf, sb.len, link_type, sha1);
 		else if (write_sha1_file(sb.buf, sb.len, link_type, sha1))
-- 
1.8.2.380.g0d4e79b

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 18:30 [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
                   ` (6 preceding siblings ...)
  2013-04-04 18:30 ` [PATCH 7/7] sha1_file: write ref_name to link object Ramkumar Ramachandra
@ 2013-04-04 18:40 ` Linus Torvalds
  2013-04-04 18:52   ` Ramkumar Ramachandra
  2013-04-04 18:47 ` Jonathan Nieder
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 140+ messages in thread
From: Linus Torvalds @ 2013-04-04 18:40 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Junio C Hamano

On Thu, Apr 4, 2013 at 11:30 AM, Ramkumar Ramachandra
<artagnon@gmail.com> wrote:
>
> The purpose of this series is to convince you that we've made a lot of
> fundamental mistakes while designing submodules, and that we should
> fix them now.  [1/7] argues for a new object type, and this is the
> core of the idea.

I don't dispute that a new link object might be a good idea, but
there's no explanation of the actual format of this thing anywhere,
and what the real advantages would be. A clearer "this is the design,
this is the format of the link object, and this is what it buys us"
would be a good idea. Also, one of the arguments against using link
objects originally was that the format wasn't stable, and in
particular the address of the actual submodule repository might differ
for different people. So when adding a new object type, explaining
*why* the format of such an object is globally stable (since it will
be part of the SHA1 of the object) is a big deal.

            Linus

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 18:30 [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
                   ` (7 preceding siblings ...)
  2013-04-04 18:40 ` [RFC/PATCH 0/7] Rework git core for native submodules Linus Torvalds
@ 2013-04-04 18:47 ` Jonathan Nieder
  2013-04-04 18:58   ` Jonathan Nieder
  2013-04-04 18:55 ` Jonathan Nieder
  2013-04-06 20:10 ` [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
  10 siblings, 1 reply; 140+ messages in thread
From: Jonathan Nieder @ 2013-04-04 18:47 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Junio C Hamano, Linus Torvalds

Ramkumar Ramachandra wrote:

> The purpose of this series is to convince you that we've made a lot of
> fundamental mistakes while designing submodules, and that we should
> fix them now.  [1/7] argues for a new object type, and this is the
> core of the idea.

Oh, dear.

Shouldn't it be possible to explain the same thing using a test
script illustrating intended UI?

[...]
>     $ git clone gh:artagnon/varlog
>     $ cd varlog
>     $ git clone gh:artagnon/clayoven
>     # Notice how it puts clayoven.git in ~/bare

I really would like to be able to continue doing something like

	git clone --recurse-submodules git://repo.or.cz/cgit.git
	# never mind!
	rm -fr cgit

without leaving any clutter behind.  I have used systems that kept
state in my home directory before and found them a pain in the neck to
debug.  Others may disagree, though.

[...]
>     # Again, just works!  No cd-to-toplevel nonsense

Didn't Jens mention that git-submodule requiring that one work
at the toplevel is just a (presumably easily fixable) bug?

[...]
> If you think this is all a big waste of time, and that we should focus
> on improving git-submodule.sh, you're probably deranged.  Because it's

I don't think that *you* should focus on improving git-submodule, as
long as you are not using it and dislike its design.  But I do think
it's strange to at the same time

 1) tell me I'm deranged for liking submodules
 2) dismiss other experiments that have been created as alternatives

I like experimentation, which means sometimes having tools whose
purposes overlap, and I like when it's possible to help something
evolve to be better, even far enough to interoperate with or replace
uses of another tool.

I also believe in "live and let live".  That means that even if
someone is a little crazy, if they are not actively harmful, I do not
destroy their tools.

That probably marks me as deranged.

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 18:40 ` [RFC/PATCH 0/7] Rework git core for native submodules Linus Torvalds
@ 2013-04-04 18:52   ` Ramkumar Ramachandra
  2013-04-04 19:04     ` Linus Torvalds
  2013-04-05  6:53     ` Ramkumar Ramachandra
  0 siblings, 2 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 18:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git List, Junio C Hamano

Linus Torvalds wrote:
> I don't dispute that a new link object might be a good idea, but
> there's no explanation of the actual format of this thing anywhere,
> and what the real advantages would be. A clearer "this is the design,
> this is the format of the link object, and this is what it buys us"
> would be a good idea.

Yeah, I need help with that.  I've just stuffed in whatever fields
popped into my mind first.  The current ones are:

1. upstream_url: this records the upstream URL.  No need to keep a .gitmodules.

2. checkout_rev: this records the ref to check out the submodule to.
As opposed to a concrete SHA-1, this allows for more flexibility; you
can put refs/heads/master and have truly floating submodules.

3. ref_name: this specifies what name the ref under
refs/modules/<branch>/ should use.

4. floating: this bit specifies whether to record a concrete SHA-1 in
checkout_rev.

5. statthrough: this bit specifies whether git should stat() through
the worktree.  We can turn it off on big repositories for performance
reasons.

> Also, one of the arguments against using link
> objects originally was that the format wasn't stable, and in
> particular the address of the actual submodule repository might differ
> for different people. So when adding a new object type, explaining
> *why* the format of such an object is globally stable (since it will
> be part of the SHA1 of the object) is a big deal.

After some discussion, I hope to be able to finalize a list of fields
that will suffice for (nearly) everything.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 18:30 [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
                   ` (8 preceding siblings ...)
  2013-04-04 18:47 ` Jonathan Nieder
@ 2013-04-04 18:55 ` Jonathan Nieder
  2013-04-08 10:10   ` Duy Nguyen
  2013-04-06 20:10 ` [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
  10 siblings, 1 reply; 140+ messages in thread
From: Jonathan Nieder @ 2013-04-04 18:55 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Junio C Hamano, Linus Torvalds

Ramkumar Ramachandra wrote:

> 1. 'git add' should not go past submodule boundaries.  I should not be
>    able to 'git add clayoven/' or 'git add clayoven/LICENSE'.  In
>    addition, the shell completion also needs to be fixed.

Yep.  This is a bug.

> 2. An empty directory containing a .git file is a perfectly valid
>    worktree, but does not show up in the superproject's 'git status'
>    output.  How can it be treated like an empty directory?

Stated like that, it doesn't sound like a bug.  Git since very early
has deliberately not tracked files or directories named .git.

Do you need this as a way of importing from a foreign VCS when someone
has accidentally checked in a .git directory along with everything
else?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 18:47 ` Jonathan Nieder
@ 2013-04-04 18:58   ` Jonathan Nieder
  0 siblings, 0 replies; 140+ messages in thread
From: Jonathan Nieder @ 2013-04-04 18:58 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Junio C Hamano, Linus Torvalds

Jonathan Nieder wrote:
> Ramkumar Ramachandra wrote:

>> The purpose of this series is to convince you that we've made a lot of
>> fundamental mistakes while designing submodules, and
[...]
> Shouldn't it be possible to explain the same thing using a test
> script illustrating intended UI?

Sorry, I sent this reply too quickly.  Your explanation to Linus
clarified the idea.

Regards,
Jonathan

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 18:52   ` Ramkumar Ramachandra
@ 2013-04-04 19:04     ` Linus Torvalds
  2013-04-04 19:17       ` Junio C Hamano
                         ` (3 more replies)
  2013-04-05  6:53     ` Ramkumar Ramachandra
  1 sibling, 4 replies; 140+ messages in thread
From: Linus Torvalds @ 2013-04-04 19:04 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Junio C Hamano

On Thu, Apr 4, 2013 at 11:52 AM, Ramkumar Ramachandra
<artagnon@gmail.com> wrote:
>
> 1. upstream_url: this records the upstream URL.  No need to keep a .gitmodules.
>
> 2. checkout_rev: this records the ref to check out the submodule to.
> As opposed to a concrete SHA-1, this allows for more flexibility; you
> can put refs/heads/master and have truly floating submodules.
>
> 3. ref_name: this specifies what name the ref under
> refs/modules/<branch>/ should use.
>
> 4. floating: this bit specifies whether to record a concrete SHA-1 in
> checkout_rev.
>
> 5. statthrough: this bit specifies whether git should stat() through
> the worktree.  We can turn it off on big repositories for performance
> reasons.

So the thing is (and this was pretty much the original basis for
.gitmodules) that pretty  much *all* of the above fields are quite
possibly site-specific, rather than globally stable.

So I actually conceptually like (and liked) the notion of a link
object, but I just don't think it is necessarily practically useful,
exactly because different installations of the *same* supermodule
might well want to have different setups wrt these submodule fields.

My gut feel is that yes, .gitmodules was always a bit of a hack, but
it's a *working* hack, and it does have advantages exactly because
it's more fluid than an actual git object (which by definition has to
be set 100% in stone). If there are things you feel it does wrong
(like the "git add" bug that is being discussed elsewhere), I wonder
if it's not best to at least try to fix/extend them in the current
model. The features you seem to be after (ie that whole
floating/refname thing) don't seem fundamentally antithetical to the
current model (a "commit" SHA1 of all zeroes for floating, with a new
refname field in .submodules? I dunno)..

                  Linus

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 19:04     ` Linus Torvalds
@ 2013-04-04 19:17       ` Junio C Hamano
  2013-04-04 19:59         ` Ramkumar Ramachandra
  2013-04-04 20:28         ` Jens Lehmann
  2013-04-04 19:36       ` Ramkumar Ramachandra
                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 140+ messages in thread
From: Junio C Hamano @ 2013-04-04 19:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ramkumar Ramachandra, Jens Lehmann, Heiko Voigt, Git List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> ... The features you seem to be after (ie that whole
> floating/refname thing) don't seem fundamentally antithetical to the
> current model (a "commit" SHA1 of all zeroes for floating, with a new
> refname field in .submodules? I dunno)..

Just on this part.

I think Heiko and Jens's (by the way, why aren't they on the Cc:
list when this topic is clearly discussing submodules?  Don't we
want to learn how the current submodule subsystem is used to solve
what real-world problems?) .gitmodules updates is exactly going in
that direction.

 - A submodule can be marked as floating in .gitmodules and be
   specified how (typially, "use the tip of this branch in the
   submodule");

 - Running "submodule update" a floating submodule does not detach
   the submodule working tree to commit in the index of the
   superproject; instead it will use the specified branch tip;

 - A floating submodule records a concrete commit object name in the
   index of the superproject (no need to stuff an unusual SHA-1
   there to signal that the submodule is floating---it is recorded
   in the .gitmodules).  Thanks to this, a release out of the
   top-level can still describe the state of the entire tree;

 - It would be normal for the commit recorded in the index of the
   superproject not to match what is checked out in the submodule
   working tree (i.e. the tip of the branch in the submodule may
   have advanced).  A traditional non-floating submodule has many
   mechanisms to be noisy about this situation to prevent users from
   making an incomplete commits, but they may have to be toned down
   or squelched for floating submodules.

Anything I missed, Jens, Heiko?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 19:04     ` Linus Torvalds
  2013-04-04 19:17       ` Junio C Hamano
@ 2013-04-04 19:36       ` Ramkumar Ramachandra
  2013-04-04 19:44         ` Linus Torvalds
  2013-04-04 19:42       ` Ramkumar Ramachandra
  2013-04-04 21:20       ` Jens Lehmann
  3 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 19:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git List, Junio C Hamano

Linus Torvalds wrote:
> So the thing is (and this was pretty much the original basis for
> .gitmodules) that pretty  much *all* of the above fields are quite
> possibly site-specific, rather than globally stable.
>
> So I actually conceptually like (and liked) the notion of a link
> object, but I just don't think it is necessarily practically useful,
> exactly because different installations of the *same* supermodule
> might well want to have different setups wrt these submodule fields.
>
> My gut feel is that yes, .gitmodules was always a bit of a hack, but
> it's a *working* hack, and it does have advantages exactly because
> it's more fluid than an actual git object (which by definition has to
> be set 100% in stone). If there are things you feel it does wrong
> (like the "git add" bug that is being discussed elsewhere), I wonder
> if it's not best to at least try to fix/extend them in the current
> model. The features you seem to be after (ie that whole
> floating/refname thing) don't seem fundamentally antithetical to the
> current model (a "commit" SHA1 of all zeroes for floating, with a new
> refname field in .submodules? I dunno)..

Let's compare the two alternatives: .gitmodules versus link object.
If I want my fork of .gitmodules, I create a commit on top.  If I want
my fork of the link object, I create a link object, plus tree object,
plus commit object on top of that.  But the commit still rebases fine.

On malleability, have you looked at [5/7], where I create edit-link
(dead code; half done)?  The buffer looks just like a .gitmodules
buffer.  Fundamentally, what is the difference between this and a
blob?  git-core can parse it into structured data that it can slurp
easily.

I don't want full float or nothing.  I want in-betweens too, and refs are great.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 19:04     ` Linus Torvalds
  2013-04-04 19:17       ` Junio C Hamano
  2013-04-04 19:36       ` Ramkumar Ramachandra
@ 2013-04-04 19:42       ` Ramkumar Ramachandra
  2013-04-04 21:20       ` Jens Lehmann
  3 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 19:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git List, Junio C Hamano

Linus Torvalds wrote:
> .... don't seem fundamentally antithetical to the
> current model

I don't think it's fundamentally antithetical either.  This basically
makes the life of git-submodule much simpler, and eventually obsolete
it away completely.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 19:36       ` Ramkumar Ramachandra
@ 2013-04-04 19:44         ` Linus Torvalds
  2013-04-04 19:52           ` Ramkumar Ramachandra
  2013-04-04 20:04           ` Ramkumar Ramachandra
  0 siblings, 2 replies; 140+ messages in thread
From: Linus Torvalds @ 2013-04-04 19:44 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Junio C Hamano

On Thu, Apr 4, 2013 at 12:36 PM, Ramkumar Ramachandra
<artagnon@gmail.com> wrote:
>
> Let's compare the two alternatives: .gitmodules versus link object.
> If I want my fork of .gitmodules, I create a commit on top.

Or you could also just edit and carry a dirty .gitmodules around for
your personal use-case.

I don't know if anybody does that, but it should work fine.

And I don't see what you can do with the link objects that you cannot
do with .gitmodules. That's what it really boils down to. .gitmodules
do actually work. Your extensions would work with them too.

               Linus

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 19:44         ` Linus Torvalds
@ 2013-04-04 19:52           ` Ramkumar Ramachandra
  2013-04-04 20:08             ` Ramkumar Ramachandra
  2013-04-04 20:04           ` Ramkumar Ramachandra
  1 sibling, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 19:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git List, Junio C Hamano

Linus Torvalds wrote:
> On Thu, Apr 4, 2013 at 12:36 PM, Ramkumar Ramachandra
> <artagnon@gmail.com> wrote:
>>
>> Let's compare the two alternatives: .gitmodules versus link object.
>> If I want my fork of .gitmodules, I create a commit on top.
>
> Or you could also just edit and carry a dirty .gitmodules around for
> your personal use-case.

Just take the link's buffer with you everywhere.  All you have to do
is git edit-link <name> and paste the file's contents there, instead
of opening .gitmodules directly in your editor.

> And I don't see what you can do with the link objects that you cannot
> do with .gitmodules. That's what it really boils down to. .gitmodules
> do actually work. Your extensions would work with them too.

If it came to that, you could write a huge Perl script to solve
everything with a .githack.  It breaks the internal symmetry of the
repository, which is why git-submodule is having such a field day.
I'm trying to prove, in my series, that making fundamental changes
lets us get rid of a huge amount of complexity.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 19:17       ` Junio C Hamano
@ 2013-04-04 19:59         ` Ramkumar Ramachandra
  2013-04-04 20:28         ` Jens Lehmann
  1 sibling, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 19:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Jens Lehmann, Heiko Voigt, Git List

Junio C Hamano wrote:
> I think Heiko and Jens's (by the way, why aren't they on the Cc:
> list when this topic is clearly discussing submodules?  Don't we
> want to learn how the current submodule subsystem is used to solve
> what real-world problems?) .gitmodules updates is exactly going in
> that direction.

Because it's pointless.  We're not discussing a git-submodule
alternative.  We're discussing how to fix git-core so that
git-submodule becomes much simpler; to the extent that it will be
unnecessary soon.

git-submodule is years of hard work and it can do a limited version of
floating with great difficulty.  Mine is two days of work, and can
already do true floating submodules.  What is going on?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 19:44         ` Linus Torvalds
  2013-04-04 19:52           ` Ramkumar Ramachandra
@ 2013-04-04 20:04           ` Ramkumar Ramachandra
  2013-04-05 16:02             ` Linus Torvalds
  1 sibling, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 20:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git List, Junio C Hamano

Linus Torvalds wrote:
> Or you could also just edit and carry a dirty .gitmodules around for
> your personal use-case.

I'm sorry, but a dirty worktree is unnecessarily painful to work with.
 I don't think anyone objects to committing, if they can understand
basic rebase.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 19:52           ` Ramkumar Ramachandra
@ 2013-04-04 20:08             ` Ramkumar Ramachandra
  0 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 20:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git List, Junio C Hamano

Ramkumar Ramachandra wrote:
> Just take the link's buffer with you everywhere.  All you have to do
> is git edit-link <name> and paste the file's contents there, instead
> of opening .gitmodules directly in your editor.

On this.  The buffer doesn't have to conform to a tight spec: we can
just expose a .gitconfig-like buffer and reduce it to a tight spec
before writing out.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 19:17       ` Junio C Hamano
  2013-04-04 19:59         ` Ramkumar Ramachandra
@ 2013-04-04 20:28         ` Jens Lehmann
  1 sibling, 0 replies; 140+ messages in thread
From: Jens Lehmann @ 2013-04-04 20:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Ramkumar Ramachandra, Heiko Voigt, Git List

Am 04.04.2013 21:17, schrieb Junio C Hamano:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
>> ... The features you seem to be after (ie that whole
>> floating/refname thing) don't seem fundamentally antithetical to the
>> current model (a "commit" SHA1 of all zeroes for floating, with a new
>> refname field in .submodules? I dunno)..
> 
> Just on this part.
> 
> I think Heiko and Jens's (by the way, why aren't they on the Cc:
> list when this topic is clearly discussing submodules?  Don't we
> want to learn how the current submodule subsystem is used to solve
> what real-world problems?) .gitmodules updates is exactly going in
> that direction.
> 
>  - A submodule can be marked as floating in .gitmodules and be
>    specified how (typially, "use the tip of this branch in the
>    submodule");
> 
>  - Running "submodule update" a floating submodule does not detach
>    the submodule working tree to commit in the index of the
>    superproject; instead it will use the specified branch tip;
> 
>  - A floating submodule records a concrete commit object name in the
>    index of the superproject (no need to stuff an unusual SHA-1
>    there to signal that the submodule is floating---it is recorded
>    in the .gitmodules).  Thanks to this, a release out of the
>    top-level can still describe the state of the entire tree;
> 
>  - It would be normal for the commit recorded in the index of the
>    superproject not to match what is checked out in the submodule
>    working tree (i.e. the tip of the branch in the submodule may
>    have advanced).  A traditional non-floating submodule has many
>    mechanisms to be noisy about this situation to prevent users from
>    making an incomplete commits, but they may have to be toned down
>    or squelched for floating submodules.
> 
> Anything I missed, Jens, Heiko?

Nope, that perfectly sums it up.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 19:04     ` Linus Torvalds
                         ` (2 preceding siblings ...)
  2013-04-04 19:42       ` Ramkumar Ramachandra
@ 2013-04-04 21:20       ` Jens Lehmann
  2013-04-04 21:35         ` Ramkumar Ramachandra
  2013-04-04 22:13         ` Junio C Hamano
  3 siblings, 2 replies; 140+ messages in thread
From: Jens Lehmann @ 2013-04-04 21:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ramkumar Ramachandra, Git List, Junio C Hamano, Heiko Voigt

Am 04.04.2013 21:04, schrieb Linus Torvalds:
> My gut feel is that yes, .gitmodules was always a bit of a hack, but
> it's a *working* hack, and it does have advantages exactly because
> it's more fluid than an actual git object (which by definition has to
> be set 100% in stone).

Exactly. The flexibility of the .gitmodules file will really help us
when it comes to the next feature that submodules are going to learn
after recursive update: automatically initialize and then populate
certain submodules during the clone of the superproject. You have to
be able to configure that per submodule, which needs a new config
option in .gitmodules. Others may follow for different use cases.

While starting to grok submodules I was wondering myself if the data
stored in .gitmodules would better be stored in an extended gitlink
object, but I learned soon that the scope of the data that has to be
stored there was not clear at that time (and still isn't). So I'm
not opposed per se to adding a special object containing all that
information, but I strongly believe we are not even close to
considering such a step (and won't be for quite some time and maybe
never will).

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 21:20       ` Jens Lehmann
@ 2013-04-04 21:35         ` Ramkumar Ramachandra
  2013-04-04 22:13         ` Junio C Hamano
  1 sibling, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 21:35 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Linus Torvalds, Git List, Junio C Hamano, Heiko Voigt

Jens Lehmann wrote:
> Exactly. The flexibility of the .gitmodules file will really help us
> when it comes to the next feature that submodules are going to learn
> after recursive update:

That's like saying that the flexibility of a blob is invaluable: let's
throw out all the other objects, and make do with blobs.  Ofcourse we
make mistakes: we didn't put a generation number in the commit object,
for instance (I'm not arguing about whether it's right or wrong: just
that some people think it's a mistake).

> While starting to grok submodules I was wondering myself if the data
> stored in .gitmodules would better be stored in an extended gitlink
> object, but I learned soon that the scope of the data that has to be
> stored there was not clear at that time (and still isn't). So I'm
> not opposed per se to adding a special object containing all that
> information, but I strongly believe we are not even close to
> considering such a step (and won't be for quite some time and maybe
> never will).

Nonsense.  We will think through it before freezing the format, like
we did with the other objects.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 21:20       ` Jens Lehmann
  2013-04-04 21:35         ` Ramkumar Ramachandra
@ 2013-04-04 22:13         ` Junio C Hamano
  2013-04-04 22:18           ` Ramkumar Ramachandra
  1 sibling, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-04 22:13 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Linus Torvalds, Ramkumar Ramachandra, Git List, Heiko Voigt

Jens Lehmann <Jens.Lehmann@web.de> writes:

> While starting to grok submodules I was wondering myself if the data
> stored in .gitmodules would better be stored in an extended gitlink
> object, but I learned soon that the scope of the data that has to be
> stored there was not clear at that time (and still isn't). So I'm
> not opposed per se to adding a special object containing all that
> information, but I strongly believe we are not even close to
> considering such a step (and won't be for quite some time and maybe
> never will).

I actually think the storage is more or less an orthogonal issue.

The format must be defined to be extensible (nobody is perfect and
if you wait for an exhaustive list of attributes that cover all use
cases including the ones that haven't even been invented yet, you
will get nowhere), and designed carefully to reduce the chance of
allowing the extended/optional bit to express the same thing in two
different ways to make sure the object name will not become
unnecessarily unstable, but you can start small, keep adding
optional fields, and be prepared to design an upgrade path when you
need to add new mandatory fields---that cannot be helped whether you
record the information about submodules in .gitmodules or a new
blob-ish object at the location where the submodule tree should
reside in the index and the tree.

However, the current .gitmodules design, even though it originally
was invented as a way to carry information other than what a single
commit object name from an otherwise unrelated project can express
without having to change anything in-core, has a few practical
merits.  The information _about_ submodules is stored separately
(i.e. in the .gitmodules file) from submodules themselves, and it
may be a good thing.

When you are changing information _about_ submodules (e.g. you may
be updating the recommended URL to fetch it from), you can use the
usual tools like "git diff" to see how it changed, just like changes
to any other file.  If the information _about_ a submodule A is
stored at path A, and at the same time you have a working tree that
corresponds to the root of the submodule A at that path, it gets
unclear what "git diff A" should report.  Should it report the
change in the submodule itself, or should it report the change in
the information _about_ the submodule?  By separating these two
concepts to two different places, .gitmodules design solves the
issue nicely.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 22:13         ` Junio C Hamano
@ 2013-04-04 22:18           ` Ramkumar Ramachandra
  2013-04-04 22:26             ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 22:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jens Lehmann, Linus Torvalds, Git List, Heiko Voigt

Junio C Hamano wrote:
> When you are changing information _about_ submodules (e.g. you may
> be updating the recommended URL to fetch it from), you can use the
> usual tools like "git diff" to see how it changed, just like changes
> to any other file.  If the information _about_ a submodule A is
> stored at path A, and at the same time you have a working tree that
> corresponds to the root of the submodule A at that path, it gets
> unclear what "git diff A" should report.  Should it report the
> change in the submodule itself, or should it report the change in
> the information _about_ the submodule?  By separating these two
> concepts to two different places, .gitmodules design solves the
> issue nicely.

git diff-link.  Just turn it into a buffer and diff as usual.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 22:18           ` Ramkumar Ramachandra
@ 2013-04-04 22:26             ` Junio C Hamano
  2013-04-04 22:32               ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-04 22:26 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jens Lehmann, Linus Torvalds, Git List, Heiko Voigt

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Junio C Hamano wrote:
>> When you are changing information _about_ submodules (e.g. you may
>> be updating the recommended URL to fetch it from), you can use the
>> usual tools like "git diff" to see how it changed, just like changes
>> to any other file.  If the information _about_ a submodule A is
>> stored at path A, and at the same time you have a working tree that
>> corresponds to the root of the submodule A at that path, it gets
>> unclear what "git diff A" should report.  Should it report the
>> change in the submodule itself, or should it report the change in
>> the information _about_ the submodule?  By separating these two
>> concepts to two different places, .gitmodules design solves the
>> issue nicely.
>
> git diff-link.  Just turn it into a buffer and diff as usual.

Sounds like you are saying that you can pile a new command on top of
new command to solve what the existing tools people are familar with
can already solve in a consistent way without adding anything new.
Are you going to dupliate various options to "git diff" and "git
log" in "git diff-link"?  Will you then next need "git log-link"?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 22:26             ` Junio C Hamano
@ 2013-04-04 22:32               ` Ramkumar Ramachandra
  2013-04-04 23:08                 ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 22:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jens Lehmann, Linus Torvalds, Git List, Heiko Voigt

Junio C Hamano wrote:
> Sounds like you are saying that you can pile a new command on top of
> new command to solve what the existing tools people are familar with
> can already solve in a consistent way without adding anything new.
> Are you going to dupliate various options to "git diff" and "git
> log" in "git diff-link"?  Will you then next need "git log-link"?

What I'm saying is: As always, we start with plumbing and work our way
up to porcelain.  We do have git diff-files, diff-index, diff-tree, so
I don't see what the problem with diff-link is.  The point is that we
can get an initial scripted version out quickly.

And no, I never suggested a git log-link.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 22:32               ` Ramkumar Ramachandra
@ 2013-04-04 23:08                 ` Junio C Hamano
  2013-04-04 23:14                   ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-04 23:08 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jens Lehmann, Linus Torvalds, Git List, Heiko Voigt

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Junio C Hamano wrote:
>> Sounds like you are saying that you can pile a new command on top of
>> new command to solve what the existing tools people are familar with
>> can already solve in a consistent way without adding anything new.
>> Are you going to dupliate various options to "git diff" and "git
>> log" in "git diff-link"?  Will you then next need "git log-link"?
>
> What I'm saying is: As always, we start with plumbing and work our way
> up to porcelain.  We do have git diff-files, diff-index, diff-tree, so
> I don't see what the problem with diff-link is.  The point is that we
> can get an initial scripted version out quickly.
>
> And no, I never suggested a git log-link.

"git log -p .gitmodules" would be a way to review what changed in
the information about submodules.  Don't you need "git log-link" for
exactly the same reason why you need "git diff-link" in the first
place?

So you may not have suggested it, but I suspect that was only
because you haven't had enough time to think things through.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 23:08                 ` Junio C Hamano
@ 2013-04-04 23:14                   ` Ramkumar Ramachandra
  2013-04-05 17:07                     ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-04 23:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jens Lehmann, Linus Torvalds, Git List, Heiko Voigt

Junio C Hamano wrote:
> "git log -p .gitmodules" would be a way to review what changed in
> the information about submodules.  Don't you need "git log-link" for
> exactly the same reason why you need "git diff-link" in the first
> place?
>
> So you may not have suggested it, but I suspect that was only
> because you haven't had enough time to think things through.

What is this git log -p .gitmodules doing?  It's walking down the
commit history, and picking out the commits in which that blob
changed.  Then it's diffing the blobs in those commits with each
other.  Why is git log -p <link> any different?  We already know how
to diff blobs, and we just need a way to diff links.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 18:52   ` Ramkumar Ramachandra
  2013-04-04 19:04     ` Linus Torvalds
@ 2013-04-05  6:53     ` Ramkumar Ramachandra
  1 sibling, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-05  6:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git List, Junio C Hamano

Ramkumar Ramachandra wrote:
> After some discussion, I hope to be able to finalize a list of fields
> that will suffice for (nearly) everything.

The task is actually much easier than this.  All we have to do is
finalize the list of fields that will mandatorily be written to the
link object.  As I might have indicated in my series, this is:
upstream_url, checkout_rev, and ref_name.  Really, the user only needs
to supply a valid upstream_url: after a clone, everything else can be
inferred (with the exception of a ref_name conflict; I don't like
auto-mangling).

Other fields are like .git/config fields.  We can add new key/value
pairs in the future, without worrying about migration.  A problem
arises only if we want to add a new mandatory field, change the
default value of a key, or deprecate an existing key/value pair.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 7/7] sha1_file: write ref_name to link object
  2013-04-04 18:30 ` [PATCH 7/7] sha1_file: write ref_name to link object Ramkumar Ramachandra
@ 2013-04-05  7:03   ` Ramkumar Ramachandra
  0 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-05  7:03 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Linus Torvalds

Ramkumar Ramachandra wrote:
> Great.  Now, we just have to write refs/modules/<branch>/* at
> commit-time.

Actually, we have to update things in refs/modules/ everytime we
update things in refs/heads/.  In the case of a 'git branch -M' for
example, refs/heads/<oldname> is rewritten to refs/heads/<newname>:
similarly, refs/modules/<oldname>/ needs to be moved to
refs/modules/<newname>/.

There is one special case worth mentioning.  Let's say I'm committing
changes to link objects to a detached HEAD, b8bb3f.  Then, I write
refs/modules/b8bb3f/ at commit-time.  A subsequent 'checkout -b' that
updates refs/heads/<newbranch> will just have to move
refs/modules/b8bb3f/ to refs/modules/<newbranch>/.

The caveat is that I might commit to the detached HEAD and leave it
hanging around until the next gc.  So, gc will need to remove
refs/modules/b8bb3f/ too, but that's not a pressing issue at the
moment.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 6/7] clone: introduce clone.submodulegitdir
  2013-04-04 18:30 ` [PATCH 6/7] clone: introduce clone.submodulegitdir Ramkumar Ramachandra
@ 2013-04-05  7:07   ` Ramkumar Ramachandra
  0 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-05  7:07 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Linus Torvalds

Ramkumar Ramachandra wrote:
> diff --git a/builtin/clone.c b/builtin/clone.c
> index e0aaf13..1b798e6 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -658,11 +659,22 @@ static void write_refspec_config(const char* src_ref_prefix,
>         strbuf_release(&value);
>  }
>
> +static int git_clone_config(const char *var, const char *value, void *cb)
> +{
> +       if (!strcmp(var, "clone.submodulegitdir")) {
> +               git_config_string(&submodule_gitdir, var, value);
> +               return 0;
> +       }
> +       return git_default_config(var, value, cb);
> +}

submodule_gitdir can be a human-path, and we will need real_path() to
turn it into a concrete path.  Why doesn't real_path() expand ~ yet?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/7] sha1_file, link: write link objects to the database
  2013-04-04 18:30 ` [PATCH 2/7] sha1_file, link: write link objects to the database Ramkumar Ramachandra
@ 2013-04-05  7:11   ` Ramkumar Ramachandra
  2013-04-05  7:59     ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-05  7:11 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Linus Torvalds

Ramkumar Ramachandra wrote:
> diff --git a/link.c b/link.c
> index bb20a51..349646d 100644
> --- a/link.c
> +++ b/link.c
> @@ -20,8 +20,30 @@ struct link *lookup_link(const unsigned char *sha1)
>
>  int parse_link_buffer(struct link *item, void *buffer, unsigned long size)
>  {
> +       char *bufptr = buffer;
> +       char *tail = buffer + size;
> +       char *eol;
> +
>         if (item->object.parsed)
>                 return 0;
>         item->object.parsed = 1;
> +       while (bufptr < tail) {
> +               eol = strchr(bufptr, '\n');
> +               *eol = '\0';
> +               if (!prefixcmp(bufptr, "upstream_url = "))
> +                       item->upstream_url = xstrdup(bufptr + 15);
> +               else if (!prefixcmp(bufptr, "checkout_rev = "))
> +                       item->checkout_rev = xstrdup(bufptr + 15);
> +               else if (!prefixcmp(bufptr, "ref_name = "))
> +                       item->ref_name = xstrdup(bufptr + 11);
> +               else if (!prefixcmp(bufptr, "floating = "))
> +                       item->floating = atoi(bufptr + 11);
> +               else if (!prefixcmp(bufptr, "statthrough = "))
> +                       item->statthrough = atoi(bufptr + 14);
> +               else
> +                       return error("Parse error in link buffer");
> +
> +               bufptr = eol + 1;
> +       }
>         return 0;
>  }

This needs to be replaced by a .git/config parser.  However, I can't
use the parser from config.c as-it-is, because it expects a section
like [core] to be present.  So, we have to refactor it to optionally
parse section-less configs.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/7] sha1_file, link: write link objects to the database
  2013-04-05  7:11   ` Ramkumar Ramachandra
@ 2013-04-05  7:59     ` Ramkumar Ramachandra
  0 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-05  7:59 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Linus Torvalds

Ramkumar Ramachandra wrote:
> This needs to be replaced by a .git/config parser.  However, I can't
> use the parser from config.c as-it-is, because it expects a section
> like [core] to be present.  So, we have to refactor it to optionally
> parse section-less configs.

Er, sorry about the thinko: I meant that edit-link should use the
.git/config parser.  This one is just fine.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 20:04           ` Ramkumar Ramachandra
@ 2013-04-05 16:02             ` Linus Torvalds
  2013-04-05 16:37               ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Linus Torvalds @ 2013-04-05 16:02 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Junio C Hamano

On Thu, Apr 4, 2013 at 1:04 PM, Ramkumar Ramachandra <artagnon@gmail.com> wrote:
> Linus Torvalds wrote:
>> Or you could also just edit and carry a dirty .gitmodules around for
>> your personal use-case.
>
> I'm sorry, but a dirty worktree is unnecessarily painful to work with.

Bzzt. Wrong.

A dirty worktree is not only easy to work with (I do it all the time,
having random test-patches in my tree that I never even intend to
commit), it's a *requirement*.

One thing that git does really really well is merging. And one of the
reasons why git does merging well (apart from the obvious meta-issue:
it's what I care about) is that it not only has the stable information
in the object database, it also has the staging information in the
index, *and* it has dirty data in the working tree.

You absolutely need all three. Having an "edit" command to edit stable
data (or staging data) is broken. Trust me, I've been there, done
that, got the T-shirt and know it is wrong. The whole "stable objects"
+ "index" + "dirty worktree" is FUNDAMENTALLY the right way to work,
and it *has* to work that way for merges to work well.

The only things that we don't have "dirty data" for in the worktree is
creating commits and tags, but those aren't relevant for the merging
process anyway, in the sense that you never change them for merging,
you create them *after* merging (and this is fundamental, and not just
a git implementation issue).

So you absolutely need a dirty worktree. You need it for testing, and
you need it for merging. Having a model where you don't have a
in-progress entity that works as a temporary is absolutely and
entirely wrong.

               Linus

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-05 16:02             ` Linus Torvalds
@ 2013-04-05 16:37               ` Ramkumar Ramachandra
  0 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-05 16:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git List

Linus Torvalds wrote:
> So you absolutely need a dirty worktree. You need it for testing, and
> you need it for merging. Having a model where you don't have a
> in-progress entity that works as a temporary is absolutely and
> entirely wrong.

I agree entirely.  My comment was just a "by the way", and specific to
how people work with .gitmodules: I didn't imply any strong notions of
Right or Wrong with respect to dirty worktrees in general.  So, yes:
links stage and unstage, just like blobs do.

Oh, and I'm currently writing infrastructure to work with links like
blobs.  Here's a WIP: git cat-link <link> is exactly the same as cat
<file>, to the end user.

-- 8< --
From d8a1de6f9075771dde6f1fde9ffa193dce386a17 Mon Sep 17 00:00:00 2001
From: Ramkumar Ramachandra <artagnon@gmail.com>
Date: Fri, 5 Apr 2013 19:42:56 +0530
Subject: [PATCH] builtin/cat-link: implement new builtin

This is a simple program that calls unpack_trees() with a custom
callback that just prints the contents of whatever objects were
matched using revs.prune_data.  Blobs can be cat'ed directly from the
filesystem, so this program is primarily useful for links; git
cat-link <link> shows it up like a blob.

We will use this program to build edit-link.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 Makefile           |  3 +-
 builtin.h          |  1 +
 builtin/cat-link.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 diff-lib.c         | 10 +++----
 diff.h             |  6 ++++
 git.c              |  1 +
 6 files changed, 98 insertions(+), 6 deletions(-)
 create mode 100644 builtin/cat-link.c

diff --git a/Makefile b/Makefile
index cd4b6f9..28194d7 100644
--- a/Makefile
+++ b/Makefile
@@ -349,7 +349,7 @@ GIT-VERSION-FILE: FORCE
 
 # CFLAGS and LDFLAGS are for the users to override from the command line.
 
-CFLAGS = -g -O2 -Wall
+CFLAGS = -g -O0 -Wall
 LDFLAGS =
 ALL_CFLAGS = $(CPPFLAGS) $(CFLAGS)
 ALL_LDFLAGS = $(LDFLAGS)
@@ -893,6 +893,7 @@ BUILTIN_OBJS += builtin/blame.o
 BUILTIN_OBJS += builtin/branch.o
 BUILTIN_OBJS += builtin/bundle.o
 BUILTIN_OBJS += builtin/cat-file.o
+BUILTIN_OBJS += builtin/cat-link.o
 BUILTIN_OBJS += builtin/check-attr.o
 BUILTIN_OBJS += builtin/check-ignore.o
 BUILTIN_OBJS += builtin/check-ref-format.o
diff --git a/builtin.h b/builtin.h
index faef559..be0160d 100644
--- a/builtin.h
+++ b/builtin.h
@@ -49,6 +49,7 @@ extern int cmd_blame(int argc, const char **argv, const char *prefix);
 extern int cmd_branch(int argc, const char **argv, const char *prefix);
 extern int cmd_bundle(int argc, const char **argv, const char *prefix);
 extern int cmd_cat_file(int argc, const char **argv, const char *prefix);
+extern int cmd_cat_link(int argc, const char **argv, const char *prefix);
 extern int cmd_checkout(int argc, const char **argv, const char *prefix);
 extern int cmd_checkout_index(int argc, const char **argv, const char *prefix);
 extern int cmd_check_attr(int argc, const char **argv, const char *prefix);
diff --git a/builtin/cat-link.c b/builtin/cat-link.c
new file mode 100644
index 0000000..14dd92b
--- /dev/null
+++ b/builtin/cat-link.c
@@ -0,0 +1,83 @@
+/*
+ * Copyright (c) 2013 Ramkumar Ramachandra
+ */
+#include "cache.h"
+#include "tree.h"
+#include "cache-tree.h"
+#include "unpack-trees.h"
+#include "commit.h"
+#include "diff.h"
+#include "revision.h"
+
+static int cat_file(struct cache_entry **src, struct unpack_trees_options *o) {
+	int cached, match_missing = 1;
+	unsigned dirty_submodule = 0;
+	unsigned int mode;
+	const unsigned char *sha1;
+	struct cache_entry *idx = src[0];
+	struct cache_entry *tree = src[1];
+	struct rev_info *revs = o->unpack_data;
+	enum object_type type;
+	unsigned long size;
+	char *buf;
+
+	cached = o->index_only;
+	if (ce_path_match(idx ? idx : tree, &revs->prune_data)) {
+		if (get_stat_data(idx, &sha1, &mode, cached, match_missing,
+					&dirty_submodule, NULL) < 0)
+			die("Something went wrong!");
+		buf = read_sha1_file(sha1, &type, &size);
+		printf("%s", buf);
+	}
+	return 0;
+}
+
+int cmd_cat_link(int argc, const char **argv, const char *prefix)
+{
+	struct unpack_trees_options opts;
+	int cached = 1;
+	struct rev_info revs;
+	struct tree *tree;
+	struct tree_desc tree_desc;
+	struct object_array_entry *ent;
+
+	if (argc < 2)
+		die("Usage: git cat-link <link>");
+
+	init_revisions(&revs, prefix);
+	setup_revisions(argc, argv, &revs, NULL); /* For revs.prune_data */
+	add_head_to_pending(&revs);
+
+	/* Hack to diff against index; we create a dummy tree for the
+	   index information */
+	if (!revs.pending.nr) {
+		struct tree *tree;
+		tree = lookup_tree(EMPTY_TREE_SHA1_BIN);
+		add_pending_object(&revs, &tree->object, "HEAD");
+	}
+
+	if (read_cache() < 0)
+		die("read_cache() failed");
+	ent = revs.pending.objects;
+	tree = parse_tree_indirect(ent->item->sha1);
+	if (!tree)
+		return error("bad tree object %s",
+			     ent->name ? ent->name : sha1_to_hex(ent->item->sha1));
+
+	memset(&opts, 0, sizeof(opts));
+	opts.head_idx = 1;
+	opts.index_only = cached;
+	opts.diff_index_cached = cached;
+	opts.merge = 1;
+	opts.fn = cat_file;
+	opts.unpack_data = &revs;
+	opts.src_index = &the_index;
+	opts.dst_index = NULL;
+	opts.pathspec = &revs.diffopt.pathspec;
+	opts.pathspec->recursive = 1;
+	opts.pathspec->max_depth = -1;
+
+	init_tree_desc(&tree_desc, tree->buffer, tree->size);
+	unpack_trees(1, &tree_desc, &opts);
+	return 0;
+}
diff --git a/diff-lib.c b/diff-lib.c
index f35de0f..b0ba136 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -246,11 +246,11 @@ static void diff_index_show_file(struct rev_info *revs,
 		       sha1, sha1_valid, ce->name, dirty_submodule);
 }
 
-static int get_stat_data(struct cache_entry *ce,
-			 const unsigned char **sha1p,
-			 unsigned int *modep,
-			 int cached, int match_missing,
-			 unsigned *dirty_submodule, struct diff_options *diffopt)
+int get_stat_data(struct cache_entry *ce,
+		const unsigned char **sha1p,
+		unsigned int *modep,
+		int cached, int match_missing,
+		unsigned *dirty_submodule, struct diff_options *diffopt)
 {
 	const unsigned char *sha1 = ce->sha1;
 	unsigned int mode = ce->ce_mode;
diff --git a/diff.h b/diff.h
index 78b4091..02ed497 100644
--- a/diff.h
+++ b/diff.h
@@ -326,6 +326,12 @@ extern int diff_result_code(struct diff_options *, int);
 
 extern void diff_no_index(struct rev_info *, int, const char **, int, const char *);
 
+extern int get_stat_data(struct cache_entry *ce,
+			const unsigned char **sha1p,
+			unsigned int *modep,
+			int cached, int match_missing,
+			unsigned *dirty_submodule, struct diff_options *diffopt);
+
 extern int index_differs_from(const char *def, int diff_flags);
 
 extern size_t fill_textconv(struct userdiff_driver *driver,
diff --git a/git.c b/git.c
index 850d3f5..3f3f074 100644
--- a/git.c
+++ b/git.c
@@ -313,6 +313,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "branch", cmd_branch, RUN_SETUP },
 		{ "bundle", cmd_bundle, RUN_SETUP_GENTLY },
 		{ "cat-file", cmd_cat_file, RUN_SETUP },
+		{ "cat-link", cmd_cat_link, RUN_SETUP },
 		{ "check-attr", cmd_check_attr, RUN_SETUP },
 		{ "check-ignore", cmd_check_ignore, RUN_SETUP | NEED_WORK_TREE },
 		{ "check-ref-format", cmd_check_ref_format },
-- 
1.8.2.380.g0d4e79b

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 23:14                   ` Ramkumar Ramachandra
@ 2013-04-05 17:07                     ` Junio C Hamano
  2013-04-05 17:23                       ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-05 17:07 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jens Lehmann, Linus Torvalds, Git List, Heiko Voigt

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Junio C Hamano wrote:
>> "git log -p .gitmodules" would be a way to review what changed in
>> the information about submodules.  Don't you need "git log-link" for
>> exactly the same reason why you need "git diff-link" in the first
>> place?
>>
>> So you may not have suggested it, but I suspect that was only
>> because you haven't had enough time to think things through.
>
> What is this git log -p .gitmodules doing?  It's walking down the
> commit history, and picking out the commits in which that blob
> changed.  Then it's diffing the blobs in those commits with each
> other.  Why is git log -p <link> any different?  We already know how
> to diff blobs, and we just need a way to diff links.

You already forget what you invented "git diff-link" as a "solution"
for, perhaps?

By recording the submodules themselve and information _about_ the
submodules separately (the latter is in .gitmodules), "git diff A"
can show the difference in submodule A, while "git diff .gitmodules"
can show a change, which is a possibly in-working-tree-only proposed
change, in information about submoudules.

Once you start recording the latter also at path "A", it becomes
unclear what "git diff A" should show.

That is what I said in the message, to which you invented "diff-link"
as a solution to the "unclear"-ness.

Am I misremembering the flow of discussion in this thread?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-05 17:07                     ` Junio C Hamano
@ 2013-04-05 17:23                       ` Ramkumar Ramachandra
  0 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-05 17:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jens Lehmann, Linus Torvalds, Git List, Heiko Voigt

Junio C Hamano wrote:
> Once you start recording the latter also at path "A", it becomes
> unclear what "git diff A" should show.
>
> That is what I said in the message, to which you invented "diff-link"
> as a solution to the "unclear"-ness.

I just thought it would be a stopgap until we get diff to support
links natively.  Obviously, when we get native diff support, 'log -p'
will be able to show differences as well.  As it turns out from my
little experiment with cat-link, it's really easy to get native diff
support, and I'm targeting that directly instead of a scripted
solution.

As for the unclearness issue, it's a little more complicated than
that: a non-floating submodule could've previously been a floating
one, or vice-versa.  As of this moment, I'm only planning to show
differences between link buffers.  In the case when two consecutive
commits change a link checkout_rev (where floating is set to 0), we
can come up with something like the current diff.submodule = log.  I
see no cause to worry about the interface of that now.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 18:30 [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
                   ` (9 preceding siblings ...)
  2013-04-04 18:55 ` Jonathan Nieder
@ 2013-04-06 20:10 ` Ramkumar Ramachandra
  2013-04-07  3:31   ` Junio C Hamano
  10 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-06 20:10 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Linus Torvalds

Hi again,

So we've thought about it for some time, and I really need you to
start reviewing the code now.
I'll just summarize what we've discussed so far:

1. The malleability argument doesn't hold, because we're proposing a
link object with optional fields.

2. The local-fork argument doesn't hold, because users will be
rebasing changes to the link object in exactly the same way as they
currently do with the blob object .gitmodules.

3. The worktree argument doesn't hold, because we're proposing to
treat the link object as nothing more than a blob object that can be
parsed by git-core.  It will stage and unstage just like a blob.
Sure, it's not accessible directly by the filesystem: so what?  What
is the difference does `emacsclient .gitmodules` versus `git edit-link
clayoven` make to the end-user?

4. The diff-confusion argument is just another by-the-way, but it
doesn't really hold either.  Currently, we see:

    - Subproject commit b83492
    + Subproject commit 39ab2f

(with diff.submodule set to log, we can actually see the log of the
submodule between these two commits.  With links, we will see:

    - checkout_rev = b83492
    + checkout_rev = 39ab2f

There's nothing that prevents us from respecting diff.submodule (some
minor glue code will have to be written; that's all).

*. There is actually one thing that .gitmodules does better than
links.  foreach.  It's trivial to implement with .gitmodules and hard
to implement with links: with .gitmodules, the paths of all the
submodules are in one place.  But with links, we'll have to
unpack_trees() every tree in the entire repository, and dig through it
to find all the link objects to initialize.  Basically, inefficient
and inelegant.  However, I don't think this is a big problem in
practice, since this is not exactly a common operation: I'd probably
want to recurse-submodules once at clone time.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-06 20:10 ` [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
@ 2013-04-07  3:31   ` Junio C Hamano
  2013-04-07  7:27     ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-07  3:31 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Linus Torvalds

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> So we've thought about it for some time, and I really need you to
> start reviewing the code now.
>
> I'll just summarize what we've discussed so far:
> ...

I do not think we have heard anything concrete and usable about what
you are trying to achieve yet.

You may be proposing to discard baby with bathwater.  We haven't
seen an evidence that the change is really worth having.  We do not
even know what you are trying to change, other than "I want to add a
new object type to largely replicate what is recorded in .gitmodules
file".  What are you trying to solve?

	I want to have a project for an appliance, that binds two
	projects, the kernel and the appliance's userspace.  The
	usual suspects to use to implement such a project would be
	Git submodule, repo, or Gitslave.

	I want to be able to do X and Y and Z in managing such a
	project.

        If I try to use submodule, I cannot see how I could make it
	do X for _this thing_, and it is not a bug in the
	implementation but is fundamental because of _this and
	that_.  If I try to use repo, ...... the same, and the same
	for Gitslave. ......

	I propose to add a new "gitlink" object recorded in the tree
	and in the index, and the said cases X, Y and Z can be
	solved in _such and such way_.  We cannot solve it without
	having a new "gitlink" object recorded in the tree object
	because of _this and that reason_.

I think it is too premature to discuss _your_ code.  The patches do
not even tell us anything about how much more work is needed to
merely make Git with your patches work properly again.  For one
thing, I suspect that you won't even be able to repack a repository
that has OBJ_LINK only with the patches you posted.

At this point the only thing that we can gain from reading your
patch is that you can write C to do _something_, but that something
is so fuzzily explained that we do not know what to make of that
knowledge that you write good (or bad, we don't know) C.

It would be much more productive to learn what these specific issues
X, Y and Z are, and if the problems you are having with existing
solutions are really fundamental that need changes to object layer
to solve.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07  3:31   ` Junio C Hamano
@ 2013-04-07  7:27     ` Ramkumar Ramachandra
  2013-04-07  9:00       ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07  7:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Linus Torvalds

Junio C Hamano wrote:
> I think it is too premature to discuss _your_ code.  The patches do
> not even tell us anything about how much more work is needed to
> merely make Git with your patches work properly again.  For one
> thing, I suspect that you won't even be able to repack a repository
> that has OBJ_LINK only with the patches you posted.

Let me try to rephrase my original request: I'm an inexperienced
contributor trying to do something very ambitious.  Having authored a
huge part of it, Linus and you understand git-core much better than I
can ever hope to understand.  These are things that you need to tell
me after reading the patches.  I only have a rough idea to make Git
work properly with my patches again: I can't know for sure until I
write all the code.  What's more?  Your guess will probably be better
than mine after you read the code.

You're asking me to submit a perfect 40 or 50 part series that's a
potential candidate for merging.  I'm sorry to say this, but I'm
incapable of doing that without posting intermediate work and getting
help.  Frankly, it's a very unreasonable expectation; I don't think
anyone except you or Linus can even get close after making such a
fundamental change.

I might end up writing all the code (which I'm perfectly okay with),
and all I'm asking from you is to constantly keep picking my brain (by
reviewing my code and posting good critiques).  Am I being
unreasonable?

> At this point the only thing that we can gain from reading your
> patch is that you can write C to do _something_, but that something
> is so fuzzily explained that we do not know what to make of that
> knowledge that you write good (or bad, we don't know) C.

C is irrelevant here: I'm not asking you for style/ structuring tips;
I'm asking you for a critique on the implementation of this idea.
Linus raised some good points after reading [0/7] that I countered,
and there isn't anything else either anyone can raise without reading
more.  Code speaks more clearly than the English in [0/7]: you should
be able to deduce a lot more intent and direction.

> It would be much more productive to learn what these specific issues
> X, Y and Z are, and if the problems you are having with existing
> solutions are really fundamental that need changes to object layer
> to solve.

To reiterate: link does not make possible something that is not
fundamentally _possible_ with a .githack and a 100k-line Perl script.
At its core, every variant of submodules does this.  What I'm
essentially proposing: break up the information in .githack into
smaller bits and create a new object type so it can be parsed by
git-core easily.  If you agree that my proposal doesn't make
impossible what was possible earlier, and that it makes life easier
for everyone, we should be good to go.

When the series matures, we can investigate the other implementations
in greater detail so we can pick out more optional fields to add to
the link object before getting it merged.  This is not the right time
to do that: we're currently trying to get git-core working with the
mandatory fields.

> I do not think we have heard anything concrete and usable about what
> you are trying to achieve yet.

I'll try to rephrase your concerns here:

0. We don't know if this approach will yield a mergeable series at
all, because it breaks so many things and is so difficult to complete.

1. We don't know how much work is needed to bring the series to a
point where it is in a mergeable state.  There is no timeline
specified.

2. We can't build an exhaustive list of the problems that this new
approach will solve (ie. we haven't finalized the optional fields).

3. We don't have anything useable yet.

And your non-concerns should be:

1. We know that this approach won't fundamentally limit us in not
being able to solve some specific problems that the .githack approach
solves.

2. We know that this approach makes life easier for everyone, and
there are significant concrete benefits to teaching git-core about
links.

I agree with all this fully.  I don't have a concrete roadmap; we'll
just have to dive in based on what we've seen so far, and hope that
we're able to finish what we started.  So, my final question is: are
you still not convinced that this approach shows a lot of potential,
and is worth exploring now?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07  7:27     ` Ramkumar Ramachandra
@ 2013-04-07  9:00       ` Junio C Hamano
  2013-04-07 10:58         ` Ramkumar Ramachandra
  2013-04-07 15:51         ` Ramkumar Ramachandra
  0 siblings, 2 replies; 140+ messages in thread
From: Junio C Hamano @ 2013-04-07  9:00 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Linus Torvalds

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Junio C Hamano wrote:
>> I think it is too premature to discuss _your_ code.  The patches do
>> not even tell us anything about how much more work is needed to
>> merely make Git with your patches work properly again.  For one
>> thing, I suspect that you won't even be able to repack a repository
>> that has OBJ_LINK only with the patches you posted.
>
> ...
> You're asking me to submit a perfect 40 or 50 part series...

Not at all.  And please do not start _coding_.

When the design is not clear enough that a 7-patch series is not
ready to be reviewed, certainly 50-patch series will not be.  Not
until you can explain what you are trying to solve and convince
others why other less disruptive approaches are fundamentally
unworkable, and why we need to change the object layer.

> To reiterate: link does not make possible something that is not
> fundamentally _possible_ with a .githack and a 100k-line Perl script.
>
> At its core, every variant of submodules does this.  What I'm
> essentially proposing: break up the information in .githack into
> smaller bits and create a new object type so it can be parsed by
> git-core easily.

The .gitmodules file is designed to be easily parsable by the config
infrastructure and implemented as such already, thank-you-very-much.

Why do you keep calling an already working solution with derogatory
misspelling?  That only gives others an impression that you do not
understand how the current system works, and pursuade them not to
waste time responding to you.  Stop it.

> When the series matures, we can investigate the other implementations
> in greater detail so we can pick out more optional fields to add to
> the link object before getting it merged.

Sorry, but that is not how open source works in general, and
certainly not how this project works.

We do not add disruptive change just for the sake of changing it to
break a working system, make an extra work to clean up the fallout
for ourselves (i.e. your "40 to 50 patch series", but honestly
speaking I expect it would be more like a 4 months work for a full
time engineer or two), for unproven design (that has not yet to be
illustrated) to solve problems (that has not yet to be explained),
without knowing that

 (1) the problems are worth solving;

 (2) the design will solve the problems; and

 (3) solving the same problems without such a disruptive change is
     impossible, or so cumbersome that it will be far larger than
     the work needed to clean-up the fallout of the disruptive
     change.

So what are your X, Y, Z?  You still haven't answered that question.

For that matter, you didn't answer the same question that was more
tersely phrased by Linus in the very first response in the thread:

> Linus Torvalds wrote:
>> I don't dispute that a new link object might be a good idea, but
>> there's no explanation of the actual format of this thing anywhere,
>> and what the real advantages would be. A clearer "this is the design,
>> this is the format of the link object, and this is what it buys us"
>> would be a good idea.
>
> Yeah, I need help with that.  I've just stuffed in whatever fields
> popped into my mind first.  The current ones are:


And what you listed were your back-then-current thinking on "actual
format".

What are the real advantages?  How are they used?  What do they
allow us to do what we cannot do with .gitmodules (or repo or
gitslave for that matter)?  What do they buy us?

What problem are you trying to solve?

I have this suspicion that you do not have to change anything in the
object layer to make Git behave very differently from the current
submodule implementation.  For example, if your gripe were (I am
just speculating without any input from you in this thread) that
each submodule working tree has ".git" at their top and there is no
unified view from the top-level [*1*], we certainly can solve it
without any change to the object layer.

We currently add a cache entry that has the commit object name to
the index from the tree object when we check out the superproject,
and create a separate repository with a working tree when we
instantiate a submodule.

This arrangement does not have to be fundamental. It is a design
choice of one particular working tree layout, which is totally local
to individual superproject working tree.

You could arrange a single index (the one in the superproject) to
hold the tree contents from the commit in the submodule, while
noting the original commit object name in a new mandatory extension
section in the index. The index will have a unified view of the
whole tree, and we do not have to have a .git at the root of each
submodule working tree (be it a directory or a gitfile).

I think the message where I talked about the "bind" idea in the list
archive URLs I gave you earlier would give you such a layout, and
you should go read it again to understand how the flow from object
database to index to working tree back to index back to object
database was envisioned to work.  I think the only thing we need to
do differently from that "bind" proposal in the current world order
is not to record the new submodule state in the commit object of the
superproject, but actually create a new commit for the submodule
part and store it in the tree object for the commit in the
superproject (the "bind lines in the superproject commit" was a
hack, only because we didn't have a way to write the submodule
commit object name in the index and in the trees).


[Footnote]

*1* I am not saying that keeping everything down to the leaves of
submodules is necessarily a good idea.  This is only meant as an
illustration of what kind of a system that looks drastically
different from the end-user's point of view you could build, without
breaking the object layer and having to redo everything.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07  9:00       ` Junio C Hamano
@ 2013-04-07 10:58         ` Ramkumar Ramachandra
  2013-04-07 15:51         ` Ramkumar Ramachandra
  1 sibling, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 10:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Linus Torvalds

Your overall hostility is unappreciated.  The burden of proof is on
me, while you calmly sit back and criticize anything that breaks the
current working state, and refuse to look at the implementation.
Anyway, here we go again.

Junio C Hamano wrote:
> Not at all.  And please do not start _coding_.

You've successfully killed all my enthusiasm.  Congratulations.

> When the design is not clear enough that a 7-patch series is not
> ready to be reviewed, certainly 50-patch series will not be.  Not
> until you can explain what you are trying to solve and convince
> others why other less disruptive approaches are fundamentally
> unworkable, and why we need to change the object layer.

"So, my final question is: are you still not convinced that this
approach shows a lot of potential, and is worth exploring now?"
"No."

I don't know how many times to repeat this: No, Junio.  A less
disruptive approach is _not_ fundamentally unworkable.  You can spend
the next five years fixing submodule.c/ git-submodule.sh, or can take
a step back and think about why it's in such pathetic shape right now.

>> To reiterate: link does not make possible something that is not
>> fundamentally _possible_ with a .githack and a 100k-line Perl script.
>>
>> At its core, every variant of submodules does this.  What I'm
>> essentially proposing: break up the information in .githack into
>> smaller bits and create a new object type so it can be parsed by
>> git-core easily.
>
> The .gitmodules file is designed to be easily parsable by the config
> infrastructure and implemented as such already, thank-you-very-much.

You're missing the point.  Who parses .gitmodules?  submodule.c and
git-submodule.sh, as opposed to a link being parsed by git-core.  How
is it any different?  That's what my series is trying to answer.

> Why do you keep calling an already working solution with derogatory
> misspelling?  That only gives others an impression that you do not
> understand how the current system works, and pursuade them not to
> waste time responding to you.  Stop it.

I don't see why you have to get offended by my deliberate misspelling:
we're not emotionally attached to software, and I'm merely criticizing
what I think is a bad hack.  I'm not pointing out the concrete
limitations of git-submodule precisely because they can be fixed
without any changes to the object layer: this thread will become a
discussion about how to fix submodule.c/ git-submodule.sh.  You want
floating submodules?  Fine, we'll write a helper script that
auto-commits to superproject everytime the SHA-1 changes.  Everything
_can_ be done.

What exactly don't I "understand" about the current system, apart from
the fact that everybody is super-rigid and defensive about what
already works?

Let us take a moment to look at the current state of git-submodule
(note that this is after many years of hard work).  This is just off
the top of my head:

1. To add a submodule, you can't git add.  You need to git submodule
add.  And only from the toplevel directory.  You can't first clone and
then add either: a git submodule add clones, adds lines to
.gitmodules, AND stages everything.

2. There is currently no way to remove a submodule.  You have to git
rm it, remove the lines in .gitmodules, and remove the GITDIR from
.git/modules.

3. It is currently impossible to git mv a submodule, because of the
amount of gymnastics required to relocate the object store, rewrite
the .gitmodules and stage the correct changes.

4. It is currently impossible to do true floating submodules, because
we're using a commit object in-tree.

5. You have to execute all submodule commands from the toplevel of the worktree.

6. It is currently impossible to initialize a nested submodule without
initializing the container submodule.  If I really want this, I have
to trade-off composability and use repo.

What is going on?  Either the people working on git-submodule are
horribly incompetent, or there's some fundamental problem.  I believe
the problem is the latter and have tried to show that the above quirks
can be fixed in a much simpler way with two days of work.  What part
of this didn't you understand?

> Sorry, but that is not how open source works in general, and
> certainly not how this project works.
>
> We do not add disruptive change just for the sake of changing it to
> break a working system, make an extra work to clean up the fallout
> for ourselves (i.e. your "40 to 50 patch series", but honestly
> speaking I expect it would be more like a 4 months work for a full
> time engineer or two), for unproven design (that has not yet to be
> illustrated) to solve problems (that has not yet to be explained),
> without knowing that
>
>  (1) the problems are worth solving;
>
>  (2) the design will solve the problems; and
>
>  (3) solving the same problems without such a disruptive change is
>      impossible, or so cumbersome that it will be far larger than
>      the work needed to clean-up the fallout of the disruptive
>      change.
>
> So what are your X, Y, Z?  You still haven't answered that question.

(1) The ones that are currently solved by various existing
implementations.  repo, mr, gitslave, git-subtree and git-submodule.

(2) Currently, I'm targeting making the life of git-submodule.sh
simpler, fixing the UI/UX, and adding a few new features: floating
submodules, refs for submodules, and blocking statthrough.  Isn't this
a definite improvement over the current design?  Why are you asking me
to investigate and solve every problem exhaustively now?

(3) Nothing is impossible.  It's cumbersome, and that's what I'm
trying to answer with my series: a little bit of code written in two
days can simplify a lot of things.  How do I give a definite answer to
this question without submitting two different series: one fixing
submodule.c/ git-submodule.sh to do everything I want, and another to
fix everything using my approach?

> What are the real advantages?  How are they used?  What do they
> allow us to do what we cannot do with .gitmodules (or repo or
> gitslave for that matter)?  What do they buy us?

For the 100th time, there's nothing you _cannot_ do with .gitmodules.
I'm not solving any new problems.  They're all solved by using a
combination of existing tools: each come with a specific set of
benefits and trade-offs.  I'm trying to engineer an simple and elegant
solution that will solve many of those problems natively in git.  They
buy us simplicity and elegance (I know you especially hate the word
"elegant", but I have no other way to put it).

> I have this suspicion that you do not have to change anything in the
> object layer to make Git behave very differently from the current
> submodule implementation.

Yes, you can!  For the 101th time.

> For example, if your gripe were (I am
> just speculating without any input from you in this thread) that
> each submodule working tree has ".git" at their top and there is no
> unified view from the top-level [*1*], we certainly can solve it
> without any change to the object layer.

I don't know where you got that idea from (certainly not from reading
my series): that is not my gripe at all.  As I've already stated, my
gripe is with how unnecessarily complicated, inelegant, and
featureless submodule.c/ git-submodule.sh is.

> We currently add a cache entry that has the commit object name to
> the index from the tree object when we check out the superproject,
> and create a separate repository with a working tree when we
> instantiate a submodule.

Yes, I'm aware.

> You could arrange a single index (the one in the superproject) to
> hold the tree contents from the commit in the submodule, while
> noting the original commit object name in a new mandatory extension
> section in the index. The index will have a unified view of the
> whole tree, and we do not have to have a .git at the root of each
> submodule working tree (be it a directory or a gitfile).

I think this is a very bad idea, because the toplevel index (and
combined object store) will blow up when we have lots of big
submodules.  One of my goals for the new submodule design is to answer
the scaling problem with ultra-large repositories: the answer is to
break them up into smaller ones and compose them using this beautiful
and powerful mechanism.

> I think the message where I talked about the "bind" idea in the list
> archive URLs I gave you earlier would give you such a layout, and
> you should go read it again to understand how the flow from object
> database to index to working tree back to index back to object
> database was envisioned to work.  I think the only thing we need to
> do differently from that "bind" proposal in the current world order
> is not to record the new submodule state in the commit object of the
> superproject, but actually create a new commit for the submodule
> part and store it in the tree object for the commit in the
> superproject (the "bind lines in the superproject commit" was a
> hack, only because we didn't have a way to write the submodule
> commit object name in the index and in the trees).

I don't see how this is relevant to our discussion, but anyway:

Yes, I read about your bind idea back from 2006.  TL;DR version for
everyone else reading:  Junio proposed that the commit object be
extended in the following way in January 2006.

tree 04803b09c300c8325258ccf2744115acc4c57067
bind 5b2bcc7b2d546c636f79490655b3347acc91d17f linux-2.6/
bind 0bdd79af62e8621359af08f0afca0ce977348ac7 appliance/
author Junio C Hamano <junio@kernel.org> 1137965565 -0800
committer Junio C Hamano <junio@kernel.org> 1137965565 -0800

The bind lines are referring to tree objects.  There's a reason the
link infrastructure written in 2007 by Linus made no reference to
this: it's a bad idea.  It's a much better idea to compose using
commits.  Even better to compose using a specialized link object.  Why
are you taking one step back from the current implementation?

So, my final question is: what do I have to do to convince you that
this approach shows promise?  Haven't I answered the questions you
keep repeating: "what problem does it solve?" and "why are existing
implementations fundamentally unworkable?".  I really don't know what
more to say, so can you give me a list of concrete actionable items
instead of repeating the same questions?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07  9:00       ` Junio C Hamano
  2013-04-07 10:58         ` Ramkumar Ramachandra
@ 2013-04-07 15:51         ` Ramkumar Ramachandra
  2013-04-07 16:12           ` John Keeping
  2013-04-07 19:26           ` Ramkumar Ramachandra
  1 sibling, 2 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 15:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Linus Torvalds

I suspect you're overtly worried about the fallout of such a
disruptive change.  If so, you could've just said: "Ram, I like the
idea.  But what breakages do you estimate we'll have to deal with?"
instead of attacking the idea and repeatedly questioning its purpose.
So, I'll make a rough guess based on the first iteration I intend to
get merged:

- Not all the git submodule subcommands will work. add/ status/ init/
deinit are easy to rewrite, but stuff like --recursive and foreach
might be slightly problematic as I already pointed out earlier.  We'll
have to code depending on how far you think the first iteration should
go.  After a few iterations, we can make 'git submodule' just print
"This command is deprecated.  Please read `man gitsubmodules`."

- All existing repositories with submodules will not be supported.  My
plan to deal with this: Have git-core code detect commit objects
in-tree and disable things like diff.  As soon as the user executes
the first 'git submodule' command, remove all existing submodules,
along with .gitmodules and re-add them as link objects.  Then print a
message saying: "We've just migrated your submodules to the new
format.  Please commit this."

That's really it.  It's certainly not earth-shattering breakage; and I
think the inconvenience it causes is more than compensated by its
beautiful design and UI/UX.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 15:51         ` Ramkumar Ramachandra
@ 2013-04-07 16:12           ` John Keeping
  2013-04-07 16:42             ` Ramkumar Ramachandra
  2013-04-07 19:26           ` Ramkumar Ramachandra
  1 sibling, 1 reply; 140+ messages in thread
From: John Keeping @ 2013-04-07 16:12 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Junio C Hamano, Git List, Linus Torvalds

On Sun, Apr 07, 2013 at 09:21:44PM +0530, Ramkumar Ramachandra wrote:
> I suspect you're overtly worried about the fallout of such a
> disruptive change.  If so, you could've just said: "Ram, I like the
> idea.  But what breakages do you estimate we'll have to deal with?"
> instead of attacking the idea and repeatedly questioning its purpose.
> So, I'll make a rough guess based on the first iteration I intend to
> get merged:
> 
> - Not all the git submodule subcommands will work. add/ status/ init/
> deinit are easy to rewrite, but stuff like --recursive and foreach
> might be slightly problematic as I already pointed out earlier.  We'll
> have to code depending on how far you think the first iteration should
> go.  After a few iterations, we can make 'git submodule' just print
> "This command is deprecated.  Please read `man gitsubmodules`."
> 
> - All existing repositories with submodules will not be supported.  My
> plan to deal with this: Have git-core code detect commit objects
> in-tree and disable things like diff.  As soon as the user executes
> the first 'git submodule' command, remove all existing submodules,
> along with .gitmodules and re-add them as link objects.  Then print a
> message saying: "We've just migrated your submodules to the new
> format.  Please commit this."

Meaning that every repository using submodules need to have a flag day
when all of the people using it switch to the new Git version at once?

How happy do you expect users to be if they have to remember to use
different Git version to work on different repositories because some
have switched and some haven't?

>From a user's point of view, the current submodule support mostly works
very well.  Yes there are some annoyances ("you are not at the top
level") and some more advanced features require a bit too much work
(moving a submodule) but in normal usage it works very well in my
experience.

I think you need a much better argument than "it makes the
implementation more beautiful" to convince users that a flag day is
necessary.

> That's really it.  It's certainly not earth-shattering breakage;

For most users, the migration you've outlined above is exactly that.

Even looking just at commits sent to this list, I've seen users on
versions of Git from 1.7.10 to builds from next/pu in just the last
week.  Coordinating a flag day for even a slightly popular repository is
going to cause a lot of pain.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 16:12           ` John Keeping
@ 2013-04-07 16:42             ` Ramkumar Ramachandra
  2013-04-07 17:02               ` John Keeping
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 16:42 UTC (permalink / raw)
  To: John Keeping; +Cc: Junio C Hamano, Git List, Linus Torvalds

John Keeping wrote:
> Meaning that every repository using submodules need to have a flag day
> when all of the people using it switch to the new Git version at once?

No, I would be totally against a migration that involves a flag-day.
What I meant is that having old-style submodule side-by-side with
new-style submodules is confusing (think about people using an older
version and getting confused), and that we should disallow it.  Users
will still be able to use existing repositories with new versions of
git with a few caveats:
1. They won't be able to add new new submodules without migrating all
existing submodules.
2. git ls-tree will show the in-tree object incorrectly as a link (ie.
not commit).

That's about it, I think.  Obviously, everyone working on the
repository has to upgrade to a new version of git before they can use
new-style submodules.

> I think you need a much better argument than "it makes the
> implementation more beautiful" to convince users that a flag day is
> necessary.

There is no flag day necessary, and that is not my argument at all:
new-style submodules brings lots of new functionality to the table.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 16:42             ` Ramkumar Ramachandra
@ 2013-04-07 17:02               ` John Keeping
  2013-04-07 17:22                 ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: John Keeping @ 2013-04-07 17:02 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Junio C Hamano, Git List, Linus Torvalds

On Sun, Apr 07, 2013 at 10:12:28PM +0530, Ramkumar Ramachandra wrote:
> John Keeping wrote:
> > Meaning that every repository using submodules need to have a flag day
> > when all of the people using it switch to the new Git version at once?
> 
> No, I would be totally against a migration that involves a flag-day.
> What I meant is that having old-style submodule side-by-side with
> new-style submodules is confusing (think about people using an older
> version and getting confused), and that we should disallow it.  Users
> will still be able to use existing repositories with new versions of
> git with a few caveats:
> 1. They won't be able to add new new submodules without migrating all
> existing submodules.
> 2. git ls-tree will show the in-tree object incorrectly as a link (ie.
> not commit).
> 
> That's about it, I think.  Obviously, everyone working on the
> repository has to upgrade to a new version of git before they can use
> new-style submodules.

So not a flag day, but still some point at which the repository
transitions to "will not work with Git older than version X".  And if
you need to add a new submodule then you cannot delay that transition
any longer.

> > I think you need a much better argument than "it makes the
> > implementation more beautiful" to convince users that a flag day is
> > necessary.
> 
> There is no flag day necessary, and that is not my argument at all:
> new-style submodules brings lots of new functionality to the table.

I haven't seen anywhere a concise list of what functionality this is.
Do you have a simple bulleted list of what new features this would
allow?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 17:02               ` John Keeping
@ 2013-04-07 17:22                 ` Ramkumar Ramachandra
  2013-04-07 17:52                   ` John Keeping
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 17:22 UTC (permalink / raw)
  To: John Keeping; +Cc: Junio C Hamano, Git List, Linus Torvalds

John Keeping wrote:
> So not a flag day, but still some point at which the repository
> transitions to "will not work with Git older than version X".  And if
> you need to add a new submodule then you cannot delay that transition
> any longer.

Yes, that is true.  I don't see any way out of this.

> I haven't seen anywhere a concise list of what functionality this is.
> Do you have a simple bulleted list of what new features this would
> allow?

Sure, I'll write it out for you from an end-user perspective:

0. Great UI/UX.  No more cd-to-toplevel, and a beautiful set of native
commands that are consistent with the overall design of git-core.
Which means: clone (to put something in an unstaged place), add (to
stage), and commit (to commit the change).  There's now exactly one
place in your worktree (which is represented as one file in git; think
of it a sort of symlink)  to look in for all the information.  git
cat-link <link> to figure out its parameters, git edit-link to edit
its parameters: no more "find the matching pwd in .gitmodules in
toplevel".  To remove a submodule, just git rm.  And git mv works!

1. True floating submodules.  You can have a submodule checked out at
`master` or `v3.1`: no more detached HEADs in submodules unless you
want fixed submodules.  No additional cruft required to do the
floating: the information is native, in a link object.

2. Initializing a nested submodule without having to initialize the
outer one: no more repo XML nonsense.  And it's composable: you don't
need to put the information about all submodules in one central place.

3. Ability to have very many large submodule repositories without the
performance hit.  It makes sense to block stat() from going through
when you have floating submodules.  This means that many levels of
nesting are very easily possible.

4. It's suddenly much easier to add new features to this
implementation.  You don't need to do the kind of gymnastics you'd
have to do if you were hacking on submodule.c/ git-submodule.sh.

This is basically how "great design" plays out.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 17:22                 ` Ramkumar Ramachandra
@ 2013-04-07 17:52                   ` John Keeping
  2013-04-07 18:07                     ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: John Keeping @ 2013-04-07 17:52 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Junio C Hamano, Git List, Linus Torvalds

On Sun, Apr 07, 2013 at 10:52:50PM +0530, Ramkumar Ramachandra wrote:
> Sure, I'll write it out for you from an end-user perspective:

To play Devil's Advocate for a bit...

> 0. Great UI/UX.  No more cd-to-toplevel, and a beautiful set of native
> commands that are consistent with the overall design of git-core.
> Which means: clone (to put something in an unstaged place), add (to
> stage), and commit (to commit the change).  There's now exactly one
> place in your worktree (which is represented as one file in git; think
> of it a sort of symlink)  to look in for all the information.  git
> cat-link <link> to figure out its parameters, git edit-link to edit
> its parameters: no more "find the matching pwd in .gitmodules in
> toplevel".  To remove a submodule, just git rm.  And git mv works!

Presumably now without .git/config support, so I can't override the
checked-in settings without my own custom branch.  Even carrying a dirty
working tree seems problematic here since a checked-out link object is a
directory, which can't have information like the remote URL in it.

> 1. True floating submodules.  You can have a submodule checked out at
> `master` or `v3.1`: no more detached HEADs in submodules unless you
> want fixed submodules.  No additional cruft required to do the
> floating: the information is native, in a link object.

Can't I do that now with "submodule.<name>.branch" and "git submodule
update --remote --rebase" and friends?

> 2. Initializing a nested submodule without having to initialize the
> outer one: no more repo XML nonsense.  And it's composable: you don't
> need to put the information about all submodules in one central place.

How does this interact when there is the following structure:

    super
    `-- sub
        `-- subsub   (specified by sub)

and subsub is specified as a submodule in *both* super and sub but with
different settings.  Do I get different behaviour depending on $PWD?

> 3. Ability to have very many large submodule repositories without the
> performance hit.  It makes sense to block stat() from going through
> when you have floating submodules.  This means that many levels of
> nesting are very easily possible.

Can't I already control this to some degree?  Certainly the following
commands take different amounts of time to run:

    git status
    git -c status.submodulesummary=true status

> 4. It's suddenly much easier to add new features to this
> implementation.  You don't need to do the kind of gymnastics you'd
> have to do if you were hacking on submodule.c/ git-submodule.sh.
> 
> This is basically how "great design" plays out.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 17:52                   ` John Keeping
@ 2013-04-07 18:07                     ` Ramkumar Ramachandra
  2013-04-07 18:21                       ` John Keeping
  2013-04-07 18:22                       ` Ramkumar Ramachandra
  0 siblings, 2 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 18:07 UTC (permalink / raw)
  To: John Keeping; +Cc: Junio C Hamano, Git List, Linus Torvalds

John Keeping wrote:
> On Sun, Apr 07, 2013 at 10:52:50PM +0530, Ramkumar Ramachandra wrote:
>> Sure, I'll write it out for you from an end-user perspective:
>
> To play Devil's Advocate for a bit...

Yes!

>> 0. Great UI/UX.  No more cd-to-toplevel, and a beautiful set of native
>> commands that are consistent with the overall design of git-core.
>> Which means: clone (to put something in an unstaged place), add (to
>> stage), and commit (to commit the change).  There's now exactly one
>> place in your worktree (which is represented as one file in git; think
>> of it a sort of symlink)  to look in for all the information.  git
>> cat-link <link> to figure out its parameters, git edit-link to edit
>> its parameters: no more "find the matching pwd in .gitmodules in
>> toplevel".  To remove a submodule, just git rm.  And git mv works!
>
> Presumably now without .git/config support, so I can't override the
> checked-in settings without my own custom branch.  Even carrying a dirty
> working tree seems problematic here since a checked-out link object is a
> directory, which can't have information like the remote URL in it.

Sure you can have a dirty worktree.  It's just like .gitmodules:
there's zero difference but for the fact that .gitmodules is
accessible directly via your filesystem, while links are not.

>> 1. True floating submodules.  You can have a submodule checked out at
>> `master` or `v3.1`: no more detached HEADs in submodules unless you
>> want fixed submodules.  No additional cruft required to do the
>> floating: the information is native, in a link object.
>
> Can't I do that now with "submodule.<name>.branch" and "git submodule
> update --remote --rebase" and friends?

Yes, but that is not true floating: you shouldn't have to be sorry and
rebase.  In new-style submodules, they're first class citizens (ie.
true): you can just replace the SHA-1 with a ref in the link.

>> 2. Initializing a nested submodule without having to initialize the
>> outer one: no more repo XML nonsense.  And it's composable: you don't
>> need to put the information about all submodules in one central place.
>
> How does this interact when there is the following structure:
>
>     super
>     `-- sub
>         `-- subsub   (specified by sub)
>
> and subsub is specified as a submodule in *both* super and sub but with
> different settings.  Do I get different behaviour depending on $PWD?

This is a very fringe case that I haven't thought about.  I don't know
how it will behave: I haven't built it yet (and don't have the entire
implementation in my head yet).

>> 3. Ability to have very many large submodule repositories without the
>> performance hit.  It makes sense to block stat() from going through
>> when you have floating submodules.  This means that many levels of
>> nesting are very easily possible.
>
> Can't I already control this to some degree?  Certainly the following
> commands take different amounts of time to run:
>
>     git status
>     git -c status.submodulesummary=true status

You can't control the most fundamental thing, stat(): this is the
primary killer of performance on a large worktree.  There is currently
no way to block stat(): new-style submodules offers a way to configure
which submodules to block the stat() on.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 18:07                     ` Ramkumar Ramachandra
@ 2013-04-07 18:21                       ` John Keeping
  2013-04-07 18:34                         ` Jens Lehmann
  2013-04-07 18:37                         ` Ramkumar Ramachandra
  2013-04-07 18:22                       ` Ramkumar Ramachandra
  1 sibling, 2 replies; 140+ messages in thread
From: John Keeping @ 2013-04-07 18:21 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Junio C Hamano, Git List, Linus Torvalds

On Sun, Apr 07, 2013 at 11:37:02PM +0530, Ramkumar Ramachandra wrote:
> John Keeping wrote:
> > On Sun, Apr 07, 2013 at 10:52:50PM +0530, Ramkumar Ramachandra wrote:
> >> Sure, I'll write it out for you from an end-user perspective:
> >
> > To play Devil's Advocate for a bit...
> 
> Yes!
> 
> >> 0. Great UI/UX.  No more cd-to-toplevel, and a beautiful set of native
> >> commands that are consistent with the overall design of git-core.
> >> Which means: clone (to put something in an unstaged place), add (to
> >> stage), and commit (to commit the change).  There's now exactly one
> >> place in your worktree (which is represented as one file in git; think
> >> of it a sort of symlink)  to look in for all the information.  git
> >> cat-link <link> to figure out its parameters, git edit-link to edit
> >> its parameters: no more "find the matching pwd in .gitmodules in
> >> toplevel".  To remove a submodule, just git rm.  And git mv works!
> >
> > Presumably now without .git/config support, so I can't override the
> > checked-in settings without my own custom branch.  Even carrying a dirty
> > working tree seems problematic here since a checked-out link object is a
> > directory, which can't have information like the remote URL in it.
> 
> Sure you can have a dirty worktree.  It's just like .gitmodules:
> there's zero difference but for the fact that .gitmodules is
> accessible directly via your filesystem, while links are not.

I can't see how this gets me a dirty working tree.  Since the link needs
to be stored somewhere, I assume it's in the index; so I can have staged
changes, but not unstaged changes.

> >> 1. True floating submodules.  You can have a submodule checked out at
> >> `master` or `v3.1`: no more detached HEADs in submodules unless you
> >> want fixed submodules.  No additional cruft required to do the
> >> floating: the information is native, in a link object.
> >
> > Can't I do that now with "submodule.<name>.branch" and "git submodule
> > update --remote --rebase" and friends?
> 
> Yes, but that is not true floating: you shouldn't have to be sorry and
> rebase.  In new-style submodules, they're first class citizens (ie.
> true): you can just replace the SHA-1 with a ref in the link.

But what happens if I make any changes on top?  With --rebase and
--merge I can specify exactly what I want to happen (and obviously if I
don't have any changes then whichever I choose simply sets my branch to
the upstream ref).

> >> 2. Initializing a nested submodule without having to initialize the
> >> outer one: no more repo XML nonsense.  And it's composable: you don't
> >> need to put the information about all submodules in one central place.
> >
> > How does this interact when there is the following structure:
> >
> >     super
> >     `-- sub
> >         `-- subsub   (specified by sub)
> >
> > and subsub is specified as a submodule in *both* super and sub but with
> > different settings.  Do I get different behaviour depending on $PWD?
> 
> This is a very fringe case that I haven't thought about.  I don't know
> how it will behave: I haven't built it yet (and don't have the entire
> implementation in my head yet).
> 
> >> 3. Ability to have very many large submodule repositories without the
> >> performance hit.  It makes sense to block stat() from going through
> >> when you have floating submodules.  This means that many levels of
> >> nesting are very easily possible.
> >
> > Can't I already control this to some degree?  Certainly the following
> > commands take different amounts of time to run:
> >
> >     git status
> >     git -c status.submodulesummary=true status
> 
> You can't control the most fundamental thing, stat(): this is the
> primary killer of performance on a large worktree.  There is currently
> no way to block stat(): new-style submodules offers a way to configure
> which submodules to block the stat() on.

So it would be something like per-submodule --untracked-files and
--ignore-submodules settings?  I can see that being useful.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 18:07                     ` Ramkumar Ramachandra
  2013-04-07 18:21                       ` John Keeping
@ 2013-04-07 18:22                       ` Ramkumar Ramachandra
  1 sibling, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 18:22 UTC (permalink / raw)
  To: John Keeping; +Cc: Junio C Hamano, Git List, Linus Torvalds

Ramkumar Ramachandra wrote:
> You can't control the most fundamental thing, stat(): this is the
> primary killer of performance on a large worktree.  There is currently
> no way to block stat(): new-style submodules offers a way to configure
> which submodules to block the stat() on.

Let me try to put this in simpler language.  When you run 'git status'
on your toplevel repository, old-style submodules runs git status on
each of the submodule repositories also: this is because submodules
were traditionally fixed; therefore, if you forget to commit some
changes in submodules before you make the superproject commit, you
might break the build.  In new-style submodules, you can prevent this
from happening on a link-specific basis.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 18:21                       ` John Keeping
@ 2013-04-07 18:34                         ` Jens Lehmann
  2013-04-07 18:44                           ` Ramkumar Ramachandra
  2013-04-07 18:59                           ` John Keeping
  2013-04-07 18:37                         ` Ramkumar Ramachandra
  1 sibling, 2 replies; 140+ messages in thread
From: Jens Lehmann @ 2013-04-07 18:34 UTC (permalink / raw)
  To: John Keeping
  Cc: Ramkumar Ramachandra, Junio C Hamano, Git List, Linus Torvalds

Am 07.04.2013 20:21, schrieb John Keeping:
> On Sun, Apr 07, 2013 at 11:37:02PM +0530, Ramkumar Ramachandra wrote:
>> John Keeping wrote:
>>> On Sun, Apr 07, 2013 at 10:52:50PM +0530, Ramkumar Ramachandra wrote:
>>>> 3. Ability to have very many large submodule repositories without the
>>>> performance hit.  It makes sense to block stat() from going through
>>>> when you have floating submodules.  This means that many levels of
>>>> nesting are very easily possible.
>>>
>>> Can't I already control this to some degree?  Certainly the following
>>> commands take different amounts of time to run:
>>>
>>>     git status
>>>     git -c status.submodulesummary=true status
>>
>> You can't control the most fundamental thing, stat(): this is the
>> primary killer of performance on a large worktree.  There is currently
>> no way to block stat(): new-style submodules offers a way to configure
>> which submodules to block the stat() on.
> 
> So it would be something like per-submodule --untracked-files and
> --ignore-submodules settings?  I can see that being useful.

Ram is plain wrong here (just like he is on "git rm" and "git mv",
even though the latter is currently still in pu). This use case is
handled by submodules for years now. Take a look at the "ignore"
setting in .gitmodules which give you full control of the stat()s
in submodules, in addition you have the repo-wide option
diff.ignoreSubmodules.

The whole feature list is full of red herrings like this which
have nothing to do with the advantages of a new object, but talk
about UI issues which are easy to solve in both worlds.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 18:21                       ` John Keeping
  2013-04-07 18:34                         ` Jens Lehmann
@ 2013-04-07 18:37                         ` Ramkumar Ramachandra
  1 sibling, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 18:37 UTC (permalink / raw)
  To: John Keeping; +Cc: Junio C Hamano, Git List, Linus Torvalds

John Keeping wrote:
> I can't see how this gets me a dirty working tree.  Since the link needs
> to be stored somewhere, I assume it's in the index; so I can have staged
> changes, but not unstaged changes.

If you have changes in the index, your worktree is classified as
"dirty".  See git-sh-setup.sh:require_clean_work_tree().  Yes, you
can't have unstaged changes, only untracked changes.  Linus: I made a
mistake here.

> But what happens if I make any changes on top?  With --rebase and
> --merge I can specify exactly what I want to happen (and obviously if I
> don't have any changes then whichever I choose simply sets my branch to
> the upstream ref).

Hm, that is a useful thing to support: we'll have to record both the
SHA-1 and the ref.

> So it would be something like per-submodule --untracked-files and
> --ignore-submodules settings?  I can see that being useful.

Yes.  Never mind; Jens just pointed out that I'm wrong about this.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 18:34                         ` Jens Lehmann
@ 2013-04-07 18:44                           ` Ramkumar Ramachandra
  2013-04-07 20:15                             ` Jens Lehmann
  2013-04-07 18:59                           ` John Keeping
  1 sibling, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 18:44 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: John Keeping, Junio C Hamano, Git List, Linus Torvalds

Jens Lehmann wrote:
> Ram is plain wrong here (just like he is on "git rm" and "git mv",
> even though the latter is currently still in pu). This use case is
> handled by submodules for years now. Take a look at the "ignore"
> setting in .gitmodules which give you full control of the stat()s
> in submodules, in addition you have the repo-wide option
> diff.ignoreSubmodules.

Oh, I didn't know about ignore in .gitmodules.  Sorry about that.

> The whole feature list is full of red herrings like this which
> have nothing to do with the advantages of a new object, but talk
> about UI issues which are easy to solve in both worlds.

Really?  git-submodule.sh was written in 2007, and does not have git
mv or cd-to-toplevel restriction removed to date.  What does that say
about git-submodule?

I specifically said end-user's perspective.  Why exactly would I be
talking about the advantages of the link object?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 18:34                         ` Jens Lehmann
  2013-04-07 18:44                           ` Ramkumar Ramachandra
@ 2013-04-07 18:59                           ` John Keeping
  2013-04-07 19:06                             ` Ramkumar Ramachandra
  1 sibling, 1 reply; 140+ messages in thread
From: John Keeping @ 2013-04-07 18:59 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Ramkumar Ramachandra, Junio C Hamano, Git List, Linus Torvalds

On Sun, Apr 07, 2013 at 08:34:27PM +0200, Jens Lehmann wrote:
> The whole feature list is full of red herrings like this which
> have nothing to do with the advantages of a new object, but talk
> about UI issues which are easy to solve in both worlds.

With the clarifications Ram's provided in this thread, I think there are
also some important regressions in functionality in his proposal (at
least as it currently stands), particularly losing the .gitconfig
overrides.

The only proposed change that seems to me to be impossible with the
current .gitmodules approach is the "submodule in a non-initialized
submodule" feature, but I've never seen anyone ask for that and it seems
likely to open a whole can of worms where the behaviour is likely to
vary with $PWD.  The current hierarchical approach provides sensible
encapsulation of repositories and is simple to understand: once you're
in a repository nothing above its root directory affects you.

It doesn't seem to me that "it's harder than I'd like to add a feature I
want" is a good reason to subject all users of submodules to a lot of
pain migrating to some new implementation that doesn't work the way
they're used to and which will mean they have to deal with complaints
when people using an older version of Git can't clone their repository
(and I doubt we want this mailing list to be flooded with such
complaints either).

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 18:59                           ` John Keeping
@ 2013-04-07 19:06                             ` Ramkumar Ramachandra
  2013-04-07 19:17                               ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 19:06 UTC (permalink / raw)
  To: John Keeping; +Cc: Jens Lehmann, Junio C Hamano, Git List, Linus Torvalds

John Keeping wrote:
> With the clarifications Ram's provided in this thread, I think there are
> also some important regressions in functionality in his proposal (at
> least as it currently stands), particularly losing the .gitconfig
> overrides.

If we want the entire feature list in the very first iteration, it's
going to be huge.

> The only proposed change that seems to me to be impossible with the
> current .gitmodules approach is the "submodule in a non-initialized
> submodule" feature, but I've never seen anyone ask for that and it seems
> likely to open a whole can of worms where the behaviour is likely to
> vary with $PWD.  The current hierarchical approach provides sensible
> encapsulation of repositories and is simple to understand: once you're
> in a repository nothing above its root directory affects you.

That can be implemented in the current submodule system too, fwiw.

> It doesn't seem to me that "it's harder than I'd like to add a feature I
> want" is a good reason to subject all users of submodules to a lot of
> pain migrating to some new implementation that doesn't work the way
> they're used to and which will mean they have to deal with complaints
> when people using an older version of Git can't clone their repository
> (and I doubt we want this mailing list to be flooded with such
> complaints either).

Like I've said before: there is nothing that _cannot_ be done with the
current submodule system.  To see the real advantages of this new
submodule system, you have to think like a developer, not an end-user.
 Focusing just on end-user happiness is a very myopic way to develop
software, and I think the git community is better than that.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 19:06                             ` Ramkumar Ramachandra
@ 2013-04-07 19:17                               ` Ramkumar Ramachandra
  0 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 19:17 UTC (permalink / raw)
  To: John Keeping; +Cc: Jens Lehmann, Junio C Hamano, Git List, Linus Torvalds

Ramkumar Ramachandra wrote:
>> The only proposed change that seems to me to be impossible with the
>> current .gitmodules approach is the "submodule in a non-initialized
>> submodule" feature, but I've never seen anyone ask for that and it seems
>> likely to open a whole can of worms where the behaviour is likely to
>> vary with $PWD.  The current hierarchical approach provides sensible
>> encapsulation of repositories and is simple to understand: once you're
>> in a repository nothing above its root directory affects you.
>
> That can be implemented in the current submodule system too, fwiw.

And yes, it's going to be very ugly: I imagine special refs pointing
to blobs, or something like that.  Then again, what do end-users care
about?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 15:51         ` Ramkumar Ramachandra
  2013-04-07 16:12           ` John Keeping
@ 2013-04-07 19:26           ` Ramkumar Ramachandra
       [not found]             ` <CAP8UFD3i2vc3OSAHRERpiPY7cRjqhkqcBN9hVW0QmMksnCPccw@mail.gmail.com>
  1 sibling, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 19:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Linus Torvalds

This reminds me of the commit generation numbers thread.

"But how can we determine ancestry?"
"Use the commit timestamp."
"But what if there are clock skews?"
"Put in a slop."

It breaks existing stuff, and it's hard to show any end-user benefit.
I fear this proposal will meet with the same fate.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 18:44                           ` Ramkumar Ramachandra
@ 2013-04-07 20:15                             ` Jens Lehmann
  2013-04-07 20:49                               ` Ramkumar Ramachandra
                                                 ` (2 more replies)
  0 siblings, 3 replies; 140+ messages in thread
From: Jens Lehmann @ 2013-04-07 20:15 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: John Keeping, Junio C Hamano, Git List, Linus Torvalds

Am 07.04.2013 20:44, schrieb Ramkumar Ramachandra:
> Jens Lehmann wrote:
>> The whole feature list is full of red herrings like this which
>> have nothing to do with the advantages of a new object, but talk
>> about UI issues which are easy to solve in both worlds.
> 
> Really?  git-submodule.sh was written in 2007, and does not have git
> mv or cd-to-toplevel restriction removed to date.  What does that say
> about git-submodule?

That there is still some work to do, which I never denied and am
actively working on (see "git mv" support in pu, which tackles one
of the UI issues you mentioned).

> I specifically said end-user's perspective.  Why exactly would I be
> talking about the advantages of the link object?

Because they are all that matters when it comes to decide if a link
object should be introduced to replace the current model. We should
discuss the differences in the UI that result from introducing such
an object, not the stuff that is still missing from our current
implementation (as that has to be coded either way and can not be
taken in favor of either solution). And we can additionally also
talk about the differences in hacking on git, where I concede that
putting everything into a single object could lead to shorter code
than having to consult a .gitmodules file for that (even though I
believe these arguments are much less important than UI changes).

Just to be sure: I think we agree that both approaches are capable
of allowing all relevant use cases, because they store the same
information?

Disclaimer: I am not opposed to the link object per se, but after
all we are talking about severely changing user visible behavior.
So I want to see striking evidence that we gain something from it,
discussed separately from UI deficiencies of the current code (no
cd-to-toplevel please ;-).

So I started putting together a list of advantages and one of
disadvantages of the new link object compared to the current model.
We can extend and refine that to see what your proposal would mean
for us. After all we are talking about severely changing user
visible behavior, so we need convincing reasons to do that.


Advantages:

* Information is stored in one place, no need to lookup stuff in
  another file/blob.

* Easier coding, as we find all information in a single object.

(I did not forget to add the point that you currently need a
checked out work tree to access the .gitmodules file, as there is
ongoing work to read the configuration directly from the database)

(Another advantage would be that it is easier to merge the link
object, but a - still to be coded - .gitmodules aware merge driver
would work just as well)


Disadvantages:

* Changes in user visible behavior, possible compatibility
  problems when Git versions are mixed.

* Special tools are needed to edit submodule information where
  currently a plain editor is sufficient.

* merge conflicts are harder to resolve and require special git
  commands, solving them in .gitmodules is way more intuitive
  as users are already used to conflict markers.

* A link object has no unstaged counterpart that a file easily
  has. What would that mean for adding a submodule and then
  unstaging it (or how could we add a submodule unstaged, like
  you proposed in another email)?

(I think when we also put the submodule name in the object we
could also retain the ability to repopulated moved submodules
from their old repo, which is found by that name)


I'm not saying that this list is complete, I just wrote down
what came to mind. When we e.g. find workable solutions to the
Disadvantages we can remove them from the list and append them
in parentheses for later reference like I did here. Does that
sound like a plan?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 20:15                             ` Jens Lehmann
@ 2013-04-07 20:49                               ` Ramkumar Ramachandra
  2013-04-07 21:02                                 ` John Keeping
  2013-04-07 20:57                               ` Ramkumar Ramachandra
  2013-04-08 20:41                               ` Jens Lehmann
  2 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 20:49 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: John Keeping, Junio C Hamano, Git List, Linus Torvalds

Jens Lehmann wrote:
> Just to be sure: I think we agree that both approaches are capable
> of allowing all relevant use cases, because they store the same
> information?

Yes.

> Disclaimer: I am not opposed to the link object per se, but after
> all we are talking about severely changing user visible behavior.
> So I want to see striking evidence that we gain something from it,
> discussed separately from UI deficiencies of the current code (no
> cd-to-toplevel please ;-).

The only mandatory user-visible behavior change is the absence of
.gitmodules.  The git submodule subcommand will be have to be present
and made to work, whether we like it or not.

> (I did not forget to add the point that you currently need a
> checked out work tree to access the .gitmodules file, as there is
> ongoing work to read the configuration directly from the database)

Read the configuration from the database?  How?
Also, I want refs quite badly: I really can't stand repo.

> (Another advantage would be that it is easier to merge the link
> object, but a - still to be coded - .gitmodules aware merge driver
> would work just as well)

It's very simple to implement: if you turn it into a blob, you can
diff and merge as usual.

> Disadvantages:
>
> * Changes in user visible behavior, possible compatibility
>   problems when Git versions are mixed.

Agreed.

> * Special tools are needed to edit submodule information where
>   currently a plain editor is sufficient.

Um, I actually really like this.  I don't want to cd-to-toplevel, open
up my .gitmodules and look for the relevant section.  And it's a very
simple tool: see the git cat-file that I posted earlier.

> * merge conflicts are harder to resolve and require special git
>   commands, solving them in .gitmodules is way more intuitive
>   as users are already used to conflict markers.

There shouldn't be that many merge conflicts to begin with!  It
happens because you've stuffed all the information into one gigantic
.gitmodules.  With links, life is *much* easier: you already have a
tight buffer format and a predefined order in which the key/value
pairs will appear.  But yes, we will require to grow git-core to merge
links seamlessly.

> * A link object has no unstaged counterpart that a file easily
>   has. What would that mean for adding a submodule and then
>   unstaging it (or how could we add a submodule unstaged, like
>   you proposed in another email)?

Adding a submodule untracked (not unstaged) is possible, and is
default: git clone gets the submodules, and you have to use git add to
stage it.  I agree that you can't edit-link and have an unstaged
change, but I really don't care about that.

> (I think when we also put the submodule name in the object we
> could also retain the ability to repopulated moved submodules
> from their old repo, which is found by that name)

Hm, considering that the information is not present anywhere
(certainly not in the tree), this is probably a good idea.  We'd have
the history of the submodule's name too.

> I'm not saying that this list is complete, I just wrote down
> what came to mind. When we e.g. find workable solutions to the
> Disadvantages we can remove them from the list and append them
> in parentheses for later reference like I did here. Does that
> sound like a plan?

Yes, good plan.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 20:15                             ` Jens Lehmann
  2013-04-07 20:49                               ` Ramkumar Ramachandra
@ 2013-04-07 20:57                               ` Ramkumar Ramachandra
  2013-04-07 21:23                                 ` Jonathan Nieder
  2013-04-08 20:41                               ` Jens Lehmann
  2 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 20:57 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: John Keeping, Junio C Hamano, Git List, Linus Torvalds

Jens Lehmann wrote:
> * Easier coding, as we find all information in a single object.

It's not just the difference between a single location versus multiple
locations.  It's about the core object code of git parsing links, as
opposed to a fringe submodule.c/ submodule.sh parsing .gitmodules.
When you push git-submodule.sh into core, you'll have to constantly
call functions to parse .gitmodules and get the information.  With
links, all that information is free, provided you've parsed the
object.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 20:49                               ` Ramkumar Ramachandra
@ 2013-04-07 21:02                                 ` John Keeping
  2013-04-07 21:11                                   ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: John Keeping @ 2013-04-07 21:02 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Jens Lehmann, Junio C Hamano, Git List, Linus Torvalds

On Mon, Apr 08, 2013 at 02:19:10AM +0530, Ramkumar Ramachandra wrote:
> Jens Lehmann wrote:
> > * A link object has no unstaged counterpart that a file easily
> >   has. What would that mean for adding a submodule and then
> >   unstaging it (or how could we add a submodule unstaged, like
> >   you proposed in another email)?
> 
> Adding a submodule untracked (not unstaged) is possible, and is
> default: git clone gets the submodules, and you have to use git add to
> stage it.  I agree that you can't edit-link and have an unstaged
> change, but I really don't care about that.

I do.  I quite often use "git add -p" to sort things out and submodules
currently fit into that seamlessly: I can add the submodule and then
wait until later to commit it, without needing to either clone and
remember to "submodule add" later or commit and play with rebase.

Losing the ability to do that is a major usability regression as far as
I'm concerned.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 21:02                                 ` John Keeping
@ 2013-04-07 21:11                                   ` Ramkumar Ramachandra
  0 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 21:11 UTC (permalink / raw)
  To: John Keeping; +Cc: Jens Lehmann, Junio C Hamano, Git List, Linus Torvalds

John Keeping wrote:
> I do.  I quite often use "git add -p" to sort things out and submodules
> currently fit into that seamlessly: I can add the submodule and then
> wait until later to commit it, without needing to either clone and
> remember to "submodule add" later or commit and play with rebase.
>
> Losing the ability to do that is a major usability regression as far as
> I'm concerned.

I didn't realize people cared so deeply about this.  Sure, we can
emulate it: keeping the information in .git/link-specs/ doesn't sound
like a bad idea.  While it may feel like a hack, I think it's the
right approach if we want to support non-fs-backed objects in our
tree.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 20:57                               ` Ramkumar Ramachandra
@ 2013-04-07 21:23                                 ` Jonathan Nieder
  2013-04-07 21:30                                   ` Ramkumar Ramachandra
  2013-04-17 10:37                                   ` Duy Nguyen
  0 siblings, 2 replies; 140+ messages in thread
From: Jonathan Nieder @ 2013-04-07 21:23 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Jens Lehmann, John Keeping, Junio C Hamano, Git List, Linus Torvalds

Ramkumar Ramachandra wrote:

>             It's about the core object code of git parsing links, as
> opposed to a fringe submodule.c/ submodule.sh parsing .gitmodules.

What's stopping the core object code of git parsing .gitmodules?  What
is the core object code?  How does this compare to other metadata
files like .gitattributes and .gitignore?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
       [not found]             ` <CAP8UFD3i2vc3OSAHRERpiPY7cRjqhkqcBN9hVW0QmMksnCPccw@mail.gmail.com>
@ 2013-04-07 21:24               ` Ramkumar Ramachandra
       [not found]                 ` <CAP8UFD16gwWjE7T75D7kUM-VOXhtZaSRGtEg8fW5kmuKDLTQHQ@mail.gmail.com>
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 21:24 UTC (permalink / raw)
  To: Christian Couder; +Cc: Junio C Hamano, Git List, Linus Torvalds

Christian Couder wrote:
> About generation numbers, please have a look at the thread leading to this
> message:
>
> http://thread.gmane.org/gmane.comp.version-control.git/177146/focus=177586
>
> In short, generation numbers were not such a good idea because there were
> already existing ways to get around the problem and because there was no
> simple way to implement them without breaking other things.

Thanks for the interesting read, Christian.  I didn't follow the
discussion closely, and only have a passing understanding/ interest in
the issue.

> My opinion is that your proposal can only be accepted if it is also a
> solution, or a big step toward a solution, to other difficult problems, like
> for example narrow/subtree clones.

Hm, a link object referring to a tree object, as opposed to a
revision.  I'll think about this for some time.

> So you should try to improve it by looking for other important features it
> could provide in a simple way.
> This would prove, or at least be a good sign, that it is a fundamental
> improvement to add a link object the way you describe it.

I'll look for more submodule-like features to strengthen my case.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 21:23                                 ` Jonathan Nieder
@ 2013-04-07 21:30                                   ` Ramkumar Ramachandra
  2013-04-08  7:48                                     ` Jens Lehmann
  2013-04-17 10:37                                   ` Duy Nguyen
  1 sibling, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-07 21:30 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Jens Lehmann, John Keeping, Junio C Hamano, Git List, Linus Torvalds

Jonathan Nieder wrote:
> What's stopping the core object code of git parsing .gitmodules?

Nothing, except that it's perversely unnatural for object parsing code
to parse something outside the object store.

> What
> is the core object code?

parse_link_buffer(): the conventions have already been set by
parse_blob_buffer(), parse_tree_buffer() etc.

> How does this compare to other metadata
> files like .gitattributes and .gitignore?

.gitignore and .gitattributes are parsed in dir.c, where git "treats"
worktree paths.  It's quite nicely integrated.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 21:30                                   ` Ramkumar Ramachandra
@ 2013-04-08  7:48                                     ` Jens Lehmann
  2013-04-08  8:07                                       ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Jens Lehmann @ 2013-04-08  7:48 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Jonathan Nieder, John Keeping, Junio C Hamano, Git List, Linus Torvalds

Am 07.04.2013 23:30, schrieb Ramkumar Ramachandra:
> Jonathan Nieder wrote:
>> What's stopping the core object code of git parsing .gitmodules?

Just to clarify that: git core already does that. A "git grep
gitmodules_config" shows it is parsed by some git core commands:
checkout, commit, the diff family and fetch. Others will follow
in the recursive update series. And "git mv" support will teach
that command to manipulate the .gitmodules file (and I hope that
a patch teaching "git rm" to remove the section from .gitmodules
will be accepted in the near future).

> Nothing, except that it's perversely unnatural for object parsing code
> to parse something outside the object store.

Hmm, at least the unstaged .gitmodules file has to be parsed from
the file system. And Heiko's current work on parsing .gitmodules
directly from the object store will help here too, right?

>> How does this compare to other metadata
>> files like .gitattributes and .gitignore?
> 
> .gitignore and .gitattributes are parsed in dir.c, where git "treats"
> worktree paths.  It's quite nicely integrated.

And .gitmodules is parsed in submodule.c where Git treats
.gitmodules entries. So I don't see a problem here, what am I
missing?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08  7:48                                     ` Jens Lehmann
@ 2013-04-08  8:07                                       ` Ramkumar Ramachandra
  2013-04-08  8:19                                         ` Jonathan Nieder
  2013-04-08  8:37                                         ` Jonathan Nieder
  0 siblings, 2 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-08  8:07 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Jonathan Nieder, John Keeping, Junio C Hamano, Git List, Linus Torvalds

Jens Lehmann wrote:
> Hmm, at least the unstaged .gitmodules file has to be parsed from
> the file system.

You seem to be touting it as a distinct advantage.  In my opinion,
.gitmodules is a wart that needs to be done away with: it should _not_
be on the filesystem, just like a commit object isn't on the
filesystem.  Getting links to unstage is two hours of work, tops.  And
I'm the one writing the whole thing, so I don't see what everyone else
is complaining about.

> And Heiko's current work on parsing .gitmodules
> directly from the object store will help here too, right?

Ofcourse, you _can_ parse a blob into a struct.  It's just extremely
gross to treat a blob located in a certain tree path differently from
other blobs.  It's a perverse violation of git's fundamental design,
and I'm strongly against such a change.

What I still fail to understand is why you keep mentioning
work-in-progress.  You've had five years in which you haven't been
able to do things that I did in two days.  Yes, you _can_ keep
.gitmodules and hack around everything, but why do you _want_ to do
that?  Preserving backward compatibility is not *that* important, in
my opinion.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08  8:07                                       ` Ramkumar Ramachandra
@ 2013-04-08  8:19                                         ` Jonathan Nieder
  2013-04-08  9:08                                           ` Ramkumar Ramachandra
  2013-04-08  8:37                                         ` Jonathan Nieder
  1 sibling, 1 reply; 140+ messages in thread
From: Jonathan Nieder @ 2013-04-08  8:19 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Jens Lehmann, John Keeping, Junio C Hamano, Git List, Linus Torvalds

Hi Ram,

Ramkumar Ramachandra wrote:

>                                                     In my opinion,
> .gitmodules is a wart that needs to be done away with: it should _not_
> be on the filesystem, just like a commit object isn't on the
> filesystem.

What do you think of .gitignore and .gitattributes?  Should they be
somewhere other than the filesystem as well?

[...]
> What I still fail to understand is why you keep mentioning
> work-in-progress.  You've had five years in which you haven't been
> able to do things that I did in two days.

I don't think Jens had any obligation to work on submodules and
nothing else for the last five years. ;-)

If you end up convincing others that your tools are worth working
on and those tools pleasantly take care of the same workflows that
submodules do, then I imagine people will be happy to migrate.

Speaking only for myself, I actually prefer the submodule UI, despite
not being thrilled with the
single-.gitmodules-file-at-the-root-of-the-worktree feature.  So I
will not be working on your proposed redesign, unless it evolves
enough to be as pleasant a UI as (the long proposed UI of) submodules.

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08  8:07                                       ` Ramkumar Ramachandra
  2013-04-08  8:19                                         ` Jonathan Nieder
@ 2013-04-08  8:37                                         ` Jonathan Nieder
  2013-04-08  9:14                                           ` Ramkumar Ramachandra
  2013-04-08 14:46                                           ` Junio C Hamano
  1 sibling, 2 replies; 140+ messages in thread
From: Jonathan Nieder @ 2013-04-08  8:37 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Jens Lehmann, John Keeping, Junio C Hamano, Git List, Linus Torvalds

Ramkumar Ramachandra wrote:
> Jens Lehmann wrote:

>> Hmm, at least the unstaged .gitmodules file has to be parsed from
>> the file system.
>
> You seem to be touting it as a distinct advantage.

To clarify what I said in a side thread: yes, as long as the submodule
metadata includes the hostname I am downloading a library from, having
it in an ordinary file is an advantage.

The problem with URLs (and especially hostnames) is that they change.
When my project's previous domain name is lost because the hosting
company lost interest, I want to be able to grep for all instances of
that domain name in my project's documentation and metadata and change
them all at once with a simple command like the following:

	git grep -l -F -e oldhost.example.com |
	xargs sed -i -e s/oldhost.example.com/newhost.example.com/g

When I clone a project with --no-recurse-submodules, I want to be able
to see what other servers will be contacted when I run "git checkout
--recurse-submodules".  The current .gitmodules file lets me find that
out with a simple, intuitive command:

	cat .gitmodules

I might change some URLs locally, because I know that some project's
upstream has moved.

	git submodule init
	git config --edit

On the other hand, the single .gitmodules file will be a pain to merge
if multiple branches modify it.  So I do look forward to a merge
strategy that deals more intelligently with its content, and wouldn't
have minded a design that split this information into multiple files
if we were starting over.

Jonathan

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08  8:19                                         ` Jonathan Nieder
@ 2013-04-08  9:08                                           ` Ramkumar Ramachandra
  2013-04-08 10:29                                             ` Duy Nguyen
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-08  9:08 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Jens Lehmann, John Keeping, Junio C Hamano, Git List, Linus Torvalds

Jonathan Nieder wrote:
> What do you think of .gitignore and .gitattributes?  Should they be
> somewhere other than the filesystem as well?

I would argue that .gitignore and .gitattributes are done right.  They
are integrated into a very mature part of git-core very well, and
their nature is fundamentally different from that of .gitmodules.

.gitignore and .gitattributes specify extended globs (see: wildmatch)
rules to apply on the worktree, and can be in multiple places in the
worktree.  They apply strictly on the current worktree; they have
nothing to do with the index, and have no interaction with other
objects in the repository.  Now, you might argue that they should be
part of the tree object, but I will disagree because they don't
operate on concrete entries in the tree but rather extended globs that
match worktree paths.  .gitmodules, on the other hand, specifies
fundamental repository composition: it should be a special object in
the tree precisely because it changes the fundamental meaning of one
concrete tree entry.  It has nothing to do with path treatment in the
worktree, and hence has nothing to do with .gitattributes
or.gitignore.

> I don't think Jens had any obligation to work on submodules and
> nothing else for the last five years. ;-)

I know.  What I'm saying is that his current approach is just filled
with tons of unnecessary complexity, inelegance, and pain.  This is
evidenced by the fact that the current submodule system is pathetic
after five years of work (and I don't think the developers working on
it were particularly incompetent or lazy).

> If you end up convincing others that your tools are worth working
> on and those tools pleasantly take care of the same workflows that
> submodules do, then I imagine people will be happy to migrate.

Yes, I'm planning a strict superset of the current submodule system
features.  After some thought, I've decided not to have any feature
regressions in my first version for merge (although that means a lot
of work for me).

> Speaking only for myself, I actually prefer the submodule UI, despite
> not being thrilled with the
> single-.gitmodules-file-at-the-root-of-the-worktree feature.  So I
> will not be working on your proposed redesign, unless it evolves
> enough to be as pleasant a UI as (the long proposed UI of) submodules.

I'm very interested in building a pleasant UI.  I've always been a
person who cares deeply about UI: this is evidenced by my recent
remote.pushdefault patch, and my pull.autostash WIP.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08  8:37                                         ` Jonathan Nieder
@ 2013-04-08  9:14                                           ` Ramkumar Ramachandra
  2013-04-08 14:46                                           ` Junio C Hamano
  1 sibling, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-08  9:14 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Jens Lehmann, John Keeping, Junio C Hamano, Git List, Linus Torvalds

Jonathan Nieder wrote:
>         git grep -l -F -e oldhost.example.com |
>         xargs sed -i -e s/oldhost.example.com/newhost.example.com/g

Yes, I've had to do this too: in a proxied environment I had to
s/git:\/\//https:\/\//.  So yes, we will have features to operate on
multiple links at the same time.  I'm thinking something fine-grained
that allows you to pick which links to operate on.  It's currently a
vague thought, and I'm not sure what the implementation will look
like.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-04 18:55 ` Jonathan Nieder
@ 2013-04-08 10:10   ` Duy Nguyen
  2013-04-08 10:26     ` [PATCH] t3700 (add): add failing test for add with submodules Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Duy Nguyen @ 2013-04-08 10:10 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Jonathan Nieder, Git List, Junio C Hamano, Linus Torvalds

On Fri, Apr 5, 2013 at 5:55 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Ramkumar Ramachandra wrote:
>
>> 1. 'git add' should not go past submodule boundaries.  I should not be
>>    able to 'git add clayoven/' or 'git add clayoven/LICENSE'.  In
>>    addition, the shell completion also needs to be fixed.
>
> Yep.  This is a bug.

I notice that this case is handled by git-add, but there is probably a
bug somewhere. Ram, can you make a test case for this?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH] t3700 (add): add failing test for add with submodules
  2013-04-08 10:10   ` Duy Nguyen
@ 2013-04-08 10:26     ` Ramkumar Ramachandra
  2013-04-08 11:04       ` Duy Nguyen
  2013-04-08 21:30       ` Jeff King
  0 siblings, 2 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-08 10:26 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jonathan Nieder, Git List, Junio C Hamano, Linus Torvalds

git add currently goes past submodule boundaries.  Document this bug.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 t/t3700-add.sh | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 874b3a6..a1ea050 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -310,4 +310,18 @@ test_expect_success 'git add --dry-run --ignore-missing of non-existing file out
 	test_i18ncmp expect.err actual.err
 '
 
+test_expect_failure 'git add should not go past submodule boundaries' '
+	mkdir submodule_dir &&
+	(
+		cd submodule_dir &&
+		git init &&
+		cat >foo <<-\EOF &&
+		Some content
+		EOF
+		git add foo &&
+		git commit -a -m "Add foo"
+	) &&
+	git add submodule_dir/foo
+'
+
 test_done
-- 
1.8.2.373.g961c512

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08  9:08                                           ` Ramkumar Ramachandra
@ 2013-04-08 10:29                                             ` Duy Nguyen
  2013-04-08 11:06                                               ` Ramkumar Ramachandra
  2013-04-08 11:10                                               ` Ramkumar Ramachandra
  0 siblings, 2 replies; 140+ messages in thread
From: Duy Nguyen @ 2013-04-08 10:29 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Jonathan Nieder, Jens Lehmann, John Keeping, Junio C Hamano,
	Git List, Linus Torvalds

On Mon, Apr 8, 2013 at 7:08 PM, Ramkumar Ramachandra <artagnon@gmail.com> wrote:
> Jonathan Nieder wrote:
>> What do you think of .gitignore and .gitattributes?  Should they be
>> somewhere other than the filesystem as well?
>
> I would argue that .gitignore and .gitattributes are done right.  They
> are integrated into a very mature part of git-core very well, and
> their nature is fundamentally different from that of .gitmodules.

Probably off-topic, but I'm starting to find ".gitignore can be found
in every directory" a burden to day-to-day git operations. So imo it's
not done right entirely ;-)

> .gitignore and .gitattributes specify extended globs (see: wildmatch)
> rules to apply on the worktree, and can be in multiple places in the
> worktree.  They apply strictly on the current worktree; they have
> nothing to do with the index, and have no interaction with other
> objects in the repository.

Index operations sometimes read these .git{ignore,attributes}. I
believe git-archive reads worktree's .gitattributes, so it's not
really just about worktree.

>> I don't think Jens had any obligation to work on submodules and
>> nothing else for the last five years. ;-)
>
> I know.  What I'm saying is that his current approach is just filled
> with tons of unnecessary complexity, inelegance, and pain.  This is
> evidenced by the fact that the current submodule system is pathetic
> after five years of work (and I don't think the developers working on
> it were particularly incompetent or lazy).

I don't follow this thread closely, but I think there's a common
ground where improvements can benefit both approaches. There are a lot
of problems for deep integration and erasing submodule's boundaries
from UI perspective. I think maybe you can work on that first, gain
experience along the way, and maintain the link-object changes
separately. Maybe someday you will manage to switch .gitmodules with
it. Or maybe I'm wrong (partly because I did not read the whole
thread)
-- 
Duy

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH] t3700 (add): add failing test for add with submodules
  2013-04-08 10:26     ` [PATCH] t3700 (add): add failing test for add with submodules Ramkumar Ramachandra
@ 2013-04-08 11:04       ` Duy Nguyen
  2013-04-08 15:07         ` Junio C Hamano
  2013-04-08 21:30       ` Jeff King
  1 sibling, 1 reply; 140+ messages in thread
From: Duy Nguyen @ 2013-04-08 11:04 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Jonathan Nieder, Git List, Junio C Hamano, Linus Torvalds

On Mon, Apr 8, 2013 at 8:26 PM, Ramkumar Ramachandra <artagnon@gmail.com> wrote:
> +test_expect_failure 'git add should not go past submodule boundaries' '
> +       mkdir submodule_dir &&
> +       (
> +               cd submodule_dir &&
> +               git init &&
> +               cat >foo <<-\EOF &&
> +               Some content
> +               EOF
> +               git add foo &&
> +               git commit -a -m "Add foo"
> +       ) &&
> +       git add submodule_dir/foo

Thanks. I yhink the last line above should be "test_must_fail git add
...". I'm half way of fixing it (I think treat_leading_directory is a
bit loose). Will continue tomorrow (or this weekend, depending on
$DAYJOB)
--
Duy

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 10:29                                             ` Duy Nguyen
@ 2013-04-08 11:06                                               ` Ramkumar Ramachandra
  2013-04-08 11:29                                                 ` Duy Nguyen
  2013-04-08 11:10                                               ` Ramkumar Ramachandra
  1 sibling, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-08 11:06 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jonathan Nieder, Jens Lehmann, John Keeping, Junio C Hamano,
	Git List, Linus Torvalds

Duy Nguyen wrote:
> Probably off-topic, but I'm starting to find ".gitignore can be found
> in every directory" a burden to day-to-day git operations. So imo it's
> not done right entirely ;-)

Why is it a burden?  I would argue that the tooling support is not yet
there, but git check-ignore is a step in the right direction.  What
alternate design would you propose, just out of curiosity?

> Index operations sometimes read these .git{ignore,attributes}. I
> believe git-archive reads worktree's .gitattributes, so it's not
> really just about worktree.

I should've said "largely, only affects the current worktree".

> I don't follow this thread closely, but I think there's a common
> ground where improvements can benefit both approaches. There are a lot
> of problems for deep integration and erasing submodule's boundaries
> from UI perspective. I think maybe you can work on that first, gain
> experience along the way, and maintain the link-object changes
> separately. Maybe someday you will manage to switch .gitmodules with
> it. Or maybe I'm wrong (partly because I did not read the whole
> thread)

Yes, there is some common ground.  But:

1. The inspiration for fixing fundamental design problems comes from
my redesign.  For instance, I would've never discovered the git add
bug if I'd not attempted to git add (as opposed to the unnatural
abstraction that git submodule add presents).

2. I think it is absolutely imperative that we do the redesign now,
before we've descended too far into the madness that the current
design is.  I think I'm capable of doing the redesign now, with some
help and support from the list.  My attitude doesn't align with the
"I'm feeling lazy; why don't we postpone it?" argument.  Let's finish
what I started now: I'm more than willing to dedicate the next few
months full-time towards finishing this and getting it merged.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 10:29                                             ` Duy Nguyen
  2013-04-08 11:06                                               ` Ramkumar Ramachandra
@ 2013-04-08 11:10                                               ` Ramkumar Ramachandra
  1 sibling, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-08 11:10 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jonathan Nieder, Jens Lehmann, John Keeping, Junio C Hamano,
	Git List, Linus Torvalds

Duy Nguyen wrote:
> Probably off-topic, but I'm starting to find ".gitignore can be found
> in every directory" a burden to day-to-day git operations. So imo it's
> not done right entirely ;-)

Or are you saying it's hard to implement elegantly and efficiently in
git-core?  If so, I agree wholeheartedly.  I'm not yet sure what to do
about the situation.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 11:06                                               ` Ramkumar Ramachandra
@ 2013-04-08 11:29                                                 ` Duy Nguyen
  2013-04-08 11:53                                                   ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Duy Nguyen @ 2013-04-08 11:29 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Jonathan Nieder, Jens Lehmann, John Keeping, Junio C Hamano,
	Git List, Linus Torvalds

On Mon, Apr 8, 2013 at 9:06 PM, Ramkumar Ramachandra <artagnon@gmail.com> wrote:
> Duy Nguyen wrote:
>> Probably off-topic, but I'm starting to find ".gitignore can be found
>> in every directory" a burden to day-to-day git operations. So imo it's
>> not done right entirely ;-)
>
> Why is it a burden?  I would argue that the tooling support is not yet
> there, but git check-ignore is a step in the right direction.  What
> alternate design would you propose, just out of curiosity?

You don't know if .gitignore is there, so you need to check for it in
every directory. If we fixed its location (e.g. worktree's top) we
would not have to look in every directory. Then again it may be a bit
inconvenient that way. If you remove a directory, you also remove
.gitignore rules inside when you distribute .gitignore files.
Otherwise you need to clean up top .gitignore once in a while.

> 1. The inspiration for fixing fundamental design problems comes from
> my redesign.  For instance, I would've never discovered the git add
> bug if I'd not attempted to git add (as opposed to the unnatural
> abstraction that git submodule add presents).

I actually spotted a similar use of git-add in the test suite [1]. You
see, it's a bug that should be fixed but in that particular case, it's
valid to add something "inside a submodule". I wanted to fix that with
my read_directory rewrite (part of the pathspec stuff) but never got
around to finish it and eventually gave up, which leads to your next
point..

> 2. I think it is absolutely imperative that we do the redesign now,
> before we've descended too far into the madness that the current
> design is.  I think I'm capable of doing the redesign now, with some
> help and support from the list.  My attitude doesn't align with the
> "I'm feeling lazy; why don't we postpone it?" argument.  Let's finish
> what I started now: I'm more than willing to dedicate the next few
> months full-time towards finishing this and getting it merged.

Good luck. Bug such a big work usually requires more than one
volunteer. If you haven't convinced (*) the community it's right,
maybe you should take a few days thinking about it again before
implementing.

[1] http://thread.gmane.org/gmane.comp.version-control.git/177454
(*) just a feeling after a quick glance, I may be terribly wrong again
--
Duy

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 11:29                                                 ` Duy Nguyen
@ 2013-04-08 11:53                                                   ` Ramkumar Ramachandra
  2013-04-08 15:06                                                     ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-08 11:53 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jonathan Nieder, Jens Lehmann, John Keeping, Junio C Hamano,
	Git List, Linus Torvalds

Duy Nguyen wrote:
> Good luck. Bug such a big work usually requires more than one
> volunteer. If you haven't convinced (*) the community it's right,
> maybe you should take a few days thinking about it again before
> implementing.

Yes, I'm thinking about it before rushing in to implement it.

There will always be resistance to change, especially when it involves
a change that breaks a working implementation.  If anything, the
resistance is only going to get worse with time, as people pile more
and more hacks on top of the current submodule implementation.  I say:
do it now, before we lose steam.

As far as I can tell, I'm completely unbiased: I have no vested
interests in either implementation, and I just want to see the best
implementation win.  My conviction in the new approach has only
strengthened after discussions on this thread: there must be some
reason for that, no?

Frankly, I was hoping that atleast one or two people on the thread
would take my side of the argument (or atleast tell me that I'm not
deranged), but that hasn't happened.  Nevertheless, I hope to convince
more people by doing more work and posting a beautifully working
implementation.

I'm already prepared for the worst case: I'll be forced to dump all my
work and be disappointed with the git community.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08  8:37                                         ` Jonathan Nieder
  2013-04-08  9:14                                           ` Ramkumar Ramachandra
@ 2013-04-08 14:46                                           ` Junio C Hamano
  2013-04-08 17:12                                             ` Junio C Hamano
  1 sibling, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-08 14:46 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ramkumar Ramachandra, Jens Lehmann, John Keeping, Git List,
	Linus Torvalds

Jonathan Nieder <jrnieder@gmail.com> writes:

[snipped everything I agree with...]

> On the other hand, the single .gitmodules file will be a pain to merge
> if multiple branches modify it.  So I do look forward to a merge
> strategy that deals more intelligently with its content, and wouldn't
> have minded a design that split this information into multiple files
> if we were starting over.

I find it a sensible suggestion to have a content-aware merge
driver.  Such a custom merge driver to help merging a structured
datafile in the config format will have other uses when we need to
do more than the current system (outside submodules there will be
other things "frotz" that need "information about frotz" in the
future, and a .gitfrotz file would be one possible way to do so).

I do not think it needs to be split per-submodule.

When a submodule in the common ancestor was at path dirA/, and you
are merging with another branch that moved it to path dirB/, the
contents of .gitmodules file for that module (that is identified by
its <name>) will need a three-way merge of its .path element:

    common ancestor:    submodule.<name>.path = dirA/
    ours:               submodule.<name>.path = dirA/
    theirs:             submodule.<name>.path = dirB/

And your content-aware merge driver should be able to do the
resolving by following the usual three-way merge rules.  We started
from the same dirA/ and only they changed, so the result is dirB/.

By the way, that's a "merge driver" (which deals with per-path
content merge), not a strategy (which deals with the entire tree
level merge).

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 11:53                                                   ` Ramkumar Ramachandra
@ 2013-04-08 15:06                                                     ` Junio C Hamano
  2013-04-08 16:08                                                       ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-08 15:06 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Duy Nguyen, Jonathan Nieder, Jens Lehmann, John Keeping,
	Git List, Linus Torvalds

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> As far as I can tell, I'm completely unbiased: I have no vested
> interests in either implementation,...
> ...
> Frankly, I was hoping that atleast one or two people on the thread
> would take my side of the argument (or atleast tell me that I'm not
> deranged), but that hasn't happened.

Aren't these two quite contradicting?

After listening to what others tell one with an unbiased mind and
finding that nobody agrees with what one initially proposed, an
unbiased person would step back, take a deep breath and think again,
before insulting the intelligence of others with a "dissapointed",
like this:

> I'm already prepared for the worst case: I'll be forced to dump all my
> work and be disappointed with the git community.

Would it be possible that (at least some part of, or possibly all
of) your ideas had some merit, but with all your hostility against
the current system and the work that went behind it, you did not
communicate well enough to make others understand you?

What I found very hard to read in this thread was that your messages
all went like this:

 1. In the current system, I have to be at the top level of a
    submodule to work in it (or some other problems).

 2. I will fix it in a more "elegant" way.

 3. I have to have a new object at the submodule path, not the
    current "submodule is a commit bound at the submodule path, and
    information about the submodule is in .gitmodules".

There was very little concrete explanation on how #3 leads to #2,
i.e. the overall design of your new system and how it will work,
other than you would read what we currently write in .gitmodules
from a new kind of object.

When an alternative solution was suggested, all your responses were
full of subjective "inelegant" and "ugly", and at least I couldn't
read much substance in it (here, the usual me might add "maybe
others differ", but after reading this thread, I strongly suspect
that others share this problem).

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH] t3700 (add): add failing test for add with submodules
  2013-04-08 11:04       ` Duy Nguyen
@ 2013-04-08 15:07         ` Junio C Hamano
  0 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2013-04-08 15:07 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Ramkumar Ramachandra, Jonathan Nieder, Git List, Linus Torvalds

Duy Nguyen <pclouds@gmail.com> writes:

> On Mon, Apr 8, 2013 at 8:26 PM, Ramkumar Ramachandra <artagnon@gmail.com> wrote:
>> +test_expect_failure 'git add should not go past submodule boundaries' '
>> +       mkdir submodule_dir &&
>> +       (
>> +               cd submodule_dir &&
>> +               git init &&
>> +               cat >foo <<-\EOF &&
>> +               Some content
>> +               EOF
>> +               git add foo &&
>> +               git commit -a -m "Add foo"
>> +       ) &&
>> +       git add submodule_dir/foo
>
> Thanks. I yhink the last line above should be "test_must_fail git add
> ...". I'm half way of fixing it (I think treat_leading_directory is a
> bit loose). Will continue tomorrow (or this weekend, depending on
> $DAYJOB)

This is a good thing to test (to make sure "add" fails, as you
pointed out). Can you include a fixed one in your series?

Thanks.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 15:06                                                     ` Junio C Hamano
@ 2013-04-08 16:08                                                       ` Ramkumar Ramachandra
  2013-04-08 18:10                                                         ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-08 16:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Jonathan Nieder, Jens Lehmann, John Keeping,
	Git List, Linus Torvalds

Junio C Hamano wrote:
> Would it be possible that (at least some part of, or possibly all
> of) your ideas had some merit, but with all your hostility against
> the current system and the work that went behind it, you did not
> communicate well enough to make others understand you?

Agreed.  My annoyance with the current system did go a little
overboard, and I've been having a splitting headache for the last few
days.

> What I found very hard to read in this thread was that your messages
> all went like this:
>
>  1. In the current system, I have to be at the top level of a
>     submodule to work in it (or some other problems).
>
>  2. I will fix it in a more "elegant" way.
>
>  3. I have to have a new object at the submodule path, not the
>     current "submodule is a commit bound at the submodule path, and
>     information about the submodule is in .gitmodules".
>
> There was very little concrete explanation on how #3 leads to #2,
> i.e. the overall design of your new system and how it will work,
> other than you would read what we currently write in .gitmodules
> from a new kind of object.

I had no way of expressing what I wanted to do except by writing code
when I started off this thread, but am in much better shape now.  Let
me try to explain my fundamental assumptions and code in a concise way
now.

1. Having a toplevel .gitmodules means that any git-core command like
add/ rm/ mv will be burdened with looking for the .gitmodules at the
toplevel of the worktree and editing it appropriately along with
whatever it was built to do (ie. writing to the index and committing
it).  This is highly unnatural.  Putting the information in link
objects means that we get a more natural UI + warts like
cd-to-toplevel disappear with no extra code.

2. If we want to make git-submodule a part of git-core (which I think
everyone agrees with), we will need to make the information in
.gitmodules available more easily to the rest of git-core.  One way to
do it without breaking anything is to unpack the root tree, look for
an entry with the path .gitmodules and handle it different from other
blobs: ie. parse it into structured data that the rest of git-core can
consume.  However, I think it is very gross as the blob is not
inherently special in any way: it's just incidentally stored at a
specific tree path.  The alternative is to have an inherently special
kind of blob (ie. link object).  In the git-core code, I can simply
match for a link object and operate on it accordingly.  As opposed to
matching a blob object, and its tree path.  Moreover, this means that
the user can simply git edit-link <link> from anywhere in the worktree
instead of having to refer to the appropriate section in the toplevel
.gitmodules.

3. Currently diffing/ merging one huge .gitmodules file is a mess, as
it doesn't have to conform to a strict format.  This means that I can
get conflicts between these two:

    url = gh:artagnon/clayoven
    url =gh:artagnon/clayoven

Moreover, since the fields are not ordered, a simple reordering of the
fields will cause a merge conflict.  The correct way to fix this is to
split up .gitmodules into many logical files, have a git
edit-gitmodules which reduces user input to a strict format, and then
write custom diff/merge drivers.  My proposal involves having a git
edit-link, and teaching git-core to diff/merge appropriately.  The
information is already in logical bits.

4. The only seeming disadvantage of not having a file accessible via
the filesystem is that it doesn't behave like a full blob.  But it
does; the code to unstage a link object (emulation) is actually very
simple: I'm currently writing it.

5. Having a first-class link object comes with functional advantages.
It means that I can have a ref pointing to link objects and easily
initialize a nested repository without having to initialize the
containing repository (ie. essentially replacing repo). We can have
true floating submodules, which is really nice in my opinion: you can
fix a library at v3.1 and switch it to v3.2 at some point in the
future without using ugly SHA-1 hexes anywhere.

6. While it is possible to work top-down from the current system, that
approach is clearly taking too long and is too painful.  This explains
why submodules haven't come a long way in the last five years.  With
my approach, I'm trying to make life simpler for everyone: it will
suddenly become much easier to hack on submodules, and it can improve
more rapidly over the next five years.  I'm not thinking about
short-term fixes precisely for this reason: the long-term goal is
worth a little bit of short-term inconvenience.

7. I estimate that replacing the current submodule system without
feature regressions will not take a lot of effort and can be done with
minimal breakages.  It's not a lot of code or anything very complex.
We just have to follow along the lines of how git-core handles blobs,
and write a little bit of code to make links behave like blobs (I'm
halfway done with this already).

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
       [not found]                 ` <CAP8UFD16gwWjE7T75D7kUM-VOXhtZaSRGtEg8fW5kmuKDLTQHQ@mail.gmail.com>
@ 2013-04-08 17:04                   ` Ramkumar Ramachandra
  0 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-08 17:04 UTC (permalink / raw)
  To: Christian Couder; +Cc: Junio C Hamano, Git List, Linus Torvalds

Christian Couder wrote:
> What if instead of a git submodule I want to have an hg, or, God/Linus/deity
> forbid, an SVN submodule, inside my git worktree?
> What if I just want a very big movie or .tgz downloaded from somewhere else?

Since the link object is rooted to the tree, it's impossible to have
anything but a working copy in the link directory.  How can I have a
non-git-worktree link directory without breaking checkout?  I think
that making it too generic will make the entire submodule experience
suffer, because the implementation must be coded according to the
lowest-common-denominator.  This is the mistake that the tool mr
makes: since it's so generic, it can't provide very powerful
functionality specifically for git repositories.

I'll try to think of something else.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 14:46                                           ` Junio C Hamano
@ 2013-04-08 17:12                                             ` Junio C Hamano
  0 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2013-04-08 17:12 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ramkumar Ramachandra, Jens Lehmann, John Keeping, Git List,
	Linus Torvalds

Junio C Hamano <gitster@pobox.com> writes:

> Jonathan Nieder <jrnieder@gmail.com> writes:
>
> [snipped everything I agree with...]
>
>> On the other hand, the single .gitmodules file will be a pain to merge
>> if multiple branches modify it.  So I do look forward to a merge
>> strategy that deals more intelligently with its content, and wouldn't
>> have minded a design that split this information into multiple files
>> if we were starting over.
>
> I find it a sensible suggestion to have a content-aware merge
> driver.  Such a custom merge driver to help merging a structured
> datafile in the config format will have other uses when we need to
> do more than the current system (outside submodules there will be
> other things "frotz" that need "information about frotz" in the
> future, and a .gitfrotz file would be one possible way to do so).
>
> I do not think it needs to be split per-submodule.

Another thing to think about is what to do when/if we want to
express "this is the default that applies to all submodules".  For
example, a superproject that binds multiple submodules may want to
say "When on this branch, make all submodules also on 'next'".

With a unified single place that holds information about all
submodules, it is trivial to add a "default" section, perhaps like
this:

	[default]
		branch = next
	[submodule "framework"]
		url = ...
                path = framework
	[submodule "common"]
		url = ...
                path = common
                branch = master ;# regardless of other modules...

on top of the "submodule.<name>.branch" mechanism for floating
checkout (the "default" is of course not limited to "branch" but
applies in general).

It is not obvious where such a "default" piece should go once you
start splitting these into per-submodule files, be it a separate but
still in-tree file that is different from the submodule it desribes,
or a blob-like object that sits at the path for the submodule in the
tree and in the index as Ram wants to do (as I kept saying, the
storage mechanism is not fundamental).

This is similar to why .gitattributes is easy to work with, I think.
You can describe the information about paths in that file (which
lives at a place different from the paths that are described), and
you can have a catch-all rule in it.

This is a tangent, but you could build a system that attaches
attributes to individual paths and hide the attributes from the
working tree filesystem (think: svn:blah) and have a set of special
commands (think: svn propset, proplist, etc.) to work with them, and
that is an equally valid way to implement attributes (it does not
make .gitattributes less valid way to do so, though).

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 16:08                                                       ` Ramkumar Ramachandra
@ 2013-04-08 18:10                                                         ` Junio C Hamano
  2013-04-08 19:03                                                           ` Ramkumar Ramachandra
  2013-04-09 11:51                                                           ` Jakub Narębski
  0 siblings, 2 replies; 140+ messages in thread
From: Junio C Hamano @ 2013-04-08 18:10 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Duy Nguyen, Jonathan Nieder, Jens Lehmann, John Keeping,
	Git List, Linus Torvalds

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> 1. Having a toplevel .gitmodules means that any git-core command like
> add/ rm/ mv will be burdened with looking for the .gitmodules at the
> toplevel of the worktree and editing it appropriately along with
> whatever it was built to do (ie. writing to the index and committing
> it).

Burdened is a subjective word.  What's bad about having a single place
you know you can read and find out information about things?  You
have to learn about them to do anything specific to them anyway.

> This is highly unnatural.

Unnatural is a subjective word, and there is no justification I see
here in your message.

> Putting the information in link
> objects means that we get a more natural UI + warts like
> cd-to-toplevel disappear with no extra code.

I do not see how "link objects" _means_ "natural UI", yet, without
an explanation how one leads to the other.

What does cd-to-toplevel have anything to do with it?  In case you
did not notice, all the core commands internally cd-to-toplevel and
carry the "prefix" information while doing so, and prepend the
prefix to user-supplied paths to find which path the user is talking
about.  So "cd to toplevel before starting to carry the operation out"
is a natural pattern inside Git.  As many people already told you,
"the user has to run 'git submodule' from the top-level of the
submodule working tree" is a simple oversight of the implementation.

> 2. If we want to make git-submodule a part of git-core (which I think
> everyone agrees with), we will need to make the information in
> .gitmodules available more easily to the rest of git-core.

Care to define "more easily" which is another subjective word?  The
.gitmodules file uses the bog-standard configuration format that can
be easily read with the config.c infrastructure.  It is a separate
matter that git_config() API is cumbersome to use, but improving it
would help not just .gitmodules but also the regular non-submodule
users of Git.  There is a topic in the works to read data in that
format from core Heiko is working on.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 18:10                                                         ` Junio C Hamano
@ 2013-04-08 19:03                                                           ` Ramkumar Ramachandra
  2013-04-08 19:48                                                             ` Junio C Hamano
  2013-04-09 11:51                                                           ` Jakub Narębski
  1 sibling, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-08 19:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Jonathan Nieder, Jens Lehmann, John Keeping,
	Git List, Linus Torvalds

Junio C Hamano wrote:
> Ramkumar Ramachandra <artagnon@gmail.com> writes:
>
>> 1. Having a toplevel .gitmodules means that any git-core command like
>> add/ rm/ mv will be burdened with looking for the .gitmodules at the
>> toplevel of the worktree and editing it appropriately along with
>> whatever it was built to do (ie. writing to the index and committing
>> it).
>
> Burdened is a subjective word.  What's bad about having a single place
> you know you can read and find out information about things?  You
> have to learn about them to do anything specific to them anyway.

"Burden" refers to the extra work of looking for a file in the
worktree, when this is completely unnecessary if you use a link
object.

>> This is highly unnatural.
>
> Unnatural is a subjective word, and there is no justification I see
> here in your message.

git's design follows along the lines of the UNIX philosophy: do one
thing, do it well.  git add/ rm/ mv have a very sharply defined task:
they first lock the index file, read the_index using read_cache(),
build a cache_entry struct using user-supplied data (this might
involve worktree code from dir.c to recurse subdirectories, for
example), write that cache_entry to the_index (removing existing
entries with cache_tree_invalidate_path() if necessary), and finally
write the_index to the index file, releasing the lock.  Would you
agree that any operation that doesn't follow along this line is
unnatural?  What then, does writing a special file in the worktree
(aka .gitmodules) have to do with this entire process?

Does git diff/ commit/ add/ rm or any other command you can think of
rely on a special file in the worktree (aka .gitmodules) to be checked
out?  Then why does git submodule require it?  Isn't this a
requirement that is inconsistent with the rest of git-core?

>> Putting the information in link
>> objects means that we get a more natural UI + warts like
>> cd-to-toplevel disappear with no extra code.
>
> I do not see how "link objects" _means_ "natural UI", yet, without
> an explanation how one leads to the other.

I should've said "means an easy route to get the existing UI to work
with little or no additional code".  Making the submodule information
available to git-core is precisely what leads to this.  In
index_path(), you can inject a case for S_IFDIR to write a link object
to the database, writing the sha1 to the supplied argument.  This is
not unnatural in any way, because we're just following along the lines
of the S_IFLNK codepath, which writes a blob object to the database.
Now index_path() is called by add_to_index(), which is the master
function for adding anything to the index.  Therefore, git add just
works.  git rm is much simpler: it calls remove_file_from_index()
which in turn calls cache_tree_invalidate() and
remove_index_entry_at().  Once the entry is removed from the cache,
our job is done.  The link object will be cleaned up at gc-time.  git
mv is just a combination of git rm and git add: it invalidates an
existing entry and adds a new one with a different name.

There is no special .gitmodules to take care of.

> What does cd-to-toplevel have anything to do with it?  In case you
> did not notice, all the core commands internally cd-to-toplevel and
> carry the "prefix" information while doing so, and prepend the
> prefix to user-supplied paths to find which path the user is talking
> about.  So "cd to toplevel before starting to carry the operation out"
> is a natural pattern inside Git.  As many people already told you,
> "the user has to run 'git submodule' from the top-level of the
> submodule working tree" is a simple oversight of the implementation.

Yes, I am aware.  I'm piggy-banking on the mature parts of git-core to
get functionality that I would otherwise have to write by hand.  The
current implementation needs to hand-code this, and hasn't done it yet
(presumably because it's non-trivial).

>> 2. If we want to make git-submodule a part of git-core (which I think
>> everyone agrees with), we will need to make the information in
>> .gitmodules available more easily to the rest of git-core.
>
> Care to define "more easily" which is another subjective word?  The
> .gitmodules file uses the bog-standard configuration format that can
> be easily read with the config.c infrastructure.  It is a separate
> matter that git_config() API is cumbersome to use, but improving it
> would help not just .gitmodules but also the regular non-submodule
> users of Git.  There is a topic in the works to read data in that
> format from core Heiko is working on.

This goes both ways: the information is both easier to read and write.
 I can easily create a link object from anywhere: index_path() or
cmd_edit_link().  To do this, I just have to call write_sha1_file()
with the buffer filled out and with the parameter link_type (which is
already defined).  To access the data in a link, I have to fill out a
tree_desc with "HEAD", an unpack_tree_opts with a custom callback, and
pass it to unpack_trees().  An example of a custom callback:
cmd_cat_link() which simply calls get_stat_data() to fill in the
SHA-1, and read_sha1_file() to read that object into a buffer.

I don't have to rely on a worktree with toplevel .gitmodules checked
out.  The information is easily readable/ writeable by default when
I'm working with git, irrespective of my the state of my worktree.

Why must we use the git_config() API when it was never designed to do
this?  Why not leverage the mature part of git that was intended for
this, where our new object fits in snugly?

Can you now present a counter-argument about why .gitmodules is
_better_ suited for the task of managing submodules?  (Hint: No, you
can't; because it isn't).  Can you give me an example of one thing
that will become more complex if we were to follow my approach (I've
already acknowledged foreach, so that's out)?  Do you have any
concrete objections to the new design, apart from the fact that it
breaks what already "works"?  Do you acknowledge that the new design
will remove a lot of existing complexity (if "complex" is too
subjective for you: result in a significant negative overall
diffstat)?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 19:03                                                           ` Ramkumar Ramachandra
@ 2013-04-08 19:48                                                             ` Junio C Hamano
  2013-04-08 19:54                                                               ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-08 19:48 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Duy Nguyen, Jonathan Nieder, Jens Lehmann, John Keeping,
	Git List, Linus Torvalds

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Does git diff/ commit/ add/ rm or any other command you can think of
> rely on a special file in the worktree (aka .gitmodules) to be checked
> out?

Try "git add foo~" with usual suspect in .gitignore ;-)

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 19:48                                                             ` Junio C Hamano
@ 2013-04-08 19:54                                                               ` Ramkumar Ramachandra
  2013-04-08 20:30                                                                 ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-08 19:54 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Jonathan Nieder, Jens Lehmann, John Keeping,
	Git List, Linus Torvalds

Junio C Hamano wrote:
> Ramkumar Ramachandra <artagnon@gmail.com> writes:
>
>> Does git diff/ commit/ add/ rm or any other command you can think of
>> rely on a special file in the worktree (aka .gitmodules) to be checked
>> out?
>
> Try "git add foo~" with usual suspect in .gitignore ;-)

First, it's not a hard requirement: in the worst case, git add will
add the file even without a -f.  Second, I've already argued about how
I think this is the right design: What part of that do you disagree
with?  What alternate design do you propose?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 19:54                                                               ` Ramkumar Ramachandra
@ 2013-04-08 20:30                                                                 ` Junio C Hamano
  2013-04-08 21:03                                                                   ` Ramkumar Ramachandra
  2013-04-08 21:59                                                                   ` Ramkumar Ramachandra
  0 siblings, 2 replies; 140+ messages in thread
From: Junio C Hamano @ 2013-04-08 20:30 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Duy Nguyen, Jonathan Nieder, Jens Lehmann, John Keeping,
	Git List, Linus Torvalds

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Junio C Hamano wrote:
>> Ramkumar Ramachandra <artagnon@gmail.com> writes:
>>
>>> Does git diff/ commit/ add/ rm or any other command you can think of
>>> rely on a special file in the worktree (aka .gitmodules) to be checked
>>> out?
>>
>> Try "git add foo~" with usual suspect in .gitignore ;-)
>
> First, it's not a hard requirement: in the worst case, git add will
> add the file even without a -f.

In the same sense .gitmodules is not a hard requirement, either.  I
use a submodule without .gitmodules in one of my repositories (the
top-level houses the source to generete my dotfiles and is cloned to
my environment at work, but the submodule houses my private files
that live only at home).  The gitlink entry in the index and the
tree and presence of the .git repository in the submodule checkout
(where it exists) is sufficient to make the layout work.

If your complaints were "I cannot make X work with the current
system, even with changes to git-submodule and some core part of the
system, and I think the reason is because the way module information
is stored is in a separate file .gitmodules", with a concrete X,
people who are more versed with the submodule subsystem may be able
to help you come up with a cleaner solution without throwing the
baby with the bathwater, but I do not think we saw any concrete X
mentioned.

The same sentence followed by "... and with an object of a new type
stored at the path of the submodule, I can make it work by doing A,
B and C", with concrete A, B and C, some people may be interested in
pursuing that avenue with you, but I do not think we saw such
combinations of <X, A, B, C> either.

If all of your argument starts from "I think .gitmodules is ugly
because it is not an object of a separate type stored at the path of
the submodule, and here are the reasons why I think it is ugly", I
have nothing more to say to you.  That "ugly" is at best skewed
aesthetics, and each and every example that comes up in this
discussion, like this "'git add' works with .gitignore", and the one
I sent on ".gitattributes vs .gitmodules on the default" in the
nearby subthread to Jonathan, makes me realize that .gitmodules is
_more_ in line with the rest of the system, not less.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 20:15                             ` Jens Lehmann
  2013-04-07 20:49                               ` Ramkumar Ramachandra
  2013-04-07 20:57                               ` Ramkumar Ramachandra
@ 2013-04-08 20:41                               ` Jens Lehmann
  2013-04-08 21:36                                 ` Jeff King
  2 siblings, 1 reply; 140+ messages in thread
From: Jens Lehmann @ 2013-04-08 20:41 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: John Keeping, Junio C Hamano, Git List, Linus Torvalds,
	Jonathan Nieder, Nguyen Thai Ngoc Duy, Heiko Voigt

Ok, here comes an updated version of our comparison list which
I updated with what I read in recent discussions. As I said
earlier, please speak up if I missed anything (or forgot to add
anyone to the CC).

I picked up one advantage ("no need to cd-to-toplevel to edit
.gitmodules) two new disadvantages ("foreach" and "default
submodule config") and retired one Ram showed a solution for
(the "unstaged gitlink").


Advantages:

* Information is stored in one place, no need to lookup stuff in
  another file/blob.

* Easier coding, as we find all information in a single object.

* No need to cd-to-toplevel to change configuration in the
  .gitmodules file, the special tools to edit link information
  will work in any subdirectory.

(We currently need a checked out work tree to access the
.gitmodules file, but there is ongoing work to read the
configuration directly from the database)

(While it is easier to merge the link object, a .gitmodules
aware merge driver would work just as well)


Disadvantages:

* Changes in user visible behavior, possible compatibility
  problems when Git versions are mixed.

* Special tools are needed to edit submodule information where
  currently a plain editor is sufficient and a standard format
  is used.

* merge conflicts are harder to resolve and require special git
  commands, solving them in .gitmodules is way more intuitive
  as users are already used to conflict markers.

* "git submodule foreach" becomes harder to implement

* With .gitmodules we lose a central spot where configuration
  concerning many submodules can be stored

(I think when we also put the submodule name in the object we
could also retain the ability to repopulated moved submodules
from their old repo, which is found by that name)

(That a link object can have no unstaged counterpart that a file
easily has can be fixed by special casing this, e.g. in using a
file in .git/link-specs/)


Hmm, while it is still too early to close the polls, it looks
to me as most advantages are about easier coding while most
disadvantages hit the user. That makes it more understandable
for me why Ram is so convinced of his approach and why on the
other hand submodule users like myself are rather sceptical. I
think we need some more advantages that users will directly
profit from, the cd-to-toplevel for .gitmodules is definitely
not enough to support the change Ram is proposing. What other
advantages are missing here?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 20:30                                                                 ` Junio C Hamano
@ 2013-04-08 21:03                                                                   ` Ramkumar Ramachandra
  2013-04-10  7:23                                                                     ` Philip Oakley
  2013-04-08 21:59                                                                   ` Ramkumar Ramachandra
  1 sibling, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-08 21:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Jonathan Nieder, Jens Lehmann, John Keeping,
	Git List, Linus Torvalds

This is going nowhere.  You're stuck at making the current submodule
system work, not answering my questions, diverting conversation,
repeatedly asking the same stupid questions, labelling everything that
I say "subjective", and refusing to look at the objective counterpart
(aka, the code).  It's clear to me that no matter how many more emails
I write, you're not going to concede.

I'm not interested in wasting any more of my time with this nonsense.

I give up.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH] t3700 (add): add failing test for add with submodules
  2013-04-08 10:26     ` [PATCH] t3700 (add): add failing test for add with submodules Ramkumar Ramachandra
  2013-04-08 11:04       ` Duy Nguyen
@ 2013-04-08 21:30       ` Jeff King
  2013-04-08 22:03         ` Junio C Hamano
                           ` (2 more replies)
  1 sibling, 3 replies; 140+ messages in thread
From: Jeff King @ 2013-04-08 21:30 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Duy Nguyen, Jonathan Nieder, Git List, Junio C Hamano, Linus Torvalds

On Mon, Apr 08, 2013 at 03:56:49PM +0530, Ramkumar Ramachandra wrote:

> git add currently goes past submodule boundaries.  Document this bug.

It's not just submodules, but we should not recurse into any
sub-repository. If I have an unrelated Meta/ repository, I should not be
able to "git add Meta/foo", whether I have used "git submodule" or not.

This topic came about 2 years ago, and I had a quick-and-dirty patch:

  http://thread.gmane.org/gmane.comp.version-control.git/170937/focus=171040

I do not recall anything about the patch at this point (i.e., whether it
was the right thing), but maybe it is a good starting point for somebody
to look into it.

> diff --git a/t/t3700-add.sh b/t/t3700-add.sh
> index 874b3a6..a1ea050 100755
> --- a/t/t3700-add.sh
> +++ b/t/t3700-add.sh
> @@ -310,4 +310,18 @@ test_expect_success 'git add --dry-run --ignore-missing of non-existing file out
>  	test_i18ncmp expect.err actual.err
>  '
>  
> +test_expect_failure 'git add should not go past submodule boundaries' '
> +	mkdir submodule_dir &&
> +	(
> +		cd submodule_dir &&
> +		git init &&
> +		cat >foo <<-\EOF &&
> +		Some content
> +		EOF
> +		git add foo &&
> +		git commit -a -m "Add foo"
> +	) &&
> +	git add submodule_dir/foo
> +'

That is not actually a submodule, but rather just a repo that happens to
be inside our working tree. I know the distinction is subtle, but
according to the thread I linked to above, we may actually treat paths
with gitlinked index entries separately already (I did not try it,
though).

-Peff

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 20:41                               ` Jens Lehmann
@ 2013-04-08 21:36                                 ` Jeff King
  0 siblings, 0 replies; 140+ messages in thread
From: Jeff King @ 2013-04-08 21:36 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Ramkumar Ramachandra, John Keeping, Junio C Hamano, Git List,
	Linus Torvalds, Jonathan Nieder, Nguyen Thai Ngoc Duy,
	Heiko Voigt

On Mon, Apr 08, 2013 at 10:41:57PM +0200, Jens Lehmann wrote:

> (While it is easier to merge the link object, a .gitmodules
> aware merge driver would work just as well)

I have not been following this thread that closely, so apologies if I
missed it, but one thing I have not seen mention of is how the extra
information inside the gitlink object will require extra merge effort.

Imagine I have two branches; one updates the submodule's commit pointer,
and the other updates some meta-information about the submodule (e.g.,
it points the URL to a new host). In the current system, one change goes
into .gitmodules, and the other goes into the gitlink path. In a new
combined object, there is a conflict and we must do content-level
merging on it (which presumably would be done with a specialized merge
driver).

So I think that in some cases .gitmodules creates more conflicts
(submodule A and submodule B are touched and have a textual conflict),
and sometimes the combined object would create more objects (you touch
two parts of the of the combined object). The solution in both cases is
a smarter merge driver that understands which parts semantically
conflict and which parts do not.

-Peff

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 20:30                                                                 ` Junio C Hamano
  2013-04-08 21:03                                                                   ` Ramkumar Ramachandra
@ 2013-04-08 21:59                                                                   ` Ramkumar Ramachandra
  1 sibling, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-08 21:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Jonathan Nieder, Jens Lehmann, John Keeping,
	Git List, Linus Torvalds

Junio C Hamano wrote:
> If all of your argument starts from "I think .gitmodules is ugly
> because it is not an object of a separate type stored at the path of
> the submodule, and here are the reasons why I think it is ugly", I
> have nothing more to say to you.

_This_ is how you summarize the seven points and the follow-up emails
I wrote out for you?  Seriously?

I enjoy good debate, and would've loved to be beaten in argument.  But
that's not what happened here.  I was just too frustrated with your
stupidity to continue.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH] t3700 (add): add failing test for add with submodules
  2013-04-08 21:30       ` Jeff King
@ 2013-04-08 22:03         ` Junio C Hamano
  2013-04-08 22:07           ` Jeff King
  2013-04-09  9:19         ` Ramkumar Ramachandra
  2013-04-09 11:43         ` Jakub Narębski
  2 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-08 22:03 UTC (permalink / raw)
  To: Jeff King
  Cc: Ramkumar Ramachandra, Duy Nguyen, Jonathan Nieder, Git List,
	Linus Torvalds

Jeff King <peff@peff.net> writes:

> On Mon, Apr 08, 2013 at 03:56:49PM +0530, Ramkumar Ramachandra wrote:
>
>> git add currently goes past submodule boundaries.  Document this bug.
>
> It's not just submodules, but we should not recurse into any
> sub-repository. If I have an unrelated Meta/ repository, I should not be
> able to "git add Meta/foo", whether I have used "git submodule" or not.
>
> This topic came about 2 years ago, and I had a quick-and-dirty patch:
>
>   http://thread.gmane.org/gmane.comp.version-control.git/170937/focus=171040
>
> I do not recall anything about the patch at this point (i.e., whether it
> was the right thing), but maybe it is a good starting point for somebody
> to look into it.
>
>> diff --git a/t/t3700-add.sh b/t/t3700-add.sh
>> index 874b3a6..a1ea050 100755
>> --- a/t/t3700-add.sh
>> +++ b/t/t3700-add.sh
>> @@ -310,4 +310,18 @@ test_expect_success 'git add --dry-run --ignore-missing of non-existing file out
>>  	test_i18ncmp expect.err actual.err
>>  '
>>  
>> +test_expect_failure 'git add should not go past submodule boundaries' '
>> +	mkdir submodule_dir &&
>> +	(
>> +		cd submodule_dir &&
>> +		git init &&
>> +		cat >foo <<-\EOF &&
>> +		Some content
>> +		EOF
>> +		git add foo &&
>> +		git commit -a -m "Add foo"
>> +	) &&
>> +	git add submodule_dir/foo
>> +'
>
> That is not actually a submodule, but rather just a repo that happens to
> be inside our working tree. 

I think we should treat it as a submodule-to-be, waiting for the
user to run "git add submodule_dir".

If it is a file in the working tree of an unrelated and separate
repository, it still is wrong to allow it to be added to our
repository, no?

If we had "git add submodule_dir" before the last "git add", as you
pointed out, we should already error out.

> I know the distinction is subtle, but according to the thread I
> linked to above, we may actually treat paths with gitlinked index
> entries separately already (I did not try it, though).

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH] t3700 (add): add failing test for add with submodules
  2013-04-08 22:03         ` Junio C Hamano
@ 2013-04-08 22:07           ` Jeff King
  0 siblings, 0 replies; 140+ messages in thread
From: Jeff King @ 2013-04-08 22:07 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ramkumar Ramachandra, Duy Nguyen, Jonathan Nieder, Git List,
	Linus Torvalds

On Mon, Apr 08, 2013 at 03:03:41PM -0700, Junio C Hamano wrote:

> >> +test_expect_failure 'git add should not go past submodule boundaries' '
> >> +	mkdir submodule_dir &&
> >> +	(
> >> +		cd submodule_dir &&
> >> +		git init &&
> >> +		cat >foo <<-\EOF &&
> >> +		Some content
> >> +		EOF
> >> +		git add foo &&
> >> +		git commit -a -m "Add foo"
> >> +	) &&
> >> +	git add submodule_dir/foo
> >> +'
> >
> > That is not actually a submodule, but rather just a repo that happens to
> > be inside our working tree. 
> 
> I think we should treat it as a submodule-to-be, waiting for the
> user to run "git add submodule_dir".
> 
> If it is a file in the working tree of an unrelated and separate
> repository, it still is wrong to allow it to be added to our
> repository, no?

Sorry if I wasn't clear; I absolutely think this test is checking
something reasonable, and we should fix it to pass. I was only referring
to the wording, which is misleading (and hoped that pointing it out
would help whoever works on it in the right direction of where the
problem is).

-Peff

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH] t3700 (add): add failing test for add with submodules
  2013-04-08 21:30       ` Jeff King
  2013-04-08 22:03         ` Junio C Hamano
@ 2013-04-09  9:19         ` Ramkumar Ramachandra
  2013-04-09  9:21           ` [PATCH 0/2] Fix git " Ramkumar Ramachandra
  2013-04-09 16:27           ` [PATCH] t3700 (add): add failing test for add with submodules Jeff King
  2013-04-09 11:43         ` Jakub Narębski
  2 siblings, 2 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-09  9:19 UTC (permalink / raw)
  To: Jeff King
  Cc: Duy Nguyen, Jonathan Nieder, Git List, Junio C Hamano, Linus Torvalds

Jeff King wrote:
> That is not actually a submodule, but rather just a repo that happens to
> be inside our working tree. I know the distinction is subtle, but
> according to the thread I linked to above, we may actually treat paths
> with gitlinked index entries separately already (I did not try it,
> though).

Agreed.  treat_gitlink() dies if there is a gitlink cache_entry
matching any of the pathspecs; it does one thing well, and promises
what it does: however, its core logic in check_path_for_gitlink() can
easily be moved into lstat_cache_matchlen() as that is more generic
(checks index and worktree).  die_if_path_beyond_symlink() is the
perfect example to replicate.  Today, there is one more caller of
die_if_path_beyond_symlink(): check-ignore, so we must patch that too.

On a slightly related note treat_path() also contains the logic for
checking for a git repository in the worktree.  Unfortunately, the
code cannot be reused because it checks for a '.git' in a dirent.

On the wording issue, a submodule is a submodule whether in-index or
otherwise.  I would write two different tests: one for in-worktree
submodule and another for in-index submodule, and name them
appropriately.  Does that make sense?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 0/2] Fix git add with submodules
  2013-04-09  9:19         ` Ramkumar Ramachandra
@ 2013-04-09  9:21           ` Ramkumar Ramachandra
  2013-04-09  9:21             ` [PATCH 1/2] t3700 (add): add two tests for testing " Ramkumar Ramachandra
  2013-04-09  9:21             ` [PATCH 2/2] add: refuse to add paths beyond repository boundaries Ramkumar Ramachandra
  2013-04-09 16:27           ` [PATCH] t3700 (add): add failing test for add with submodules Jeff King
  1 sibling, 2 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-09  9:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Duy Nguyen, Junio C Hamano, Git List

Hi,

git add has a bug when operating on submodules.  The following test
fails:

	mkdir submodule_dir &&
	(
		cd submodule_dir &&
		git init &&
		cat >foo <<-\EOF &&
		Some content
		EOF
		git add foo &&
		git commit -a -m "Add foo"
	) &&
	test_must_fail git add submodule_dir/foo

[1/2] adds this failing test along with a passing related test.
[2/2] fixes the failing test.

Ramkumar Ramachandra (2):
  t3700 (add): add two tests for testing add with submodules
  add: refuse to add paths beyond repository boundaries

 builtin/add.c  |  5 +++--
 cache.h        |  2 ++
 pathspec.c     | 12 ++++++++++++
 pathspec.h     |  1 +
 symlinks.c     | 43 +++++++++++++++++++++++++++++++++++++------
 t/t3700-add.sh | 32 ++++++++++++++++++++++++++++++++
 6 files changed, 87 insertions(+), 8 deletions(-)

-- 
1.8.2.1.347.gdd82260.dirty

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 1/2] t3700 (add): add two tests for testing add with submodules
  2013-04-09  9:21           ` [PATCH 0/2] Fix git " Ramkumar Ramachandra
@ 2013-04-09  9:21             ` Ramkumar Ramachandra
  2013-04-09  9:21             ` [PATCH 2/2] add: refuse to add paths beyond repository boundaries Ramkumar Ramachandra
  1 sibling, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-09  9:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Duy Nguyen, Junio C Hamano, Git List

The first test "git add should not go past gitlink boundaries" checks
that paths within a submodule added with 'git submodule add' cannot be
added to the index.  It passes because of treat_gitlink() in
builtin/add.c.

The second test "git add should not go past git repository boundaries"
checks that paths within a git repository in the worktree (not yet
added with 'git submodule add') cannot be added to the index.  It
fails because there is no existing code to check this.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 t/t3700-add.sh | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 874b3a6..1ad2331 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -310,4 +310,36 @@ test_expect_success 'git add --dry-run --ignore-missing of non-existing file out
 	test_i18ncmp expect.err actual.err
 '
 
+test_expect_success 'git add should not go past gitlink boundaries' '
+	rm -rf submodule_dir &&
+	mkdir submodule_dir &&
+	(
+		cd submodule_dir &&
+		git init &&
+		git config remote.origin.url "quux" &&
+		cat >foo <<-\EOF &&
+		Some content
+		EOF
+		git add foo &&
+		git commit -a -m "Add foo"
+	) &&
+	git submodule add ./submodule_dir &&
+	test_must_fail git add submodule_dir/foo
+'
+
+test_expect_failure 'git add should not go past git repository boundaries' '
+	rm -rf submodule_dir &&
+	mkdir submodule_dir &&
+	(
+		cd submodule_dir &&
+		git init &&
+		cat >foo <<-\EOF &&
+		Some content
+		EOF
+		git add foo &&
+		git commit -a -m "Add foo"
+	) &&
+	test_must_fail git add submodule_dir/foo
+'
+
 test_done
-- 
1.8.2.1.347.gdd82260.dirty

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09  9:21           ` [PATCH 0/2] Fix git " Ramkumar Ramachandra
  2013-04-09  9:21             ` [PATCH 1/2] t3700 (add): add two tests for testing " Ramkumar Ramachandra
@ 2013-04-09  9:21             ` Ramkumar Ramachandra
  2013-04-09 16:50               ` Jeff King
  2013-04-09 17:09               ` Junio C Hamano
  1 sibling, 2 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-09  9:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Duy Nguyen, Junio C Hamano, Git List

Currently, git add has the logic for refusing to add gitlinks using
treat_path(), which in turn calls check_path_for_gitlink().  However,
this only checks for an in-index submodule (or gitlink cache_entry).
A path inside a git repository in the worktree still adds fine, and
this is a bug.  The logic for denying it is very similar to denying
adding paths beyond symbolic links: die_if_path_beyond_symlink().
Follow its example and write a die_if_path_beyond_gitrepo() to fix
this bug.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 builtin/add.c  |  5 +++--
 cache.h        |  2 ++
 pathspec.c     | 12 ++++++++++++
 pathspec.h     |  1 +
 symlinks.c     | 43 +++++++++++++++++++++++++++++++++++++------
 t/t3700-add.sh |  2 +-
 6 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index ab1c9e8..1538129 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -155,8 +155,8 @@ static void refresh(int verbose, const char **pathspec)
 
 /*
  * Normalizes argv relative to prefix, via get_pathspec(), and then
- * runs die_if_path_beyond_symlink() on each path in the normalized
- * list.
+ * runs die_if_path_beyond_symlink() and die_if_path_beyond_repository()
+ * on each path in the normalized list.
  */
 static const char **validate_pathspec(const char **argv, const char *prefix)
 {
@@ -166,6 +166,7 @@ static const char **validate_pathspec(const char **argv, const char *prefix)
 		const char **p;
 		for (p = pathspec; *p; p++) {
 			die_if_path_beyond_symlink(*p, prefix);
+			die_if_path_beyond_gitrepo(*p, prefix);
 		}
 	}
 
diff --git a/cache.h b/cache.h
index e1e8ce8..987d7f3 100644
--- a/cache.h
+++ b/cache.h
@@ -962,6 +962,8 @@ struct cache_def {
 
 extern int has_symlink_leading_path(const char *name, int len);
 extern int threaded_has_symlink_leading_path(struct cache_def *, const char *, int);
+extern int has_gitrepo_leading_path(const char *name, int len);
+extern int threaded_has_gitrepo_leading_path(struct cache_def *, const char *, int);
 extern int check_leading_path(const char *name, int len);
 extern int has_dirs_only_path(const char *name, int len, int prefix_len);
 extern void schedule_dir_for_removal(const char *name, int len);
diff --git a/pathspec.c b/pathspec.c
index 284f397..142631d 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -99,3 +99,15 @@ void die_if_path_beyond_symlink(const char *path, const char *prefix)
 		die(_("'%s' is beyond a symbolic link"), path + len);
 	}
 }
+
+/*
+ * Dies if the given path refers to a file inside a directory with a
+ * git repository in it.
+ */
+void die_if_path_beyond_gitrepo(const char *path, const char *prefix)
+{
+	if (has_gitrepo_leading_path(path, strlen(path))) {
+		int len = prefix ? strlen(prefix) : 0;
+		die(_("'%s' is beyond a git repository"), path + len);
+	}
+}
diff --git a/pathspec.h b/pathspec.h
index db0184a..c201c7b 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -5,5 +5,6 @@ extern char *find_pathspecs_matching_against_index(const char **pathspec);
 extern void add_pathspec_matches_against_index(const char **pathspec, char *seen, int specs);
 extern const char *check_path_for_gitlink(const char *path);
 extern void die_if_path_beyond_symlink(const char *path, const char *prefix);
+extern void die_if_path_beyond_gitrepo(const char *path, const char *prefix);
 
 #endif /* PATHSPEC_H */
diff --git a/symlinks.c b/symlinks.c
index c2b41a8..e551dae 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -54,6 +54,7 @@ static inline void reset_lstat_cache(struct cache_def *cache)
 #define FL_LSTATERR (1 << 3)
 #define FL_ERR      (1 << 4)
 #define FL_FULLPATH (1 << 5)
+#define FL_GITREPO  (1 << 6)
 
 /*
  * Check if name 'name' of length 'len' has a symlink leading
@@ -142,8 +143,22 @@ static int lstat_cache_matchlen(struct cache_def *cache,
 			if (errno == ENOENT)
 				*ret_flags |= FL_NOENT;
 		} else if (S_ISDIR(st.st_mode)) {
-			last_slash_dir = last_slash;
-			continue;
+			/* Check to see if the directory contains a
+			   git repository */
+			struct stat st;
+			struct strbuf dotgitentry = STRBUF_INIT;
+			strbuf_addf(&dotgitentry, "%s/.git", cache->path);
+			if (lstat(dotgitentry.buf, &st) < 0) {
+				if (errno == ENOENT) {
+					strbuf_release(&dotgitentry);
+					last_slash_dir = last_slash;
+					continue;
+				}
+				*ret_flags = FL_LSTATERR;
+			}
+			else
+				*ret_flags = FL_GITREPO;
+			strbuf_release(&dotgitentry);
 		} else if (S_ISLNK(st.st_mode)) {
 			*ret_flags = FL_SYMLINK;
 		} else {
@@ -153,11 +168,11 @@ static int lstat_cache_matchlen(struct cache_def *cache,
 	}
 
 	/*
-	 * At the end update the cache.  Note that max 3 different
-	 * path types, FL_NOENT, FL_SYMLINK and FL_DIR, can be cached
-	 * for the moment!
+	 * At the end update the cache.  Note that max 4 different
+	 * path types: FL_NOENT, FL_SYMLINK, FL_GITREPO, and
+	 * FL_DIR.
 	 */
-	save_flags = *ret_flags & track_flags & (FL_NOENT|FL_SYMLINK);
+	save_flags = *ret_flags & track_flags & (FL_NOENT|FL_SYMLINK|FL_GITREPO);
 	if (save_flags && last_slash > 0 && last_slash <= PATH_MAX) {
 		cache->path[last_slash] = '\0';
 		cache->len = last_slash;
@@ -204,6 +219,14 @@ int threaded_has_symlink_leading_path(struct cache_def *cache, const char *name,
 }
 
 /*
+ * Return non-zero if path 'name' has a leading gitrepo component
+ */
+int threaded_has_gitrepo_leading_path(struct cache_def *cache, const char *name, int len)
+{
+	return lstat_cache(cache, name, len, FL_GITREPO|FL_DIR, USE_ONLY_LSTAT) & FL_GITREPO;
+}
+
+/*
  * Return non-zero if path 'name' has a leading symlink component
  */
 int has_symlink_leading_path(const char *name, int len)
@@ -212,6 +235,14 @@ int has_symlink_leading_path(const char *name, int len)
 }
 
 /*
+ * Return non-zero if path 'name' has a leading gitrepo component
+ */
+int has_gitrepo_leading_path(const char *name, int len)
+{
+	return threaded_has_gitrepo_leading_path(&default_cache, name, len);
+}
+
+/*
  * Return zero if path 'name' has a leading symlink component or
  * if some leading path component does not exists.
  *
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 1ad2331..4714734 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -327,7 +327,7 @@ test_expect_success 'git add should not go past gitlink boundaries' '
 	test_must_fail git add submodule_dir/foo
 '
 
-test_expect_failure 'git add should not go past git repository boundaries' '
+test_expect_success 'git add should not go past git repository boundaries' '
 	rm -rf submodule_dir &&
 	mkdir submodule_dir &&
 	(
-- 
1.8.2.1.347.gdd82260.dirty

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [PATCH] t3700 (add): add failing test for add with submodules
  2013-04-08 21:30       ` Jeff King
  2013-04-08 22:03         ` Junio C Hamano
  2013-04-09  9:19         ` Ramkumar Ramachandra
@ 2013-04-09 11:43         ` Jakub Narębski
  2013-04-09 11:54           ` Ramkumar Ramachandra
  2 siblings, 1 reply; 140+ messages in thread
From: Jakub Narębski @ 2013-04-09 11:43 UTC (permalink / raw)
  To: Jeff King
  Cc: Ramkumar Ramachandra, Duy Nguyen, Jonathan Nieder, Git List,
	Junio C Hamano, Linus Torvalds

W dniu 08.04.2013 23:30, Jeff King pisze:
> On Mon, Apr 08, 2013 at 03:56:49PM +0530, Ramkumar Ramachandra wrote:
> 
>> git add currently goes past submodule boundaries.  Document this bug.
> 
> It's not just submodules, but we should not recurse into any
> sub-repository. If I have an unrelated Meta/ repository, I should not be
> able to "git add Meta/foo", whether I have used "git submodule" or not.
> 
> This topic came about 2 years ago, and I had a quick-and-dirty patch:
> 
>   http://thread.gmane.org/gmane.comp.version-control.git/170937/focus=171040
> 
> I do not recall anything about the patch at this point (i.e., whether it
> was the right thing), but maybe it is a good starting point for somebody
> to look into it.

Hmmm... I used to do (and still do) such not-recommended thing,
i.e. keeping git/gitweb/TODO etc. in git/gitweb/.git repository,
while having git/gitweb/gitweb.perl in git/.git repository.

So my (admittedly strange) setup will stop working?

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 18:10                                                         ` Junio C Hamano
  2013-04-08 19:03                                                           ` Ramkumar Ramachandra
@ 2013-04-09 11:51                                                           ` Jakub Narębski
  1 sibling, 0 replies; 140+ messages in thread
From: Jakub Narębski @ 2013-04-09 11:51 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ramkumar Ramachandra, Duy Nguyen, Jonathan Nieder, Jens Lehmann,
	John Keeping, Git List, Linus Torvalds

Junio C Hamano wrote:
> Ramkumar Ramachandra wrote:

>> 2. If we want to make git-submodule a part of git-core (which I think
>>    everyone agrees with), we will need to make the information in
>>    .gitmodules available more easily to the rest of git-core.

> Care to define "more easily" which is another subjective word?  The
> .gitmodules file uses the bog-standard configuration format that can
> be easily read with the config.c infrastructure.  It is a separate
> matter that git_config() API is cumbersome to use, but improving it
> would help not just .gitmodules but also the regular non-submodule
> users of Git.  There is a topic in the works to read data in that
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> format from core Heiko is working on.
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

BTW. this is something that I was missing to implement better
submodule support in gitweb (and thus git-instaweb) than just
marking it as submodule in 'tree' view.

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH] t3700 (add): add failing test for add with submodules
  2013-04-09 11:43         ` Jakub Narębski
@ 2013-04-09 11:54           ` Ramkumar Ramachandra
  2013-04-09 13:49             ` Jakub Narębski
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-09 11:54 UTC (permalink / raw)
  To: Jakub Narębski
  Cc: Jeff King, Duy Nguyen, Jonathan Nieder, Git List, Junio C Hamano,
	Linus Torvalds

Jakub Narębski wrote:
> Hmmm... I used to do (and still do) such not-recommended thing,
> i.e. keeping git/gitweb/TODO etc. in git/gitweb/.git repository,
> while having git/gitweb/gitweb.perl in git/.git repository.

Why don't you put the gitweb/TODO in a different branch in the git.git
repository?  Why do you feel the need to have two different
repositories tracking different files in the same path?

Just out of curiosity, how does stuff work with your setup?  Does the
worktree gitweb/ belong to your gitweb.git repository or git.git
repository?  How do half the git commands work?  For example, won't
git clean -dfx remove the files tracked by your other repository?
Will a conflicting checkout not stomp files tracked by the other
repository?  How are worktree-rules like .gitignore applied?

> So my (admittedly strange) setup will stop working?

Yes.  I would persuade you not to use such a gross setup; this is not
what git was intended for at all.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH] t3700 (add): add failing test for add with submodules
  2013-04-09 11:54           ` Ramkumar Ramachandra
@ 2013-04-09 13:49             ` Jakub Narębski
  0 siblings, 0 replies; 140+ messages in thread
From: Jakub Narębski @ 2013-04-09 13:49 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Jeff King, Duy Nguyen, Jonathan Nieder, Git List, Junio C Hamano,
	Linus Torvalds

On 09.04.2013, Ramkumar Ramachandra wrote:
> Jakub Narębski wrote:

>> Hmmm... I used to do (and still do) such not-recommended thing,
>> i.e. keeping git/gitweb/TODO etc. in git/gitweb/.git repository,
>> while having git/gitweb/gitweb.perl in git/.git repository.
> 
> Why don't you put the gitweb/TODO in a different branch in the git.git
> repository?  Why do you feel the need to have two different
> repositories tracking different files in the same path?

It is not only gitweb/TODO.  If it was only that file, I could have
used 'gitweb/todo' branch for it, or something.  Though I would be
missing having it beside gitweb.perl, and having easy access to it
during work on gitweb.perl

I want to have various files that I use when working on gitweb.perl
(but should not and would not be in gitweb subsystem in git.git
repository) to be under version control, and be side-by-side near
gitweb.perl.

These are:
* gitweb/TODO - TODO file for gitweb (personal, not often updated),
  and similar gitweb/gitwebs-whats-cooking.txt

  Might be put into 'gitweb/todo/TODO' file and 'gitweb/todo/.git'
  private repository (and perhaps 'gitweb/todo' branch of my clone
  of git.git repository).

* various *_test.perl files, where I test features to be possibly
  put into gitweb, like e.g. chop_str_test.perl or test_find_forks.perl
  (or similar benchmark_find_forks.perl)

* private personal configuration files for testing its output, like
  gitweb/gitweb_config.perl and gitweb/magic.txt !!!

  Those are very much required to reside beside gitweb/gitweb.perl
  because of default GITWEB_CONFIG value.  With those I can simply
  run current gitweb/gitweb.perl (sic!) from its directory while
  I am working on it.

> Just out of curiosity, how does stuff work with your setup?  Does the
> worktree gitweb/ belong to your gitweb.git repository or git.git
> repository?

The 'git/gitweb/' worktree belong to both repositories (it is 'gitweb/'
in git.git clone i.e. git/.git, and it is top dir of git/gitweb/.git
repository).

> How do half the git commands work?  For example, won't
> git clean -dfx remove the files tracked by your other repository?

They work, somewhat and with some care.  I don't use "git clean"
for example.

> Will a conflicting checkout not stomp files tracked by the other
> repository?  How are worktree-rules like .gitignore applied?

'git/gitweb/.gitignore' belong to 'git/gitweb/.git' repository
and is used to ignore 'git/.git' files (with the intent of marking
them untracked *precious*).  I could have used 'info/exclude'
here because this repository is for the time being private.

As gitweb subsystem in git.git is quite stable, untracked but
existing 'gitweb/.gitignore' doesn't usually matter, as all files
in 'gitweb/' are already tracked.  Besides I can always use
"git add -f" for adding to git.git if necessary (e.g. when splitting
gitweb.js etc.).

>> So my (admittedly strange) setup will stop working?
> 
> Yes.  I would persuade you not to use such a gross setup; this is not
> what git was intended for at all.

Why not?

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH] t3700 (add): add failing test for add with submodules
  2013-04-09  9:19         ` Ramkumar Ramachandra
  2013-04-09  9:21           ` [PATCH 0/2] Fix git " Ramkumar Ramachandra
@ 2013-04-09 16:27           ` Jeff King
  1 sibling, 0 replies; 140+ messages in thread
From: Jeff King @ 2013-04-09 16:27 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Duy Nguyen, Jonathan Nieder, Git List, Junio C Hamano, Linus Torvalds

On Tue, Apr 09, 2013 at 02:49:24PM +0530, Ramkumar Ramachandra wrote:

> On the wording issue, a submodule is a submodule whether in-index or
> otherwise.  I would write two different tests: one for in-worktree
> submodule and another for in-index submodule, and name them
> appropriately.  Does that make sense?

Yeah, that makes perfect sense to me.

-Peff

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09  9:21             ` [PATCH 2/2] add: refuse to add paths beyond repository boundaries Ramkumar Ramachandra
@ 2013-04-09 16:50               ` Jeff King
  2013-04-09 17:09               ` Junio C Hamano
  1 sibling, 0 replies; 140+ messages in thread
From: Jeff King @ 2013-04-09 16:50 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Duy Nguyen, Junio C Hamano, Git List

On Tue, Apr 09, 2013 at 02:51:37PM +0530, Ramkumar Ramachandra wrote:

> Currently, git add has the logic for refusing to add gitlinks using
> treat_path(), which in turn calls check_path_for_gitlink().  However,
> this only checks for an in-index submodule (or gitlink cache_entry).
> A path inside a git repository in the worktree still adds fine, and
> this is a bug.  The logic for denying it is very similar to denying
> adding paths beyond symbolic links: die_if_path_beyond_symlink().
> Follow its example and write a die_if_path_beyond_gitrepo() to fix
> this bug.

Thanks for working on this.

I think the direction is a good one. It does disallow Jakub's crazy
shared-directory setup. I am not too sad to see that disallowed, at
least by default, because there are so many ways to screw yourself while
using it if you are not careful (I tried something similar once, and
gave up because I kept running into problematic cases).

I am not opposed to having an escape hatch to operate in that mode, but
it should be triggered explicitly so it doesn't catch users unaware.

> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
> ---
>  builtin/add.c  |  5 +++--
>  cache.h        |  2 ++
>  pathspec.c     | 12 ++++++++++++
>  pathspec.h     |  1 +
>  symlinks.c     | 43 +++++++++++++++++++++++++++++++++++++------
>  t/t3700-add.sh |  2 +-
>  6 files changed, 56 insertions(+), 9 deletions(-)

I am not super-familiar with this part of the code, but having worked on
it once two years ago for the same problem, your solution looks like the
right thing.

> @@ -142,8 +143,22 @@ static int lstat_cache_matchlen(struct cache_def *cache,
>  			if (errno == ENOENT)
>  				*ret_flags |= FL_NOENT;
>  		} else if (S_ISDIR(st.st_mode)) {
> -			last_slash_dir = last_slash;
> -			continue;
> +			/* Check to see if the directory contains a
> +			   git repository */
> +			struct stat st;
> +			struct strbuf dotgitentry = STRBUF_INIT;
> +			strbuf_addf(&dotgitentry, "%s/.git", cache->path);

Can we use mkpath here to avoid an allocation? Or even better,
cache->path is PATH_MAX+1 bytes, and we munge it earlier in the
function. Can we just check the length and stick "/.git" on the end?

> +			if (lstat(dotgitentry.buf, &st) < 0) {
> +				if (errno == ENOENT) {
> +					strbuf_release(&dotgitentry);
> +					last_slash_dir = last_slash;
> +					continue;
> +				}
> +				*ret_flags = FL_LSTATERR;
> +			}
> +			else
> +				*ret_flags = FL_GITREPO;
> +			strbuf_release(&dotgitentry);

In my original patch long ago, Junio asked if we should be checking
is_git_directory() when we find a ".git" entry, to make sure it is not a
false positive. I don't have a strong opinion either way, but if we do
that, we would possibly want to update treat_path to do the same thing
for consistency.

-Peff

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09  9:21             ` [PATCH 2/2] add: refuse to add paths beyond repository boundaries Ramkumar Ramachandra
  2013-04-09 16:50               ` Jeff King
@ 2013-04-09 17:09               ` Junio C Hamano
  2013-04-09 17:34                 ` Junio C Hamano
  1 sibling, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-09 17:09 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jeff King, Duy Nguyen, Git List

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Currently, git add has the logic for refusing to add gitlinks using
> treat_path(), which in turn calls check_path_for_gitlink().  However,
> this only checks for an in-index submodule (or gitlink cache_entry).
> A path inside a git repository in the worktree still adds fine, and
> this is a bug.  The logic for denying it is very similar to denying
> adding paths beyond symbolic links: die_if_path_beyond_symlink().
> Follow its example and write a die_if_path_beyond_gitrepo() to fix
> this bug.
>
> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
> ---

> @@ -166,6 +166,7 @@ static const char **validate_pathspec(const char **argv, const char *prefix)
>  		const char **p;
>  		for (p = pathspec; *p; p++) {
>  			die_if_path_beyond_symlink(*p, prefix);
> +			die_if_path_beyond_gitrepo(*p, prefix);
>  		}
>  	}
> diff --git a/cache.h b/cache.h
> index e1e8ce8..987d7f3 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -962,6 +962,8 @@ struct cache_def {
>  
>  extern int has_symlink_leading_path(const char *name, int len);
> +extern int has_gitrepo_leading_path(const char *name, int len);

I looked at the output from "grep has_symlink_leading_path" and also
for "die_if_path_beyond"; all of these places are checking "I have
this multi-level path; I want to know if the path does not (should
not) be part of the current project", I think.  Certainly the one in
the "update-index" is about the same operation as "git add" you are
patching.

Isn't it a better approach to _rename_ the existing function not to
single out "symlink"-ness of the path first ?  A symlink in the
middle of such a multi-level path that leads to a place outside the
project is _not_ the only way to step out of our project boundary.  A
directory in the middle of a multi-level path that is the top-level
of the working tree of a foreign project is another way to step out
of our project boundary.  Perhaps

	die_if_path_outside_our_project()
        path_outside_our_project()

And then update the implementation of path_outside_our_project(),
which only took "symlink in the middle" into account so far, and
teach it that such a "top-level of the working tree of a foreign
project" is also stepping out of our project?

That way, you do not have to settle on fixing the bug only in "git
add" and keep the bug in "git update-index", I think.

I think the hit in builtin/apply.c deals with the same "beyond
symlink is outside our project" check and can be updated like so.  I
didn't look at the ones in diff-lib.c and dir.c so you may want to
double check on what they use it for.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 17:09               ` Junio C Hamano
@ 2013-04-09 17:34                 ` Junio C Hamano
  2013-04-09 17:41                   ` Ramkumar Ramachandra
                                     ` (2 more replies)
  0 siblings, 3 replies; 140+ messages in thread
From: Junio C Hamano @ 2013-04-09 17:34 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jeff King, Duy Nguyen, Git List

Junio C Hamano <gitster@pobox.com> writes:

> Ramkumar Ramachandra <artagnon@gmail.com> writes:
>
>> Currently, git add has the logic for refusing to add gitlinks using
>> treat_path(), which in turn calls check_path_for_gitlink().  However,
>> this only checks for an in-index submodule (or gitlink cache_entry).
>> A path inside a git repository in the worktree still adds fine, and
>> this is a bug.  The logic for denying it is very similar to denying
>> adding paths beyond symbolic links: die_if_path_beyond_symlink().
>> Follow its example and write a die_if_path_beyond_gitrepo() to fix
>> this bug.
>>
>> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
>> ---
>
>> @@ -166,6 +166,7 @@ static const char **validate_pathspec(const char **argv, const char *prefix)
>>  		const char **p;
>>  		for (p = pathspec; *p; p++) {
>>  			die_if_path_beyond_symlink(*p, prefix);
>> +			die_if_path_beyond_gitrepo(*p, prefix);
>>  		}
>>  	}
>> diff --git a/cache.h b/cache.h
>> index e1e8ce8..987d7f3 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -962,6 +962,8 @@ struct cache_def {
>>  
>>  extern int has_symlink_leading_path(const char *name, int len);
>> +extern int has_gitrepo_leading_path(const char *name, int len);
>
> I looked at the output from "grep has_symlink_leading_path" and also
> for "die_if_path_beyond"; all of these places are checking "I have
> this multi-level path; I want to know if the path does not (should
> not) be part of the current project", I think.  Certainly the one in
> the "update-index" is about the same operation as "git add" you are
> patching.
>
> Isn't it a better approach to _rename_ the existing function not to
> single out "symlink"-ness of the path first ?  A symlink in the
> middle of such a multi-level path that leads to a place outside the
> project is _not_ the only way to step out of our project boundary.  A
> directory in the middle of a multi-level path that is the top-level
> of the working tree of a foreign project is another way to step out
> of our project boundary.  Perhaps
>
> 	die_if_path_outside_our_project()
>         path_outside_our_project()
>
> And then update the implementation of path_outside_our_project(),
> which only took "symlink in the middle" into account so far, and
> teach it that such a "top-level of the working tree of a foreign
> project" is also stepping out of our project?
>
> That way, you do not have to settle on fixing the bug only in "git
> add" and keep the bug in "git update-index", I think.
>
> I think the hit in builtin/apply.c deals with the same "beyond
> symlink is outside our project" check and can be updated like so.  I
> didn't look at the ones in diff-lib.c and dir.c so you may want to
> double check on what they use it for.

The first step (renaming and adjusting comments) would look like
this.


 builtin/add.c          |  6 +++---
 builtin/apply.c        |  8 ++++++--
 builtin/check-ignore.c |  2 +-
 builtin/update-index.c |  4 ++--
 cache.h                |  4 ++--
 diff-lib.c             |  2 +-
 dir.c                  |  2 +-
 pathspec.c             |  6 +++---
 pathspec.h             |  2 +-
 preload-index.c        |  2 +-
 symlinks.c             | 10 +++++-----
 t/t0008-ignores.sh     |  2 +-
 12 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index ab1c9e8..7cb80ef 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -155,8 +155,8 @@ static void refresh(int verbose, const char **pathspec)
 
 /*
  * Normalizes argv relative to prefix, via get_pathspec(), and then
- * runs die_if_path_beyond_symlink() on each path in the normalized
- * list.
+ * runs die_if_path_outside_our_project() on each path in the
+ * normalized list.
  */
 static const char **validate_pathspec(const char **argv, const char *prefix)
 {
@@ -165,7 +165,7 @@ static const char **validate_pathspec(const char **argv, const char *prefix)
 	if (pathspec) {
 		const char **p;
 		for (p = pathspec; *p; p++) {
-			die_if_path_beyond_symlink(*p, prefix);
+			die_if_path_outside_our_project(*p, prefix);
 		}
 	}
 
diff --git a/builtin/apply.c b/builtin/apply.c
index 5b882d0..d0b408e 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -3469,10 +3469,14 @@ static int check_to_create(const char *new_name, int ok_if_exists)
 		 * A leading component of new_name might be a symlink
 		 * that is going to be removed with this patch, but
 		 * still pointing at somewhere that has the path.
-		 * In such a case, path "new_name" does not exist as
+		 * Or it could be the top-level of a working tree of
+		 * a different project that is embedded in our working
+		 * tree.
+		 *
+		 * In such cases, path "new_name" does not exist as
 		 * far as git is concerned.
 		 */
-		if (has_symlink_leading_path(new_name, strlen(new_name)))
+		if (path_outside_our_project(new_name, strlen(new_name)))
 			return 0;
 
 		return EXISTS_IN_WORKTREE;
diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
index 0240f99..bce378d 100644
--- a/builtin/check-ignore.c
+++ b/builtin/check-ignore.c
@@ -88,7 +88,7 @@ static int check_ignore(const char *prefix, const char **pathspec)
 		full_path = prefix_path(prefix, prefix
 					? strlen(prefix) : 0, path);
 		full_path = check_path_for_gitlink(full_path);
-		die_if_path_beyond_symlink(full_path, prefix);
+		die_if_path_outside_our_project(full_path, prefix);
 		if (!seen[i]) {
 			exclude = last_exclude_matching_path(&check, full_path,
 							     -1, &dtype);
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 5c7762e..7c47fa2 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -186,8 +186,8 @@ static int process_path(const char *path)
 	struct cache_entry *ce;
 
 	len = strlen(path);
-	if (has_symlink_leading_path(path, len))
-		return error("'%s' is beyond a symbolic link", path);
+	if (path_outside_our_project(path, len))
+		return error("'%s' is outside our working tree", path);
 
 	pos = cache_name_pos(path, len);
 	ce = pos < 0 ? NULL : active_cache[pos];
diff --git a/cache.h b/cache.h
index e1e8ce8..f6359b5 100644
--- a/cache.h
+++ b/cache.h
@@ -960,8 +960,8 @@ struct cache_def {
 	int prefix_len_stat_func;
 };
 
-extern int has_symlink_leading_path(const char *name, int len);
-extern int threaded_has_symlink_leading_path(struct cache_def *, const char *, int);
+extern int path_outside_our_project(const char *name, int len);
+extern int threaded_path_outside_our_project(struct cache_def *, const char *, int);
 extern int check_leading_path(const char *name, int len);
 extern int has_dirs_only_path(const char *name, int len, int prefix_len);
 extern void schedule_dir_for_removal(const char *name, int len);
diff --git a/diff-lib.c b/diff-lib.c
index f35de0f..8aff906 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -32,7 +32,7 @@ static int check_removed(const struct cache_entry *ce, struct stat *st)
 			return -1;
 		return 1;
 	}
-	if (has_symlink_leading_path(ce->name, ce_namelen(ce)))
+	if (path_outside_our_project(ce->name, ce_namelen(ce)))
 		return 1;
 	if (S_ISDIR(st->st_mode)) {
 		unsigned char sub[20];
diff --git a/dir.c b/dir.c
index 91cfd99..b90b57b 100644
--- a/dir.c
+++ b/dir.c
@@ -1479,7 +1479,7 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const char
 {
 	struct path_simplify *simplify;
 
-	if (has_symlink_leading_path(path, len))
+	if (path_outside_our_project(path, len))
 		return dir->nr;
 
 	simplify = create_simplify(pathspec);
diff --git a/pathspec.c b/pathspec.c
index 284f397..336149f 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -92,10 +92,10 @@ const char *check_path_for_gitlink(const char *path)
  * Dies if the given path refers to a file inside a symlinked
  * directory in the index.
  */
-void die_if_path_beyond_symlink(const char *path, const char *prefix)
+void die_if_path_outside_our_project(const char *path, const char *prefix)
 {
-	if (has_symlink_leading_path(path, strlen(path))) {
+	if (path_outside_our_project(path, strlen(path))) {
 		int len = prefix ? strlen(prefix) : 0;
-		die(_("'%s' is beyond a symbolic link"), path + len);
+		die(_("'%s' is outside the working tree"), path + len);
 	}
 }
diff --git a/pathspec.h b/pathspec.h
index db0184a..ef816a8 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -4,6 +4,6 @@
 extern char *find_pathspecs_matching_against_index(const char **pathspec);
 extern void add_pathspec_matches_against_index(const char **pathspec, char *seen, int specs);
 extern const char *check_path_for_gitlink(const char *path);
-extern void die_if_path_beyond_symlink(const char *path, const char *prefix);
+extern void die_if_path_outside_our_project(const char *path, const char *prefix);
 
 #endif /* PATHSPEC_H */
diff --git a/preload-index.c b/preload-index.c
index 49cb08d..b3e57d4 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -55,7 +55,7 @@ static void *preload_thread(void *_data)
 			continue;
 		if (!ce_path_match(ce, &pathspec))
 			continue;
-		if (threaded_has_symlink_leading_path(&cache, ce->name, ce_namelen(ce)))
+		if (threaded_path_outside_our_project(&cache, ce->name, ce_namelen(ce)))
 			continue;
 		if (lstat(ce->name, &st))
 			continue;
diff --git a/symlinks.c b/symlinks.c
index c2b41a8..baed93f 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -196,19 +196,19 @@ static int lstat_cache(struct cache_def *cache, const char *name, int len,
 #define USE_ONLY_LSTAT  0
 
 /*
- * Return non-zero if path 'name' has a leading symlink component
+ * Return non-zero if path 'name' points outside the working tree
  */
-int threaded_has_symlink_leading_path(struct cache_def *cache, const char *name, int len)
+int threaded_path_outside_our_project(struct cache_def *cache, const char *name, int len)
 {
 	return lstat_cache(cache, name, len, FL_SYMLINK|FL_DIR, USE_ONLY_LSTAT) & FL_SYMLINK;
 }
 
 /*
- * Return non-zero if path 'name' has a leading symlink component
+ * Return non-zero if path 'name' points outside the working tree
  */
-int has_symlink_leading_path(const char *name, int len)
+int path_outside_our_project(const char *name, int len)
 {
-	return threaded_has_symlink_leading_path(&default_cache, name, len);
+	return threaded_path_outside_our_project(&default_cache, name, len);
 }
 
 /*
diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index 9c1bde1..3881e7d 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -397,7 +397,7 @@ test_expect_success_multi SYMLINKS 'symlink' '' '
 
 test_expect_success_multi SYMLINKS 'beyond a symlink' '' '
 	test_check_ignore "a/symlink/foo" 128 &&
-	test_stderr "fatal: '\''a/symlink/foo'\'' is beyond a symbolic link"
+	test_stderr "fatal: '\''a/symlink/foo'\'' is outside the working tree"
 '
 
 test_expect_success_multi SYMLINKS 'beyond a symlink from subdirectory' '' '

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 17:34                 ` Junio C Hamano
@ 2013-04-09 17:41                   ` Ramkumar Ramachandra
  2013-04-09 17:54                     ` Junio C Hamano
  2013-04-09 17:41                   ` Junio C Hamano
  2013-04-09 18:32                   ` Jakub Narębski
  2 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-09 17:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Duy Nguyen, Git List

Junio C Hamano wrote:
> The first step (renaming and adjusting comments) would look like
> this.

Thanks for this!  I like the name die_if_path_outside_our_project().
I'll take care of the rest.`

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 17:34                 ` Junio C Hamano
  2013-04-09 17:41                   ` Ramkumar Ramachandra
@ 2013-04-09 17:41                   ` Junio C Hamano
  2013-04-09 17:56                     ` Ramkumar Ramachandra
  2013-04-09 18:32                   ` Jakub Narębski
  2 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-09 17:41 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jeff King, Duy Nguyen, Git List

Junio C Hamano <gitster@pobox.com> writes:

>> I looked at the output from "grep has_symlink_leading_path" and also
>> for "die_if_path_beyond"; all of these places are checking "I have
>> this multi-level path; I want to know if the path does not (should
>> not) be part of the current project", I think.  Certainly the one in
>> the "update-index" is about the same operation as "git add" you are
>> patching.
>>
>> Isn't it a better approach to _rename_ the existing function not to
>> single out "symlink"-ness of the path first ?  A symlink in the
>> middle of such a multi-level path that leads to a place outside the
>> project is _not_ the only way to step out of our project boundary.  A
>> directory in the middle of a multi-level path that is the top-level
>> of the working tree of a foreign project is another way to step out
>> of our project boundary.  Perhaps
>>
>> 	die_if_path_outside_our_project()
>>         path_outside_our_project()
>>
>> And then update the implementation of path_outside_our_project(),
>> which only took "symlink in the middle" into account so far, and
>> teach it that such a "top-level of the working tree of a foreign
>> project" is also stepping out of our project?
>>
>> That way, you do not have to settle on fixing the bug only in "git
>> add" and keep the bug in "git update-index", I think.
>>
>> I think the hit in builtin/apply.c deals with the same "beyond
>> symlink is outside our project" check and can be updated like so.  I
>> didn't look at the ones in diff-lib.c and dir.c so you may want to
>> double check on what they use it for.
>
> The first step (renaming and adjusting comments) would look like
> this.

Actually, there is another function "check_leading_path()" you may
want also adjust.

        /*
         * Return zero if path 'name' has a leading symlink component or
         * if some leading path component does not exists.
         *
         * Return -1 if leading path exists and is a directory.
         *
         * Return path length if leading path exists and is neither a
         * directory nor a symlink.
         */
        int check_leading_path(const char *name, int len)
        {
            return threaded_check_leading_path(&default_cache, name, len);
        }

I think what the callers of this function care about is if the name
is a path that should not be added to our index (i.e. points
"outside the repository").  If you had a symlink d that points at e
when our project does have a subdirectory e with file f,

	check_leading_path("d/f")

wants to say "bad", even though the real file pointed at, i.e. "e/f"
is inside our working tree, so "outside our working tree" is not
quite correct in the strict sense (this applies equally to
has_symlink_leading_path), but I think we should treat the case
where "d" (and "d/f") belongs to the working tree of a repository
for a separate project, that is embedded in our working tree the
same way.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 17:41                   ` Ramkumar Ramachandra
@ 2013-04-09 17:54                     ` Junio C Hamano
  2013-04-09 18:17                       ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-09 17:54 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jeff King, Duy Nguyen, Git List

Sorry for repeated rerolls.  I had missed another instance in t0008
and also the explanation was lacking.

-- >8 --
Subject: [PATCH] symlinks: rename has_symlink_leading_path() to
 path_outside_our_project()

The purpose of the function is to prevent a path from getting added
to our project when its path component steps outside our working
tree, like this:

	ln -s /etc myetc
	git add myetc/passwd

We do not want to end up with "myetc/passwd" in our index.  To make
sure an attempt to add such a path is caught, the implementation
checks if there is any leading symbolic link in the given path
(adding "myetc" itself as a symbolic link to our project is
accepted).

But there are other cases to attempt to add a path that do not
belong to our project, which do not have to involve a symbolic link
in the leading path.

Rename the function, and die_if_path_beyond_symlink() function, to
clarify what they are really checking, not how they are checking.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/add.c          |  6 +++---
 builtin/apply.c        |  8 ++++++--
 builtin/check-ignore.c |  2 +-
 builtin/update-index.c |  4 ++--
 cache.h                |  4 ++--
 diff-lib.c             |  2 +-
 dir.c                  |  2 +-
 pathspec.c             |  6 +++---
 pathspec.h             |  2 +-
 preload-index.c        |  2 +-
 symlinks.c             | 10 +++++-----
 t/t0008-ignores.sh     |  4 ++--
 12 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index ab1c9e8..7cb80ef 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -155,8 +155,8 @@ static void refresh(int verbose, const char **pathspec)
 
 /*
  * Normalizes argv relative to prefix, via get_pathspec(), and then
- * runs die_if_path_beyond_symlink() on each path in the normalized
- * list.
+ * runs die_if_path_outside_our_project() on each path in the
+ * normalized list.
  */
 static const char **validate_pathspec(const char **argv, const char *prefix)
 {
@@ -165,7 +165,7 @@ static const char **validate_pathspec(const char **argv, const char *prefix)
 	if (pathspec) {
 		const char **p;
 		for (p = pathspec; *p; p++) {
-			die_if_path_beyond_symlink(*p, prefix);
+			die_if_path_outside_our_project(*p, prefix);
 		}
 	}
 
diff --git a/builtin/apply.c b/builtin/apply.c
index 5b882d0..d0b408e 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -3469,10 +3469,14 @@ static int check_to_create(const char *new_name, int ok_if_exists)
 		 * A leading component of new_name might be a symlink
 		 * that is going to be removed with this patch, but
 		 * still pointing at somewhere that has the path.
-		 * In such a case, path "new_name" does not exist as
+		 * Or it could be the top-level of a working tree of
+		 * a different project that is embedded in our working
+		 * tree.
+		 *
+		 * In such cases, path "new_name" does not exist as
 		 * far as git is concerned.
 		 */
-		if (has_symlink_leading_path(new_name, strlen(new_name)))
+		if (path_outside_our_project(new_name, strlen(new_name)))
 			return 0;
 
 		return EXISTS_IN_WORKTREE;
diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
index 0240f99..bce378d 100644
--- a/builtin/check-ignore.c
+++ b/builtin/check-ignore.c
@@ -88,7 +88,7 @@ static int check_ignore(const char *prefix, const char **pathspec)
 		full_path = prefix_path(prefix, prefix
 					? strlen(prefix) : 0, path);
 		full_path = check_path_for_gitlink(full_path);
-		die_if_path_beyond_symlink(full_path, prefix);
+		die_if_path_outside_our_project(full_path, prefix);
 		if (!seen[i]) {
 			exclude = last_exclude_matching_path(&check, full_path,
 							     -1, &dtype);
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 5c7762e..7c47fa2 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -186,8 +186,8 @@ static int process_path(const char *path)
 	struct cache_entry *ce;
 
 	len = strlen(path);
-	if (has_symlink_leading_path(path, len))
-		return error("'%s' is beyond a symbolic link", path);
+	if (path_outside_our_project(path, len))
+		return error("'%s' is outside our working tree", path);
 
 	pos = cache_name_pos(path, len);
 	ce = pos < 0 ? NULL : active_cache[pos];
diff --git a/cache.h b/cache.h
index e1e8ce8..f6359b5 100644
--- a/cache.h
+++ b/cache.h
@@ -960,8 +960,8 @@ struct cache_def {
 	int prefix_len_stat_func;
 };
 
-extern int has_symlink_leading_path(const char *name, int len);
-extern int threaded_has_symlink_leading_path(struct cache_def *, const char *, int);
+extern int path_outside_our_project(const char *name, int len);
+extern int threaded_path_outside_our_project(struct cache_def *, const char *, int);
 extern int check_leading_path(const char *name, int len);
 extern int has_dirs_only_path(const char *name, int len, int prefix_len);
 extern void schedule_dir_for_removal(const char *name, int len);
diff --git a/diff-lib.c b/diff-lib.c
index f35de0f..8aff906 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -32,7 +32,7 @@ static int check_removed(const struct cache_entry *ce, struct stat *st)
 			return -1;
 		return 1;
 	}
-	if (has_symlink_leading_path(ce->name, ce_namelen(ce)))
+	if (path_outside_our_project(ce->name, ce_namelen(ce)))
 		return 1;
 	if (S_ISDIR(st->st_mode)) {
 		unsigned char sub[20];
diff --git a/dir.c b/dir.c
index 91cfd99..b90b57b 100644
--- a/dir.c
+++ b/dir.c
@@ -1479,7 +1479,7 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const char
 {
 	struct path_simplify *simplify;
 
-	if (has_symlink_leading_path(path, len))
+	if (path_outside_our_project(path, len))
 		return dir->nr;
 
 	simplify = create_simplify(pathspec);
diff --git a/pathspec.c b/pathspec.c
index 284f397..336149f 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -92,10 +92,10 @@ const char *check_path_for_gitlink(const char *path)
  * Dies if the given path refers to a file inside a symlinked
  * directory in the index.
  */
-void die_if_path_beyond_symlink(const char *path, const char *prefix)
+void die_if_path_outside_our_project(const char *path, const char *prefix)
 {
-	if (has_symlink_leading_path(path, strlen(path))) {
+	if (path_outside_our_project(path, strlen(path))) {
 		int len = prefix ? strlen(prefix) : 0;
-		die(_("'%s' is beyond a symbolic link"), path + len);
+		die(_("'%s' is outside the working tree"), path + len);
 	}
 }
diff --git a/pathspec.h b/pathspec.h
index db0184a..ef816a8 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -4,6 +4,6 @@
 extern char *find_pathspecs_matching_against_index(const char **pathspec);
 extern void add_pathspec_matches_against_index(const char **pathspec, char *seen, int specs);
 extern const char *check_path_for_gitlink(const char *path);
-extern void die_if_path_beyond_symlink(const char *path, const char *prefix);
+extern void die_if_path_outside_our_project(const char *path, const char *prefix);
 
 #endif /* PATHSPEC_H */
diff --git a/preload-index.c b/preload-index.c
index 49cb08d..b3e57d4 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -55,7 +55,7 @@ static void *preload_thread(void *_data)
 			continue;
 		if (!ce_path_match(ce, &pathspec))
 			continue;
-		if (threaded_has_symlink_leading_path(&cache, ce->name, ce_namelen(ce)))
+		if (threaded_path_outside_our_project(&cache, ce->name, ce_namelen(ce)))
 			continue;
 		if (lstat(ce->name, &st))
 			continue;
diff --git a/symlinks.c b/symlinks.c
index c2b41a8..baed93f 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -196,19 +196,19 @@ static int lstat_cache(struct cache_def *cache, const char *name, int len,
 #define USE_ONLY_LSTAT  0
 
 /*
- * Return non-zero if path 'name' has a leading symlink component
+ * Return non-zero if path 'name' points outside the working tree
  */
-int threaded_has_symlink_leading_path(struct cache_def *cache, const char *name, int len)
+int threaded_path_outside_our_project(struct cache_def *cache, const char *name, int len)
 {
 	return lstat_cache(cache, name, len, FL_SYMLINK|FL_DIR, USE_ONLY_LSTAT) & FL_SYMLINK;
 }
 
 /*
- * Return non-zero if path 'name' has a leading symlink component
+ * Return non-zero if path 'name' points outside the working tree
  */
-int has_symlink_leading_path(const char *name, int len)
+int path_outside_our_project(const char *name, int len)
 {
-	return threaded_has_symlink_leading_path(&default_cache, name, len);
+	return threaded_path_outside_our_project(&default_cache, name, len);
 }
 
 /*
diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index 9c1bde1..a00ee75 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -397,7 +397,7 @@ test_expect_success_multi SYMLINKS 'symlink' '' '
 
 test_expect_success_multi SYMLINKS 'beyond a symlink' '' '
 	test_check_ignore "a/symlink/foo" 128 &&
-	test_stderr "fatal: '\''a/symlink/foo'\'' is beyond a symbolic link"
+	test_stderr "fatal: '\''a/symlink/foo'\'' is outside the working tree"
 '
 
 test_expect_success_multi SYMLINKS 'beyond a symlink from subdirectory' '' '
@@ -405,7 +405,7 @@ test_expect_success_multi SYMLINKS 'beyond a symlink from subdirectory' '' '
 		cd a &&
 		test_check_ignore "symlink/foo" 128
 	) &&
-	test_stderr "fatal: '\''symlink/foo'\'' is beyond a symbolic link"
+	test_stderr "fatal: '\''symlink/foo'\'' is outside the working tree"
 '
 
 ############################################################################
-- 
1.8.2.1-465-gf55e5b3

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 17:41                   ` Junio C Hamano
@ 2013-04-09 17:56                     ` Ramkumar Ramachandra
  2013-04-09 18:48                       ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-09 17:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Duy Nguyen, Git List

Junio C Hamano wrote:
> I think what the callers of this function care about is if the name
> is a path that should not be added to our index (i.e. points
> "outside the repository").  If you had a symlink d that points at e
> when our project does have a subdirectory e with file f,
>
>         check_leading_path("d/f")
>
> wants to say "bad", even though the real file pointed at, i.e. "e/f"
> is inside our working tree, so "outside our working tree" is not
> quite correct in the strict sense (this applies equally to
> has_symlink_leading_path), but

Actually, you introduced one naming regression:
has_symlink_leading_path() is a good name for what the function does,
as opposed to die_if_path_outside_our_tree(), which is misleading.
What about die_if_path_contains_links() to encapsulate gitlinks and
symlinks?

> I think we should treat the case
> where "d" (and "d/f") belongs to the working tree of a repository
> for a separate project, that is embedded in our working tree the
> same way.

I'm not too sure about this.  It means that I can have symlinks to
files in various parts of my worktree, but not to directories.  Isn't
this an absurd limitation to impose?  I'm not saying that it's
particularly useful to have a symlink at / pointing to a directory
deeply nested in your repository, but that limitations must have some
concrete rationale.

Anyway, since we're not introducing any regressions (as
has_symlink_leading_path imposes the same absurd limitation), we don't
have to fix this now.  But it's certainly something worth fixing in
the future, I think.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 17:54                     ` Junio C Hamano
@ 2013-04-09 18:17                       ` Ramkumar Ramachandra
  2013-04-09 18:50                         ` Junio C Hamano
  2013-04-09 20:31                         ` Junio C Hamano
  0 siblings, 2 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-09 18:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Duy Nguyen, Git List

Junio C Hamano wrote:
> But there are other cases to attempt to add a path that do not
> belong to our project, which do not have to involve a symbolic link
> in the leading path.

The reader is now wondering what this could possibly be, and why you
didn't send this patch earlier.  Perhaps clarify with: s/there are
cases/there may be cases/ and append "One such case that we currently
don't handle yet is a path inside another git repository in our
worktree, as demonstrated by test tXXXX.X."

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 17:34                 ` Junio C Hamano
  2013-04-09 17:41                   ` Ramkumar Ramachandra
  2013-04-09 17:41                   ` Junio C Hamano
@ 2013-04-09 18:32                   ` Jakub Narębski
  2013-04-09 18:51                     ` Junio C Hamano
  2 siblings, 1 reply; 140+ messages in thread
From: Jakub Narębski @ 2013-04-09 18:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ramkumar Ramachandra, Jeff King, Duy Nguyen, Git List

W dniu 09.04.2013 19:34, Junio C Hamano pisze:

> -	if (has_symlink_leading_path(path, len))
> -		return error("'%s' is beyond a symbolic link", path);
> +	if (path_outside_our_project(path, len))
> +		return error("'%s' is outside our working tree", path);
>  

Don't we lose important information here?  Or we shouldn't care?

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 17:56                     ` Ramkumar Ramachandra
@ 2013-04-09 18:48                       ` Junio C Hamano
  2013-04-10 13:38                         ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-09 18:48 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jeff King, Duy Nguyen, Git List

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Junio C Hamano wrote:
>> I think what the callers of this function care about is if the name
>> is a path that should not be added to our index (i.e. points
>> "outside the repository").  If you had a symlink d that points at e
>> when our project does have a subdirectory e with file f,
>>
>>         check_leading_path("d/f")
>>
>> wants to say "bad", even though the real file pointed at, i.e. "e/f"
>> is inside our working tree, so "outside our working tree" is not
>> quite correct in the strict sense (this applies equally to
>> has_symlink_leading_path), but
>
> Actually, you introduced one naming regression:
> has_symlink_leading_path() is a good name for what the function does,
> as opposed to die_if_path_outside_our_tree(), which is misleading.
> What about die_if_path_contains_links() to encapsulate gitlinks and
> symlinks?

The cases we know that "$d/f" (where $d is a path that is one or
more levels, e.g. "dir", "d/i", or "d/i/r") is bad are when

 - "$d" is a symlink, because what you could add to the index is "$d"
   and nothing underneath it; or

 - "$d" is a directory that is the top level of the working tree
   that is controled by "$d/.git", because what you could add to the
   index is "$d" and nothing underneath it.

If "$d" were added to our index, the former will make 120000 entry
and the latter will make 160000 entry.  But the user may not want to
add $d ever to our project, so in that case, neither will give us a
symlink or a gitlink.

We should find a word that makes it clear that "this path is beyond
something we _could_ add".  I do not think "link" is a good word for
it.  It shares the same mistake that led to the original misnomer,
i.e. "the case we happened to notice was when we have symlink so
let's name it with 'symlink' somewhere in it."

>> I think we should treat the case
>> where "d" (and "d/f") belongs to the working tree of a repository
>> for a separate project, that is embedded in our working tree the
>> same way.
>
> I'm not too sure about this.  It means that I can have symlinks to
> files in various parts of my worktree, but not to directories.

It does not mean that.  It is valid to do

	ln -s myetc /etc
        git add myetc

It is NOT valid to do

	git add myetc/passwd

One can have symlinks to anywhere all one wants.  We track symlinks.

It is the same for the top-level of the working tree of a separate
project, be it a submodule or not.  It is valid to do

	mkdir foo && (cd foo && git init && >file)
        git add foo

It is NOT valid to do

	git add foo/file

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 18:17                       ` Ramkumar Ramachandra
@ 2013-04-09 18:50                         ` Junio C Hamano
  2013-04-09 19:09                           ` Junio C Hamano
  2013-04-09 20:31                         ` Junio C Hamano
  1 sibling, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-09 18:50 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jeff King, Duy Nguyen, Git List

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> The reader is now wondering what this could possibly be, and why you
> didn't send this patch earlier. 

Because it wasn't written back then?

> Perhaps clarify with: s/there are
> cases/there may be cases/ and append "One such case that we currently
> don't handle yet is a path inside another git repository in our
> worktree, as demonstrated by test tXXXX.X."

I think "we currently don't handle" is a misstatement.  It is not a
bug that we don't.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 18:32                   ` Jakub Narębski
@ 2013-04-09 18:51                     ` Junio C Hamano
  2013-04-09 18:58                       ` Jakub Narębski
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-09 18:51 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Ramkumar Ramachandra, Jeff King, Duy Nguyen, Git List

Jakub Narębski <jnareb@gmail.com> writes:

> W dniu 09.04.2013 19:34, Junio C Hamano pisze:
>
>> -	if (has_symlink_leading_path(path, len))
>> -		return error("'%s' is beyond a symbolic link", path);
>> +	if (path_outside_our_project(path, len))
>> +		return error("'%s' is outside our working tree", path);
>>  
>
> Don't we lose important information here?  Or we shouldn't care?

What important information is it?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 18:51                     ` Junio C Hamano
@ 2013-04-09 18:58                       ` Jakub Narębski
  2013-04-09 19:10                         ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Jakub Narębski @ 2013-04-09 18:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ramkumar Ramachandra, Jeff King, Duy Nguyen, Git List

Junio C Hamano wrote:
> Jakub Narębski <jnareb@gmail.com> writes:
> 
>> W dniu 09.04.2013 19:34, Junio C Hamano pisze:
>>
>>> -	if (has_symlink_leading_path(path, len))
>>> -		return error("'%s' is beyond a symbolic link", path);
>>> +	if (path_outside_our_project(path, len))
>>> +		return error("'%s' is outside our working tree", path);
>>>  
>>
>> Don't we lose important information here?  Or we shouldn't care?
> 
> What important information is it?

That the cause is symbolic link (or other git repository, in the future).

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 18:50                         ` Junio C Hamano
@ 2013-04-09 19:09                           ` Junio C Hamano
  0 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2013-04-09 19:09 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jeff King, Duy Nguyen, Git List

Junio C Hamano <gitster@pobox.com> writes:

> Ramkumar Ramachandra <artagnon@gmail.com> writes:
>
>> The reader is now wondering what this could possibly be, and why you
>> didn't send this patch earlier. 
>
> Because it wasn't written back then?
>
>> Perhaps clarify with: s/there are
>> cases/there may be cases/ and append "One such case that we currently
>> don't handle yet is a path inside another git repository in our
>> worktree, as demonstrated by test tXXXX.X."
>
> I think "we currently don't handle" is a misstatement.  It is not a
> bug that we don't.

We can think of it this way.

In your working tree, there is an upper bound for the paths you can
include in your commit.  When you are at the top-level of your
working tree, you do not say "git add ../f" or "git add ../d/f".
The root-level of your working tree is an upper bound and you do not
cross that boundary.

It turns out that there are lower bounds for the paths as well.
When we say "Git tracks symbolic links", anything that appears
beyond a symbolic link is beyond that boundary.  If we track a
symbolic link "l", we can of course add "l". When "l" leads to
a directory somewhere else, the filesystem gives you an illusion
that there are things under "l" (e.g. "l" points at "/etc" and there
is "l/passwd" there), but that is beyond the boundary.  You do not
add "l/passwd".  Otherwise "git add l" would become meaningless.
Does it add the symbolic link itself, or all the files in there,
pretending "l" is actually a directory?  We have chosen to say it is
the former, and apply that rule consistently.

It is the same for "Git tracks submodules", which defines that the
top-level of the submodule working tree as such a lower boundary.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 18:58                       ` Jakub Narębski
@ 2013-04-09 19:10                         ` Junio C Hamano
  0 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2013-04-09 19:10 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Ramkumar Ramachandra, Jeff King, Duy Nguyen, Git List

Jakub Narębski <jnareb@gmail.com> writes:

> Junio C Hamano wrote:
>> Jakub Narębski <jnareb@gmail.com> writes:
>> 
>>> W dniu 09.04.2013 19:34, Junio C Hamano pisze:
>>>
>>>> -	if (has_symlink_leading_path(path, len))
>>>> -		return error("'%s' is beyond a symbolic link", path);
>>>> +	if (path_outside_our_project(path, len))
>>>> +		return error("'%s' is outside our working tree", path);
>>>>  
>>>
>>> Don't we lose important information here?  Or we shouldn't care?
>> 
>> What important information is it?
>
> That the cause is symbolic link (or other git repository, in the future).

And in what way is it important?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 18:17                       ` Ramkumar Ramachandra
  2013-04-09 18:50                         ` Junio C Hamano
@ 2013-04-09 20:31                         ` Junio C Hamano
  2013-04-10 13:25                           ` Ramkumar Ramachandra
  1 sibling, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2013-04-09 20:31 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jeff King, Duy Nguyen, Git List

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Junio C Hamano wrote:
>> But there are other cases to attempt to add a path that do not
>> belong to our project, which do not have to involve a symbolic link
>> in the leading path.
>
> The reader is now wondering what this could possibly be, and why you
> didn't send this patch earlier.  Perhaps clarify with: s/there are
> cases/there may be cases/ and append "One such case that we currently
> don't handle yet is a path inside another git repository in our
> worktree, as demonstrated by test tXXXX.X."

I _think_ I misread what you meant to say in the above.

We can go either way between "are cases" or "may be cases".  I meant
it as an immediate predecessor ([PATCH 1/n]) to the patch you were
working on ([PATCH 2/n] and later), so in that context, it does not
matter.  [PATCH 2/n] will start as "Now the naming is saner, let's
start noticing when the user gives a path is beyond our project
boundary because it is under control of another repository by adding
necessary logic to that function."

And I also misread "we currently don't handle" above as "but we
really should allow adding d/f when d is at the top of the working
tree of another project", but that was not what you meant to say.
Instead, "We do not notice such a bad case in today's code yet" was
what you meant.  But if we are to use "there are cases" in [1/n] and
start [2/n] with "Now we have renamed, let's do this", then we do
not have to bother saying anything in [1/n] about the upcoming
change in [2/n], especially the patches come back-to-back in a
single series.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-08 21:03                                                                   ` Ramkumar Ramachandra
@ 2013-04-10  7:23                                                                     ` Philip Oakley
  0 siblings, 0 replies; 140+ messages in thread
From: Philip Oakley @ 2013-04-10  7:23 UTC (permalink / raw)
  To: Ramkumar Ramachandra, Junio C Hamano
  Cc: Duy Nguyen, Jonathan Nieder, Jens Lehmann, John Keeping,
	Git List, Linus Torvalds

From: "Ramkumar Ramachandra" <artagnon@gmail.com>
Sent: Monday, April 08, 2013 10:03 PM
> This is going nowhere.  You're stuck at making the current submodule
> system work, not answering my questions, diverting conversation,
> repeatedly asking the same stupid questions, labelling everything that
> I say "subjective", and refusing to look at the objective counterpart
> (aka, the code).  It's clear to me that no matter how many more emails
> I write, you're not going to concede.
>
> I'm not interested in wasting any more of my time with this nonsense.
>
> I give up.
> --
Please don't "give up". It is a bit of a 'wicked' problem [1].

Yes to taking a rest, stepping back and trying to summarise/review what 
was discussed.

I couldn't keep up with all the discussion, and I doubt many others kept 
up, especially those who have been frustrated in their (mis-) use of 
submodules. Do remember that Junio has multiple roles which belie the 
softness of the word 'maintainer'. It includes "Defender of the 
Heritage" in the same way that keepers of ancient monuments will want 
visitors to enjoy the site, but rail against a garish new stainless 
steel and glass entrance to the Colosseum (choose you local heritage 
site) (see [1] again).

I get confused (about sub-modules) with msysgit where git.git is a 
sub-module, and is the fastest moving (an inversion of control issue), 
and when hacking at (just) the msys level when the git sub-module isn't 
in sync.

In many ways sub-module tracking is like file renames and empty 
directories (both of which come up a lot). The submodule meta 
information issue has great similarity to the empty directories issue. 
It's about meta information, not about content (which is certified 
verified by sha1), and about how users know what is going on and get a 
(natural) feeling of control (without upsetting other users/controllers) 
.

regards

Philip
[now to schedule some time to do the catch up reading. $dayjob beckons]

[1] www.poppendieck.com/wicked.htm 

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 20:31                         ` Junio C Hamano
@ 2013-04-10 13:25                           ` Ramkumar Ramachandra
  2013-04-10 16:25                             ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-10 13:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Duy Nguyen, Git List

Junio C Hamano wrote:
> And I also misread "we currently don't handle" above as "but we
> really should allow adding d/f when d is at the top of the working
> tree of another project", but that was not what you meant to say.
> Instead, "We do not notice such a bad case in today's code yet" was
> what you meant.  But if we are to use "there are cases" in [1/n] and
> start [2/n] with "Now we have renamed, let's do this", then we do
> not have to bother saying anything in [1/n] about the upcoming
> change in [2/n], especially the patches come back-to-back in a
> single series.

Exactly.  Yeah, I don't think you patch makes sense as a standalone
anyway: I'll use appropriate wording when I roll the series, so it
follows nicely.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-09 18:48                       ` Junio C Hamano
@ 2013-04-10 13:38                         ` Ramkumar Ramachandra
  0 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-10 13:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Duy Nguyen, Git List

Junio C Hamano wrote:
> One can have symlinks to anywhere all one wants.  We track symlinks.
> [...]

Yes, I know.  We store symlinks as blobs containing one line, the path
to the file, without a trailing newline.  And we have a mode for it to
distinguish it from regular files.

What I meant is:

    echo "baz" >newfile
    cd foo/bar/quux
    ln -s ../../../newfile
    cd ../../..                    # Back to toplevel
    git add foo/bar/quux/newfile

This is allowed.  While:

    cd foo/bar/quux
    echo "baz" >newfile
    cd ../../..                    # Back to toplevel
    ln -s foo/bar/quux
    git add quux/newfile

is disallowed.  Then again, if we were to replace the last line with:

    cd quux
    git add newfile

and it works.

Notice that both symlinks are pointing to paths inside out repository,
and the only difference is that the second example attempts to add a
path with a symlink as the non-final component.  The path is not
pointing "outside" our repository, as the function name would
indicate.

Anyway, it's just a minor detail that would be nice to fix in the
future.  Nothing urgent.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 2/2] add: refuse to add paths beyond repository boundaries
  2013-04-10 13:25                           ` Ramkumar Ramachandra
@ 2013-04-10 16:25                             ` Junio C Hamano
  0 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2013-04-10 16:25 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jeff King, Duy Nguyen, Git List

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Exactly.  Yeah, I don't think you patch makes sense as a standalone
> anyway.

Yes, it was purely a preparatory step.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-07 21:23                                 ` Jonathan Nieder
  2013-04-07 21:30                                   ` Ramkumar Ramachandra
@ 2013-04-17 10:37                                   ` Duy Nguyen
  2013-04-17 11:06                                     ` Ramkumar Ramachandra
  2013-04-17 16:01                                     ` Junio C Hamano
  1 sibling, 2 replies; 140+ messages in thread
From: Duy Nguyen @ 2013-04-17 10:37 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ramkumar Ramachandra, Jens Lehmann, John Keeping, Junio C Hamano,
	Git List, Linus Torvalds

On Mon, Apr 8, 2013 at 7:23 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Ramkumar Ramachandra wrote:
>
>>             It's about the core object code of git parsing links, as
>> opposed to a fringe submodule.c/ submodule.sh parsing .gitmodules.
>
> What's stopping the core object code of git parsing .gitmodules?  What
> is the core object code?  How does this compare to other metadata
> files like .gitattributes and .gitignore?

Somewhat related to the topic. Why can't .gitattributes be used for
storing what's currently in .gitmodules?
--
Duy

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-17 10:37                                   ` Duy Nguyen
@ 2013-04-17 11:06                                     ` Ramkumar Ramachandra
  2013-04-17 11:27                                       ` Duy Nguyen
  2013-04-17 16:01                                     ` Junio C Hamano
  1 sibling, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-17 11:06 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jonathan Nieder, Jens Lehmann, John Keeping, Junio C Hamano,
	Git List, Linus Torvalds

Duy Nguyen wrote:
> Somewhat related to the topic. Why can't .gitattributes be used for
> storing what's currently in .gitmodules?

It can.  It's just a small syntax change from "key = value" attributes
inside a toplevel [submodule <name>] section separated by newlines, to
a path marked with multiple "key=value" attributes separated by
whitespace.  However, we don't want to make this change because these
submodule attributes are somewhat "different" from .gitattributes
attributes.

Roughly speaking, the current .gitmodules design treats submodule
directories as "directories with special attributes", with two
differences: these directories have a special mode in the index, and a
commit object is created in the database to represent the "partial
state" of this submodule.  If you think about it, the information
stored in the commit object is no less/ no more important than the
path-attribute mapping in .gitmodules.  I was arguing for using a
special OBJ_LINK to represent the full state of the submodule, and
doing away with the attributes altogether, but not everyone agrees.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-17 11:06                                     ` Ramkumar Ramachandra
@ 2013-04-17 11:27                                       ` Duy Nguyen
  2013-04-17 11:56                                         ` Ramkumar Ramachandra
       [not found]                                         ` <CALkWK0m9QmZaSDruY=+2F-Kkw+fd6E1TYC TBpVQHRJrzq2VjCQ@mail.gmail.com>
  0 siblings, 2 replies; 140+ messages in thread
From: Duy Nguyen @ 2013-04-17 11:27 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Jonathan Nieder, Jens Lehmann, John Keeping, Junio C Hamano,
	Git List, Linus Torvalds

On Wed, Apr 17, 2013 at 9:06 PM, Ramkumar Ramachandra
<artagnon@gmail.com> wrote:
> Duy Nguyen wrote:
>> Somewhat related to the topic. Why can't .gitattributes be used for
>> storing what's currently in .gitmodules?
>
> It can.  It's just a small syntax change from "key = value" attributes
> inside a toplevel [submodule <name>] section separated by newlines, to
> a path marked with multiple "key=value" attributes separated by
> whitespace.  However, we don't want to make this change because these
> submodule attributes are somewhat "different" from .gitattributes
> attributes.
>
> Roughly speaking, the current .gitmodules design treats submodule
> directories as "directories with special attributes", with two
> differences: these directories have a special mode in the index, and a
> commit object is created in the database to represent the "partial
> state" of this submodule.

That was my thinking. .gitmodules would break if a user moves the
submodule manually (or even if .gitattributes is used)

> If you think about it, the information
> stored in the commit object is no less/ no more important than the
> path-attribute mapping in .gitmodules.  I was arguing for using a
> special OBJ_LINK to represent the full state of the submodule, and
> doing away with the attributes altogether, but not everyone agrees.

Include me to those everyone. url feels like a local thing that should
not stay in object database (another way of looking at it is like an
email address: the primary one fixed in stone in commits with .mailmap
for future substitution). Other attributes like .update,
.fetchRecursiveSubmodules... definitely should not be stored in object
database. I think if they are stored in the submodule's config file,
then the manual move problem above will go away.

And if you're dead set on storing some submodule state in object
database, why not reuse tag object with some nea header lines?
--
Duy

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-17 11:27                                       ` Duy Nguyen
@ 2013-04-17 11:56                                         ` Ramkumar Ramachandra
  2013-04-17 12:06                                           ` Duy Nguyen
       [not found]                                         ` <CALkWK0m9QmZaSDruY=+2F-Kkw+fd6E1TYC TBpVQHRJrzq2VjCQ@mail.gmail.com>
  1 sibling, 1 reply; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-17 11:56 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jonathan Nieder, Jens Lehmann, John Keeping, Junio C Hamano,
	Git List, Linus Torvalds

Duy Nguyen wrote:
> Include me to those everyone. url feels like a local thing that should
> not stay in object database (another way of looking at it is like an
> email address: the primary one fixed in stone in commits with .mailmap
> for future substitution).

We've been over this several times in earlier emails.  That's like
saying that a blob should not be stored in the object database,
because it is not "fixed in stone" (my OBJ_LINK is just a special kind
of blob, as I've repeated many times already).  I don't rely on what I
"feel", which is why I started out by posting an implementation: the
implementation seems to indicate that getting an OBJ_LINK will
simplify a lot of things.  And that is my primary criterion for
deciding: if the implementation is simple and elegant, it must clearly
be doing something right.

Again, I'm not saying that my approach is Correct and Final.  What I'm
saying is: "Here's what I've done.  Something interesting is going on.
 It's probably worth a look?"

> Other attributes like .update,
> .fetchRecursiveSubmodules... definitely should not be stored in object
> database.

"Coffee and other beverages definitely should be served cold."
All very nice to say, but I don't see any rationale.

> I think if they are stored in the submodule's config file,
> then the manual move problem above will go away.

What?  The submodule's .git/config?  Why should a submodule repository
know that it is being used as a submodule?  What inherent properties
of a git repository change if it is being used as a submodule?

> And if you're dead set on storing some submodule state in object
> database,

I'm not.  I'm just saying that it seems to be an interesting
alternative approach.  Considering that nobody else brought up a real
alternative approach, and chose to just keep defending .gitmodules to
the death, it's the only other approach we have.

> why not reuse tag object with some nea header lines?

Or a unified blob, which is currently what we have.  The point is to
have structured parseable information that the object-parsing code of
git code and easily slurp and give to the rest of git-core.

Please clear your reading backlog to avoid bringing up the same points
over and over again.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-17 11:56                                         ` Ramkumar Ramachandra
@ 2013-04-17 12:06                                           ` Duy Nguyen
  2013-04-17 12:14                                             ` Ramkumar Ramachandra
  0 siblings, 1 reply; 140+ messages in thread
From: Duy Nguyen @ 2013-04-17 12:06 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Jonathan Nieder, Jens Lehmann, John Keeping, Junio C Hamano,
	Git List, Linus Torvalds

On Wed, Apr 17, 2013 at 9:56 PM, Ramkumar Ramachandra
<artagnon@gmail.com> wrote:
>> why not reuse tag object with some nea header lines?
>
> Or a unified blob, which is currently what we have.  The point is to
> have structured parseable information that the object-parsing code of
> git code and easily slurp and give to the rest of git-core.

I think you misunderstood. I meant instead of introducing new object
type OBJ_LINK, you can reuse tag object and add new header lines for
your purposes.

> Please clear your reading backlog to avoid bringing up the same points
> over and over again.

Yep. I'll shut up until it's cleared.
--
Duy

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-17 12:06                                           ` Duy Nguyen
@ 2013-04-17 12:14                                             ` Ramkumar Ramachandra
  0 siblings, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-17 12:14 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jonathan Nieder, Jens Lehmann, John Keeping, Junio C Hamano,
	Git List, Linus Torvalds

Duy Nguyen wrote:
> On Wed, Apr 17, 2013 at 9:56 PM, Ramkumar Ramachandra
> <artagnon@gmail.com> wrote:
>>> why not reuse tag object with some nea header lines?
>>
>> Or a unified blob, which is currently what we have.  The point is to
>> have structured parseable information that the object-parsing code of
>> git code and easily slurp and give to the rest of git-core.
>
> I think you misunderstood. I meant instead of introducing new object
> type OBJ_LINK, you can reuse tag object and add new header lines for
> your purposes.

Oh, I interpreted your typo "nea" as "neat", when you meant "new".
Yeah, it's worth exploring: I don't know what backward compatibility
benefits it will yield yet.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-17 10:37                                   ` Duy Nguyen
  2013-04-17 11:06                                     ` Ramkumar Ramachandra
@ 2013-04-17 16:01                                     ` Junio C Hamano
  1 sibling, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2013-04-17 16:01 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Jens Lehmann,
	John Keeping, Git List, Linus Torvalds

Duy Nguyen <pclouds@gmail.com> writes:

> Somewhat related to the topic. Why can't .gitattributes be used for
> storing what's currently in .gitmodules?

You _could_ use gitattributes to encode, but it goes against what a
gitattributes file does or is for.  It is a mechanism to associate
groups of paths (that may not even exist) to a set of attributes.
You could list a single pattern that happens to match a single path
and at the implementation level you may be able to make it work, but
at the design/philosophical level, it is wrong.

We need info on each submodule and we need to key it with the name
of the submodule, not with its path.  At any given time, a single
submodule lives at (at most) one path, so you could still use path
as a key in the .gitattributes, but when you need to move the
submodule path, you would need to update the entry for the submodule
in .gitattributes file by finding a pattern that match the old path
and making it a pattern that match the new path.

We have a much more suitable file format that we use to associate
various values to keys: the config format.  Also having a file that
is only about submodules and nothing else means we could write a
content-aware smart ll-merge driver that can take advantage of the
knowledge that it is written in the config format and it talks about
submodules.

The answer to "why can't" question is "no".  No, there is no reason
why you can't use it. We don't do it, because it just does not make
sense.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
       [not found]                                         ` <CALkWK0m9QmZaSDruY=+2F-Kkw+fd6E1TYC TBpVQHRJrzq2VjCQ@mail.gmail.com>
@ 2013-04-17 23:17                                           ` Philip Oakley
  2013-04-18  7:50                                             ` Ramkumar Ramachandra
  2013-04-19 17:08                                             ` Jens Lehmann
  0 siblings, 2 replies; 140+ messages in thread
From: Philip Oakley @ 2013-04-17 23:17 UTC (permalink / raw)
  To: Ramkumar Ramachandra, Duy Nguyen
  Cc: Jonathan Nieder, Jens Lehmann, John Keeping, Junio C Hamano,
	Git List, Linus Torvalds

From: "Ramkumar Ramachandra" <artagnon@gmail.com>
Sent: Wednesday, April 17, 2013 12:56 PM
>
> We've been over this several times in earlier emails.  [...]

> Again, I'm not saying that my approach is Correct and Final.  What I'm
> saying is: "Here's what I've done.  Something interesting is going on.
> It's probably worth a look?"
>
[...]
>     The point is to
> have structured parseable information that the object-parsing code of
> git code and easily slurp and give to the rest of git-core.
>
> Please clear your reading backlog to avoid bringing up the same points
> over and over again.
> --

Ram,
The email thread is pretty long with a lot of too and fro, that would be 
difficult to catch up on (too much $dayjob+$family vs $sparetime).

Would it be possible to summarise the key points and proposals of where 
the subject is now?

The submodules does need 'fixing', as does agreeing the problem and 
abuse cases.

Philip 

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-17 23:17                                           ` Philip Oakley
@ 2013-04-18  7:50                                             ` Ramkumar Ramachandra
  2013-04-19 17:08                                             ` Jens Lehmann
  1 sibling, 0 replies; 140+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-18  7:50 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Duy Nguyen, Jonathan Nieder, Jens Lehmann, John Keeping,
	Junio C Hamano, Git List, Linus Torvalds

Philip Oakley wrote:
> Would it be possible to summarise the key points and proposals of where the
> subject is now?

Sure.

If you want an update from the current approach, wait for a v2; I'm
cooking it for some time, and getting some resulting ideas merged into
upstream early (look for clone.submoduleGitDir on the list, for
instance).  When upstream is in better shape to ease in a better
fundamental design, I'll post my v2 to the list.  I'll refrain from
posting any updates now, because I don't think the resulting
discussion will generate any value.

If you want to know what this thread was about, I think [1] and [2]
summarize my arguments quite well.

[1]: http://thread.gmane.org/gmane.comp.version-control.git/220047/focus=220436
[2]: http://thread.gmane.org/gmane.comp.version-control.git/220047/focus=220495

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC/PATCH 0/7] Rework git core for native submodules
  2013-04-17 23:17                                           ` Philip Oakley
  2013-04-18  7:50                                             ` Ramkumar Ramachandra
@ 2013-04-19 17:08                                             ` Jens Lehmann
  1 sibling, 0 replies; 140+ messages in thread
From: Jens Lehmann @ 2013-04-19 17:08 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Ramkumar Ramachandra, Duy Nguyen, Jonathan Nieder, John Keeping,
	Junio C Hamano, Git List, Linus Torvalds, Heiko Voigt

Am 18.04.2013 01:17, schrieb Philip Oakley:
> Would it be possible to summarise the key points and proposals of where the subject is now?

Here you go, time to post our third iteration of the comparison
list, containing two updates:

- "easier coding" was removed from the advantages

- "git submodule foreach" was retired from the disadvantages

As in the two first versions, the issues in parentheses had been
brought up but were dismissed and are only kept for reference
together with the reason why they aren't relevant anymore. Only
those preceded by a '*' are still considered valid.


Advantages:

* Information is stored in one place, no need to lookup stuff in
  another file/blob.

* No need to cd-to-toplevel to change configuration in the
  .gitmodules file, the special tools to edit link information
  will work in any subdirectory.

(It is all but clear that this approach will lead to "easier
coding", some parts of the code - like rm and mv - will profit
from that while others won't, e.g. we have to implement the link
object manipulation tools that are not needed for .gitmodules
and we get another indirection retrieving the submodule commit
from the link object. And then there is the fact that the new
code would have to catch up with functionality already coded
using .gitmodules, like the status/diff ignore and the fetch
flags).

(We currently need a checked out work tree to access the
.gitmodules file, but there is ongoing work to read the
configuration directly from the database)

(While it is easier to merge the link object, a .gitmodules
aware merge driver would work just as well)


Disadvantages:

* Changes in user visible behavior, compatibility problems when
  Git versions are mixed.

* Special tools are needed to edit submodule information where
  currently a plain editor is sufficient and a standard format
  is used.

* merge conflicts are harder to resolve and require special git
  commands, solving them in .gitmodules is way more intuitive
  as users are already used to conflict markers.

* With .gitmodules we lose a central spot where configuration
  concerning many submodules can be stored

("git submodule foreach" becomes harder to implement" is not the
case, as that command currently also walks all tree objects and
does not read the list of submodules from the .gitmodules file)

(When we also put the submodule name in the link object we could
also retain the ability to repopulated moved submodules from
their old repo, which is found by that name)

(That a link object can have no unstaged counterpart that a file
easily has can be fixed by special casing this, e.g. in using a
file in .git/link-specs/)


As no new arguments have been brought up, it all boils down to a
change that'll hurt users badly and won't fix any issue relevant
to them. It'll bring them a flag day after which the .gitmodules
is gone and they'll have to learn new tools to update and merge
the submodule metadata (and not only the users, GUIs have to
follow and implement support for something which currently is a
perfectly normal merge conflict in a file). You'd have to smoke
really weird stuff to even consider such a change under these
circumstances (or you don't care one bit about your users).

> The submodules does need 'fixing', as does agreeing the problem and abuse cases.

Sure, but almost all problems I know about are work tree related,
so changing the internal representation buys us nothing here. It
will not magically do a bisect over submodules or will recursively
update submodule work trees, and all that stuff won't be easier to
code either just because we have to get the information from a new
object instead of a gitlink/.gitmodules combo.

Let's just close this case and get back to working on things that
users will actually profit from.

^ permalink raw reply	[flat|nested] 140+ messages in thread

end of thread, other threads:[~2013-04-19 17:09 UTC | newest]

Thread overview: 140+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-04 18:30 [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
2013-04-04 18:30 ` [PATCH 1/7] link.c, link.h: introduce fifth object type Ramkumar Ramachandra
2013-04-04 18:30 ` [PATCH 2/7] sha1_file, link: write link objects to the database Ramkumar Ramachandra
2013-04-05  7:11   ` Ramkumar Ramachandra
2013-04-05  7:59     ` Ramkumar Ramachandra
2013-04-04 18:30 ` [PATCH 3/7] teach ce_compare_gitlink() about OBJ_LINK Ramkumar Ramachandra
2013-04-04 18:30 ` [PATCH 4/7] builtin/log: teach show " Ramkumar Ramachandra
2013-04-04 18:30 ` [PATCH 5/7] edit-link: add new builtin Ramkumar Ramachandra
2013-04-04 18:30 ` [PATCH 6/7] clone: introduce clone.submodulegitdir Ramkumar Ramachandra
2013-04-05  7:07   ` Ramkumar Ramachandra
2013-04-04 18:30 ` [PATCH 7/7] sha1_file: write ref_name to link object Ramkumar Ramachandra
2013-04-05  7:03   ` Ramkumar Ramachandra
2013-04-04 18:40 ` [RFC/PATCH 0/7] Rework git core for native submodules Linus Torvalds
2013-04-04 18:52   ` Ramkumar Ramachandra
2013-04-04 19:04     ` Linus Torvalds
2013-04-04 19:17       ` Junio C Hamano
2013-04-04 19:59         ` Ramkumar Ramachandra
2013-04-04 20:28         ` Jens Lehmann
2013-04-04 19:36       ` Ramkumar Ramachandra
2013-04-04 19:44         ` Linus Torvalds
2013-04-04 19:52           ` Ramkumar Ramachandra
2013-04-04 20:08             ` Ramkumar Ramachandra
2013-04-04 20:04           ` Ramkumar Ramachandra
2013-04-05 16:02             ` Linus Torvalds
2013-04-05 16:37               ` Ramkumar Ramachandra
2013-04-04 19:42       ` Ramkumar Ramachandra
2013-04-04 21:20       ` Jens Lehmann
2013-04-04 21:35         ` Ramkumar Ramachandra
2013-04-04 22:13         ` Junio C Hamano
2013-04-04 22:18           ` Ramkumar Ramachandra
2013-04-04 22:26             ` Junio C Hamano
2013-04-04 22:32               ` Ramkumar Ramachandra
2013-04-04 23:08                 ` Junio C Hamano
2013-04-04 23:14                   ` Ramkumar Ramachandra
2013-04-05 17:07                     ` Junio C Hamano
2013-04-05 17:23                       ` Ramkumar Ramachandra
2013-04-05  6:53     ` Ramkumar Ramachandra
2013-04-04 18:47 ` Jonathan Nieder
2013-04-04 18:58   ` Jonathan Nieder
2013-04-04 18:55 ` Jonathan Nieder
2013-04-08 10:10   ` Duy Nguyen
2013-04-08 10:26     ` [PATCH] t3700 (add): add failing test for add with submodules Ramkumar Ramachandra
2013-04-08 11:04       ` Duy Nguyen
2013-04-08 15:07         ` Junio C Hamano
2013-04-08 21:30       ` Jeff King
2013-04-08 22:03         ` Junio C Hamano
2013-04-08 22:07           ` Jeff King
2013-04-09  9:19         ` Ramkumar Ramachandra
2013-04-09  9:21           ` [PATCH 0/2] Fix git " Ramkumar Ramachandra
2013-04-09  9:21             ` [PATCH 1/2] t3700 (add): add two tests for testing " Ramkumar Ramachandra
2013-04-09  9:21             ` [PATCH 2/2] add: refuse to add paths beyond repository boundaries Ramkumar Ramachandra
2013-04-09 16:50               ` Jeff King
2013-04-09 17:09               ` Junio C Hamano
2013-04-09 17:34                 ` Junio C Hamano
2013-04-09 17:41                   ` Ramkumar Ramachandra
2013-04-09 17:54                     ` Junio C Hamano
2013-04-09 18:17                       ` Ramkumar Ramachandra
2013-04-09 18:50                         ` Junio C Hamano
2013-04-09 19:09                           ` Junio C Hamano
2013-04-09 20:31                         ` Junio C Hamano
2013-04-10 13:25                           ` Ramkumar Ramachandra
2013-04-10 16:25                             ` Junio C Hamano
2013-04-09 17:41                   ` Junio C Hamano
2013-04-09 17:56                     ` Ramkumar Ramachandra
2013-04-09 18:48                       ` Junio C Hamano
2013-04-10 13:38                         ` Ramkumar Ramachandra
2013-04-09 18:32                   ` Jakub Narębski
2013-04-09 18:51                     ` Junio C Hamano
2013-04-09 18:58                       ` Jakub Narębski
2013-04-09 19:10                         ` Junio C Hamano
2013-04-09 16:27           ` [PATCH] t3700 (add): add failing test for add with submodules Jeff King
2013-04-09 11:43         ` Jakub Narębski
2013-04-09 11:54           ` Ramkumar Ramachandra
2013-04-09 13:49             ` Jakub Narębski
2013-04-06 20:10 ` [RFC/PATCH 0/7] Rework git core for native submodules Ramkumar Ramachandra
2013-04-07  3:31   ` Junio C Hamano
2013-04-07  7:27     ` Ramkumar Ramachandra
2013-04-07  9:00       ` Junio C Hamano
2013-04-07 10:58         ` Ramkumar Ramachandra
2013-04-07 15:51         ` Ramkumar Ramachandra
2013-04-07 16:12           ` John Keeping
2013-04-07 16:42             ` Ramkumar Ramachandra
2013-04-07 17:02               ` John Keeping
2013-04-07 17:22                 ` Ramkumar Ramachandra
2013-04-07 17:52                   ` John Keeping
2013-04-07 18:07                     ` Ramkumar Ramachandra
2013-04-07 18:21                       ` John Keeping
2013-04-07 18:34                         ` Jens Lehmann
2013-04-07 18:44                           ` Ramkumar Ramachandra
2013-04-07 20:15                             ` Jens Lehmann
2013-04-07 20:49                               ` Ramkumar Ramachandra
2013-04-07 21:02                                 ` John Keeping
2013-04-07 21:11                                   ` Ramkumar Ramachandra
2013-04-07 20:57                               ` Ramkumar Ramachandra
2013-04-07 21:23                                 ` Jonathan Nieder
2013-04-07 21:30                                   ` Ramkumar Ramachandra
2013-04-08  7:48                                     ` Jens Lehmann
2013-04-08  8:07                                       ` Ramkumar Ramachandra
2013-04-08  8:19                                         ` Jonathan Nieder
2013-04-08  9:08                                           ` Ramkumar Ramachandra
2013-04-08 10:29                                             ` Duy Nguyen
2013-04-08 11:06                                               ` Ramkumar Ramachandra
2013-04-08 11:29                                                 ` Duy Nguyen
2013-04-08 11:53                                                   ` Ramkumar Ramachandra
2013-04-08 15:06                                                     ` Junio C Hamano
2013-04-08 16:08                                                       ` Ramkumar Ramachandra
2013-04-08 18:10                                                         ` Junio C Hamano
2013-04-08 19:03                                                           ` Ramkumar Ramachandra
2013-04-08 19:48                                                             ` Junio C Hamano
2013-04-08 19:54                                                               ` Ramkumar Ramachandra
2013-04-08 20:30                                                                 ` Junio C Hamano
2013-04-08 21:03                                                                   ` Ramkumar Ramachandra
2013-04-10  7:23                                                                     ` Philip Oakley
2013-04-08 21:59                                                                   ` Ramkumar Ramachandra
2013-04-09 11:51                                                           ` Jakub Narębski
2013-04-08 11:10                                               ` Ramkumar Ramachandra
2013-04-08  8:37                                         ` Jonathan Nieder
2013-04-08  9:14                                           ` Ramkumar Ramachandra
2013-04-08 14:46                                           ` Junio C Hamano
2013-04-08 17:12                                             ` Junio C Hamano
2013-04-17 10:37                                   ` Duy Nguyen
2013-04-17 11:06                                     ` Ramkumar Ramachandra
2013-04-17 11:27                                       ` Duy Nguyen
2013-04-17 11:56                                         ` Ramkumar Ramachandra
2013-04-17 12:06                                           ` Duy Nguyen
2013-04-17 12:14                                             ` Ramkumar Ramachandra
     [not found]                                         ` <CALkWK0m9QmZaSDruY=+2F-Kkw+fd6E1TYC TBpVQHRJrzq2VjCQ@mail.gmail.com>
2013-04-17 23:17                                           ` Philip Oakley
2013-04-18  7:50                                             ` Ramkumar Ramachandra
2013-04-19 17:08                                             ` Jens Lehmann
2013-04-17 16:01                                     ` Junio C Hamano
2013-04-08 20:41                               ` Jens Lehmann
2013-04-08 21:36                                 ` Jeff King
2013-04-07 18:59                           ` John Keeping
2013-04-07 19:06                             ` Ramkumar Ramachandra
2013-04-07 19:17                               ` Ramkumar Ramachandra
2013-04-07 18:37                         ` Ramkumar Ramachandra
2013-04-07 18:22                       ` Ramkumar Ramachandra
2013-04-07 19:26           ` Ramkumar Ramachandra
     [not found]             ` <CAP8UFD3i2vc3OSAHRERpiPY7cRjqhkqcBN9hVW0QmMksnCPccw@mail.gmail.com>
2013-04-07 21:24               ` Ramkumar Ramachandra
     [not found]                 ` <CAP8UFD16gwWjE7T75D7kUM-VOXhtZaSRGtEg8fW5kmuKDLTQHQ@mail.gmail.com>
2013-04-08 17:04                   ` Ramkumar Ramachandra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.