All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] submodule recursion in git-archive
@ 2013-11-26  0:04 Nick Townsend
  2013-11-26 15:17 ` René Scharfe
  0 siblings, 1 reply; 20+ messages in thread
From: Nick Townsend @ 2013-11-26  0:04 UTC (permalink / raw)
  To: gitster, git; +Cc: Jens Lehmann, Jeff King

All,
My first git patch - so shout out if I’ve got the etiquette wrong! Or of course if I’ve missed something.
I googled around looking for solutions to my problem but just came up with a few shell-scripts
that didn’t quite get the functionality I needed.
The first patch fixes some typos that crept in to existing doc and declarations. It is required
for the second which actually implements the changes.

All comments gratefully received!

Regards
Nick Townsend

Subject: [PATCH 1/2] submodule: add_submodule_odb() usability

Although add_submodule_odb() is documented as being
externally usable, it is declared static and also
has incorrect documentation.

This commit fixes those and makes no changes to
existing code using them. All tests still pass.
---
 Documentation/technical/api-ref-iteration.txt | 4 ++--
 submodule.c                                   | 2 +-
 submodule.h                                   | 1 +
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/api-ref-iteration.txt b/Documentation/technical/api-ref-iteration.txt
index aa1c50f..cbee624 100644
--- a/Documentation/technical/api-ref-iteration.txt
+++ b/Documentation/technical/api-ref-iteration.txt
@@ -50,10 +50,10 @@ submodules object database. You can do this by a code-snippet like
 this:
 
 	const char *path = "path/to/submodule"
-	if (!add_submodule_odb(path))
+	if (add_submodule_odb(path))
 		die("Error submodule '%s' not populated.", path);
 
-`add_submodule_odb()` will return an non-zero value on success. If you
+`add_submodule_odb()` will return a zero value on success. If you
 do not do this you will get an error for each ref that it does not point
 to a valid object.
 
diff --git a/submodule.c b/submodule.c
index 1905d75..1ea46be 100644
--- a/submodule.c
+++ b/submodule.c
@@ -143,7 +143,7 @@ void stage_updated_gitmodules(void)
 		die(_("staging updated .gitmodules failed"));
 }
 
-static int add_submodule_odb(const char *path)
+int add_submodule_odb(const char *path)
 {
 	struct strbuf objects_directory = STRBUF_INIT;
 	struct alternate_object_database *alt_odb;
diff --git a/submodule.h b/submodule.h
index 7beec48..3e3cdca 100644
--- a/submodule.h
+++ b/submodule.h
@@ -41,5 +41,6 @@ int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_nam
 		struct string_list *needs_pushing);
 int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name);
 void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
+int add_submodule_odb(const char *path);
 
 #endif
-- 
1.8.3.4 (Apple Git-47)

Subject: [PATCH 2/2] archive: allow submodule recursion on git-archive

When using git-archive to produce a dump of a
repository, the existing code does not recurse
into a submodule when it encounters it in the tree
traversal. These changes add a command line flag
that permits this.

Note that the submodules must be updated in the
repository, otherwise this cannot take place.

The feature is disabled for remote repositories as
the git_work_tree fails. This is a possible future
enhancement.

Two additional fields are added to archiver_args:
  * recurse  - a boolean indicator
  * treepath - the path part of the tree-ish
               eg. the 'www' in HEAD:www

The latter is used within the archive writer to
determin the correct path for the submodule .git
file.

Signed-off-by: Nick Townsend <nick.townsend@mac.com>
---
 Documentation/git-archive.txt |  9 +++++++++
 archive.c                     | 38 ++++++++++++++++++++++++++++++++++++--
 archive.h                     |  2 ++
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index b97aaab..b4df735 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git archive' [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>]
 	      [-o <file> | --output=<file>] [--worktree-attributes]
+	      [--recursive|--recurse-submodules]
 	      [--remote=<repo> [--exec=<git-upload-archive>]] <tree-ish>
 	      [<path>...]
 
@@ -51,6 +52,14 @@ OPTIONS
 --prefix=<prefix>/::
 	Prepend <prefix>/ to each filename in the archive.
 
+--recursive::
+--recurse-submodules::
+	Archive entries in submodules. Errors occur if the submodules
+	have not been initialized and updated.
+	Run `git submodule update --init --recursive` immediately after
+	the clone is finished to avoid this.
+	This option is not available with remote repositories.
+
 -o <file>::
 --output=<file>::
 	Write the archive to <file> instead of stdout.
diff --git a/archive.c b/archive.c
index 346f3b2..f6313c9 100644
--- a/archive.c
+++ b/archive.c
@@ -5,6 +5,7 @@
 #include "archive.h"
 #include "parse-options.h"
 #include "unpack-trees.h"
+#include "submodule.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [options] <tree-ish> [<path>...]"),
@@ -131,13 +132,32 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
 		args->convert = ATTR_TRUE(check[1].value);
 	}
 
+	if (S_ISGITLINK(mode) && args->recurse) {
+		const char *work_tree = get_git_work_tree();
+		if (!work_tree) {
+			  die("Can't go recursive when no work dir");
+		}
+		static struct strbuf dotgit = STRBUF_INIT;
+		strbuf_reset(&dotgit);
+		strbuf_grow(&dotgit, PATH_MAX);
+		strbuf_addstr(&dotgit, work_tree);
+		strbuf_addch(&dotgit, '/');
+		if (args->treepath) {
+			  strbuf_addstr(&dotgit, args->treepath);
+			  strbuf_addch(&dotgit, '/');
+		}
+		strbuf_add(&dotgit, path_without_prefix,strlen(path_without_prefix)-1);
+		if (add_submodule_odb(dotgit.buf))
+			  die("Can't add submodule: %s", dotgit.buf);
+		strbuf_release(&dotgit);
+	}
 	if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
 		if (args->verbose)
 			fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
 		err = write_entry(args, sha1, path.buf, path.len, mode);
 		if (err)
 			return err;
-		return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
+		return (S_ISGITLINK(mode) && !args->recurse) ? 0: READ_TREE_RECURSIVE;
 	}
 
 	if (args->verbose)
@@ -256,10 +276,16 @@ static void parse_treeish_arg(const char **argv,
 	const struct commit *commit;
 	unsigned char sha1[20];
 
+	const char *colon = strchr(name, ':');
+
+	/* Store the path on the ref for later (required for --recursive) */
+	char *treepath = NULL;
+	if (colon) {
+		treepath = strdup(colon+1);
+	}
 	/* Remotes are only allowed to fetch actual refs */
 	if (remote) {
 		char *ref = NULL;
-		const char *colon = strchr(name, ':');
 		int refnamelen = colon ? colon - name : strlen(name);
 
 		if (!dwim_ref(name, refnamelen, sha1, &ref))
@@ -296,9 +322,11 @@ static void parse_treeish_arg(const char **argv,
 		tree = parse_tree_indirect(tree_sha1);
 	}
 	ar_args->tree = tree;
+	ar_args->treepath = treepath;
 	ar_args->commit_sha1 = commit_sha1;
 	ar_args->commit = commit;
 	ar_args->time = archive_time;
+
 }
 
 #define OPT__COMPR(s, v, h, p) \
@@ -318,6 +346,7 @@ static int parse_archive_args(int argc, const char **argv,
 	const char *exec = NULL;
 	const char *output = NULL;
 	int compression_level = -1;
+	int recurse = 0;
 	int verbose = 0;
 	int i;
 	int list = 0;
@@ -331,6 +360,8 @@ static int parse_archive_args(int argc, const char **argv,
 			N_("write the archive to this file")),
 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
 			N_("read .gitattributes in working directory")),
+		OPT_BOOL(0, "recursive", &recurse, N_("include submodules in archive")),
+		OPT_BOOL(0, "recurse-submodules", &recurse, N_("include submodules in archive")),
 		OPT__VERBOSE(&verbose, N_("report archived files on stderr")),
 		OPT__COMPR('0', &compression_level, N_("store only"), 0),
 		OPT__COMPR('1', &compression_level, N_("compress faster"), 1),
@@ -355,6 +386,8 @@ static int parse_archive_args(int argc, const char **argv,
 
 	argc = parse_options(argc, argv, NULL, opts, archive_usage, 0);
 
+	if (is_remote && recurse)
+		die("Cannot include submodules with option --remote");
 	if (remote)
 		die("Unexpected option --remote");
 	if (exec)
@@ -393,6 +426,7 @@ static int parse_archive_args(int argc, const char **argv,
 					format, compression_level);
 		}
 	}
+	args->recurse = recurse;
 	args->verbose = verbose;
 	args->base = base;
 	args->baselen = strlen(base);
diff --git a/archive.h b/archive.h
index 4a791e1..577238d 100644
--- a/archive.h
+++ b/archive.h
@@ -7,10 +7,12 @@ struct archiver_args {
 	const char *base;
 	size_t baselen;
 	struct tree *tree;
+	const char *treepath;
 	const unsigned char *commit_sha1;
 	const struct commit *commit;
 	time_t time;
 	struct pathspec pathspec;
+	unsigned int recurse : 1;
 	unsigned int verbose : 1;
 	unsigned int worktree_attributes : 1;
 	unsigned int convert : 1;
-- 
1.8.3.4 (Apple Git-47)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule recursion in git-archive
  2013-11-26  0:04 [PATCH] submodule recursion in git-archive Nick Townsend
@ 2013-11-26 15:17 ` René Scharfe
  2013-11-26 18:57   ` Jens Lehmann
                     ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: René Scharfe @ 2013-11-26 15:17 UTC (permalink / raw)
  To: Nick Townsend, gitster, git; +Cc: Jens Lehmann, Jeff King

Am 26.11.2013 01:04, schrieb Nick Townsend:
> My first git patch - so shout out if I’ve got the etiquette wrong! Or
> of course if I’ve missed something.

Thanks for the patches!  Please send only one per message (the second
one as a reply to the first one, or both as replies to a cover letter),
though -- that makes commenting on them much easier.

Side note: Documentation/SubmittingPatches doesn't mention that (yet),
AFAICS.

> Subject: [PATCH 1/2] submodule: add_submodule_odb() usability
> 
> Although add_submodule_odb() is documented as being
> externally usable, it is declared static and also
> has incorrect documentation.
> 
> This commit fixes those and makes no changes to
> existing code using them. All tests still pass.

Sign-off missing (see Documentation/SubmittingPatches).

> ---
>  Documentation/technical/api-ref-iteration.txt | 4 ++--
>  submodule.c                                   | 2 +-
>  submodule.h                                   | 1 +
>  3 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/technical/api-ref-iteration.txt b/Documentation/technical/api-ref-iteration.txt
> index aa1c50f..cbee624 100644
> --- a/Documentation/technical/api-ref-iteration.txt
> +++ b/Documentation/technical/api-ref-iteration.txt
> @@ -50,10 +50,10 @@ submodules object database. You can do this by a code-snippet like
>  this:
>  
>  	const char *path = "path/to/submodule"
> -	if (!add_submodule_odb(path))
> +	if (add_submodule_odb(path))
>  		die("Error submodule '%s' not populated.", path);
>  
> -`add_submodule_odb()` will return an non-zero value on success. If you
> +`add_submodule_odb()` will return a zero value on success. If you

"return zero on success" instead?

>  do not do this you will get an error for each ref that it does not point
>  to a valid object.
>  
> diff --git a/submodule.c b/submodule.c
> index 1905d75..1ea46be 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -143,7 +143,7 @@ void stage_updated_gitmodules(void)
>  		die(_("staging updated .gitmodules failed"));
>  }
>  
> -static int add_submodule_odb(const char *path)
> +int add_submodule_odb(const char *path)
>  {
>  	struct strbuf objects_directory = STRBUF_INIT;
>  	struct alternate_object_database *alt_odb;
> diff --git a/submodule.h b/submodule.h
> index 7beec48..3e3cdca 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -41,5 +41,6 @@ int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_nam
>  		struct string_list *needs_pushing);
>  int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name);
>  void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
> +int add_submodule_odb(const char *path);
>  
>  #endif

> Subject: [PATCH 2/2] archive: allow submodule recursion on git-archive
> 
> When using git-archive to produce a dump of a
> repository, the existing code does not recurse
> into a submodule when it encounters it in the tree
> traversal. These changes add a command line flag
> that permits this.
> 
> Note that the submodules must be updated in the
> repository, otherwise this cannot take place.
> 
> The feature is disabled for remote repositories as
> the git_work_tree fails. This is a possible future
> enhancement.

Hmm, curious.  Why does it fail?  I guess that happens with bare
repositories, only, right?  (Which are the most likely kind of remote
repos to encounter, of course.)

> Two additional fields are added to archiver_args:
>   * recurse  - a boolean indicator
>   * treepath - the path part of the tree-ish
>                eg. the 'www' in HEAD:www
> 
> The latter is used within the archive writer to
> determin the correct path for the submodule .git
> file.
> 
> Signed-off-by: Nick Townsend <nick.townsend@mac.com>
> ---
>  Documentation/git-archive.txt |  9 +++++++++
>  archive.c                     | 38 ++++++++++++++++++++++++++++++++++++--
>  archive.h                     |  2 ++
>  3 files changed, 47 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> index b97aaab..b4df735 100644
> --- a/Documentation/git-archive.txt
> +++ b/Documentation/git-archive.txt
> @@ -11,6 +11,7 @@ SYNOPSIS
>  [verse]
>  'git archive' [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>]
>  	      [-o <file> | --output=<file>] [--worktree-attributes]
> +	      [--recursive|--recurse-submodules]

I'd expect git archive --recurse to add subdirectories and their
contents, which it does right now, and --no-recurse to only archive the
specified objects, which is not implemented.  IAW: I wouldn't normally
associate an option with that name with submodules.  Would
--recurse-submodules alone suffice?

Side note: With only one of the options defined you could shorten them
on the command line to e.g. --rec; with both you'd need to type at least
--recursi or --recurse to disambiguate -- even though they ultimately do
the same.

>  	      [--remote=<repo> [--exec=<git-upload-archive>]] <tree-ish>
>  	      [<path>...]
>  
> @@ -51,6 +52,14 @@ OPTIONS
>  --prefix=<prefix>/::
>  	Prepend <prefix>/ to each filename in the archive.
>  
> +--recursive::
> +--recurse-submodules::
> +	Archive entries in submodules. Errors occur if the submodules
> +	have not been initialized and updated.
> +	Run `git submodule update --init --recursive` immediately after
> +	the clone is finished to avoid this.
> +	This option is not available with remote repositories.
> +
>  -o <file>::
>  --output=<file>::
>  	Write the archive to <file> instead of stdout.
> diff --git a/archive.c b/archive.c
> index 346f3b2..f6313c9 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -5,6 +5,7 @@
>  #include "archive.h"
>  #include "parse-options.h"
>  #include "unpack-trees.h"
> +#include "submodule.h"
>  
>  static char const * const archive_usage[] = {
>  	N_("git archive [options] <tree-ish> [<path>...]"),
> @@ -131,13 +132,32 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
>  		args->convert = ATTR_TRUE(check[1].value);
>  	}
>  
> +	if (S_ISGITLINK(mode) && args->recurse) {
> +		const char *work_tree = get_git_work_tree();
> +		if (!work_tree) {
> +			  die("Can't go recursive when no work dir");
> +		}

Style nit: No curly braces around single-line statements, please.

Perhaps mention submodules in the error message?

It would be nicer to get_git_work_tree right after the parameters have
been parsed and before any archive contents have been written and error
out early.

> +		static struct strbuf dotgit = STRBUF_INIT;

We avoid declarations after statements because older compilers don't
support it.

You release the memory at the end of this block; that means there's no
advantage in making this strbuf static.  Allocating and freeing the
memory for the path of each submodule shouldn't cause any performance
issues, so please just drop static from the declaration.

> +		strbuf_reset(&dotgit);

... and then you don't need to reset anymore.

> +		strbuf_grow(&dotgit, PATH_MAX);

I'd drop that as well; the number of submodules should be low enough
that the possibly avoided reallocations by giving this hint shouldn't be
noticeable.

> +		strbuf_addstr(&dotgit, work_tree);
> +		strbuf_addch(&dotgit, '/');
> +		if (args->treepath) {
> +			  strbuf_addstr(&dotgit, args->treepath);
> +			  strbuf_addch(&dotgit, '/');
> +		}
> +		strbuf_add(&dotgit, path_without_prefix,strlen(path_without_prefix)-1);
> +		if (add_submodule_odb(dotgit.buf))
> +			  die("Can't add submodule: %s", dotgit.buf);

Hmm, I wonder if we can traverse the tree and load all submodule object
databases before traversing it again to actually write file contents.
That would spare the user from getting half of an archive together with
that error message.

> +		strbuf_release(&dotgit);
> +	}
>  	if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
>  		if (args->verbose)
>  			fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
>  		err = write_entry(args, sha1, path.buf, path.len, mode);
>  		if (err)
>  			return err;
> -		return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
> +		return (S_ISGITLINK(mode) && !args->recurse) ? 0: READ_TREE_RECURSIVE;

This line is longer than 80 characters, which we tend to avoid.  How
about this?

		if (ISGITLINK(mode) && !args->recurse)
			return 0;
		return READ_TREE_RECURSIVE;

>  	}
>  
>  	if (args->verbose)
> @@ -256,10 +276,16 @@ static void parse_treeish_arg(const char **argv,
>  	const struct commit *commit;
>  	unsigned char sha1[20];
>  
> +	const char *colon = strchr(name, ':');
> +
> +	/* Store the path on the ref for later (required for --recursive) */
> +	char *treepath = NULL;
> +	if (colon) {
> +		treepath = strdup(colon+1);
> +	}

Style: No curly braces, space around operators, use wrapper functions
for simple memory error handling:

	if (colon)
		treepath = xstrdup(colon + 1);

>  	/* Remotes are only allowed to fetch actual refs */
>  	if (remote) {
>  		char *ref = NULL;
> -		const char *colon = strchr(name, ':');
>  		int refnamelen = colon ? colon - name : strlen(name);
>  
>  		if (!dwim_ref(name, refnamelen, sha1, &ref))
> @@ -296,9 +322,11 @@ static void parse_treeish_arg(const char **argv,
>  		tree = parse_tree_indirect(tree_sha1);
>  	}
>  	ar_args->tree = tree;
> +	ar_args->treepath = treepath;
>  	ar_args->commit_sha1 = commit_sha1;
>  	ar_args->commit = commit;
>  	ar_args->time = archive_time;
> +
>  }

Please don't add empty lines before the end of a block.

>  #define OPT__COMPR(s, v, h, p) \
> @@ -318,6 +346,7 @@ static int parse_archive_args(int argc, const char **argv,
>  	const char *exec = NULL;
>  	const char *output = NULL;
>  	int compression_level = -1;
> +	int recurse = 0;
>  	int verbose = 0;
>  	int i;
>  	int list = 0;
> @@ -331,6 +360,8 @@ static int parse_archive_args(int argc, const char **argv,
>  			N_("write the archive to this file")),
>  		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
>  			N_("read .gitattributes in working directory")),
> +		OPT_BOOL(0, "recursive", &recurse, N_("include submodules in archive")),
> +		OPT_BOOL(0, "recurse-submodules", &recurse, N_("include submodules in archive")),
>  		OPT__VERBOSE(&verbose, N_("report archived files on stderr")),
>  		OPT__COMPR('0', &compression_level, N_("store only"), 0),
>  		OPT__COMPR('1', &compression_level, N_("compress faster"), 1),
> @@ -355,6 +386,8 @@ static int parse_archive_args(int argc, const char **argv,
>  
>  	argc = parse_options(argc, argv, NULL, opts, archive_usage, 0);
>  
> +	if (is_remote && recurse)
> +		die("Cannot include submodules with option --remote");
>  	if (remote)
>  		die("Unexpected option --remote");
>  	if (exec)
> @@ -393,6 +426,7 @@ static int parse_archive_args(int argc, const char **argv,
>  					format, compression_level);
>  		}
>  	}
> +	args->recurse = recurse;
>  	args->verbose = verbose;
>  	args->base = base;
>  	args->baselen = strlen(base);
> diff --git a/archive.h b/archive.h
> index 4a791e1..577238d 100644
> --- a/archive.h
> +++ b/archive.h
> @@ -7,10 +7,12 @@ struct archiver_args {
>  	const char *base;
>  	size_t baselen;
>  	struct tree *tree;
> +	const char *treepath;
>  	const unsigned char *commit_sha1;
>  	const struct commit *commit;
>  	time_t time;
>  	struct pathspec pathspec;
> +	unsigned int recurse : 1;

Name it recurse_submodules?

>  	unsigned int verbose : 1;
>  	unsigned int worktree_attributes : 1;
>  	unsigned int convert : 1;
> -- 1.8.3.4 (Apple Git-47) 

A test script (t/t5005-archive-submodules.sh?) would be nice which
exercises the new option.

René

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule recursion in git-archive
  2013-11-26 15:17 ` René Scharfe
@ 2013-11-26 18:57   ` Jens Lehmann
  2013-11-26 22:18   ` Junio C Hamano
  2013-11-26 22:38   ` Heiko Voigt
  2 siblings, 0 replies; 20+ messages in thread
From: Jens Lehmann @ 2013-11-26 18:57 UTC (permalink / raw)
  To: René Scharfe, Nick Townsend, gitster, git; +Cc: Jeff King

Am 26.11.2013 16:17, schrieb René Scharfe:
> Am 26.11.2013 01:04, schrieb Nick Townsend:
>> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
>> index b97aaab..b4df735 100644
>> --- a/Documentation/git-archive.txt
>> +++ b/Documentation/git-archive.txt
>> @@ -11,6 +11,7 @@ SYNOPSIS
>>  [verse]
>>  'git archive' [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>]
>>  	      [-o <file> | --output=<file>] [--worktree-attributes]
>> +	      [--recursive|--recurse-submodules]
> 
> I'd expect git archive --recurse to add subdirectories and their
> contents, which it does right now, and --no-recurse to only archive the
> specified objects, which is not implemented.  IAW: I wouldn't normally
> associate an option with that name with submodules.  Would
> --recurse-submodules alone suffice?

It should. All new code recursing into submodules should not use
--recursive but always --recurse-submodules, as --recursive means
different things for different commands (the only exception being
"git submodule", as --recursive is obvious here, and "git clone"
for backward compatibility reasons).

But I really like what these patches are aiming at.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule recursion in git-archive
  2013-11-26 15:17 ` René Scharfe
  2013-11-26 18:57   ` Jens Lehmann
@ 2013-11-26 22:18   ` Junio C Hamano
  2013-11-27  0:28     ` René Scharfe
  2013-11-27  3:55     ` Nick Townsend
  2013-11-26 22:38   ` Heiko Voigt
  2 siblings, 2 replies; 20+ messages in thread
From: Junio C Hamano @ 2013-11-26 22:18 UTC (permalink / raw)
  To: René Scharfe, Jens Lehmann; +Cc: Nick Townsend, git, Jeff King

René Scharfe <l.s.r@web.de> writes:

> Thanks for the patches!  Please send only one per message (the second
> one as a reply to the first one, or both as replies to a cover letter),
> though -- that makes commenting on them much easier.
>
> Side note: Documentation/SubmittingPatches doesn't mention that (yet),
> AFAICS.

OK, how about doing this then?

 Documentation/SubmittingPatches | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 7055576..304b3c0 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -140,7 +140,12 @@ comment on the changes you are submitting.  It is important for
 a developer to be able to "quote" your changes, using standard
 e-mail tools, so that they may comment on specific portions of
 your code.  For this reason, all patches should be submitted
-"inline".  If your log message (including your name on the
+"inline".  A patch series that consists of N commits is sent as N
+separate e-mail messages, or a cover letter message (see below) with
+N separate e-mail messages, each being a response to the cover
+letter.
+
+If your log message (including your name on the
 Signed-off-by line) is not writable in ASCII, make sure that
 you send off a message in the correct encoding.
 

>> The feature is disabled for remote repositories as
>> the git_work_tree fails. This is a possible future
>> enhancement.
>
> Hmm, curious.  Why does it fail?  I guess that happens with bare
> repositories, only, right?  (Which are the most likely kind of remote
> repos to encounter, of course.)

Yeah, I do not think of a reason why it should fail in a bare
repository, either. "git archive" is about writing out the contents
of an already recorded tree, so there shouldn't be a reason to even
call get_git_work_tree() in the first place.

Even if the code is run inside a repository with a working tree,
when producing a tarball out of an ancient commit that had a
submodule not at its current location, --recurse-submodules option
should do the right thing, so asking for working tree location of
that submodule to find its repository is wrong, I think.  It may
happen to find one if the archived revision is close enough to what
is currently checked out, but that may not necessarily be the case.

At that point when the code discovers an S_ISGITLINK entry, it
should have both a pathname to the submodule relative to the
toplevel and the commit object name bound to that submodule
location.  What it should do, when it does not find the repository
at the given path (maybe because there is no working tree, or the
sudmodule directory has moved over time) is roughly:

 - Read from .gitmodules at the top-level from the tree it is
   creating the tarball out of;

 - Find "submodule.$name.path" entry that records that path to the
   submodule; and then

 - Using that $name, find the stashed-away location of the submodule
   repository in $GIT_DIR/modules/$name.

or something like that.

This is a related tangent, but when used in a repository that people
often use as their remote, the repository discovery may have to
interact with the relative URL.  People often ship .gitmodules with

	[submodule "bar"]
        	URL = ../bar.git
		path = barDir

for a top-level project "foo" that can be cloned thusly:

	git clone git://site.xz/foo.git

and host bar.git to be clonable with

	git clone git://site.xz/bar.git barDir/

inside the working tree of the foo project.  In such a case, when
"archive --recurse-submodules" is running, it would find the
repository for the "bar" submodule at "../bar.git", I would think.

So this part needs a bit more thought, I am afraid.

>>  'git archive' [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>]
>>  	      [-o <file> | --output=<file>] [--worktree-attributes]
>> +	      [--recursive|--recurse-submodules]
>
> I'd expect git archive --recurse to add subdirectories and their
> contents, which it does right now, and --no-recurse to only archive the
> specified objects, which is not implemented.  IAW: I wouldn't normally
> associate an option with that name with submodules.  Would
> --recurse-submodules alone suffice?

Jens already commented on this, and I agree that --recursive should
be dropped from this patch.

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] submodule recursion in git-archive
  2013-11-26 15:17 ` René Scharfe
  2013-11-26 18:57   ` Jens Lehmann
  2013-11-26 22:18   ` Junio C Hamano
@ 2013-11-26 22:38   ` Heiko Voigt
  2013-11-27  3:33     ` Nick Townsend
  2 siblings, 1 reply; 20+ messages in thread
From: Heiko Voigt @ 2013-11-26 22:38 UTC (permalink / raw)
  To: René Scharfe; +Cc: Nick Townsend, gitster, git, Jens Lehmann, Jeff King

Hi,

I like where this is going.

On Tue, Nov 26, 2013 at 04:17:43PM +0100, René Scharfe wrote:
> Am 26.11.2013 01:04, schrieb Nick Townsend:
> > +		strbuf_addstr(&dotgit, work_tree);
> > +		strbuf_addch(&dotgit, '/');
> > +		if (args->treepath) {
> > +			  strbuf_addstr(&dotgit, args->treepath);
> > +			  strbuf_addch(&dotgit, '/');
> > +		}
> > +		strbuf_add(&dotgit, path_without_prefix,strlen(path_without_prefix)-1);
> > +		if (add_submodule_odb(dotgit.buf))
> > +			  die("Can't add submodule: %s", dotgit.buf);
> 
> Hmm, I wonder if we can traverse the tree and load all submodule object
> databases before traversing it again to actually write file contents.
> That would spare the user from getting half of an archive together with
> that error message.

I am not sure whether we should die here. What about submodules that
have not been initialized and or cloned? I think that is a quite regular
use case for example for libraries that not everyone needs or big media
submodules which only the design team uses. How about skipping them (maybe
issuing a warning) by returning 0 here and proceeding?

Cheers Heiko

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule recursion in git-archive
  2013-11-26 22:18   ` Junio C Hamano
@ 2013-11-27  0:28     ` René Scharfe
  2013-11-27  3:28       ` Nick Townsend
  2013-11-27 19:05       ` Junio C Hamano
  2013-11-27  3:55     ` Nick Townsend
  1 sibling, 2 replies; 20+ messages in thread
From: René Scharfe @ 2013-11-27  0:28 UTC (permalink / raw)
  To: Junio C Hamano, Jens Lehmann; +Cc: Nick Townsend, git, Jeff King

Am 26.11.2013 23:18, schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
> 
>> Thanks for the patches!  Please send only one per message (the second
>> one as a reply to the first one, or both as replies to a cover letter),
>> though -- that makes commenting on them much easier.
>>
>> Side note: Documentation/SubmittingPatches doesn't mention that (yet),
>> AFAICS.
> 
> OK, how about doing this then?
> 
>  Documentation/SubmittingPatches | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index 7055576..304b3c0 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -140,7 +140,12 @@ comment on the changes you are submitting.  It is important for
>  a developer to be able to "quote" your changes, using standard
>  e-mail tools, so that they may comment on specific portions of
>  your code.  For this reason, all patches should be submitted
> -"inline".  If your log message (including your name on the
> +"inline".  A patch series that consists of N commits is sent as N
> +separate e-mail messages, or a cover letter message (see below) with
> +N separate e-mail messages, each being a response to the cover
> +letter.
> +
> +If your log message (including your name on the
>  Signed-off-by line) is not writable in ASCII, make sure that
>  you send off a message in the correct encoding.

OK, but the repetition of "cover letter" and "e-mail messages"
irritates me slightly for some reason.  What about the following?

-- >8 --
Subject: [PATCH] SubmittingPatches: document how to handle multiple patches

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
 Documentation/SubmittingPatches |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 7055576..e6d46ed 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -139,8 +139,15 @@ People on the Git mailing list need to be able to read and
 comment on the changes you are submitting.  It is important for
 a developer to be able to "quote" your changes, using standard
 e-mail tools, so that they may comment on specific portions of
-your code.  For this reason, all patches should be submitted
-"inline".  If your log message (including your name on the
+your code.  For this reason, each patch should be submitted
+"inline" in a separate message.
+
+Multiple related patches should be grouped into their own e-mail
+thread to help readers find all parts of the series.  To that end,
+send them as replies to either an additional "cover letter" message
+(see below), the first patch, or the respective preceding patch.
+
+If your log message (including your name on the
 Signed-off-by line) is not writable in ASCII, make sure that
 you send off a message in the correct encoding.
 
-- 
1.7.8

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule recursion in git-archive
  2013-11-27  0:28     ` René Scharfe
@ 2013-11-27  3:28       ` Nick Townsend
  2013-11-27 19:05       ` Junio C Hamano
  1 sibling, 0 replies; 20+ messages in thread
From: Nick Townsend @ 2013-11-27  3:28 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, Jens Lehmann, git, Jeff King


On 26 Nov 2013, at 16:28, René Scharfe <l.s.r@web.de> wrote:

> Am 26.11.2013 23:18, schrieb Junio C Hamano:
>> René Scharfe <l.s.r@web.de> writes:
>> 
>>> Thanks for the patches!  Please send only one per message (the second
>>> one as a reply to the first one, or both as replies to a cover letter),
>>> though -- that makes commenting on them much easier.
>>> 
>>> Side note: Documentation/SubmittingPatches doesn't mention that (yet),
>>> AFAICS.
>> 
>> OK, how about doing this then?
>> 
>> Documentation/SubmittingPatches | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
>> index 7055576..304b3c0 100644
>> --- a/Documentation/SubmittingPatches
>> +++ b/Documentation/SubmittingPatches
>> @@ -140,7 +140,12 @@ comment on the changes you are submitting.  It is important for
>> a developer to be able to "quote" your changes, using standard
>> e-mail tools, so that they may comment on specific portions of
>> your code.  For this reason, all patches should be submitted
>> -"inline".  If your log message (including your name on the
>> +"inline".  A patch series that consists of N commits is sent as N
>> +separate e-mail messages, or a cover letter message (see below) with
>> +N separate e-mail messages, each being a response to the cover
>> +letter.
>> +
>> +If your log message (including your name on the
>> Signed-off-by line) is not writable in ASCII, make sure that
>> you send off a message in the correct encoding.
> 
> OK, but the repetition of "cover letter" and "e-mail messages"
> irritates me slightly for some reason.  What about the following?
> 
> -- >8 --
> Subject: [PATCH] SubmittingPatches: document how to handle multiple patches
> 
> Signed-off-by: Rene Scharfe <l.s.r@web.de>
> ---
> Documentation/SubmittingPatches |   11 +++++++++--
> 1 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index 7055576..e6d46ed 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -139,8 +139,15 @@ People on the Git mailing list need to be able to read and
> comment on the changes you are submitting.  It is important for
> a developer to be able to "quote" your changes, using standard
> e-mail tools, so that they may comment on specific portions of
> -your code.  For this reason, all patches should be submitted
> -"inline".  If your log message (including your name on the
> +your code.  For this reason, each patch should be submitted
> +"inline" in a separate message.
> +
> +Multiple related patches should be grouped into their own e-mail
> +thread to help readers find all parts of the series.  To that end,
> +send them as replies to either an additional "cover letter" message
> +(see below), the first patch, or the respective preceding patch.
> +
> +If your log message (including your name on the
> Signed-off-by line) is not writable in ASCII, make sure that
> you send off a message in the correct encoding.
> 
> -- 
> 1.7.8
> 
> 
That seems clear to me.
At any rate I’m going to rework this based on the collective input and will submit them again.
Please check my other replies as there are some discussion points!

Nick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule recursion in git-archive
  2013-11-26 22:38   ` Heiko Voigt
@ 2013-11-27  3:33     ` Nick Townsend
  0 siblings, 0 replies; 20+ messages in thread
From: Nick Townsend @ 2013-11-27  3:33 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: René Scharfe, Junio C Hamano, git, Jens Lehmann, Jeff King


On 26 Nov 2013, at 14:38, Heiko Voigt <hvoigt@hvoigt.net> wrote:

> Hi,
> 
> I like where this is going.
> 
> On Tue, Nov 26, 2013 at 04:17:43PM +0100, René Scharfe wrote:
>> Am 26.11.2013 01:04, schrieb Nick Townsend:
>>> +		strbuf_addstr(&dotgit, work_tree);
>>> +		strbuf_addch(&dotgit, '/');
>>> +		if (args->treepath) {
>>> +			  strbuf_addstr(&dotgit, args->treepath);
>>> +			  strbuf_addch(&dotgit, '/');
>>> +		}
>>> +		strbuf_add(&dotgit, path_without_prefix,strlen(path_without_prefix)-1);
>>> +		if (add_submodule_odb(dotgit.buf))
>>> +			  die("Can't add submodule: %s", dotgit.buf);
>> 
>> Hmm, I wonder if we can traverse the tree and load all submodule object
>> databases before traversing it again to actually write file contents.
>> That would spare the user from getting half of an archive together with
>> that error message.
> 
> I am not sure whether we should die here. What about submodules that
> have not been initialized and or cloned? I think that is a quite regular
> use case for example for libraries that not everyone needs or big media
> submodules which only the design team uses. How about skipping them (maybe
> issuing a warning) by returning 0 here and proceeding?
> 
> Cheers Heiko

I agree that issuing a warning and continuing is best. If the submodule hasn’t been setup
then we should respect that and keep the current behaviour (just archive the directory entry).
There is some further debate to be had about the extent to which this should work with
un-initialized submodules which I’ll discuss in other replies.

Thanks
Nick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule recursion in git-archive
  2013-11-26 22:18   ` Junio C Hamano
  2013-11-27  0:28     ` René Scharfe
@ 2013-11-27  3:55     ` Nick Townsend
  2013-11-27 19:43       ` Junio C Hamano
  1 sibling, 1 reply; 20+ messages in thread
From: Nick Townsend @ 2013-11-27  3:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Jens Lehmann, git, Jeff King


On 26 Nov 2013, at 14:18, Junio C Hamano <gitster@pobox.com> wrote:

> René Scharfe <l.s.r@web.de> writes:
> 
>> Thanks for the patches!  Please send only one per message (the second
>> one as a reply to the first one, or both as replies to a cover letter),
>> though -- that makes commenting on them much easier.
>> 
>> Side note: Documentation/SubmittingPatches doesn't mention that (yet),
>> AFAICS.
> 
> OK, how about doing this then?
> 
> Documentation/SubmittingPatches | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index 7055576..304b3c0 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -140,7 +140,12 @@ comment on the changes you are submitting.  It is important for
> a developer to be able to "quote" your changes, using standard
> e-mail tools, so that they may comment on specific portions of
> your code.  For this reason, all patches should be submitted
> -"inline".  If your log message (including your name on the
> +"inline".  A patch series that consists of N commits is sent as N
> +separate e-mail messages, or a cover letter message (see below) with
> +N separate e-mail messages, each being a response to the cover
> +letter.
> +
> +If your log message (including your name on the
> Signed-off-by line) is not writable in ASCII, make sure that
> you send off a message in the correct encoding.
> 
> 
>>> The feature is disabled for remote repositories as
>>> the git_work_tree fails. This is a possible future
>>> enhancement.
>> 
>> Hmm, curious.  Why does it fail?  I guess that happens with bare
>> repositories, only, right?  (Which are the most likely kind of remote
>> repos to encounter, of course.)
> 
> Yeah, I do not think of a reason why it should fail in a bare
> repository, either. "git archive" is about writing out the contents
> of an already recorded tree, so there shouldn't be a reason to even
> call get_git_work_tree() in the first place.
> 
See below for a discussion of why I use the .git file in the work tree to 
load the objects for the submodule. I also thought it should work in a
remote repository - but I ran it on a properly initialized remote repository and
it failed. Since I didn’t need it for my immediate use-case I just decided to disable 
it with an error. I can look into this further, but we must decide about the question 
below first…

> Even if the code is run inside a repository with a working tree,
> when producing a tarball out of an ancient commit that had a
> submodule not at its current location, --recurse-submodules option
> should do the right thing, so asking for working tree location of
> that submodule to find its repository is wrong, I think.  It may
> happen to find one if the archived revision is close enough to what
> is currently checked out, but that may not necessarily be the case.
> 
> At that point when the code discovers an S_ISGITLINK entry, it
> should have both a pathname to the submodule relative to the
> toplevel and the commit object name bound to that submodule
> location.  What it should do, when it does not find the repository
> at the given path (maybe because there is no working tree, or the
> sudmodule directory has moved over time) is roughly:
> 
> - Read from .gitmodules at the top-level from the tree it is
>   creating the tarball out of;
> 
> - Find "submodule.$name.path" entry that records that path to the
>   submodule; and then
> 
> - Using that $name, find the stashed-away location of the submodule
>   repository in $GIT_DIR/modules/$name.
> 
> or something like that.
> 
> This is a related tangent, but when used in a repository that people
> often use as their remote, the repository discovery may have to
> interact with the relative URL.  People often ship .gitmodules with
> 
> 	[submodule "bar"]
>        	URL = ../bar.git
> 		path = barDir
> 
> for a top-level project "foo" that can be cloned thusly:
> 
> 	git clone git://site.xz/foo.git
> 
> and host bar.git to be clonable with
> 
> 	git clone git://site.xz/bar.git barDir/
> 
> inside the working tree of the foo project.  In such a case, when
> "archive --recurse-submodules" is running, it would find the
> repository for the "bar" submodule at "../bar.git", I would think.
> 
> So this part needs a bit more thought, I am afraid.

I see that there is a lot of potential complexity around setting up a submodule:
* The .gitmodules file can be dirty (easy to flag, but should we allow archive to proceed?)
* Users can mess with settings both prior to git submodule init and before git submodule update.
* What if it’s a raw clone and the user manually changes things between init and update?
* I’m not a git-internals expert but looking through the code I see that you can add additional object
directories and change paths as you show above.

For those reasons I deliberately decided not to reproduce the above logic all by myself.
On the other hand, what it *did* seem to me is that once you have the .git file
then you know you’ve got all that covered. So I just used that. This restricts the function to
working only on a properly setup repository - but that is my use case!

If you think that doing this more extensive setup is even *viable* given the space between
init and update then I”m happy to try it. I didn’t want to start off on a fools errand.

> 
>>> 'git archive' [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>]
>>> 	      [-o <file> | --output=<file>] [--worktree-attributes]
>>> +	      [--recursive|--recurse-submodules]
>> 
>> I'd expect git archive --recurse to add subdirectories and their
>> contents, which it does right now, and --no-recurse to only archive the
>> specified objects, which is not implemented.  IAW: I wouldn't normally
>> associate an option with that name with submodules.  Would
>> --recurse-submodules alone suffice?
> 
> Jens already commented on this, and I agree that --recursive should
> be dropped from this patch.
I only put —recursive because that is what git-clone has for it’s behaviour wrt submodules.
If that flag is deprecated then I’m fine with using only —recurse-submodules
Perhaps a deprecation flag or note in the code would help?


Overall I’m impressed by the speed and quality of the responses (and the codebase!) so am glad to
move this forward. I look forward to your feedback.

Kind Regards
Nick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule recursion in git-archive
  2013-11-27  0:28     ` René Scharfe
  2013-11-27  3:28       ` Nick Townsend
@ 2013-11-27 19:05       ` Junio C Hamano
  1 sibling, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2013-11-27 19:05 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jens Lehmann, Nick Townsend, git, Jeff King

René Scharfe <l.s.r@web.de> writes:

> OK, but the repetition of "cover letter" and "e-mail messages"
> irritates me slightly for some reason.  What about the following?

Looks good to me; will queue, thanks.

> -- >8 --
> Subject: [PATCH] SubmittingPatches: document how to handle multiple patches
>
> Signed-off-by: Rene Scharfe <l.s.r@web.de>
> ---
>  Documentation/SubmittingPatches |   11 +++++++++--
>  1 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index 7055576..e6d46ed 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -139,8 +139,15 @@ People on the Git mailing list need to be able to read and
>  comment on the changes you are submitting.  It is important for
>  a developer to be able to "quote" your changes, using standard
>  e-mail tools, so that they may comment on specific portions of
> -your code.  For this reason, all patches should be submitted
> -"inline".  If your log message (including your name on the
> +your code.  For this reason, each patch should be submitted
> +"inline" in a separate message.
> +
> +Multiple related patches should be grouped into their own e-mail
> +thread to help readers find all parts of the series.  To that end,
> +send them as replies to either an additional "cover letter" message
> +(see below), the first patch, or the respective preceding patch.
> +
> +If your log message (including your name on the
>  Signed-off-by line) is not writable in ASCII, make sure that
>  you send off a message in the correct encoding.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule recursion in git-archive
  2013-11-27  3:55     ` Nick Townsend
@ 2013-11-27 19:43       ` Junio C Hamano
  2013-11-29 22:38         ` Heiko Voigt
  2013-12-03  0:00         ` [PATCH] submodule recursion in git-archive Nick Townsend
  0 siblings, 2 replies; 20+ messages in thread
From: Junio C Hamano @ 2013-11-27 19:43 UTC (permalink / raw)
  To: Nick Townsend; +Cc: René Scharfe, Jens Lehmann, git, Jeff King

Nick Townsend <nick.townsend@mac.com> writes:

> On 26 Nov 2013, at 14:18, Junio C Hamano <gitster@pobox.com> wrote:
>
>> Even if the code is run inside a repository with a working tree,
>> when producing a tarball out of an ancient commit that had a
>> submodule not at its current location, --recurse-submodules option
>> should do the right thing, so asking for working tree location of
>> that submodule to find its repository is wrong, I think.  It may
>> happen to find one if the archived revision is close enough to what
>> is currently checked out, but that may not necessarily be the case.
>> 
>> At that point when the code discovers an S_ISGITLINK entry, it
>> should have both a pathname to the submodule relative to the
>> toplevel and the commit object name bound to that submodule
>> location.  What it should do, when it does not find the repository
>> at the given path (maybe because there is no working tree, or the
>> sudmodule directory has moved over time) is roughly:
>> 
>> - Read from .gitmodules at the top-level from the tree it is
>>   creating the tarball out of;
>> 
>> - Find "submodule.$name.path" entry that records that path to the
>>   submodule; and then
>> 
>> - Using that $name, find the stashed-away location of the submodule
>>   repository in $GIT_DIR/modules/$name.
>> 
>> or something like that.
>> 
>> This is a related tangent, but when used in a repository that people
>> often use as their remote, the repository discovery may have to
>> interact with the relative URL.  People often ship .gitmodules with
>> 
>> 	[submodule "bar"]
>>        	URL = ../bar.git
>> 		path = barDir
>> 
>> for a top-level project "foo" that can be cloned thusly:
>> 
>> 	git clone git://site.xz/foo.git
>> 
>> and host bar.git to be clonable with
>> 
>> 	git clone git://site.xz/bar.git barDir/
>> 
>> inside the working tree of the foo project.  In such a case, when
>> "archive --recurse-submodules" is running, it would find the
>> repository for the "bar" submodule at "../bar.git", I would think.
>> 
>> So this part needs a bit more thought, I am afraid.
>
> I see that there is a lot of potential complexity around setting up a submodule:

No question about it.

> * The .gitmodules file can be dirty (easy to flag, but should we
> allow archive to proceed?)

As we are discussing "archive", which takes a tree object from the
top-level project that is recorded in the object database, the
information _about_ the submodule in question should come from the
given tree being archived.  There is no reason for the .gitmodules
file that happens to be sitting in the working tree of the top-level
project to be involved in the decision, so its dirtyness should not
matter, I think.  If the tree being archived has a submodule whose
name is "kernel" at path "linux/" (relative to the top-level
project), its repository should be at .git/modules/kernel in the
layout recent git-submodule prepares, and we should find that
path-and-name mapping from .gitmodules recorded in that tree object
we are archiving. The version that happens to be checked out to the
working tree may have moved the submodule to a new path "linux-3.0/"
and "linux-3.0/.git" may have "gitdir: .git/modules/kernel" in it,
but when archiving a tree that has the submodule at "linux/", it
would not help---we would not know to look at "linux-3.0/.git" to
learn that information anyway because .gitmodules in the working
tree would say that the submodule at path "linux-3.0/" is with name
"kernel", and would not tell us anything about "linux/".

> * Users can mess with settings both prior to git submodule init
> and before git submodule update.

I think this is irrelevant for exactly the same reason as above.

What makes this tricker, however, is how to deal with an old-style
repository, where the submodule repositories are embedded in the
working tree that happens to be checked out.  In that case, we may
have to read .gitmodules from two places, i.e.

 (1) We are archiving a tree with a submodule at "linux/";

 (2) We read .gitmodules from that tree and learn that the submodule
     has name "kernel";

 (3) There is no ".git/modules/kernel" because the repository uses
     the old layout (if the user never was interested in this
     submodule, .git/modules/kernel may also be missing, and we
     should tell these two cases apart by checking .git/config to
     see if a corresponding entry for the "kernel" submodule exists
     there);

 (4) In a repository that uses the old layout, there must be the
     repository somewhere embedded in the current working tree (this
     inability to remove is why we use the new layout these days).
     We can learn where it is by looking at .gitmodules in the
     working tree---map the name "kernel" we learned earlier, and
     map it to the current path ("linux-3.0/" if you have been
     following this example so far).

And in that fallback context, I would say that reading from a dirty
(or "messed with by the user") .gitmodules is the right thing to
do.  Perhaps the user may be in the process of moving the submodule
in his working tree with

    $ mv linux-3.0 linux-3.2
    $ git config -f .gitmodules submodule.kernel.path linux-3.2

but hasn't committed the change yet.

> For those reasons I deliberately decided not to reproduce the
> above logic all by myself.

As I already hinted, I agree that the "how to find the location of
submodule repository, given a particular tree in the top-level
project the submodule belongs to and the path to the submodule in
question" deserves a separate thread to discuss with area experts.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] submodule recursion in git-archive
  2013-11-27 19:43       ` Junio C Hamano
@ 2013-11-29 22:38         ` Heiko Voigt
       [not found]           ` <3C71BC83-4DD0-43F8-9E36-88594CA63FC5@mac.com>
  2013-12-03  0:00         ` [PATCH] submodule recursion in git-archive Nick Townsend
  1 sibling, 1 reply; 20+ messages in thread
From: Heiko Voigt @ 2013-11-29 22:38 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nick Townsend, René Scharfe, Jens Lehmann, git, Jeff King

On Wed, Nov 27, 2013 at 11:43:44AM -0800, Junio C Hamano wrote:
> Nick Townsend <nick.townsend@mac.com> writes:
> > * The .gitmodules file can be dirty (easy to flag, but should we
> > allow archive to proceed?)
> 
> As we are discussing "archive", which takes a tree object from the
> top-level project that is recorded in the object database, the
> information _about_ the submodule in question should come from the
> given tree being archived.  There is no reason for the .gitmodules
> file that happens to be sitting in the working tree of the top-level
> project to be involved in the decision, so its dirtyness should not
> matter, I think.  If the tree being archived has a submodule whose
> name is "kernel" at path "linux/" (relative to the top-level
> project), its repository should be at .git/modules/kernel in the
> layout recent git-submodule prepares, and we should find that
> path-and-name mapping from .gitmodules recorded in that tree object
> we are archiving. The version that happens to be checked out to the
> working tree may have moved the submodule to a new path "linux-3.0/"
> and "linux-3.0/.git" may have "gitdir: .git/modules/kernel" in it,
> but when archiving a tree that has the submodule at "linux/", it
> would not help---we would not know to look at "linux-3.0/.git" to
> learn that information anyway because .gitmodules in the working
> tree would say that the submodule at path "linux-3.0/" is with name
> "kernel", and would not tell us anything about "linux/".
> 
> > * Users can mess with settings both prior to git submodule init
> > and before git submodule update.
> 
> I think this is irrelevant for exactly the same reason as above.
> 
> What makes this tricker, however, is how to deal with an old-style
> repository, where the submodule repositories are embedded in the
> working tree that happens to be checked out.  In that case, we may
> have to read .gitmodules from two places, i.e.
> 
>  (1) We are archiving a tree with a submodule at "linux/";
> 
>  (2) We read .gitmodules from that tree and learn that the submodule
>      has name "kernel";
> 
>  (3) There is no ".git/modules/kernel" because the repository uses
>      the old layout (if the user never was interested in this
>      submodule, .git/modules/kernel may also be missing, and we
>      should tell these two cases apart by checking .git/config to
>      see if a corresponding entry for the "kernel" submodule exists
>      there);
> 
>  (4) In a repository that uses the old layout, there must be the
>      repository somewhere embedded in the current working tree (this
>      inability to remove is why we use the new layout these days).
>      We can learn where it is by looking at .gitmodules in the
>      working tree---map the name "kernel" we learned earlier, and
>      map it to the current path ("linux-3.0/" if you have been
>      following this example so far).
> 
> And in that fallback context, I would say that reading from a dirty
> (or "messed with by the user") .gitmodules is the right thing to
> do.  Perhaps the user may be in the process of moving the submodule
> in his working tree with
> 
>     $ mv linux-3.0 linux-3.2
>     $ git config -f .gitmodules submodule.kernel.path linux-3.2
> 
> but hasn't committed the change yet.
> 
> > For those reasons I deliberately decided not to reproduce the
> > above logic all by myself.
> 
> As I already hinted, I agree that the "how to find the location of
> submodule repository, given a particular tree in the top-level
> project the submodule belongs to and the path to the submodule in
> question" deserves a separate thread to discuss with area experts.

FYI, I already started to implement this lookup of submodule paths early
this year[1] but have not found the time to proceed on that yet. I am
planning to continue on that topic soonish. We need it to implement a
correct recursive fetch with clone on-demand as a basis for the future
recursive checkout.

During the work on this I hit too many open questions. Thats why I am
currently working on a complete plan[2] so we can discuss and define how
this needs to be implemented. It is an asciidoc document which I will
send out once I am finished with it.

Cheers Heiko

[1] http://article.gmane.org/gmane.comp.version-control.git/217020
[2] https://github.com/hvoigt/git/wiki/submodule-fetch-config

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule recursion in git-archive
  2013-11-27 19:43       ` Junio C Hamano
  2013-11-29 22:38         ` Heiko Voigt
@ 2013-12-03  0:00         ` Nick Townsend
  2013-12-03  0:03           ` Fwd: " Nick Townsend
  1 sibling, 1 reply; 20+ messages in thread
From: Nick Townsend @ 2013-12-03  0:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Jens Lehmann, git, Jeff King


On 27 Nov 2013, at 11:43, Junio C Hamano <gitster@pobox.com> wrote:

> Nick Townsend <nick.townsend@mac.com> writes:
> 
>> On 26 Nov 2013, at 14:18, Junio C Hamano <gitster@pobox.com> wrote:
>> 
>>> Even if the code is run inside a repository with a working tree,
>>> when producing a tarball out of an ancient commit that had a
>>> submodule not at its current location, --recurse-submodules option
>>> should do the right thing, so asking for working tree location of
>>> that submodule to find its repository is wrong, I think.  It may
>>> happen to find one if the archived revision is close enough to what
>>> is currently checked out, but that may not necessarily be the case.
>>> 
>>> At that point when the code discovers an S_ISGITLINK entry, it
>>> should have both a pathname to the submodule relative to the
>>> toplevel and the commit object name bound to that submodule
>>> location.  What it should do, when it does not find the repository
>>> at the given path (maybe because there is no working tree, or the
>>> sudmodule directory has moved over time) is roughly:
>>> 
>>> - Read from .gitmodules at the top-level from the tree it is
>>>  creating the tarball out of;
>>> 
>>> - Find "submodule.$name.path" entry that records that path to the
>>>  submodule; and then
>>> 
>>> - Using that $name, find the stashed-away location of the submodule
>>>  repository in $GIT_DIR/modules/$name.
>>> 
>>> or something like that.
>>> 
>>> This is a related tangent, but when used in a repository that people
>>> often use as their remote, the repository discovery may have to
>>> interact with the relative URL.  People often ship .gitmodules with
>>> 
>>> 	[submodule "bar"]
>>>       	URL = ../bar.git
>>> 		path = barDir
>>> 
>>> for a top-level project "foo" that can be cloned thusly:
>>> 
>>> 	git clone git://site.xz/foo.git
>>> 
>>> and host bar.git to be clonable with
>>> 
>>> 	git clone git://site.xz/bar.git barDir/
>>> 
>>> inside the working tree of the foo project.  In such a case, when
>>> "archive --recurse-submodules" is running, it would find the
>>> repository for the "bar" submodule at "../bar.git", I would think.
>>> 
>>> So this part needs a bit more thought, I am afraid.
>> 
>> I see that there is a lot of potential complexity around setting up a submodule:
> 
> No question about it.
> 
>> * The .gitmodules file can be dirty (easy to flag, but should we
>> allow archive to proceed?)
> 
> As we are discussing "archive", which takes a tree object from the
> top-level project that is recorded in the object database, the
> information _about_ the submodule in question should come from the
> given tree being archived.  There is no reason for the .gitmodules
> file that happens to be sitting in the working tree of the top-level
> project to be involved in the decision, so its dirtyness should not
> matter, I think.  If the tree being archived has a submodule whose
> name is "kernel" at path "linux/" (relative to the top-level
> project), its repository should be at .git/modules/kernel in the
> layout recent git-submodule prepares, and we should find that
> path-and-name mapping from .gitmodules recorded in that tree object
> we are archiving. The version that happens to be checked out to the
> working tree may have moved the submodule to a new path "linux-3.0/"
> and "linux-3.0/.git" may have "gitdir: .git/modules/kernel" in it,
> but when archiving a tree that has the submodule at "linux/", it
> would not help---we would not know to look at "linux-3.0/.git" to
> learn that information anyway because .gitmodules in the working
> tree would say that the submodule at path "linux-3.0/" is with name
> "kernel", and would not tell us anything about "linux/".
> 
>> * Users can mess with settings both prior to git submodule init
>> and before git submodule update.
> 
> I think this is irrelevant for exactly the same reason as above.
> 
> What makes this tricker, however, is how to deal with an old-style
> repository, where the submodule repositories are embedded in the
> working tree that happens to be checked out.  In that case, we may
> have to read .gitmodules from two places, i.e.
> 
> (1) We are archiving a tree with a submodule at "linux/";
> 
> (2) We read .gitmodules from that tree and learn that the submodule
>     has name "kernel";
> 
> (3) There is no ".git/modules/kernel" because the repository uses
>     the old layout (if the user never was interested in this
>     submodule, .git/modules/kernel may also be missing, and we
>     should tell these two cases apart by checking .git/config to
>     see if a corresponding entry for the "kernel" submodule exists
>     there);
> 
> (4) In a repository that uses the old layout, there must be the
>     repository somewhere embedded in the current working tree (this
>     inability to remove is why we use the new layout these days).
>     We can learn where it is by looking at .gitmodules in the
>     working tree---map the name "kernel" we learned earlier, and
>     map it to the current path ("linux-3.0/" if you have been
>     following this example so far).
> 
> And in that fallback context, I would say that reading from a dirty
> (or "messed with by the user") .gitmodules is the right thing to
> do.  Perhaps the user may be in the process of moving the submodule
> in his working tree with
> 
>    $ mv linux-3.0 linux-3.2
>    $ git config -f .gitmodules submodule.kernel.path linux-3.2
> 
> but hasn't committed the change yet.
> 
>> For those reasons I deliberately decided not to reproduce the
>> above logic all by myself.
> 
> As I already hinted, I agree that the "how to find the location of
> submodule repository, given a particular tree in the top-level
> project the submodule belongs to and the path to the submodule in
> question" deserves a separate thread to discuss with area experts.

As per my email to Heiko on this thread, I’m happy to start such 
a discussion - I’ll use your notes as a starting point. I’m much more comfortable
using a wiki for this - is this common or should I start a new mail thread
with RFC in the title or similar?

I did complete my work on my version of git-archive (for internal use) and added some regression tests
for current behaviour. Also the add_submodule_odb patch should IMHO be incorporated
anyway. I’ll resubmit those two for consideration in a new thread.

Kind Regards
Nick Townsend

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Fwd: [PATCH] submodule recursion in git-archive
  2013-12-03  0:00         ` [PATCH] submodule recursion in git-archive Nick Townsend
@ 2013-12-03  0:03           ` Nick Townsend
  0 siblings, 0 replies; 20+ messages in thread
From: Nick Townsend @ 2013-12-03  0:03 UTC (permalink / raw)
  To: git



Begin forwarded message:

> From: Nick Townsend <nick.townsend@mac.com>
> Subject: Re: [PATCH] submodule recursion in git-archive
> Date: 2 December 2013 16:00:50 GMT-8
> To: Junio C Hamano <gitster@pobox.com>
> Cc: René Scharfe <l.s.r@web.de>, Jens Lehmann <Jens.Lehmann@web.de>, git@vger.kernel.org, Jeff King <peff@peff.net>
> 
> 
> On 27 Nov 2013, at 11:43, Junio C Hamano <gitster@pobox.com> wrote:
> 
>> Nick Townsend <nick.townsend@mac.com> writes:
>> 
>>> On 26 Nov 2013, at 14:18, Junio C Hamano <gitster@pobox.com> wrote:
>>> 
>>>> Even if the code is run inside a repository with a working tree,
>>>> when producing a tarball out of an ancient commit that had a
>>>> submodule not at its current location, --recurse-submodules option
>>>> should do the right thing, so asking for working tree location of
>>>> that submodule to find its repository is wrong, I think.  It may
>>>> happen to find one if the archived revision is close enough to what
>>>> is currently checked out, but that may not necessarily be the case.
>>>> 
>>>> At that point when the code discovers an S_ISGITLINK entry, it
>>>> should have both a pathname to the submodule relative to the
>>>> toplevel and the commit object name bound to that submodule
>>>> location.  What it should do, when it does not find the repository
>>>> at the given path (maybe because there is no working tree, or the
>>>> sudmodule directory has moved over time) is roughly:
>>>> 
>>>> - Read from .gitmodules at the top-level from the tree it is
>>>> creating the tarball out of;
>>>> 
>>>> - Find "submodule.$name.path" entry that records that path to the
>>>> submodule; and then
>>>> 
>>>> - Using that $name, find the stashed-away location of the submodule
>>>> repository in $GIT_DIR/modules/$name.
>>>> 
>>>> or something like that.
>>>> 
>>>> This is a related tangent, but when used in a repository that people
>>>> often use as their remote, the repository discovery may have to
>>>> interact with the relative URL.  People often ship .gitmodules with
>>>> 
>>>> 	[submodule "bar"]
>>>>      	URL = ../bar.git
>>>> 		path = barDir
>>>> 
>>>> for a top-level project "foo" that can be cloned thusly:
>>>> 
>>>> 	git clone git://site.xz/foo.git
>>>> 
>>>> and host bar.git to be clonable with
>>>> 
>>>> 	git clone git://site.xz/bar.git barDir/
>>>> 
>>>> inside the working tree of the foo project.  In such a case, when
>>>> "archive --recurse-submodules" is running, it would find the
>>>> repository for the "bar" submodule at "../bar.git", I would think.
>>>> 
>>>> So this part needs a bit more thought, I am afraid.
>>> 
>>> I see that there is a lot of potential complexity around setting up a submodule:
>> 
>> No question about it.
>> 
>>> * The .gitmodules file can be dirty (easy to flag, but should we
>>> allow archive to proceed?)
>> 
>> As we are discussing "archive", which takes a tree object from the
>> top-level project that is recorded in the object database, the
>> information _about_ the submodule in question should come from the
>> given tree being archived.  There is no reason for the .gitmodules
>> file that happens to be sitting in the working tree of the top-level
>> project to be involved in the decision, so its dirtyness should not
>> matter, I think.  If the tree being archived has a submodule whose
>> name is "kernel" at path "linux/" (relative to the top-level
>> project), its repository should be at .git/modules/kernel in the
>> layout recent git-submodule prepares, and we should find that
>> path-and-name mapping from .gitmodules recorded in that tree object
>> we are archiving. The version that happens to be checked out to the
>> working tree may have moved the submodule to a new path "linux-3.0/"
>> and "linux-3.0/.git" may have "gitdir: .git/modules/kernel" in it,
>> but when archiving a tree that has the submodule at "linux/", it
>> would not help---we would not know to look at "linux-3.0/.git" to
>> learn that information anyway because .gitmodules in the working
>> tree would say that the submodule at path "linux-3.0/" is with name
>> "kernel", and would not tell us anything about "linux/".
>> 
>>> * Users can mess with settings both prior to git submodule init
>>> and before git submodule update.
>> 
>> I think this is irrelevant for exactly the same reason as above.
>> 
>> What makes this tricker, however, is how to deal with an old-style
>> repository, where the submodule repositories are embedded in the
>> working tree that happens to be checked out.  In that case, we may
>> have to read .gitmodules from two places, i.e.
>> 
>> (1) We are archiving a tree with a submodule at "linux/";
>> 
>> (2) We read .gitmodules from that tree and learn that the submodule
>>    has name "kernel";
>> 
>> (3) There is no ".git/modules/kernel" because the repository uses
>>    the old layout (if the user never was interested in this
>>    submodule, .git/modules/kernel may also be missing, and we
>>    should tell these two cases apart by checking .git/config to
>>    see if a corresponding entry for the "kernel" submodule exists
>>    there);
>> 
>> (4) In a repository that uses the old layout, there must be the
>>    repository somewhere embedded in the current working tree (this
>>    inability to remove is why we use the new layout these days).
>>    We can learn where it is by looking at .gitmodules in the
>>    working tree---map the name "kernel" we learned earlier, and
>>    map it to the current path ("linux-3.0/" if you have been
>>    following this example so far).
>> 
>> And in that fallback context, I would say that reading from a dirty
>> (or "messed with by the user") .gitmodules is the right thing to
>> do.  Perhaps the user may be in the process of moving the submodule
>> in his working tree with
>> 
>>   $ mv linux-3.0 linux-3.2
>>   $ git config -f .gitmodules submodule.kernel.path linux-3.2
>> 
>> but hasn't committed the change yet.
>> 
>>> For those reasons I deliberately decided not to reproduce the
>>> above logic all by myself.
>> 
>> As I already hinted, I agree that the "how to find the location of
>> submodule repository, given a particular tree in the top-level
>> project the submodule belongs to and the path to the submodule in
>> question" deserves a separate thread to discuss with area experts.
> 
> As per my email to Heiko on this thread, I’m happy to start such 
> a discussion - I’ll use your notes as a starting point. I’m much more comfortable
> using a wiki for this - is this common or should I start a new mail thread
> with RFC in the title or similar?
> 
> I did complete my work on my version of git-archive (for internal use) and added some regression tests
> for current behaviour. Also the add_submodule_odb patch should IMHO be incorporated
> anyway. I’ll resubmit those two for consideration in a new thread.
> 
> Kind Regards
> Nick Townsend
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] submodule recursion in git-archive
       [not found]           ` <3C71BC83-4DD0-43F8-9E36-88594CA63FC5@mac.com>
@ 2013-12-03  0:05             ` Nick Townsend
  2013-12-03 18:33             ` Heiko Voigt
  1 sibling, 0 replies; 20+ messages in thread
From: Nick Townsend @ 2013-12-03  0:05 UTC (permalink / raw)
  To: git


From: Nick Townsend <nick.townsend@mac.com>
Subject: Re: [PATCH] submodule recursion in git-archive
Date: 2 December 2013 15:55:36 GMT-8
To: Heiko Voigt <hvoigt@hvoigt.net>
Cc: Junio C Hamano <gitster@pobox.com>, René Scharfe <l.s.r@web.de>, Jens Lehmann <Jens.Lehmann@web.de>, git@vger.kernel.org, Jeff King <peff@peff.net>


On 29 Nov 2013, at 14:38, Heiko Voigt <hvoigt@hvoigt.net> wrote:

> On Wed, Nov 27, 2013 at 11:43:44AM -0800, Junio C Hamano wrote:
>> Nick Townsend <nick.townsend@mac.com> writes:
>>> * The .gitmodules file can be dirty (easy to flag, but should we
>>> allow archive to proceed?)
>> 
>> As we are discussing "archive", which takes a tree object from the
>> top-level project that is recorded in the object database, the
>> information _about_ the submodule in question should come from the
>> given tree being archived.  There is no reason for the .gitmodules
>> file that happens to be sitting in the working tree of the top-level
>> project to be involved in the decision, so its dirtyness should not
>> matter, I think.  If the tree being archived has a submodule whose
>> name is "kernel" at path "linux/" (relative to the top-level
>> project), its repository should be at .git/modules/kernel in the
>> layout recent git-submodule prepares, and we should find that
>> path-and-name mapping from .gitmodules recorded in that tree object
>> we are archiving. The version that happens to be checked out to the
>> working tree may have moved the submodule to a new path "linux-3.0/"
>> and "linux-3.0/.git" may have "gitdir: .git/modules/kernel" in it,
>> but when archiving a tree that has the submodule at "linux/", it
>> would not help---we would not know to look at "linux-3.0/.git" to
>> learn that information anyway because .gitmodules in the working
>> tree would say that the submodule at path "linux-3.0/" is with name
>> "kernel", and would not tell us anything about "linux/".
>> 
>>> * Users can mess with settings both prior to git submodule init
>>> and before git submodule update.
>> 
>> I think this is irrelevant for exactly the same reason as above.
>> 
>> What makes this tricker, however, is how to deal with an old-style
>> repository, where the submodule repositories are embedded in the
>> working tree that happens to be checked out.  In that case, we may
>> have to read .gitmodules from two places, i.e.
>> 
>> (1) We are archiving a tree with a submodule at "linux/";
>> 
>> (2) We read .gitmodules from that tree and learn that the submodule
>>     has name "kernel";
>> 
>> (3) There is no ".git/modules/kernel" because the repository uses
>>     the old layout (if the user never was interested in this
>>     submodule, .git/modules/kernel may also be missing, and we
>>     should tell these two cases apart by checking .git/config to
>>     see if a corresponding entry for the "kernel" submodule exists
>>     there);
>> 
>> (4) In a repository that uses the old layout, there must be the
>>     repository somewhere embedded in the current working tree (this
>>     inability to remove is why we use the new layout these days).
>>     We can learn where it is by looking at .gitmodules in the
>>     working tree---map the name "kernel" we learned earlier, and
>>     map it to the current path ("linux-3.0/" if you have been
>>     following this example so far).
>> 
>> And in that fallback context, I would say that reading from a dirty
>> (or "messed with by the user") .gitmodules is the right thing to
>> do.  Perhaps the user may be in the process of moving the submodule
>> in his working tree with
>> 
>>    $ mv linux-3.0 linux-3.2
>>    $ git config -f .gitmodules submodule.kernel.path linux-3.2
>> 
>> but hasn't committed the change yet.
>> 
>>> For those reasons I deliberately decided not to reproduce the
>>> above logic all by myself.
>> 
>> As I already hinted, I agree that the "how to find the location of
>> submodule repository, given a particular tree in the top-level
>> project the submodule belongs to and the path to the submodule in
>> question" deserves a separate thread to discuss with area experts.
> 
> FYI, I already started to implement this lookup of submodule paths early
> this year[1] but have not found the time to proceed on that yet. I am
> planning to continue on that topic soonish. We need it to implement a
> correct recursive fetch with clone on-demand as a basis for the future
> recursive checkout.
> 
> During the work on this I hit too many open questions. Thats why I am
> currently working on a complete plan[2] so we can discuss and define how
> this needs to be implemented. It is an asciidoc document which I will
> send out once I am finished with it.
> 
> Cheers Heiko
> 
> [1] http://article.gmane.org/gmane.comp.version-control.git/217020
> [2] https://github.com/hvoigt/git/wiki/submodule-fetch-config

Heiko
It seems to me that the question that you are trying to solve is
more complex than the problem I faced in git-archive, where we have a
single commit of the top-level repository that we are chasing. 
Perhaps we should split the work into two pieces:

a. Identifying the complete submodule configuration for a single commit, and
b. the complexity of behaviour when fetching and cloning recursively (which 
    of course requires a.)

I’m very happy to work on the first, but the second seems to me to require more
understanding than I currently possess. In order to do this it would help to have a
place to discuss this. I see you have used the wiki of your fork of git on GitHub.
Is that the right place to solicit input?

Kind Regards
Nick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] submodule recursion in git-archive
       [not found]           ` <3C71BC83-4DD0-43F8-9E36-88594CA63FC5@mac.com>
  2013-12-03  0:05             ` Nick Townsend
@ 2013-12-03 18:33             ` Heiko Voigt
  2013-12-09 20:55               ` [RFC/WIP PATCH] implement reading of submodule .gitmodules configuration into cache Heiko Voigt
  1 sibling, 1 reply; 20+ messages in thread
From: Heiko Voigt @ 2013-12-03 18:33 UTC (permalink / raw)
  To: Nick Townsend
  Cc: Junio C Hamano, René Scharfe, Jens Lehmann, git, Jeff King

Hi,

On Mon, Dec 02, 2013 at 03:55:36PM -0800, Nick Townsend wrote:
> 
> On 29 Nov 2013, at 14:38, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> > FYI, I already started to implement this lookup of submodule paths early
> > this year[1] but have not found the time to proceed on that yet. I am
> > planning to continue on that topic soonish. We need it to implement a
> > correct recursive fetch with clone on-demand as a basis for the future
> > recursive checkout.
> > 
> > During the work on this I hit too many open questions. Thats why I am
> > currently working on a complete plan[2] so we can discuss and define how
> > this needs to be implemented. It is an asciidoc document which I will
> > send out once I am finished with it.
> > 
> > Cheers Heiko
> > 
> > [1] http://article.gmane.org/gmane.comp.version-control.git/217020
> > [2] https://github.com/hvoigt/git/wiki/submodule-fetch-config
> 
> It seems to me that the question that you are trying to solve is
> more complex than the problem I faced in git-archive, where we have a
> single commit of the top-level repository that we are chasing. 
> Perhaps we should split the work into two pieces:
> 
> a. Identifying the complete submodule configuration for a single commit, and
> b. the complexity of behaviour when fetching and cloning recursively (which 
>     of course requires a.)

You are right the latter (b) is a separate topic. So how about I extract the
submodule config parsing part from the mentioned patch and you can then
use that patch as a basis for your work? As far as I understand you only
need to parse the .gitmodules file for one commit and then lookup the
submodule names from paths right? That would simplify matters and we can
postpone the caching of multiple commits for the time when I continue on b.

> I’m very happy to work on the first, but the second seems to me to require more
> understanding than I currently possess. In order to do this it would help to have a
> place to discuss this. I see you have used the wiki of your fork of git on GitHub.
> Is that the right place to solicit input?

I only used that to collect all information into one place. I am not
sure if thats actually necessary for the .gitmodules parsing you need.

I think we should discuss everything related to the design and patches
here on the list. If you have questions regarding my code I am also
happy to answer that via private mail.

Cheers Heiko

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC/WIP PATCH] implement reading of submodule .gitmodules configuration into cache
  2013-12-03 18:33             ` Heiko Voigt
@ 2013-12-09 20:55               ` Heiko Voigt
  2013-12-09 23:37                 ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Heiko Voigt @ 2013-12-09 20:55 UTC (permalink / raw)
  To: Nick Townsend
  Cc: Junio C Hamano, René Scharfe, Jens Lehmann, git, Jeff King

This submodule configuration cache allows us to lazily read .gitmodules
configurations by commit into a runtime cache which can then be used to
easily lookup values from it. Currently only the values for path or name
are stored but it can be extended for any value needed.

This cache can be used for all purposes which need knowledge about
submodule configurations.

It is expected that .gitmodules files do not change often between
commits. Thats why we lookup the .gitmodules sha1 from a commit and then
either lookup an already parsed configuration or parse and cache an
unknown one for each sha1. The cache is lazily build on demand for each
requested commit.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
---
On Tue, Dec 03, 2013 at 07:33:01PM +0100, Heiko Voigt wrote:
> On Mon, Dec 02, 2013 at 03:55:36PM -0800, Nick Townsend wrote:
> > It seems to me that the question that you are trying to solve is
> > more complex than the problem I faced in git-archive, where we have a
> > single commit of the top-level repository that we are chasing. 
> > Perhaps we should split the work into two pieces:
> > 
> > a. Identifying the complete submodule configuration for a single commit, and
> > b. the complexity of behaviour when fetching and cloning recursively (which 
> >     of course requires a.)
> 
> You are right the latter (b) is a separate topic. So how about I extract the
> submodule config parsing part from the mentioned patch and you can then
> use that patch as a basis for your work? As far as I understand you only
> need to parse the .gitmodules file for one commit and then lookup the
> submodule names from paths right? That would simplify matters and we can
> postpone the caching of multiple commits for the time when I continue on b.

Ok and here is a patch that you can use as basis for your work. I looked
into it and found that its actually easier to use the cache. Storing
everything in a hashmap seems like overkill for your application but it
is prepared for my usecase which actually needs to parse configurations
for multiple commits.

Since this has not been discussed yet some details of my implementation
might change.

The test I implemented is only for demonstration of usage and my quick
manual testing. If we agree on going ahead with my patch I will extend
that to a proper test. Or if anyone has an idea where we can plug that
in to make a useful user interface we can also implement a test based on
that.

Cheers Heiko

 .gitignore                    |   1 +
 Makefile                      |   2 +
 submodule-config-cache.c      |  96 ++++++++++++++++++++++++++++++++
 submodule-config-cache.h      |  33 +++++++++++
 submodule.c                   | 125 ++++++++++++++++++++++++++++++++++++++++++
 submodule.h                   |   4 ++
 test-submodule-config-cache.c |  45 +++++++++++++++
 7 files changed, 306 insertions(+)
 create mode 100644 submodule-config-cache.c
 create mode 100644 submodule-config-cache.h
 create mode 100644 test-submodule-config-cache.c

diff --git a/.gitignore b/.gitignore
index 66199ed..6c91e98 100644
--- a/.gitignore
+++ b/.gitignore
@@ -200,6 +200,7 @@
 /test-sha1
 /test-sigchain
 /test-string-list
+/test-submodule-config-cache
 /test-subprocess
 /test-svn-fe
 /test-urlmatch-normalization
diff --git a/Makefile b/Makefile
index af847f8..e3d869b 100644
--- a/Makefile
+++ b/Makefile
@@ -572,6 +572,7 @@ TEST_PROGRAMS_NEED_X += test-scrap-cache-tree
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sigchain
 TEST_PROGRAMS_NEED_X += test-string-list
+TEST_PROGRAMS_NEED_X += test-submodule-config-cache
 TEST_PROGRAMS_NEED_X += test-subprocess
 TEST_PROGRAMS_NEED_X += test-svn-fe
 TEST_PROGRAMS_NEED_X += test-urlmatch-normalization
@@ -872,6 +873,7 @@ LIB_OBJS += strbuf.o
 LIB_OBJS += streaming.o
 LIB_OBJS += string-list.o
 LIB_OBJS += submodule.o
+LIB_OBJS += submodule-config-cache.o
 LIB_OBJS += symlinks.o
 LIB_OBJS += tag.o
 LIB_OBJS += trace.o
diff --git a/submodule-config-cache.c b/submodule-config-cache.c
new file mode 100644
index 0000000..7253fad
--- /dev/null
+++ b/submodule-config-cache.c
@@ -0,0 +1,96 @@
+#include "cache.h"
+#include "submodule-config-cache.h"
+#include "strbuf.h"
+#include "hash.h"
+
+void submodule_config_cache_init(struct submodule_config_cache *cache)
+{
+	init_hash(&cache->for_name);
+	init_hash(&cache->for_path);
+}
+
+static int free_one_submodule_config(void *ptr, void *data)
+{
+	struct submodule_config *entry = ptr;
+
+	strbuf_release(&entry->path);
+	strbuf_release(&entry->name);
+	free(entry);
+
+	return 0;
+}
+
+void submodule_config_cache_free(struct submodule_config_cache *cache)
+{
+	/* NOTE: its important to iterate over the name hash here
+	 * since paths might have multiple entries */
+	for_each_hash(&cache->for_name, free_one_submodule_config, NULL);
+	free_hash(&cache->for_path);
+	free_hash(&cache->for_name);
+}
+
+static unsigned int hash_sha1_string(const unsigned char *sha1, const char *string)
+{
+	int c;
+	unsigned int hash, string_hash = 5381;
+	memcpy(&hash, sha1, sizeof(hash));
+
+	/* djb2 hash */
+	while ((c = *string++))
+		string_hash = ((string_hash << 5) + hash) + c; /* hash * 33 + c */
+
+	return hash + string_hash;
+}
+
+void submodule_config_cache_update_path(struct submodule_config_cache *cache,
+		struct submodule_config *config)
+{
+	void **pos;
+	int hash = hash_sha1_string(config->gitmodule_sha1, config->path.buf);
+	pos = insert_hash(hash, config, &cache->for_path);
+	if (pos) {
+		config->next = *pos;
+		*pos = config;
+	}
+}
+
+void submodule_config_cache_insert(struct submodule_config_cache *cache, struct submodule_config *config)
+{
+	unsigned int hash;
+	void **pos;
+
+	hash = hash_sha1_string(config->gitmodule_sha1, config->name.buf);
+	pos = insert_hash(hash, config, &cache->for_name);
+	if (pos) {
+		config->next = *pos;
+		*pos = config;
+	}
+}
+
+struct submodule_config *submodule_config_cache_lookup_path(struct submodule_config_cache *cache,
+	const unsigned char *gitmodule_sha1, const char *path)
+{
+	unsigned int hash = hash_sha1_string(gitmodule_sha1, path);
+	struct submodule_config *config = lookup_hash(hash, &cache->for_path);
+
+	while (config &&
+		(hashcmp(config->gitmodule_sha1, gitmodule_sha1) ||
+		 strcmp(path, config->path.buf)))
+		config = config->next;
+
+	return config;
+}
+
+struct submodule_config *submodule_config_cache_lookup_name(struct submodule_config_cache *cache,
+	const unsigned char *gitmodule_sha1, const char *name)
+{
+	unsigned int hash = hash_sha1_string(gitmodule_sha1, name);
+	struct submodule_config *config = lookup_hash(hash, &cache->for_name);
+
+	while (config &&
+		(hashcmp(config->gitmodule_sha1, gitmodule_sha1) ||
+		 strcmp(name, config->name.buf)))
+		config = config->next;
+
+	return config;
+}
diff --git a/submodule-config-cache.h b/submodule-config-cache.h
new file mode 100644
index 0000000..7bb79b9
--- /dev/null
+++ b/submodule-config-cache.h
@@ -0,0 +1,33 @@
+#ifndef SUBMODULE_CONFIG_CACHE_H
+#define SUBMODULE_CONFIG_CACHE_H
+
+#include "hash.h"
+#include "strbuf.h"
+
+struct submodule_config_cache {
+	struct hash_table for_path;
+	struct hash_table for_name;
+};
+
+/* one submodule_config_cache entry */
+struct submodule_config {
+	struct strbuf path;
+	struct strbuf name;
+	unsigned char gitmodule_sha1[20];
+	struct submodule_config *next;
+};
+
+void submodule_config_cache_init(struct submodule_config_cache *cache);
+void submodule_config_cache_free(struct submodule_config_cache *cache);
+
+void submodule_config_cache_update_path(struct submodule_config_cache *cache,
+		struct submodule_config *config);
+void submodule_config_cache_insert(struct submodule_config_cache *cache,
+		struct submodule_config *config);
+
+struct submodule_config *submodule_config_cache_lookup_path(struct submodule_config_cache *cache,
+	const unsigned char *gitmodule_sha1, const char *path);
+struct submodule_config *submodule_config_cache_lookup_name(struct submodule_config_cache *cache,
+	const unsigned char *gitmodule_sha1, const char *name);
+
+#endif /* SUBMODULE_CONFIG_CACHE_H */
diff --git a/submodule.c b/submodule.c
index 1905d75..fa9e7ea 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "submodule.h"
+#include "submodule-config-cache.h"
 #include "dir.h"
 #include "diff.h"
 #include "commit.h"
@@ -616,6 +617,130 @@ static int is_submodule_commit_present(const char *path, unsigned char sha1[20])
 	return is_present;
 }
 
+struct parse_submodule_config_parameter {
+	unsigned char *gitmodule_sha1;
+	struct submodule_config_cache *cache;
+};
+
+static int name_and_item_from_var(const char *var, struct strbuf *name, struct strbuf *item)
+{
+	/* find the name and add it */
+	strbuf_addstr(name, var + strlen("submodule."));
+	char *end = strrchr(name->buf, '.');
+	if (!end) {
+		strbuf_release(name);
+		return 0;
+	}
+	*end = '\0';
+	if (((end + 1) - name->buf) < name->len)
+		strbuf_addstr(item, end + 1);
+
+	return 1;
+}
+
+static struct submodule_config *lookup_or_create_by_name(struct submodule_config_cache *cache,
+		unsigned char *gitmodule_sha1, const char *name)
+{
+	struct submodule_config *config;
+	config = submodule_config_cache_lookup_name(cache, gitmodule_sha1, name);
+	if (config)
+		return config;
+
+	config = xmalloc(sizeof(*config));
+
+	strbuf_init(&config->name, 1024);
+	strbuf_addstr(&config->name, name);
+
+	strbuf_init(&config->path, 1024);
+
+	hashcpy(config->gitmodule_sha1, gitmodule_sha1);
+	config->next = NULL;
+
+	submodule_config_cache_insert(cache, config);
+
+	return config;
+}
+
+static void warn_multiple_config(struct submodule_config *config, const char *option)
+{
+	warning("%s:.gitmodules, multiple configurations found for submodule.%s.%s. "
+			"Skipping second one!", sha1_to_hex(config->gitmodule_sha1),
+			option, config->name.buf);
+}
+
+static int parse_submodule_config_into_cache(const char *var, const char *value, void *data)
+{
+	struct parse_submodule_config_parameter *me = data;
+	struct submodule_config *submodule_config;
+	struct strbuf name = STRBUF_INIT, item = STRBUF_INIT;
+
+	/* We only read submodule.<name> entries */
+	if (prefixcmp(var, "submodule."))
+		return 0;
+
+	if (!name_and_item_from_var(var, &name, &item))
+		return 0;
+
+	submodule_config = lookup_or_create_by_name(me->cache, me->gitmodule_sha1, name.buf);
+
+	if (!suffixcmp(var, ".path")) {
+		if (*submodule_config->path.buf != '\0') {
+			warn_multiple_config(submodule_config, "path");
+			return 0;
+		}
+		strbuf_addstr(&submodule_config->path, value);
+		submodule_config_cache_update_path(me->cache, submodule_config);
+	}
+
+	strbuf_release(&name);
+	strbuf_release(&item);
+
+	return 0;
+}
+
+struct submodule_config *get_submodule_config_for_commit_path(struct submodule_config_cache *cache,
+		const unsigned char *commit_sha1, const char *path)
+{
+	struct strbuf rev = STRBUF_INIT;
+	unsigned long config_size;
+	char *config;
+	unsigned char sha1[20];
+	enum object_type type;
+	struct submodule_config *submodule_config = NULL;
+	struct parse_submodule_config_parameter parameter;
+
+
+	strbuf_addf(&rev, "%s:.gitmodules", sha1_to_hex(commit_sha1));
+	if (get_sha1(rev.buf, sha1) < 0)
+		goto free_rev;
+
+	submodule_config = submodule_config_cache_lookup_path(cache, sha1, path);
+	if (submodule_config)
+		goto free_rev;
+
+	config = read_sha1_file(sha1, &type, &config_size);
+	if (!config)
+		goto free_rev;
+
+	if (type != OBJ_BLOB) {
+		free(config);
+		goto free_rev;
+	}
+
+	/* fill the submodule config into the cache */
+	parameter.cache = cache;
+	parameter.gitmodule_sha1 = sha1;
+	git_config_from_buf(parse_submodule_config_into_cache, rev.buf,
+			config, config_size, &parameter);
+	free(config);
+
+	submodule_config = submodule_config_cache_lookup_path(cache, sha1, path);
+
+free_rev:
+	strbuf_release(&rev);
+	return submodule_config;
+}
+
 static void submodule_collect_changed_cb(struct diff_queue_struct *q,
 					 struct diff_options *options,
 					 void *data)
diff --git a/submodule.h b/submodule.h
index 7beec48..a26cc34 100644
--- a/submodule.h
+++ b/submodule.h
@@ -3,6 +3,8 @@
 
 struct diff_options;
 struct argv_array;
+struct submodule_config;
+struct submodule_config_cache;
 
 enum {
 	RECURSE_SUBMODULES_ON_DEMAND = -1,
@@ -41,5 +43,7 @@ int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_nam
 		struct string_list *needs_pushing);
 int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name);
 void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
+struct submodule_config *get_submodule_config_for_commit_path(struct submodule_config_cache *cache,
+		const unsigned char *commit_sha1, const char *path);
 
 #endif
diff --git a/test-submodule-config-cache.c b/test-submodule-config-cache.c
new file mode 100644
index 0000000..239560f
--- /dev/null
+++ b/test-submodule-config-cache.c
@@ -0,0 +1,45 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "cache.h"
+#include "submodule-config-cache.h"
+#include "submodule.h"
+
+static void die_usage(int argc, char **argv, const char *msg)
+{
+	fprintf(stderr, "%s\n", msg);
+	fprintf(stderr, "Usage: %s <commit> <submodulepath>\n", argv[0]);
+	exit(1);
+}
+
+int main(int argc, char **argv)
+{
+	struct submodule_config_cache submodule_config_cache;
+	struct submodule_config *submodule_config;
+	unsigned char commit_sha1[20];
+	const char *commit;
+	const char *path;
+
+	if (argc < 3)
+		die_usage(argc, argv, "Wrong number of arguments.");
+
+	commit = argv[1];
+	path = argv[2];
+
+	if (get_sha1(commit, commit_sha1) < 0)
+		die_usage(argc, argv, "Commit not found.");
+
+	submodule_config_cache_init(&submodule_config_cache);
+
+	submodule_config = get_submodule_config_for_commit_path(&submodule_config_cache,
+				commit_sha1, path);
+	if (!submodule_config)
+		die_usage(argc, argv, "Submodule config not found.");
+
+	printf("Submodule name: '%s' for path '%s'\n", submodule_config->name.buf, path);
+
+	submodule_config_cache_free(&submodule_config_cache);
+
+	return 0;
+}
-- 
1.8.5.1.21.gee0ea2e

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC/WIP PATCH] implement reading of submodule .gitmodules configuration into cache
  2013-12-09 20:55               ` [RFC/WIP PATCH] implement reading of submodule .gitmodules configuration into cache Heiko Voigt
@ 2013-12-09 23:37                 ` Junio C Hamano
  2013-12-12 13:03                   ` Heiko Voigt
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2013-12-09 23:37 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: Nick Townsend, René Scharfe, Jens Lehmann, git, Jeff King

Heiko Voigt <hvoigt@hvoigt.net> writes:

> This submodule configuration cache allows us to lazily read .gitmodules
> configurations by commit into a runtime cache which can then be used to
> easily lookup values from it. Currently only the values for path or name
> are stored but it can be extended for any value needed.
> ...

Thanks.

> diff --git a/submodule-config-cache.c b/submodule-config-cache.c
> new file mode 100644
> index 0000000..7253fad
> --- /dev/null
> +++ b/submodule-config-cache.c
> @@ -0,0 +1,96 @@
> +#include "cache.h"
> +#include "submodule-config-cache.h"
> +#include "strbuf.h"
> +#include "hash.h"
> +
> +void submodule_config_cache_init(struct submodule_config_cache *cache)
> +{
> +	init_hash(&cache->for_name);
> +	init_hash(&cache->for_path);
> +}
> +
> +static int free_one_submodule_config(void *ptr, void *data)
> +{
> +	struct submodule_config *entry = ptr;
> +
> +	strbuf_release(&entry->path);
> +	strbuf_release(&entry->name);
> +	free(entry);
> +
> +	return 0;
> +}
> +
> +void submodule_config_cache_free(struct submodule_config_cache *cache)
> +{
> +	/* NOTE: its important to iterate over the name hash here
> +	 * since paths might have multiple entries */

Style (multi-line comments).

This is interesting.  I wonder what the practical consequence is to
have a single submodule bound to the top-level tree more than once.
Updating from one of the working tree will make the other working
tree out of sync because the ultimate location of the submodule
directory pointed at by the two .git gitdirs can only have a single
HEAD, be it detached or on a branch, and a single index.

Not that the decision to enforce that names are unique in the
top-level .gitmodules, and follow that decision in this part of the
code to be defensive (not rely on the "one submodule can be bound
only once to a top-level tree"), but shouldn't such a configuration
to have a single submodule bound to more than one place in the
top-level tree be forbidden?

> +	for_each_hash(&cache->for_name, free_one_submodule_config, NULL);
> +	free_hash(&cache->for_path);
> +	free_hash(&cache->for_name);
> +}
> +
> +static unsigned int hash_sha1_string(const unsigned char *sha1, const char *string)
> +{
> +	int c;
> +	unsigned int hash, string_hash = 5381;
> +	memcpy(&hash, sha1, sizeof(hash));
> +
> +	/* djb2 hash */
> +	while ((c = *string++))
> +		string_hash = ((string_hash << 5) + hash) + c; /* hash * 33 + c */

Hmm, the comment and the code does not seem to match in math here...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [RFC/WIP PATCH] implement reading of submodule .gitmodules configuration into cache
  2013-12-09 23:37                 ` Junio C Hamano
@ 2013-12-12 13:03                   ` Heiko Voigt
  0 siblings, 0 replies; 20+ messages in thread
From: Heiko Voigt @ 2013-12-12 13:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nick Townsend, René Scharfe, Jens Lehmann, git, Jeff King

On Mon, Dec 09, 2013 at 03:37:50PM -0800, Junio C Hamano wrote:
> > +void submodule_config_cache_free(struct submodule_config_cache *cache)
> > +{
> > +	/* NOTE: its important to iterate over the name hash here
> > +	 * since paths might have multiple entries */
> 
> Style (multi-line comments).

Will fix.

> This is interesting.  I wonder what the practical consequence is to
> have a single submodule bound to the top-level tree more than once.
> Updating from one of the working tree will make the other working
> tree out of sync because the ultimate location of the submodule
> directory pointed at by the two .git gitdirs can only have a single
> HEAD, be it detached or on a branch, and a single index.

To clarify, when writing this comment I was not thinking about the same
submodule with multiple paths in the same tree but rather with the same
name under different paths in different commits.

> Not that the decision to enforce that names are unique in the
> top-level .gitmodules, and follow that decision in this part of the
> code to be defensive (not rely on the "one submodule can be bound
> only once to a top-level tree"), but shouldn't such a configuration
> to have a single submodule bound to more than one place in the
> top-level tree be forbidden?

Yes IMO, that should be forbidden currently. I do not think we actually
prevent the user from doing so but it can not happen by accident since
we derive the initial name from the local path. Maybe we should be more
strict about that and put more guards in place to avoid such
configurations from entering the database.

> > +	for_each_hash(&cache->for_name, free_one_submodule_config, NULL);
> > +	free_hash(&cache->for_path);
> > +	free_hash(&cache->for_name);
> > +}
> > +
> > +static unsigned int hash_sha1_string(const unsigned char *sha1, const char *string)
> > +{
> > +	int c;
> > +	unsigned int hash, string_hash = 5381;
> > +	memcpy(&hash, sha1, sizeof(hash));
> > +
> > +	/* djb2 hash */
> > +	while ((c = *string++))
> > +		string_hash = ((string_hash << 5) + hash) + c; /* hash * 33 + c */
> 
> Hmm, the comment and the code does not seem to match in math here...

Yeah sorry that was a leftover from the code I started with. In the
beginning it was a pure string hash. Will remove both comments (since
its also not a pure djb2 hash anymore).

Cheers Heiko

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule recursion in git-archive
       [not found] <0MWW00M0GODZPV00@nk11p03mm-asmtp002.mac.com>
@ 2013-11-27  5:03 ` Nick Townsend
  0 siblings, 0 replies; 20+ messages in thread
From: Nick Townsend @ 2013-11-27  5:03 UTC (permalink / raw)
  To: git

On 26 Nov 2013, at 07:17, René Scharfe <l.s.r@web.de> wrote:

> Am 26.11.2013 01:04, schrieb Nick Townsend:
>> My first git patch - so shout out if I’ve got the etiquette wrong! Or
>> of course if I’ve missed something.
> 
> Thanks for the patches!  Please send only one per message (the second
> one as a reply to the first one, or both as replies to a cover letter),
> though -- that makes commenting on them much easier.
> 
> Side note: Documentation/SubmittingPatches doesn't mention that (yet),
> AFAICS.
> 
>> Subject: [PATCH 1/2] submodule: add_submodule_odb() usability
>> 
>> Although add_submodule_odb() is documented as being
>> externally usable, it is declared static and also
>> has incorrect documentation.
>> 
>> This commit fixes those and makes no changes to
>> existing code using them. All tests still pass.
> 
> Sign-off missing (see Documentation/SubmittingPatches).
> 
>> ---
>> Documentation/technical/api-ref-iteration.txt | 4 ++--
>> submodule.c                                   | 2 +-
>> submodule.h                                   | 1 +
>> 3 files changed, 4 insertions(+), 3 deletions(-)
>> 
>> diff --git a/Documentation/technical/api-ref-iteration.txt b/Documentation/technical/api-ref-iteration.txt
>> index aa1c50f..cbee624 100644
>> --- a/Documentation/technical/api-ref-iteration.txt
>> +++ b/Documentation/technical/api-ref-iteration.txt
>> @@ -50,10 +50,10 @@ submodules object database. You can do this by a code-snippet like
>> this:
>> 
>> 	const char *path = "path/to/submodule"
>> -	if (!add_submodule_odb(path))
>> +	if (add_submodule_odb(path))
>> 		die("Error submodule '%s' not populated.", path);
>> 
>> -`add_submodule_odb()` will return an non-zero value on success. If you
>> +`add_submodule_odb()` will return a zero value on success. If you
> 
> "return zero on success" instead?

I like the brevity of your suggestion. Again, I just used what was there…

> 
>> do not do this you will get an error for each ref that it does not point
>> to a valid object.
>> 
>> diff --git a/submodule.c b/submodule.c
>> index 1905d75..1ea46be 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -143,7 +143,7 @@ void stage_updated_gitmodules(void)
>> 		die(_("staging updated .gitmodules failed"));
>> }
>> 
>> -static int add_submodule_odb(const char *path)
>> +int add_submodule_odb(const char *path)
>> {
>> 	struct strbuf objects_directory = STRBUF_INIT;
>> 	struct alternate_object_database *alt_odb;
>> diff --git a/submodule.h b/submodule.h
>> index 7beec48..3e3cdca 100644
>> --- a/submodule.h
>> +++ b/submodule.h
>> @@ -41,5 +41,6 @@ int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_nam
>> 		struct string_list *needs_pushing);
>> int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name);
>> void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
>> +int add_submodule_odb(const char *path);
>> 
>> #endif
> 
>> Subject: [PATCH 2/2] archive: allow submodule recursion on git-archive
>> 
>> When using git-archive to produce a dump of a
>> repository, the existing code does not recurse
>> into a submodule when it encounters it in the tree
>> traversal. These changes add a command line flag
>> that permits this.
>> 
>> Note that the submodules must be updated in the
>> repository, otherwise this cannot take place.
>> 
>> The feature is disabled for remote repositories as
>> the git_work_tree fails. This is a possible future
>> enhancement.
> 
> Hmm, curious.  Why does it fail?  I guess that happens with bare
> repositories, only, right?  (Which are the most likely kind of remote
> repos to encounter, of course.)

I’m not sure why it failed - I didn’t think it should - but it did.
See discussion in other email.

> 
>> Two additional fields are added to archiver_args:
>>  * recurse  - a boolean indicator
>>  * treepath - the path part of the tree-ish
>>               eg. the 'www' in HEAD:www
>> 
>> The latter is used within the archive writer to
>> determin the correct path for the submodule .git
>> file.
>> 
>> Signed-off-by: Nick Townsend <nick.townsend@mac.com>
>> ---
>> Documentation/git-archive.txt |  9 +++++++++
>> archive.c                     | 38 ++++++++++++++++++++++++++++++++++++--
>> archive.h                     |  2 ++
>> 3 files changed, 47 insertions(+), 2 deletions(-)
>> 
>> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
>> index b97aaab..b4df735 100644
>> --- a/Documentation/git-archive.txt
>> +++ b/Documentation/git-archive.txt
>> @@ -11,6 +11,7 @@ SYNOPSIS
>> [verse]
>> 'git archive' [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>]
>> 	      [-o <file> | --output=<file>] [--worktree-attributes]
>> +	      [--recursive|--recurse-submodules]
> 
> I'd expect git archive --recurse to add subdirectories and their
> contents, which it does right now, and --no-recurse to only archive the
> specified objects, which is not implemented.  IAW: I wouldn't normally
> associate an option with that name with submodules.  Would
> --recurse-submodules alone suffice?
> 
> Side note: With only one of the options defined you could shorten them
> on the command line to e.g. --rec; with both you'd need to type at least
> --recursi or --recurse to disambiguate -- even though they ultimately do
> the same.

See other comments - I agree - recurse-submodules it is!

> 
>> 	      [--remote=<repo> [--exec=<git-upload-archive>]] <tree-ish>
>> 	      [<path>...]
>> 
>> @@ -51,6 +52,14 @@ OPTIONS
>> --prefix=<prefix>/::
>> 	Prepend <prefix>/ to each filename in the archive.
>> 
>> +--recursive::
>> +--recurse-submodules::
>> +	Archive entries in submodules. Errors occur if the submodules
>> +	have not been initialized and updated.
>> +	Run `git submodule update --init --recursive` immediately after
>> +	the clone is finished to avoid this.
>> +	This option is not available with remote repositories.
>> +
>> -o <file>::
>> --output=<file>::
>> 	Write the archive to <file> instead of stdout.
>> diff --git a/archive.c b/archive.c
>> index 346f3b2..f6313c9 100644
>> --- a/archive.c
>> +++ b/archive.c
>> @@ -5,6 +5,7 @@
>> #include "archive.h"
>> #include "parse-options.h"
>> #include "unpack-trees.h"
>> +#include "submodule.h"
>> 
>> static char const * const archive_usage[] = {
>> 	N_("git archive [options] <tree-ish> [<path>...]"),
>> @@ -131,13 +132,32 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
>> 		args->convert = ATTR_TRUE(check[1].value);
>> 	}
>> 
>> +	if (S_ISGITLINK(mode) && args->recurse) {
>> +		const char *work_tree = get_git_work_tree();
>> +		if (!work_tree) {
>> +			  die("Can't go recursive when no work dir");
>> +		}
> 
> Style nit: No curly braces around single-line statements, please.
> 
> Perhaps mention submodules in the error message?
> 
> It would be nicer to get_git_work_tree right after the parameters have
> been parsed and before any archive contents have been written and error
> out early.
> 
>> +		static struct strbuf dotgit = STRBUF_INIT;
> 
> We avoid declarations after statements because older compilers don't
> support it.
> 
> You release the memory at the end of this block; that means there's no
> advantage in making this strbuf static.  Allocating and freeing the
> memory for the path of each submodule shouldn't cause any performance
> issues, so please just drop static from the declaration.
> 
>> +		strbuf_reset(&dotgit);
> 
> ... and then you don't need to reset anymore.
> 
>> +		strbuf_grow(&dotgit, PATH_MAX);
> 
> I'd drop that as well; the number of submodules should be low enough
> that the possibly avoided reallocations by giving this hint shouldn't be
> noticeable.

I just cut and pasted this code from another location - didn’t really think too hard
and I wasn’t familiar with the strbuf routines in the codebase. Thanks for the input.

> 
>> +		strbuf_addstr(&dotgit, work_tree);
>> +		strbuf_addch(&dotgit, '/');
>> +		if (args->treepath) {
>> +			  strbuf_addstr(&dotgit, args->treepath);
>> +			  strbuf_addch(&dotgit, '/');
>> +		}
>> +		strbuf_add(&dotgit, path_without_prefix,strlen(path_without_prefix)-1);
>> +		if (add_submodule_odb(dotgit.buf))
>> +			  die("Can't add submodule: %s", dotgit.buf);
> 
> Hmm, I wonder if we can traverse the tree and load all submodule object
> databases before traversing it again to actually write file contents.
> That would spare the user from getting half of an archive together with
> that error message.

See comments in another email about where and how to parse submodules. If we do the
parse .gitmodules approach then we could do something like this. The as-you-go
method seemed easier and more accurate - after all you may not traverse into a
submodule at all - so any time spent adding it’s objects would be wasted.

In the same conversation I agree that it’s best to warn when a submodule can’t be
located, and continue with just a directory entry in its place (as currently)

> 
>> +		strbuf_release(&dotgit);
>> +	}
>> 	if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
>> 		if (args->verbose)
>> 			fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
>> 		err = write_entry(args, sha1, path.buf, path.len, mode);
>> 		if (err)
>> 			return err;
>> -		return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
>> +		return (S_ISGITLINK(mode) && !args->recurse) ? 0: READ_TREE_RECURSIVE;
> 
> This line is longer than 80 characters, which we tend to avoid.  How
> about this?
> 
> 		if (ISGITLINK(mode) && !args->recurse)
> 			return 0;
> 		return READ_TREE_RECURSIVE;
> 
>> 	}

Agreed. I don’t usually enjoy multiple returns (especially so close to the end!)
but if the style guidelines are OK with that then so am I.

>> 
>> 	if (args->verbose)
>> @@ -256,10 +276,16 @@ static void parse_treeish_arg(const char **argv,
>> 	const struct commit *commit;
>> 	unsigned char sha1[20];
>> 
>> +	const char *colon = strchr(name, ':');
>> +
>> +	/* Store the path on the ref for later (required for --recursive) */
>> +	char *treepath = NULL;
>> +	if (colon) {
>> +		treepath = strdup(colon+1);
>> +	}
> 
> Style: No curly braces, space around operators, use wrapper functions
> for simple memory error handling:
> 
> 	if (colon)
> 		treepath = xstrdup(colon + 1);

Is that the standard practice? To use the wrapper functions?  

> 
>> 	/* Remotes are only allowed to fetch actual refs */
>> 	if (remote) {
>> 		char *ref = NULL;
>> -		const char *colon = strchr(name, ':');
>> 		int refnamelen = colon ? colon - name : strlen(name);
>> 
>> 		if (!dwim_ref(name, refnamelen, sha1, &ref))
>> @@ -296,9 +322,11 @@ static void parse_treeish_arg(const char **argv,
>> 		tree = parse_tree_indirect(tree_sha1);
>> 	}
>> 	ar_args->tree = tree;
>> +	ar_args->treepath = treepath;
>> 	ar_args->commit_sha1 = commit_sha1;
>> 	ar_args->commit = commit;
>> 	ar_args->time = archive_time;
>> +
>> }
> 
> Please don't add empty lines before the end of a block.
> 
>> #define OPT__COMPR(s, v, h, p) \
>> @@ -318,6 +346,7 @@ static int parse_archive_args(int argc, const char **argv,
>> 	const char *exec = NULL;
>> 	const char *output = NULL;
>> 	int compression_level = -1;
>> +	int recurse = 0;
>> 	int verbose = 0;
>> 	int i;
>> 	int list = 0;
>> @@ -331,6 +360,8 @@ static int parse_archive_args(int argc, const char **argv,
>> 			N_("write the archive to this file")),
>> 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
>> 			N_("read .gitattributes in working directory")),
>> +		OPT_BOOL(0, "recursive", &recurse, N_("include submodules in archive")),
>> +		OPT_BOOL(0, "recurse-submodules", &recurse, N_("include submodules in archive")),
>> 		OPT__VERBOSE(&verbose, N_("report archived files on stderr")),
>> 		OPT__COMPR('0', &compression_level, N_("store only"), 0),
>> 		OPT__COMPR('1', &compression_level, N_("compress faster"), 1),
>> @@ -355,6 +386,8 @@ static int parse_archive_args(int argc, const char **argv,
>> 
>> 	argc = parse_options(argc, argv, NULL, opts, archive_usage, 0);
>> 
>> +	if (is_remote && recurse)
>> +		die("Cannot include submodules with option --remote");
>> 	if (remote)
>> 		die("Unexpected option --remote");
>> 	if (exec)
>> @@ -393,6 +426,7 @@ static int parse_archive_args(int argc, const char **argv,
>> 					format, compression_level);
>> 		}
>> 	}
>> +	args->recurse = recurse;
>> 	args->verbose = verbose;
>> 	args->base = base;
>> 	args->baselen = strlen(base);
>> diff --git a/archive.h b/archive.h
>> index 4a791e1..577238d 100644
>> --- a/archive.h
>> +++ b/archive.h
>> @@ -7,10 +7,12 @@ struct archiver_args {
>> 	const char *base;
>> 	size_t baselen;
>> 	struct tree *tree;
>> +	const char *treepath;
>> 	const unsigned char *commit_sha1;
>> 	const struct commit *commit;
>> 	time_t time;
>> 	struct pathspec pathspec;
>> +	unsigned int recurse : 1;
> 
> Name it recurse_submodules?
> 
>> 	unsigned int verbose : 1;
>> 	unsigned int worktree_attributes : 1;
>> 	unsigned int convert : 1;
>> -- 1.8.3.4 (Apple Git-47) 
> 
> A test script (t/t5005-archive-submodules.sh?) would be nice which
> exercises the new option.

I take all the style points noted, and will create a test script - I didn’t want to invest too much
in this project until I’d received a response!

Thanks for your time in reviewing - these things are a great help.

Cheers
Nick

> 
> René

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-12-12 13:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-26  0:04 [PATCH] submodule recursion in git-archive Nick Townsend
2013-11-26 15:17 ` René Scharfe
2013-11-26 18:57   ` Jens Lehmann
2013-11-26 22:18   ` Junio C Hamano
2013-11-27  0:28     ` René Scharfe
2013-11-27  3:28       ` Nick Townsend
2013-11-27 19:05       ` Junio C Hamano
2013-11-27  3:55     ` Nick Townsend
2013-11-27 19:43       ` Junio C Hamano
2013-11-29 22:38         ` Heiko Voigt
     [not found]           ` <3C71BC83-4DD0-43F8-9E36-88594CA63FC5@mac.com>
2013-12-03  0:05             ` Nick Townsend
2013-12-03 18:33             ` Heiko Voigt
2013-12-09 20:55               ` [RFC/WIP PATCH] implement reading of submodule .gitmodules configuration into cache Heiko Voigt
2013-12-09 23:37                 ` Junio C Hamano
2013-12-12 13:03                   ` Heiko Voigt
2013-12-03  0:00         ` [PATCH] submodule recursion in git-archive Nick Townsend
2013-12-03  0:03           ` Fwd: " Nick Townsend
2013-11-26 22:38   ` Heiko Voigt
2013-11-27  3:33     ` Nick Townsend
     [not found] <0MWW00M0GODZPV00@nk11p03mm-asmtp002.mac.com>
2013-11-27  5:03 ` Nick Townsend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.