All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format
@ 2017-09-17  7:36 Ian Campbell
  2017-09-17  7:36 ` [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted Ian Campbell
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Ian Campbell @ 2017-09-17  7:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

This is the second version of my patches to add incremental support to
git-filter-branch. Since the last time I have:
 * addressed the review feedback (see changelog embedded in final
   patch)
 * switched to using the (newly introduced) `--allow-missing-tagger`
   option to `git mktag` to allow early Linux kernel tags to be
   rewritten
 * split out some of the preparatory changes to make the final patch
   easier to read.

I've force pushed to [1] where Travis seems happy and have set off the
process of re-rewriting the devicetree tree from scratch (a multi-day
affair) to validate (it's looking good).

Ian.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/devicetree/devicetree-rebasing.git/
[1] https://github.com/ijc/git/tree/git-filter-branch

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted
  2017-09-17  7:36 [PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format Ian Campbell
@ 2017-09-17  7:36 ` Ian Campbell
  2017-09-19  3:01   ` Junio C Hamano
  2017-09-17  7:36 ` [PATCH v2 2/4] filter-branch: reset $GIT_* before cleaning up Ian Campbell
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Ian Campbell @ 2017-09-17  7:36 UTC (permalink / raw)
  To: gitster; +Cc: git, Ian Campbell

This can be useful e.g. in `filter-branch` when rewritting tags produced by
older versions of Git, such as v2.6.12-rc2..v2.6.13-rc3 in the Linux kernel
source tree:

        $ git cat-file tag v2.6.12-rc2
        object 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
        type commit
        tag v2.6.12-rc2

        Linux v2.6.12-rc2 release
        -----BEGIN PGP SIGNATURE-----
        Version: GnuPG v1.2.4 (GNU/Linux)

        iD8DBQBCbW8ZF3YsRnbiHLsRAgFRAKCq/TkuDaEombFABkPqYgGCgWN2lQCcC0qc
        wznDbFU45A54dZC8RZ5JxyE=
        =ESRP
        -----END PGP SIGNATURE-----

        $ git cat-file tag v2.6.12-rc2 | git mktag
        error: char76: could not find "tagger "
        fatal: invalid tag signature file
        $ git cat-file tag v2.6.12-rc2 | git mktag --allow-missing-tagger
        9e734775f7c22d2f89943ad6c745571f1930105f

To that end, pass the new option to `mktag` in `filter-branch`.

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
---
 Documentation/git-mktag.txt |   9 +++-
 builtin/mktag.c             | 100 +++++++++++++++++++++++++-------------------
 git-filter-branch.sh        |   2 +-
 t/t3800-mktag.sh            |  33 ++++++++++++++-
 4 files changed, 98 insertions(+), 46 deletions(-)

diff --git a/Documentation/git-mktag.txt b/Documentation/git-mktag.txt
index fa6a75612..c720c7419 100644
--- a/Documentation/git-mktag.txt
+++ b/Documentation/git-mktag.txt
@@ -9,7 +9,7 @@ git-mktag - Creates a tag object
 SYNOPSIS
 --------
 [verse]
-'git mktag'
+'git mktag' [--allow-missing-tagger]
 
 DESCRIPTION
 -----------
@@ -34,6 +34,13 @@ exists, is separated by a blank line from the header.  The
 message part may contain a signature that Git itself doesn't
 care about, but that can be verified with gpg.
 
+OPTIONS
+-------
+--allow-missing-tagger::
+	Allow the `tagger` line in the header to be omitted. This is
+	rarely desirable but may be useful in recreating tags created
+	by older Git.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 031b750f0..0f5dae8d5 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -1,4 +1,5 @@
 #include "builtin.h"
+#include "parse-options.h"
 #include "tag.h"
 
 /*
@@ -15,6 +16,8 @@
  * the shortest possible tagger-line.
  */
 
+static int allow_missing_tagger;
+
 /*
  * We refuse to tag something we can't verify. Just because.
  */
@@ -41,8 +44,9 @@ static int verify_tag(char *buffer, unsigned long size)
 	unsigned char sha1[20];
 	const char *object, *type_line, *tag_line, *tagger_line, *lb, *rb;
 	size_t len;
+	const unsigned long min_size = allow_missing_tagger ? 71 : 84;
 
-	if (size < 84)
+	if (size < min_size)
 		return error("wanna fool me ? you obviously got the size wrong !");
 
 	buffer[size] = 0;
@@ -98,46 +102,46 @@ static int verify_tag(char *buffer, unsigned long size)
 	/* Verify the tagger line */
 	tagger_line = tag_line;
 
-	if (memcmp(tagger_line, "tagger ", 7))
+	if (!memcmp(tagger_line, "tagger ", 7)) {
+		/*
+		 * Check for correct form for name and email
+		 * i.e. " <" followed by "> " on _this_ line
+		 * No angle brackets within the name or email address fields.
+		 * No spaces within the email address field.
+		 */
+		tagger_line += 7;
+		if (!(lb = strstr(tagger_line, " <")) || !(rb = strstr(lb+2, "> ")) ||
+			strpbrk(tagger_line, "<>\n") != lb+1 ||
+			strpbrk(lb+2, "><\n ") != rb)
+			return error("char%"PRIuMAX": malformed tagger field",
+				(uintmax_t) (tagger_line - buffer));
+
+		/* Check for author name, at least one character, space is acceptable */
+		if (lb == tagger_line)
+			return error("char%"PRIuMAX": missing tagger name",
+				(uintmax_t) (tagger_line - buffer));
+
+		/* timestamp, 1 or more digits followed by space */
+		tagger_line = rb + 2;
+		if (!(len = strspn(tagger_line, "0123456789")))
+			return error("char%"PRIuMAX": missing tag timestamp",
+				(uintmax_t) (tagger_line - buffer));
+		tagger_line += len;
+		if (*tagger_line != ' ')
+			return error("char%"PRIuMAX": malformed tag timestamp",
+				(uintmax_t) (tagger_line - buffer));
+		tagger_line++;
+
+		/* timezone, 5 digits [+-]hhmm, max. 1400 */
+		if (!((tagger_line[0] == '+' || tagger_line[0] == '-') &&
+		      strspn(tagger_line+1, "0123456789") == 4 &&
+		      tagger_line[5] == '\n' && atoi(tagger_line+1) <= 1400))
+			return error("char%"PRIuMAX": malformed tag timezone",
+				(uintmax_t) (tagger_line - buffer));
+		tagger_line += 6;
+	} else if (!allow_missing_tagger)
 		return error("char%"PRIuMAX": could not find \"tagger \"",
-			(uintmax_t) (tagger_line - buffer));
-
-	/*
-	 * Check for correct form for name and email
-	 * i.e. " <" followed by "> " on _this_ line
-	 * No angle brackets within the name or email address fields.
-	 * No spaces within the email address field.
-	 */
-	tagger_line += 7;
-	if (!(lb = strstr(tagger_line, " <")) || !(rb = strstr(lb+2, "> ")) ||
-		strpbrk(tagger_line, "<>\n") != lb+1 ||
-		strpbrk(lb+2, "><\n ") != rb)
-		return error("char%"PRIuMAX": malformed tagger field",
-			(uintmax_t) (tagger_line - buffer));
-
-	/* Check for author name, at least one character, space is acceptable */
-	if (lb == tagger_line)
-		return error("char%"PRIuMAX": missing tagger name",
-			(uintmax_t) (tagger_line - buffer));
-
-	/* timestamp, 1 or more digits followed by space */
-	tagger_line = rb + 2;
-	if (!(len = strspn(tagger_line, "0123456789")))
-		return error("char%"PRIuMAX": missing tag timestamp",
-			(uintmax_t) (tagger_line - buffer));
-	tagger_line += len;
-	if (*tagger_line != ' ')
-		return error("char%"PRIuMAX": malformed tag timestamp",
-			(uintmax_t) (tagger_line - buffer));
-	tagger_line++;
-
-	/* timezone, 5 digits [+-]hhmm, max. 1400 */
-	if (!((tagger_line[0] == '+' || tagger_line[0] == '-') &&
-	      strspn(tagger_line+1, "0123456789") == 4 &&
-	      tagger_line[5] == '\n' && atoi(tagger_line+1) <= 1400))
-		return error("char%"PRIuMAX": malformed tag timezone",
-			(uintmax_t) (tagger_line - buffer));
-	tagger_line += 6;
+			     (uintmax_t) (tagger_line - buffer));
 
 	/* Verify the blank line separating the header from the body */
 	if (*tagger_line != '\n')
@@ -148,13 +152,25 @@ static int verify_tag(char *buffer, unsigned long size)
 	return 0;
 }
 
+static char const * const mktag_usage[] = {
+	N_("git mktag [<options>]"),
+	NULL
+};
+
+static struct option mktag_opts[] = {
+	OPT_BOOL(0, "allow-missing-tagger", &allow_missing_tagger, N_("allow the tagger field to be omitted")),
+	OPT_END(),
+};
+
 int cmd_mktag(int argc, const char **argv, const char *prefix)
 {
 	struct strbuf buf = STRBUF_INIT;
 	unsigned char result_sha1[20];
 
-	if (argc != 1)
-		usage("git mktag");
+	argc = parse_options(argc, argv, prefix, mktag_opts, mktag_usage, 0);
+
+	if (argc != 0)
+		usage_with_options(mktag_usage, mktag_opts);
 
 	if (strbuf_read(&buf, 0, 4096) < 0) {
 		die_errno("could not read from stdin");
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 3a74602ef..05645064a 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -530,7 +530,7 @@ if [ "$filter_tag_name" ]; then
 					}' \
 				    -e '/^-----BEGIN PGP SIGNATURE-----/q' \
 				    -e 'p' ) |
-				git mktag) ||
+				git mktag --allow-missing-tagger) ||
 				die "Could not create new tag object for $ref"
 			if git cat-file tag "$ref" | \
 			   sane_grep '^-----BEGIN PGP SIGNATURE-----' >/dev/null 2>&1
diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index 8eb47942e..3a77a26c8 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -340,7 +340,36 @@ check_verify_failure 'detect invalid header entry' \
 	'^error: char124: trailing garbage in tag header$'
 
 ############################################################
-# 24. create valid tag
+# 24. missing tagger ok with --allow-missing-tagger
+
+cat >tag.sig <<EOF
+object $head
+type commit
+tag mytag
+
+EOF
+
+test_expect_success \
+    'missing tagger with --allow-missing-tagger' \
+    'git mktag --allow-missing-tagger <tag.sig >.git/refs/tags/mytag 2>message'
+
+############################################################
+# 25. detect invalid header entry with --allow-missing-tagger
+
+cat >tag.sig <<EOF
+object $head
+type commit
+tag mytag
+this line should not be here
+EOF
+
+test_expect_success \
+    'detect invalid header entry with --allow-missing-tagger' \
+    '( test_must_fail git mktag --allow-missing-tagger <tag.sig 2>message ) &&
+       grep "^error: char70: trailing garbage in tag header$" message'
+
+############################################################
+# 26. create valid tag
 
 cat >tag.sig <<EOF
 object $head
@@ -355,7 +384,7 @@ test_expect_success \
     'git mktag <tag.sig >.git/refs/tags/mytag 2>message'
 
 ############################################################
-# 25. check mytag
+# 27. check mytag
 
 test_expect_success \
     'check mytag' \
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/4] filter-branch: reset $GIT_* before cleaning up
  2017-09-17  7:36 [PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format Ian Campbell
  2017-09-17  7:36 ` [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted Ian Campbell
@ 2017-09-17  7:36 ` Ian Campbell
  2017-09-17  7:36 ` [PATCH v2 3/4] filter-branch: preserve and restore $GIT_AUTHOR_* and $GIT_COMMITTER_* Ian Campbell
  2017-09-17  7:36 ` [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch Ian Campbell
  3 siblings, 0 replies; 8+ messages in thread
From: Ian Campbell @ 2017-09-17  7:36 UTC (permalink / raw)
  To: gitster; +Cc: git, Ian Campbell

This is pure code motion to enable a subsequent patch to add code which needs
to happen with the reset $GIT_* but before the temporary directory has been
cleaned up.

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
---
 git-filter-branch.sh | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 05645064a..e15c538f6 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -544,11 +544,6 @@ if [ "$filter_tag_name" ]; then
 	done
 fi
 
-cd "$orig_dir"
-rm -rf "$tempdir"
-
-trap - 0
-
 unset GIT_DIR GIT_WORK_TREE GIT_INDEX_FILE
 test -z "$ORIG_GIT_DIR" || {
 	GIT_DIR="$ORIG_GIT_DIR" && export GIT_DIR
@@ -562,6 +557,11 @@ test -z "$ORIG_GIT_INDEX_FILE" || {
 	export GIT_INDEX_FILE
 }
 
+cd "$orig_dir"
+rm -rf "$tempdir"
+
+trap - 0
+
 if [ "$(is_bare_repository)" = false ]; then
 	git read-tree -u -m HEAD || exit
 fi
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/4] filter-branch: preserve and restore $GIT_AUTHOR_* and $GIT_COMMITTER_*
  2017-09-17  7:36 [PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format Ian Campbell
  2017-09-17  7:36 ` [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted Ian Campbell
  2017-09-17  7:36 ` [PATCH v2 2/4] filter-branch: reset $GIT_* before cleaning up Ian Campbell
@ 2017-09-17  7:36 ` Ian Campbell
  2017-09-17  7:36 ` [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch Ian Campbell
  3 siblings, 0 replies; 8+ messages in thread
From: Ian Campbell @ 2017-09-17  7:36 UTC (permalink / raw)
  To: gitster; +Cc: git, Ian Campbell

These are modified by set_ident() but a subsequent patch would like to operate
on their original values.

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
---
 git-filter-branch.sh | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index e15c538f6..c88263965 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -219,6 +219,13 @@ trap 'cd "$orig_dir"; rm -rf "$tempdir"' 0
 ORIG_GIT_DIR="$GIT_DIR"
 ORIG_GIT_WORK_TREE="$GIT_WORK_TREE"
 ORIG_GIT_INDEX_FILE="$GIT_INDEX_FILE"
+ORIG_GIT_AUTHOR_NAME="$GIT_AUTHOR_NAME"
+ORIG_GIT_AUTHOR_EMAIL="$GIT_AUTHOR_EMAIL"
+ORIG_GIT_AUTHOR_DATE="$GIT_AUTHOR_DATE"
+ORIG_GIT_COMMITTER_NAME="$GIT_COMMITTER_NAME"
+ORIG_GIT_COMMITTER_EMAIL="$GIT_COMMITTER_EMAIL"
+ORIG_GIT_COMMITTER_DATE="$GIT_COMMITTER_DATE"
+
 GIT_WORK_TREE=.
 export GIT_DIR GIT_WORK_TREE
 
@@ -545,6 +552,8 @@ if [ "$filter_tag_name" ]; then
 fi
 
 unset GIT_DIR GIT_WORK_TREE GIT_INDEX_FILE
+unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE
+unset GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL GIT_COMMITTER_DATE
 test -z "$ORIG_GIT_DIR" || {
 	GIT_DIR="$ORIG_GIT_DIR" && export GIT_DIR
 }
@@ -556,6 +565,30 @@ test -z "$ORIG_GIT_INDEX_FILE" || {
 	GIT_INDEX_FILE="$ORIG_GIT_INDEX_FILE" &&
 	export GIT_INDEX_FILE
 }
+test -z "$ORIG_GIT_AUTHOR_NAME" || {
+	GIT_AUTHOR_NAME="$ORIG_GIT_AUTHOR_NAME" &&
+	export GIT_AUTHOR_NAME
+}
+test -z "$ORIG_GIT_AUTHOR_EMAIL" || {
+	GIT_AUTHOR_EMAIL="$ORIG_GIT_AUTHOR_EMAIL" &&
+	export GIT_AUTHOR_EMAIL
+}
+test -z "$ORIG_GIT_AUTHOR_DATE" || {
+	GIT_AUTHOR_DATE="$ORIG_GIT_AUTHOR_DATE" &&
+	export GIT_AUTHOR_DATE
+}
+test -z "$ORIG_GIT_COMMITTER_NAME" || {
+	GIT_COMMITTER_NAME="$ORIG_GIT_COMMITTER_NAME" &&
+	export GIT_COMMITTER_NAME
+}
+test -z "$ORIG_GIT_COMMITTER_EMAIL" || {
+	GIT_COMMITTER_EMAIL="$ORIG_GIT_COMMITTER_EMAIL" &&
+	export GIT_COMMITTER_EMAIL
+}
+test -z "$ORIG_GIT_COMMITTER_DATE" || {
+	GIT_COMMITTER_DATE="$ORIG_GIT_COMMITTER_DATE" &&
+	export GIT_COMMITTER_DATE
+}
 
 cd "$orig_dir"
 rm -rf "$tempdir"
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch
  2017-09-17  7:36 [PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format Ian Campbell
                   ` (2 preceding siblings ...)
  2017-09-17  7:36 ` [PATCH v2 3/4] filter-branch: preserve and restore $GIT_AUTHOR_* and $GIT_COMMITTER_* Ian Campbell
@ 2017-09-17  7:36 ` Ian Campbell
  2017-09-17  9:43   ` Ian Campbell
  3 siblings, 1 reply; 8+ messages in thread
From: Ian Campbell @ 2017-09-17  7:36 UTC (permalink / raw)
  To: gitster; +Cc: git, Ian Campbell

With "--state-branch=<branchname>" option, the mapping from old object names
and filtered ones in ./map/ directory is stashed away in the object database,
and the one from the previous run is read to populate the ./map/ directory,
allowing for incremental updates of large trees.

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
---
I have been using this as part of the device tree extraction from the Linux
kernel source since 2013, about time I sent the patch upstream!

v2:
- added several preceding cleanup patches, including:
  - new: use of mktag --allow-missing tagger.
  - split-out: preserving $GIT_*.
- use git rev-parse rather than git show-ref.
- improved error handling for Perl sub-processes.
- collapsed some shell pipelines involving piping output of git and ls into
  Perl into the Perl scripts.
- style fixes for conditionals and sub-shells.
- fixup indentation.
- added documentation.
- improved commit message.
---
 Documentation/git-filter-branch.txt |  8 +++++-
 git-filter-branch.sh                | 49 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
index 9e5169aa6..bebdcdec5 100644
--- a/Documentation/git-filter-branch.txt
+++ b/Documentation/git-filter-branch.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 	[--commit-filter <command>] [--tag-name-filter <command>]
 	[--subdirectory-filter <directory>] [--prune-empty]
 	[--original <namespace>] [-d <directory>] [-f | --force]
-	[--] [<rev-list options>...]
+	[--state-branch <branch>] [--] [<rev-list options>...]
 
 DESCRIPTION
 -----------
@@ -198,6 +198,12 @@ to other tags will be rewritten to point to the underlying commit.
 	directory or when there are already refs starting with
 	'refs/original/', unless forced.
 
+--state-branch <branch>::
+	This option will cause the mapping from old to new objects to
+	be loaded from named branch upon startup and saved as a new
+	commit to that branch upon exit, enabling incremental of large
+	trees. If '<branch>' does not exist it will be created.
+
 <rev-list options>...::
 	Arguments for 'git rev-list'.  All positive refs included by
 	these options are rewritten.  You may also specify options
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index c88263965..ab927c62d 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -86,7 +86,7 @@ USAGE="[--setup <command>] [--env-filter <command>]
 	[--parent-filter <command>] [--msg-filter <command>]
 	[--commit-filter <command>] [--tag-name-filter <command>]
 	[--subdirectory-filter <directory>] [--original <namespace>]
-	[-d <directory>] [-f | --force]
+	[-d <directory>] [-f | --force] [--state-branch <branch>]
 	[--] [<rev-list options>...]"
 
 OPTIONS_SPEC=
@@ -106,6 +106,7 @@ filter_msg=cat
 filter_commit=
 filter_tag_name=
 filter_subdir=
+state_branch=
 orig_namespace=refs/original/
 force=
 prune_empty=
@@ -181,6 +182,9 @@ do
 	--original)
 		orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/
 		;;
+	--state-branch)
+		state_branch="$OPTARG"
+		;;
 	*)
 		usage
 		;;
@@ -259,6 +263,26 @@ export GIT_INDEX_FILE
 # map old->new commit ids for rewriting parents
 mkdir ../map || die "Could not create map/ directory"
 
+if test -n "$state_branch"
+then
+	state_commit=$(git rev-parse --no-flags --revs-only "$state_branch")
+	if test -n "$state_commit"
+	then
+		echo "Populating map from $state_branch ($state_commit)" 1>&2
+		perl -e'open(MAP, "-|", "git show $ARGV[0]:filter.map") or die;
+			while (<MAP>) {
+				m/(.*):(.*)/ or die;
+				open F, ">../map/$1" or die;
+				print F "$2" or die;
+				close(F) or die;
+			}
+			close(MAP) or die;' "$state_commit" \
+				|| die "Unable to load state from $state_branch:filter.map"
+	else
+		echo "Branch $state_branch does not exist. Will create" 1>&2
+	fi
+fi
+
 # we need "--" only if there are no path arguments in $@
 nonrevs=$(git rev-parse --no-revs "$@") || exit
 if test -z "$nonrevs"
@@ -590,6 +614,29 @@ test -z "$ORIG_GIT_COMMITTER_DATE" || {
 	export GIT_COMMITTER_DATE
 }
 
+if test -n "$state_branch"
+then
+	echo "Saving rewrite state to $state_branch" 1>&2
+	state_blob=$(
+		perl -e'opendir D, "../map" or die;
+			open H, "|-", "git hash-object -w --stdin" or die;
+			foreach (sort readdir(D)) {
+				next if m/^\.\.?$/;
+				open F, "<../map/$_" or die;
+				chomp($f = <F>);
+				print H "$_:$f\n" or die;
+			}
+			close(H) or die;' || die "Unable to save state")
+	state_tree=$(/bin/echo -e "100644 blob $state_blob\tfilter.map" | git mktree)
+	if test -n "$state_commit"
+	then
+		state_commit=$(/bin/echo "Sync" | git commit-tree "$state_tree" -p "$state_commit")
+	else
+		state_commit=$(/bin/echo "Sync" | git commit-tree "$state_tree" )
+	fi
+	git update-ref "$state_branch" "$state_commit"
+fi
+
 cd "$orig_dir"
 rm -rf "$tempdir"
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch
  2017-09-17  7:36 ` [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch Ian Campbell
@ 2017-09-17  9:43   ` Ian Campbell
  0 siblings, 0 replies; 8+ messages in thread
From: Ian Campbell @ 2017-09-17  9:43 UTC (permalink / raw)
  To: gitster; +Cc: git

On Sun, 2017-09-17 at 08:36 +0100, Ian Campbell wrote:
> +if test -n "$state_branch"
> +then
> > +	echo "Saving rewrite state to $state_branch" 1>&2
> > +	state_blob=$(
> > +		perl -e'opendir D, "../map" or die;
> > +			open H, "|-", "git hash-object -w --stdin" or die;
> > +			foreach (sort readdir(D)) {
> > +				next if m/^\.\.?$/;
> > +				open F, "<../map/$_" or die;
> > +				chomp($f = <F>);
> > +				print H "$_:$f\n" or die;
> > +			}
> > +			close(H) or die;' || die "Unable to save state")

One things I've noticed is that for a full Linux tree history the
filter.map file is 50M+ which causes github to complain:

    remote: warning: File filter.map is 54.40 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB

(you can simulate this with `git log --pretty=format:"%H:%H"
upstream/master`.) I suppose that's not a bad recommendation for any
infra, not just GH's.

The blob is compressed in the object store so there isn't _much_ point
in compressing the map (also, it only goes down to ~30MB anyway so we
aren't buying all that much time), but I'm wondering if perhaps I
should look into a more intelligent representation, perhaps hashed by
the first two characters (as .git/objects is) to divide into several
blobs and have two levels.

I'm also wondering if the .git-rewrite/map directory, which will have
70k+ (and growing) directory entries for a modern Linux tree, would
benefit from the same sort of thing. OTOH in this case the extra shell
machinations to turn abcdef123 into ab/cdef123 might overwhelm the
savings in directory lookup time (unless there is a helper already for
that. That assume that directory lookup is even a bottleneck, I've not
measured but anecdotally/gut-feeling the commits-per-second does seem
to be decreasing over the course of the filtering process.

Ian.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted
  2017-09-17  7:36 ` [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted Ian Campbell
@ 2017-09-19  3:01   ` Junio C Hamano
  2017-09-19  6:42     ` Ian Campbell
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2017-09-19  3:01 UTC (permalink / raw)
  To: Ian Campbell; +Cc: git

Ian Campbell <ijc@hellion.org.uk> writes:

> This can be useful e.g. in `filter-branch` when rewritting tags produced by
> older versions of Git, such as v2.6.12-rc2..v2.6.13-rc3 in the Linux kernel
> source tree:
>
>         $ git cat-file tag v2.6.12-rc2
>         object 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
>         type commit
>         tag v2.6.12-rc2
>
>         Linux v2.6.12-rc2 release
>         -----BEGIN PGP SIGNATURE-----
>         Version: GnuPG v1.2.4 (GNU/Linux)
>
>         iD8DBQBCbW8ZF3YsRnbiHLsRAgFRAKCq/TkuDaEombFABkPqYgGCgWN2lQCcC0qc
>         wznDbFU45A54dZC8RZ5JxyE=
>         =ESRP
>         -----END PGP SIGNATURE-----
>
>         $ git cat-file tag v2.6.12-rc2 | git mktag
>         error: char76: could not find "tagger "
>         fatal: invalid tag signature file
>         $ git cat-file tag v2.6.12-rc2 | git mktag --allow-missing-tagger
>         9e734775f7c22d2f89943ad6c745571f1930105f
>
> To that end, pass the new option to `mktag` in `filter-branch`.

Hmph.  I cannot shake this nagging feeling that this is probably a
solution that is overly narrow to a single problem that won't scale
into the future.

As there is no guarantee that "missing-tagger" will stay to be the
only kind of broken tag objects we'd produce in one version, later
we notice our mistake and then forbid in another version, with the
approach to add '--allow-missing-tagger' optoin would imply that
we'd end up adding more and more such options, and filter-branch
will need to use all these '--allow-this-breakage' options we would
ever add.  Even though I fully agree with the problem you are trying
to solve (i.e. we want to be able to replay an old history without
our tool rejecting the data we have), it was my first reaction when
I read this patch.  IOW, my first reaction was "perhaps a single
option '--allow-broken' to cover the currently known and any future
shape of malformat over tag data is more appropriate".

But then, if we look at the body of cmd_mktag(), it looks like this:

	int cmd_mktag(...)
	{
		read input into strbuf buf;
		call verify_tag on buf to sanity check;
		call write_sha1_file() the contents of buf as a tag;
		report the object name;
	}

If we drop the "verification" step from the above, that essentially
becomes an equivaent to "hash-object -t tag -w --stdin".

So I now have to wonder if it may be sufficient to use "hash-object"
in filter-branch, without doing this "allow malformed data that we
would not permit if the tag were being created today, only to help
replaying an old, already broken data" change to "git mktag".

Is there something that makes "hash-object" insufficient (like it
still does some extra checks we would want to disable and cannot
work as a replacement for your "--allow-missing-tagger")?

Thanks.

> Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
> ---
>  Documentation/git-mktag.txt |   9 +++-
>  builtin/mktag.c             | 100 +++++++++++++++++++++++++-------------------
>  git-filter-branch.sh        |   2 +-
>  t/t3800-mktag.sh            |  33 ++++++++++++++-
>  4 files changed, 98 insertions(+), 46 deletions(-)
>
> diff --git a/Documentation/git-mktag.txt b/Documentation/git-mktag.txt
> index fa6a75612..c720c7419 100644
> --- a/Documentation/git-mktag.txt
> +++ b/Documentation/git-mktag.txt
> @@ -9,7 +9,7 @@ git-mktag - Creates a tag object
>  SYNOPSIS
>  --------
>  [verse]
> -'git mktag'
> +'git mktag' [--allow-missing-tagger]
>  
>  DESCRIPTION
>  -----------
> @@ -34,6 +34,13 @@ exists, is separated by a blank line from the header.  The
>  message part may contain a signature that Git itself doesn't
>  care about, but that can be verified with gpg.
>  
> +OPTIONS
> +-------
> +--allow-missing-tagger::
> +	Allow the `tagger` line in the header to be omitted. This is
> +	rarely desirable but may be useful in recreating tags created
> +	by older Git.
> +
>  GIT
>  ---
>  Part of the linkgit:git[1] suite
> diff --git a/builtin/mktag.c b/builtin/mktag.c
> index 031b750f0..0f5dae8d5 100644
> --- a/builtin/mktag.c
> +++ b/builtin/mktag.c
> @@ -1,4 +1,5 @@
>  #include "builtin.h"
> +#include "parse-options.h"
>  #include "tag.h"
>  
>  /*
> @@ -15,6 +16,8 @@
>   * the shortest possible tagger-line.
>   */
>  
> +static int allow_missing_tagger;
> +
>  /*
>   * We refuse to tag something we can't verify. Just because.
>   */
> @@ -41,8 +44,9 @@ static int verify_tag(char *buffer, unsigned long size)
>  	unsigned char sha1[20];
>  	const char *object, *type_line, *tag_line, *tagger_line, *lb, *rb;
>  	size_t len;
> +	const unsigned long min_size = allow_missing_tagger ? 71 : 84;
>  
> -	if (size < 84)
> +	if (size < min_size)
>  		return error("wanna fool me ? you obviously got the size wrong !");
>  
>  	buffer[size] = 0;
> @@ -98,46 +102,46 @@ static int verify_tag(char *buffer, unsigned long size)
>  	/* Verify the tagger line */
>  	tagger_line = tag_line;
>  
> -	if (memcmp(tagger_line, "tagger ", 7))
> +	if (!memcmp(tagger_line, "tagger ", 7)) {
> +		/*
> +		 * Check for correct form for name and email
> +		 * i.e. " <" followed by "> " on _this_ line
> +		 * No angle brackets within the name or email address fields.
> +		 * No spaces within the email address field.
> +		 */
> +		tagger_line += 7;
> +		if (!(lb = strstr(tagger_line, " <")) || !(rb = strstr(lb+2, "> ")) ||
> +			strpbrk(tagger_line, "<>\n") != lb+1 ||
> +			strpbrk(lb+2, "><\n ") != rb)
> +			return error("char%"PRIuMAX": malformed tagger field",
> +				(uintmax_t) (tagger_line - buffer));
> +
> +		/* Check for author name, at least one character, space is acceptable */
> +		if (lb == tagger_line)
> +			return error("char%"PRIuMAX": missing tagger name",
> +				(uintmax_t) (tagger_line - buffer));
> +
> +		/* timestamp, 1 or more digits followed by space */
> +		tagger_line = rb + 2;
> +		if (!(len = strspn(tagger_line, "0123456789")))
> +			return error("char%"PRIuMAX": missing tag timestamp",
> +				(uintmax_t) (tagger_line - buffer));
> +		tagger_line += len;
> +		if (*tagger_line != ' ')
> +			return error("char%"PRIuMAX": malformed tag timestamp",
> +				(uintmax_t) (tagger_line - buffer));
> +		tagger_line++;
> +
> +		/* timezone, 5 digits [+-]hhmm, max. 1400 */
> +		if (!((tagger_line[0] == '+' || tagger_line[0] == '-') &&
> +		      strspn(tagger_line+1, "0123456789") == 4 &&
> +		      tagger_line[5] == '\n' && atoi(tagger_line+1) <= 1400))
> +			return error("char%"PRIuMAX": malformed tag timezone",
> +				(uintmax_t) (tagger_line - buffer));
> +		tagger_line += 6;
> +	} else if (!allow_missing_tagger)
>  		return error("char%"PRIuMAX": could not find \"tagger \"",
> -			(uintmax_t) (tagger_line - buffer));
> -
> -	/*
> -	 * Check for correct form for name and email
> -	 * i.e. " <" followed by "> " on _this_ line
> -	 * No angle brackets within the name or email address fields.
> -	 * No spaces within the email address field.
> -	 */
> -	tagger_line += 7;
> -	if (!(lb = strstr(tagger_line, " <")) || !(rb = strstr(lb+2, "> ")) ||
> -		strpbrk(tagger_line, "<>\n") != lb+1 ||
> -		strpbrk(lb+2, "><\n ") != rb)
> -		return error("char%"PRIuMAX": malformed tagger field",
> -			(uintmax_t) (tagger_line - buffer));
> -
> -	/* Check for author name, at least one character, space is acceptable */
> -	if (lb == tagger_line)
> -		return error("char%"PRIuMAX": missing tagger name",
> -			(uintmax_t) (tagger_line - buffer));
> -
> -	/* timestamp, 1 or more digits followed by space */
> -	tagger_line = rb + 2;
> -	if (!(len = strspn(tagger_line, "0123456789")))
> -		return error("char%"PRIuMAX": missing tag timestamp",
> -			(uintmax_t) (tagger_line - buffer));
> -	tagger_line += len;
> -	if (*tagger_line != ' ')
> -		return error("char%"PRIuMAX": malformed tag timestamp",
> -			(uintmax_t) (tagger_line - buffer));
> -	tagger_line++;
> -
> -	/* timezone, 5 digits [+-]hhmm, max. 1400 */
> -	if (!((tagger_line[0] == '+' || tagger_line[0] == '-') &&
> -	      strspn(tagger_line+1, "0123456789") == 4 &&
> -	      tagger_line[5] == '\n' && atoi(tagger_line+1) <= 1400))
> -		return error("char%"PRIuMAX": malformed tag timezone",
> -			(uintmax_t) (tagger_line - buffer));
> -	tagger_line += 6;
> +			     (uintmax_t) (tagger_line - buffer));
>  
>  	/* Verify the blank line separating the header from the body */
>  	if (*tagger_line != '\n')
> @@ -148,13 +152,25 @@ static int verify_tag(char *buffer, unsigned long size)
>  	return 0;
>  }
>  
> +static char const * const mktag_usage[] = {
> +	N_("git mktag [<options>]"),
> +	NULL
> +};
> +
> +static struct option mktag_opts[] = {
> +	OPT_BOOL(0, "allow-missing-tagger", &allow_missing_tagger, N_("allow the tagger field to be omitted")),
> +	OPT_END(),
> +};
> +
>  int cmd_mktag(int argc, const char **argv, const char *prefix)
>  {
>  	struct strbuf buf = STRBUF_INIT;
>  	unsigned char result_sha1[20];
>  
> -	if (argc != 1)
> -		usage("git mktag");
> +	argc = parse_options(argc, argv, prefix, mktag_opts, mktag_usage, 0);
> +
> +	if (argc != 0)
> +		usage_with_options(mktag_usage, mktag_opts);
>  
>  	if (strbuf_read(&buf, 0, 4096) < 0) {
>  		die_errno("could not read from stdin");
> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> index 3a74602ef..05645064a 100755
> --- a/git-filter-branch.sh
> +++ b/git-filter-branch.sh
> @@ -530,7 +530,7 @@ if [ "$filter_tag_name" ]; then
>  					}' \
>  				    -e '/^-----BEGIN PGP SIGNATURE-----/q' \
>  				    -e 'p' ) |
> -				git mktag) ||
> +				git mktag --allow-missing-tagger) ||
>  				die "Could not create new tag object for $ref"
>  			if git cat-file tag "$ref" | \
>  			   sane_grep '^-----BEGIN PGP SIGNATURE-----' >/dev/null 2>&1
> diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
> index 8eb47942e..3a77a26c8 100755
> --- a/t/t3800-mktag.sh
> +++ b/t/t3800-mktag.sh
> @@ -340,7 +340,36 @@ check_verify_failure 'detect invalid header entry' \
>  	'^error: char124: trailing garbage in tag header$'
>  
>  ############################################################
> -# 24. create valid tag
> +# 24. missing tagger ok with --allow-missing-tagger
> +
> +cat >tag.sig <<EOF
> +object $head
> +type commit
> +tag mytag
> +
> +EOF
> +
> +test_expect_success \
> +    'missing tagger with --allow-missing-tagger' \
> +    'git mktag --allow-missing-tagger <tag.sig >.git/refs/tags/mytag 2>message'
> +
> +############################################################
> +# 25. detect invalid header entry with --allow-missing-tagger
> +
> +cat >tag.sig <<EOF
> +object $head
> +type commit
> +tag mytag
> +this line should not be here
> +EOF
> +
> +test_expect_success \
> +    'detect invalid header entry with --allow-missing-tagger' \
> +    '( test_must_fail git mktag --allow-missing-tagger <tag.sig 2>message ) &&
> +       grep "^error: char70: trailing garbage in tag header$" message'
> +
> +############################################################
> +# 26. create valid tag
>  
>  cat >tag.sig <<EOF
>  object $head
> @@ -355,7 +384,7 @@ test_expect_success \
>      'git mktag <tag.sig >.git/refs/tags/mytag 2>message'
>  
>  ############################################################
> -# 25. check mytag
> +# 27. check mytag
>  
>  test_expect_success \
>      'check mytag' \

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted
  2017-09-19  3:01   ` Junio C Hamano
@ 2017-09-19  6:42     ` Ian Campbell
  0 siblings, 0 replies; 8+ messages in thread
From: Ian Campbell @ 2017-09-19  6:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, 2017-09-19 at 12:01 +0900, Junio C Hamano wrote:
> 
> Hmph.  I cannot shake this nagging feeling that this is probably a
> solution that is overly narrow to a single problem that won't scale
> into the future.
> 
> [...snip good point...]
> 
> If we drop the "verification" step from the above, that essentially
> becomes an equivaent to "hash-object -t tag -w --stdin".
> 
> So I now have to wonder if it may be sufficient to use "hash-object"
> in filter-branch, without doing this "allow malformed data that we
> would not permit if the tag were being created today, only to help
> replaying an old, already broken data" change to "git mktag".
> 
> Is there something that makes "hash-object" insufficient (like it
> still does some extra checks we would want to disable and cannot
> work as a replacement for your "--allow-missing-tagger")?

I've done a couple of quick tests and it looks like it will work. I'll
run a few more checks and repost.

Ian.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-09-19  6:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-17  7:36 [PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format Ian Campbell
2017-09-17  7:36 ` [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted Ian Campbell
2017-09-19  3:01   ` Junio C Hamano
2017-09-19  6:42     ` Ian Campbell
2017-09-17  7:36 ` [PATCH v2 2/4] filter-branch: reset $GIT_* before cleaning up Ian Campbell
2017-09-17  7:36 ` [PATCH v2 3/4] filter-branch: preserve and restore $GIT_AUTHOR_* and $GIT_COMMITTER_* Ian Campbell
2017-09-17  7:36 ` [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch Ian Campbell
2017-09-17  9:43   ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.