* [PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format @ 2017-09-17 7:36 Ian Campbell 2017-09-17 7:36 ` [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted Ian Campbell ` (3 more replies) 0 siblings, 4 replies; 8+ messages in thread From: Ian Campbell @ 2017-09-17 7:36 UTC (permalink / raw) To: Junio C Hamano; +Cc: git This is the second version of my patches to add incremental support to git-filter-branch. Since the last time I have: * addressed the review feedback (see changelog embedded in final patch) * switched to using the (newly introduced) `--allow-missing-tagger` option to `git mktag` to allow early Linux kernel tags to be rewritten * split out some of the preparatory changes to make the final patch easier to read. I've force pushed to [1] where Travis seems happy and have set off the process of re-rewriting the devicetree tree from scratch (a multi-day affair) to validate (it's looking good). Ian. [0] https://git.kernel.org/pub/scm/linux/kernel/git/devicetree/devicetree-rebasing.git/ [1] https://github.com/ijc/git/tree/git-filter-branch ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted 2017-09-17 7:36 [PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format Ian Campbell @ 2017-09-17 7:36 ` Ian Campbell 2017-09-19 3:01 ` Junio C Hamano 2017-09-17 7:36 ` [PATCH v2 2/4] filter-branch: reset $GIT_* before cleaning up Ian Campbell ` (2 subsequent siblings) 3 siblings, 1 reply; 8+ messages in thread From: Ian Campbell @ 2017-09-17 7:36 UTC (permalink / raw) To: gitster; +Cc: git, Ian Campbell This can be useful e.g. in `filter-branch` when rewritting tags produced by older versions of Git, such as v2.6.12-rc2..v2.6.13-rc3 in the Linux kernel source tree: $ git cat-file tag v2.6.12-rc2 object 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 type commit tag v2.6.12-rc2 Linux v2.6.12-rc2 release -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQBCbW8ZF3YsRnbiHLsRAgFRAKCq/TkuDaEombFABkPqYgGCgWN2lQCcC0qc wznDbFU45A54dZC8RZ5JxyE= =ESRP -----END PGP SIGNATURE----- $ git cat-file tag v2.6.12-rc2 | git mktag error: char76: could not find "tagger " fatal: invalid tag signature file $ git cat-file tag v2.6.12-rc2 | git mktag --allow-missing-tagger 9e734775f7c22d2f89943ad6c745571f1930105f To that end, pass the new option to `mktag` in `filter-branch`. Signed-off-by: Ian Campbell <ijc@hellion.org.uk> --- Documentation/git-mktag.txt | 9 +++- builtin/mktag.c | 100 +++++++++++++++++++++++++------------------- git-filter-branch.sh | 2 +- t/t3800-mktag.sh | 33 ++++++++++++++- 4 files changed, 98 insertions(+), 46 deletions(-) diff --git a/Documentation/git-mktag.txt b/Documentation/git-mktag.txt index fa6a75612..c720c7419 100644 --- a/Documentation/git-mktag.txt +++ b/Documentation/git-mktag.txt @@ -9,7 +9,7 @@ git-mktag - Creates a tag object SYNOPSIS -------- [verse] -'git mktag' +'git mktag' [--allow-missing-tagger] DESCRIPTION ----------- @@ -34,6 +34,13 @@ exists, is separated by a blank line from the header. The message part may contain a signature that Git itself doesn't care about, but that can be verified with gpg. +OPTIONS +------- +--allow-missing-tagger:: + Allow the `tagger` line in the header to be omitted. This is + rarely desirable but may be useful in recreating tags created + by older Git. + GIT --- Part of the linkgit:git[1] suite diff --git a/builtin/mktag.c b/builtin/mktag.c index 031b750f0..0f5dae8d5 100644 --- a/builtin/mktag.c +++ b/builtin/mktag.c @@ -1,4 +1,5 @@ #include "builtin.h" +#include "parse-options.h" #include "tag.h" /* @@ -15,6 +16,8 @@ * the shortest possible tagger-line. */ +static int allow_missing_tagger; + /* * We refuse to tag something we can't verify. Just because. */ @@ -41,8 +44,9 @@ static int verify_tag(char *buffer, unsigned long size) unsigned char sha1[20]; const char *object, *type_line, *tag_line, *tagger_line, *lb, *rb; size_t len; + const unsigned long min_size = allow_missing_tagger ? 71 : 84; - if (size < 84) + if (size < min_size) return error("wanna fool me ? you obviously got the size wrong !"); buffer[size] = 0; @@ -98,46 +102,46 @@ static int verify_tag(char *buffer, unsigned long size) /* Verify the tagger line */ tagger_line = tag_line; - if (memcmp(tagger_line, "tagger ", 7)) + if (!memcmp(tagger_line, "tagger ", 7)) { + /* + * Check for correct form for name and email + * i.e. " <" followed by "> " on _this_ line + * No angle brackets within the name or email address fields. + * No spaces within the email address field. + */ + tagger_line += 7; + if (!(lb = strstr(tagger_line, " <")) || !(rb = strstr(lb+2, "> ")) || + strpbrk(tagger_line, "<>\n") != lb+1 || + strpbrk(lb+2, "><\n ") != rb) + return error("char%"PRIuMAX": malformed tagger field", + (uintmax_t) (tagger_line - buffer)); + + /* Check for author name, at least one character, space is acceptable */ + if (lb == tagger_line) + return error("char%"PRIuMAX": missing tagger name", + (uintmax_t) (tagger_line - buffer)); + + /* timestamp, 1 or more digits followed by space */ + tagger_line = rb + 2; + if (!(len = strspn(tagger_line, "0123456789"))) + return error("char%"PRIuMAX": missing tag timestamp", + (uintmax_t) (tagger_line - buffer)); + tagger_line += len; + if (*tagger_line != ' ') + return error("char%"PRIuMAX": malformed tag timestamp", + (uintmax_t) (tagger_line - buffer)); + tagger_line++; + + /* timezone, 5 digits [+-]hhmm, max. 1400 */ + if (!((tagger_line[0] == '+' || tagger_line[0] == '-') && + strspn(tagger_line+1, "0123456789") == 4 && + tagger_line[5] == '\n' && atoi(tagger_line+1) <= 1400)) + return error("char%"PRIuMAX": malformed tag timezone", + (uintmax_t) (tagger_line - buffer)); + tagger_line += 6; + } else if (!allow_missing_tagger) return error("char%"PRIuMAX": could not find \"tagger \"", - (uintmax_t) (tagger_line - buffer)); - - /* - * Check for correct form for name and email - * i.e. " <" followed by "> " on _this_ line - * No angle brackets within the name or email address fields. - * No spaces within the email address field. - */ - tagger_line += 7; - if (!(lb = strstr(tagger_line, " <")) || !(rb = strstr(lb+2, "> ")) || - strpbrk(tagger_line, "<>\n") != lb+1 || - strpbrk(lb+2, "><\n ") != rb) - return error("char%"PRIuMAX": malformed tagger field", - (uintmax_t) (tagger_line - buffer)); - - /* Check for author name, at least one character, space is acceptable */ - if (lb == tagger_line) - return error("char%"PRIuMAX": missing tagger name", - (uintmax_t) (tagger_line - buffer)); - - /* timestamp, 1 or more digits followed by space */ - tagger_line = rb + 2; - if (!(len = strspn(tagger_line, "0123456789"))) - return error("char%"PRIuMAX": missing tag timestamp", - (uintmax_t) (tagger_line - buffer)); - tagger_line += len; - if (*tagger_line != ' ') - return error("char%"PRIuMAX": malformed tag timestamp", - (uintmax_t) (tagger_line - buffer)); - tagger_line++; - - /* timezone, 5 digits [+-]hhmm, max. 1400 */ - if (!((tagger_line[0] == '+' || tagger_line[0] == '-') && - strspn(tagger_line+1, "0123456789") == 4 && - tagger_line[5] == '\n' && atoi(tagger_line+1) <= 1400)) - return error("char%"PRIuMAX": malformed tag timezone", - (uintmax_t) (tagger_line - buffer)); - tagger_line += 6; + (uintmax_t) (tagger_line - buffer)); /* Verify the blank line separating the header from the body */ if (*tagger_line != '\n') @@ -148,13 +152,25 @@ static int verify_tag(char *buffer, unsigned long size) return 0; } +static char const * const mktag_usage[] = { + N_("git mktag [<options>]"), + NULL +}; + +static struct option mktag_opts[] = { + OPT_BOOL(0, "allow-missing-tagger", &allow_missing_tagger, N_("allow the tagger field to be omitted")), + OPT_END(), +}; + int cmd_mktag(int argc, const char **argv, const char *prefix) { struct strbuf buf = STRBUF_INIT; unsigned char result_sha1[20]; - if (argc != 1) - usage("git mktag"); + argc = parse_options(argc, argv, prefix, mktag_opts, mktag_usage, 0); + + if (argc != 0) + usage_with_options(mktag_usage, mktag_opts); if (strbuf_read(&buf, 0, 4096) < 0) { die_errno("could not read from stdin"); diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 3a74602ef..05645064a 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -530,7 +530,7 @@ if [ "$filter_tag_name" ]; then }' \ -e '/^-----BEGIN PGP SIGNATURE-----/q' \ -e 'p' ) | - git mktag) || + git mktag --allow-missing-tagger) || die "Could not create new tag object for $ref" if git cat-file tag "$ref" | \ sane_grep '^-----BEGIN PGP SIGNATURE-----' >/dev/null 2>&1 diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh index 8eb47942e..3a77a26c8 100755 --- a/t/t3800-mktag.sh +++ b/t/t3800-mktag.sh @@ -340,7 +340,36 @@ check_verify_failure 'detect invalid header entry' \ '^error: char124: trailing garbage in tag header$' ############################################################ -# 24. create valid tag +# 24. missing tagger ok with --allow-missing-tagger + +cat >tag.sig <<EOF +object $head +type commit +tag mytag + +EOF + +test_expect_success \ + 'missing tagger with --allow-missing-tagger' \ + 'git mktag --allow-missing-tagger <tag.sig >.git/refs/tags/mytag 2>message' + +############################################################ +# 25. detect invalid header entry with --allow-missing-tagger + +cat >tag.sig <<EOF +object $head +type commit +tag mytag +this line should not be here +EOF + +test_expect_success \ + 'detect invalid header entry with --allow-missing-tagger' \ + '( test_must_fail git mktag --allow-missing-tagger <tag.sig 2>message ) && + grep "^error: char70: trailing garbage in tag header$" message' + +############################################################ +# 26. create valid tag cat >tag.sig <<EOF object $head @@ -355,7 +384,7 @@ test_expect_success \ 'git mktag <tag.sig >.git/refs/tags/mytag 2>message' ############################################################ -# 25. check mytag +# 27. check mytag test_expect_success \ 'check mytag' \ -- 2.11.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted 2017-09-17 7:36 ` [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted Ian Campbell @ 2017-09-19 3:01 ` Junio C Hamano 2017-09-19 6:42 ` Ian Campbell 0 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2017-09-19 3:01 UTC (permalink / raw) To: Ian Campbell; +Cc: git Ian Campbell <ijc@hellion.org.uk> writes: > This can be useful e.g. in `filter-branch` when rewritting tags produced by > older versions of Git, such as v2.6.12-rc2..v2.6.13-rc3 in the Linux kernel > source tree: > > $ git cat-file tag v2.6.12-rc2 > object 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 > type commit > tag v2.6.12-rc2 > > Linux v2.6.12-rc2 release > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.4 (GNU/Linux) > > iD8DBQBCbW8ZF3YsRnbiHLsRAgFRAKCq/TkuDaEombFABkPqYgGCgWN2lQCcC0qc > wznDbFU45A54dZC8RZ5JxyE= > =ESRP > -----END PGP SIGNATURE----- > > $ git cat-file tag v2.6.12-rc2 | git mktag > error: char76: could not find "tagger " > fatal: invalid tag signature file > $ git cat-file tag v2.6.12-rc2 | git mktag --allow-missing-tagger > 9e734775f7c22d2f89943ad6c745571f1930105f > > To that end, pass the new option to `mktag` in `filter-branch`. Hmph. I cannot shake this nagging feeling that this is probably a solution that is overly narrow to a single problem that won't scale into the future. As there is no guarantee that "missing-tagger" will stay to be the only kind of broken tag objects we'd produce in one version, later we notice our mistake and then forbid in another version, with the approach to add '--allow-missing-tagger' optoin would imply that we'd end up adding more and more such options, and filter-branch will need to use all these '--allow-this-breakage' options we would ever add. Even though I fully agree with the problem you are trying to solve (i.e. we want to be able to replay an old history without our tool rejecting the data we have), it was my first reaction when I read this patch. IOW, my first reaction was "perhaps a single option '--allow-broken' to cover the currently known and any future shape of malformat over tag data is more appropriate". But then, if we look at the body of cmd_mktag(), it looks like this: int cmd_mktag(...) { read input into strbuf buf; call verify_tag on buf to sanity check; call write_sha1_file() the contents of buf as a tag; report the object name; } If we drop the "verification" step from the above, that essentially becomes an equivaent to "hash-object -t tag -w --stdin". So I now have to wonder if it may be sufficient to use "hash-object" in filter-branch, without doing this "allow malformed data that we would not permit if the tag were being created today, only to help replaying an old, already broken data" change to "git mktag". Is there something that makes "hash-object" insufficient (like it still does some extra checks we would want to disable and cannot work as a replacement for your "--allow-missing-tagger")? Thanks. > Signed-off-by: Ian Campbell <ijc@hellion.org.uk> > --- > Documentation/git-mktag.txt | 9 +++- > builtin/mktag.c | 100 +++++++++++++++++++++++++------------------- > git-filter-branch.sh | 2 +- > t/t3800-mktag.sh | 33 ++++++++++++++- > 4 files changed, 98 insertions(+), 46 deletions(-) > > diff --git a/Documentation/git-mktag.txt b/Documentation/git-mktag.txt > index fa6a75612..c720c7419 100644 > --- a/Documentation/git-mktag.txt > +++ b/Documentation/git-mktag.txt > @@ -9,7 +9,7 @@ git-mktag - Creates a tag object > SYNOPSIS > -------- > [verse] > -'git mktag' > +'git mktag' [--allow-missing-tagger] > > DESCRIPTION > ----------- > @@ -34,6 +34,13 @@ exists, is separated by a blank line from the header. The > message part may contain a signature that Git itself doesn't > care about, but that can be verified with gpg. > > +OPTIONS > +------- > +--allow-missing-tagger:: > + Allow the `tagger` line in the header to be omitted. This is > + rarely desirable but may be useful in recreating tags created > + by older Git. > + > GIT > --- > Part of the linkgit:git[1] suite > diff --git a/builtin/mktag.c b/builtin/mktag.c > index 031b750f0..0f5dae8d5 100644 > --- a/builtin/mktag.c > +++ b/builtin/mktag.c > @@ -1,4 +1,5 @@ > #include "builtin.h" > +#include "parse-options.h" > #include "tag.h" > > /* > @@ -15,6 +16,8 @@ > * the shortest possible tagger-line. > */ > > +static int allow_missing_tagger; > + > /* > * We refuse to tag something we can't verify. Just because. > */ > @@ -41,8 +44,9 @@ static int verify_tag(char *buffer, unsigned long size) > unsigned char sha1[20]; > const char *object, *type_line, *tag_line, *tagger_line, *lb, *rb; > size_t len; > + const unsigned long min_size = allow_missing_tagger ? 71 : 84; > > - if (size < 84) > + if (size < min_size) > return error("wanna fool me ? you obviously got the size wrong !"); > > buffer[size] = 0; > @@ -98,46 +102,46 @@ static int verify_tag(char *buffer, unsigned long size) > /* Verify the tagger line */ > tagger_line = tag_line; > > - if (memcmp(tagger_line, "tagger ", 7)) > + if (!memcmp(tagger_line, "tagger ", 7)) { > + /* > + * Check for correct form for name and email > + * i.e. " <" followed by "> " on _this_ line > + * No angle brackets within the name or email address fields. > + * No spaces within the email address field. > + */ > + tagger_line += 7; > + if (!(lb = strstr(tagger_line, " <")) || !(rb = strstr(lb+2, "> ")) || > + strpbrk(tagger_line, "<>\n") != lb+1 || > + strpbrk(lb+2, "><\n ") != rb) > + return error("char%"PRIuMAX": malformed tagger field", > + (uintmax_t) (tagger_line - buffer)); > + > + /* Check for author name, at least one character, space is acceptable */ > + if (lb == tagger_line) > + return error("char%"PRIuMAX": missing tagger name", > + (uintmax_t) (tagger_line - buffer)); > + > + /* timestamp, 1 or more digits followed by space */ > + tagger_line = rb + 2; > + if (!(len = strspn(tagger_line, "0123456789"))) > + return error("char%"PRIuMAX": missing tag timestamp", > + (uintmax_t) (tagger_line - buffer)); > + tagger_line += len; > + if (*tagger_line != ' ') > + return error("char%"PRIuMAX": malformed tag timestamp", > + (uintmax_t) (tagger_line - buffer)); > + tagger_line++; > + > + /* timezone, 5 digits [+-]hhmm, max. 1400 */ > + if (!((tagger_line[0] == '+' || tagger_line[0] == '-') && > + strspn(tagger_line+1, "0123456789") == 4 && > + tagger_line[5] == '\n' && atoi(tagger_line+1) <= 1400)) > + return error("char%"PRIuMAX": malformed tag timezone", > + (uintmax_t) (tagger_line - buffer)); > + tagger_line += 6; > + } else if (!allow_missing_tagger) > return error("char%"PRIuMAX": could not find \"tagger \"", > - (uintmax_t) (tagger_line - buffer)); > - > - /* > - * Check for correct form for name and email > - * i.e. " <" followed by "> " on _this_ line > - * No angle brackets within the name or email address fields. > - * No spaces within the email address field. > - */ > - tagger_line += 7; > - if (!(lb = strstr(tagger_line, " <")) || !(rb = strstr(lb+2, "> ")) || > - strpbrk(tagger_line, "<>\n") != lb+1 || > - strpbrk(lb+2, "><\n ") != rb) > - return error("char%"PRIuMAX": malformed tagger field", > - (uintmax_t) (tagger_line - buffer)); > - > - /* Check for author name, at least one character, space is acceptable */ > - if (lb == tagger_line) > - return error("char%"PRIuMAX": missing tagger name", > - (uintmax_t) (tagger_line - buffer)); > - > - /* timestamp, 1 or more digits followed by space */ > - tagger_line = rb + 2; > - if (!(len = strspn(tagger_line, "0123456789"))) > - return error("char%"PRIuMAX": missing tag timestamp", > - (uintmax_t) (tagger_line - buffer)); > - tagger_line += len; > - if (*tagger_line != ' ') > - return error("char%"PRIuMAX": malformed tag timestamp", > - (uintmax_t) (tagger_line - buffer)); > - tagger_line++; > - > - /* timezone, 5 digits [+-]hhmm, max. 1400 */ > - if (!((tagger_line[0] == '+' || tagger_line[0] == '-') && > - strspn(tagger_line+1, "0123456789") == 4 && > - tagger_line[5] == '\n' && atoi(tagger_line+1) <= 1400)) > - return error("char%"PRIuMAX": malformed tag timezone", > - (uintmax_t) (tagger_line - buffer)); > - tagger_line += 6; > + (uintmax_t) (tagger_line - buffer)); > > /* Verify the blank line separating the header from the body */ > if (*tagger_line != '\n') > @@ -148,13 +152,25 @@ static int verify_tag(char *buffer, unsigned long size) > return 0; > } > > +static char const * const mktag_usage[] = { > + N_("git mktag [<options>]"), > + NULL > +}; > + > +static struct option mktag_opts[] = { > + OPT_BOOL(0, "allow-missing-tagger", &allow_missing_tagger, N_("allow the tagger field to be omitted")), > + OPT_END(), > +}; > + > int cmd_mktag(int argc, const char **argv, const char *prefix) > { > struct strbuf buf = STRBUF_INIT; > unsigned char result_sha1[20]; > > - if (argc != 1) > - usage("git mktag"); > + argc = parse_options(argc, argv, prefix, mktag_opts, mktag_usage, 0); > + > + if (argc != 0) > + usage_with_options(mktag_usage, mktag_opts); > > if (strbuf_read(&buf, 0, 4096) < 0) { > die_errno("could not read from stdin"); > diff --git a/git-filter-branch.sh b/git-filter-branch.sh > index 3a74602ef..05645064a 100755 > --- a/git-filter-branch.sh > +++ b/git-filter-branch.sh > @@ -530,7 +530,7 @@ if [ "$filter_tag_name" ]; then > }' \ > -e '/^-----BEGIN PGP SIGNATURE-----/q' \ > -e 'p' ) | > - git mktag) || > + git mktag --allow-missing-tagger) || > die "Could not create new tag object for $ref" > if git cat-file tag "$ref" | \ > sane_grep '^-----BEGIN PGP SIGNATURE-----' >/dev/null 2>&1 > diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh > index 8eb47942e..3a77a26c8 100755 > --- a/t/t3800-mktag.sh > +++ b/t/t3800-mktag.sh > @@ -340,7 +340,36 @@ check_verify_failure 'detect invalid header entry' \ > '^error: char124: trailing garbage in tag header$' > > ############################################################ > -# 24. create valid tag > +# 24. missing tagger ok with --allow-missing-tagger > + > +cat >tag.sig <<EOF > +object $head > +type commit > +tag mytag > + > +EOF > + > +test_expect_success \ > + 'missing tagger with --allow-missing-tagger' \ > + 'git mktag --allow-missing-tagger <tag.sig >.git/refs/tags/mytag 2>message' > + > +############################################################ > +# 25. detect invalid header entry with --allow-missing-tagger > + > +cat >tag.sig <<EOF > +object $head > +type commit > +tag mytag > +this line should not be here > +EOF > + > +test_expect_success \ > + 'detect invalid header entry with --allow-missing-tagger' \ > + '( test_must_fail git mktag --allow-missing-tagger <tag.sig 2>message ) && > + grep "^error: char70: trailing garbage in tag header$" message' > + > +############################################################ > +# 26. create valid tag > > cat >tag.sig <<EOF > object $head > @@ -355,7 +384,7 @@ test_expect_success \ > 'git mktag <tag.sig >.git/refs/tags/mytag 2>message' > > ############################################################ > -# 25. check mytag > +# 27. check mytag > > test_expect_success \ > 'check mytag' \ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted 2017-09-19 3:01 ` Junio C Hamano @ 2017-09-19 6:42 ` Ian Campbell 0 siblings, 0 replies; 8+ messages in thread From: Ian Campbell @ 2017-09-19 6:42 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Tue, 2017-09-19 at 12:01 +0900, Junio C Hamano wrote: > > Hmph. I cannot shake this nagging feeling that this is probably a > solution that is overly narrow to a single problem that won't scale > into the future. > > [...snip good point...] > > If we drop the "verification" step from the above, that essentially > becomes an equivaent to "hash-object -t tag -w --stdin". > > So I now have to wonder if it may be sufficient to use "hash-object" > in filter-branch, without doing this "allow malformed data that we > would not permit if the tag were being created today, only to help > replaying an old, already broken data" change to "git mktag". > > Is there something that makes "hash-object" insufficient (like it > still does some extra checks we would want to disable and cannot > work as a replacement for your "--allow-missing-tagger")? I've done a couple of quick tests and it looks like it will work. I'll run a few more checks and repost. Ian. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 2/4] filter-branch: reset $GIT_* before cleaning up 2017-09-17 7:36 [PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format Ian Campbell 2017-09-17 7:36 ` [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted Ian Campbell @ 2017-09-17 7:36 ` Ian Campbell 2017-09-17 7:36 ` [PATCH v2 3/4] filter-branch: preserve and restore $GIT_AUTHOR_* and $GIT_COMMITTER_* Ian Campbell 2017-09-17 7:36 ` [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch Ian Campbell 3 siblings, 0 replies; 8+ messages in thread From: Ian Campbell @ 2017-09-17 7:36 UTC (permalink / raw) To: gitster; +Cc: git, Ian Campbell This is pure code motion to enable a subsequent patch to add code which needs to happen with the reset $GIT_* but before the temporary directory has been cleaned up. Signed-off-by: Ian Campbell <ijc@hellion.org.uk> --- git-filter-branch.sh | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 05645064a..e15c538f6 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -544,11 +544,6 @@ if [ "$filter_tag_name" ]; then done fi -cd "$orig_dir" -rm -rf "$tempdir" - -trap - 0 - unset GIT_DIR GIT_WORK_TREE GIT_INDEX_FILE test -z "$ORIG_GIT_DIR" || { GIT_DIR="$ORIG_GIT_DIR" && export GIT_DIR @@ -562,6 +557,11 @@ test -z "$ORIG_GIT_INDEX_FILE" || { export GIT_INDEX_FILE } +cd "$orig_dir" +rm -rf "$tempdir" + +trap - 0 + if [ "$(is_bare_repository)" = false ]; then git read-tree -u -m HEAD || exit fi -- 2.11.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 3/4] filter-branch: preserve and restore $GIT_AUTHOR_* and $GIT_COMMITTER_* 2017-09-17 7:36 [PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format Ian Campbell 2017-09-17 7:36 ` [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted Ian Campbell 2017-09-17 7:36 ` [PATCH v2 2/4] filter-branch: reset $GIT_* before cleaning up Ian Campbell @ 2017-09-17 7:36 ` Ian Campbell 2017-09-17 7:36 ` [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch Ian Campbell 3 siblings, 0 replies; 8+ messages in thread From: Ian Campbell @ 2017-09-17 7:36 UTC (permalink / raw) To: gitster; +Cc: git, Ian Campbell These are modified by set_ident() but a subsequent patch would like to operate on their original values. Signed-off-by: Ian Campbell <ijc@hellion.org.uk> --- git-filter-branch.sh | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index e15c538f6..c88263965 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -219,6 +219,13 @@ trap 'cd "$orig_dir"; rm -rf "$tempdir"' 0 ORIG_GIT_DIR="$GIT_DIR" ORIG_GIT_WORK_TREE="$GIT_WORK_TREE" ORIG_GIT_INDEX_FILE="$GIT_INDEX_FILE" +ORIG_GIT_AUTHOR_NAME="$GIT_AUTHOR_NAME" +ORIG_GIT_AUTHOR_EMAIL="$GIT_AUTHOR_EMAIL" +ORIG_GIT_AUTHOR_DATE="$GIT_AUTHOR_DATE" +ORIG_GIT_COMMITTER_NAME="$GIT_COMMITTER_NAME" +ORIG_GIT_COMMITTER_EMAIL="$GIT_COMMITTER_EMAIL" +ORIG_GIT_COMMITTER_DATE="$GIT_COMMITTER_DATE" + GIT_WORK_TREE=. export GIT_DIR GIT_WORK_TREE @@ -545,6 +552,8 @@ if [ "$filter_tag_name" ]; then fi unset GIT_DIR GIT_WORK_TREE GIT_INDEX_FILE +unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE +unset GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL GIT_COMMITTER_DATE test -z "$ORIG_GIT_DIR" || { GIT_DIR="$ORIG_GIT_DIR" && export GIT_DIR } @@ -556,6 +565,30 @@ test -z "$ORIG_GIT_INDEX_FILE" || { GIT_INDEX_FILE="$ORIG_GIT_INDEX_FILE" && export GIT_INDEX_FILE } +test -z "$ORIG_GIT_AUTHOR_NAME" || { + GIT_AUTHOR_NAME="$ORIG_GIT_AUTHOR_NAME" && + export GIT_AUTHOR_NAME +} +test -z "$ORIG_GIT_AUTHOR_EMAIL" || { + GIT_AUTHOR_EMAIL="$ORIG_GIT_AUTHOR_EMAIL" && + export GIT_AUTHOR_EMAIL +} +test -z "$ORIG_GIT_AUTHOR_DATE" || { + GIT_AUTHOR_DATE="$ORIG_GIT_AUTHOR_DATE" && + export GIT_AUTHOR_DATE +} +test -z "$ORIG_GIT_COMMITTER_NAME" || { + GIT_COMMITTER_NAME="$ORIG_GIT_COMMITTER_NAME" && + export GIT_COMMITTER_NAME +} +test -z "$ORIG_GIT_COMMITTER_EMAIL" || { + GIT_COMMITTER_EMAIL="$ORIG_GIT_COMMITTER_EMAIL" && + export GIT_COMMITTER_EMAIL +} +test -z "$ORIG_GIT_COMMITTER_DATE" || { + GIT_COMMITTER_DATE="$ORIG_GIT_COMMITTER_DATE" && + export GIT_COMMITTER_DATE +} cd "$orig_dir" rm -rf "$tempdir" -- 2.11.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch 2017-09-17 7:36 [PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format Ian Campbell ` (2 preceding siblings ...) 2017-09-17 7:36 ` [PATCH v2 3/4] filter-branch: preserve and restore $GIT_AUTHOR_* and $GIT_COMMITTER_* Ian Campbell @ 2017-09-17 7:36 ` Ian Campbell 2017-09-17 9:43 ` Ian Campbell 3 siblings, 1 reply; 8+ messages in thread From: Ian Campbell @ 2017-09-17 7:36 UTC (permalink / raw) To: gitster; +Cc: git, Ian Campbell With "--state-branch=<branchname>" option, the mapping from old object names and filtered ones in ./map/ directory is stashed away in the object database, and the one from the previous run is read to populate the ./map/ directory, allowing for incremental updates of large trees. Signed-off-by: Ian Campbell <ijc@hellion.org.uk> --- I have been using this as part of the device tree extraction from the Linux kernel source since 2013, about time I sent the patch upstream! v2: - added several preceding cleanup patches, including: - new: use of mktag --allow-missing tagger. - split-out: preserving $GIT_*. - use git rev-parse rather than git show-ref. - improved error handling for Perl sub-processes. - collapsed some shell pipelines involving piping output of git and ls into Perl into the Perl scripts. - style fixes for conditionals and sub-shells. - fixup indentation. - added documentation. - improved commit message. --- Documentation/git-filter-branch.txt | 8 +++++- git-filter-branch.sh | 49 ++++++++++++++++++++++++++++++++++++- 2 files changed, 55 insertions(+), 2 deletions(-) diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt index 9e5169aa6..bebdcdec5 100644 --- a/Documentation/git-filter-branch.txt +++ b/Documentation/git-filter-branch.txt @@ -14,7 +14,7 @@ SYNOPSIS [--commit-filter <command>] [--tag-name-filter <command>] [--subdirectory-filter <directory>] [--prune-empty] [--original <namespace>] [-d <directory>] [-f | --force] - [--] [<rev-list options>...] + [--state-branch <branch>] [--] [<rev-list options>...] DESCRIPTION ----------- @@ -198,6 +198,12 @@ to other tags will be rewritten to point to the underlying commit. directory or when there are already refs starting with 'refs/original/', unless forced. +--state-branch <branch>:: + This option will cause the mapping from old to new objects to + be loaded from named branch upon startup and saved as a new + commit to that branch upon exit, enabling incremental of large + trees. If '<branch>' does not exist it will be created. + <rev-list options>...:: Arguments for 'git rev-list'. All positive refs included by these options are rewritten. You may also specify options diff --git a/git-filter-branch.sh b/git-filter-branch.sh index c88263965..ab927c62d 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -86,7 +86,7 @@ USAGE="[--setup <command>] [--env-filter <command>] [--parent-filter <command>] [--msg-filter <command>] [--commit-filter <command>] [--tag-name-filter <command>] [--subdirectory-filter <directory>] [--original <namespace>] - [-d <directory>] [-f | --force] + [-d <directory>] [-f | --force] [--state-branch <branch>] [--] [<rev-list options>...]" OPTIONS_SPEC= @@ -106,6 +106,7 @@ filter_msg=cat filter_commit= filter_tag_name= filter_subdir= +state_branch= orig_namespace=refs/original/ force= prune_empty= @@ -181,6 +182,9 @@ do --original) orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/ ;; + --state-branch) + state_branch="$OPTARG" + ;; *) usage ;; @@ -259,6 +263,26 @@ export GIT_INDEX_FILE # map old->new commit ids for rewriting parents mkdir ../map || die "Could not create map/ directory" +if test -n "$state_branch" +then + state_commit=$(git rev-parse --no-flags --revs-only "$state_branch") + if test -n "$state_commit" + then + echo "Populating map from $state_branch ($state_commit)" 1>&2 + perl -e'open(MAP, "-|", "git show $ARGV[0]:filter.map") or die; + while (<MAP>) { + m/(.*):(.*)/ or die; + open F, ">../map/$1" or die; + print F "$2" or die; + close(F) or die; + } + close(MAP) or die;' "$state_commit" \ + || die "Unable to load state from $state_branch:filter.map" + else + echo "Branch $state_branch does not exist. Will create" 1>&2 + fi +fi + # we need "--" only if there are no path arguments in $@ nonrevs=$(git rev-parse --no-revs "$@") || exit if test -z "$nonrevs" @@ -590,6 +614,29 @@ test -z "$ORIG_GIT_COMMITTER_DATE" || { export GIT_COMMITTER_DATE } +if test -n "$state_branch" +then + echo "Saving rewrite state to $state_branch" 1>&2 + state_blob=$( + perl -e'opendir D, "../map" or die; + open H, "|-", "git hash-object -w --stdin" or die; + foreach (sort readdir(D)) { + next if m/^\.\.?$/; + open F, "<../map/$_" or die; + chomp($f = <F>); + print H "$_:$f\n" or die; + } + close(H) or die;' || die "Unable to save state") + state_tree=$(/bin/echo -e "100644 blob $state_blob\tfilter.map" | git mktree) + if test -n "$state_commit" + then + state_commit=$(/bin/echo "Sync" | git commit-tree "$state_tree" -p "$state_commit") + else + state_commit=$(/bin/echo "Sync" | git commit-tree "$state_tree" ) + fi + git update-ref "$state_branch" "$state_commit" +fi + cd "$orig_dir" rm -rf "$tempdir" -- 2.11.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch 2017-09-17 7:36 ` [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch Ian Campbell @ 2017-09-17 9:43 ` Ian Campbell 0 siblings, 0 replies; 8+ messages in thread From: Ian Campbell @ 2017-09-17 9:43 UTC (permalink / raw) To: gitster; +Cc: git On Sun, 2017-09-17 at 08:36 +0100, Ian Campbell wrote: > +if test -n "$state_branch" > +then > > + echo "Saving rewrite state to $state_branch" 1>&2 > > + state_blob=$( > > + perl -e'opendir D, "../map" or die; > > + open H, "|-", "git hash-object -w --stdin" or die; > > + foreach (sort readdir(D)) { > > + next if m/^\.\.?$/; > > + open F, "<../map/$_" or die; > > + chomp($f = <F>); > > + print H "$_:$f\n" or die; > > + } > > + close(H) or die;' || die "Unable to save state") One things I've noticed is that for a full Linux tree history the filter.map file is 50M+ which causes github to complain: remote: warning: File filter.map is 54.40 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB (you can simulate this with `git log --pretty=format:"%H:%H" upstream/master`.) I suppose that's not a bad recommendation for any infra, not just GH's. The blob is compressed in the object store so there isn't _much_ point in compressing the map (also, it only goes down to ~30MB anyway so we aren't buying all that much time), but I'm wondering if perhaps I should look into a more intelligent representation, perhaps hashed by the first two characters (as .git/objects is) to divide into several blobs and have two levels. I'm also wondering if the .git-rewrite/map directory, which will have 70k+ (and growing) directory entries for a modern Linux tree, would benefit from the same sort of thing. OTOH in this case the extra shell machinations to turn abcdef123 into ab/cdef123 might overwhelm the savings in directory lookup time (unless there is a helper already for that. That assume that directory lookup is even a bottleneck, I've not measured but anecdotally/gut-feeling the commits-per-second does seem to be decreasing over the course of the filtering process. Ian. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-09-19 6:42 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-09-17 7:36 [PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format Ian Campbell 2017-09-17 7:36 ` [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted Ian Campbell 2017-09-19 3:01 ` Junio C Hamano 2017-09-19 6:42 ` Ian Campbell 2017-09-17 7:36 ` [PATCH v2 2/4] filter-branch: reset $GIT_* before cleaning up Ian Campbell 2017-09-17 7:36 ` [PATCH v2 3/4] filter-branch: preserve and restore $GIT_AUTHOR_* and $GIT_COMMITTER_* Ian Campbell 2017-09-17 7:36 ` [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch Ian Campbell 2017-09-17 9:43 ` Ian Campbell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.