git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: peff@peff.net, jrnieder@gmail.com,
	Elijah Newren <newren@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: [PATCH v2] fast-import: add new --date-format=raw-permissive format
Date: Thu, 28 May 2020 20:40:37 +0000	[thread overview]
Message-ID: <pull.795.v2.git.git.1590698437607.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.795.git.git.1590693313099.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

There are multiple repositories in the wild with random, invalid
timezones.  Most notably is a commit from rails.git with a timezone of
"+051800"[1].  A few searches will find other repos with that same
invalid timezone as well.  Further, Peff reports that GitHub relaxed
their fsck checks in August 2011 to accept any timezone value[2], and
there have been multiple reports to filter-repo about fast-import
crashing while trying to import their existing repositories since they
had timezone values such as "-7349423" and "-43455309"[3].

The existing check on timezone values inside fast-import may prove
useful for people who are crafting fast-import input by hand or with a
new script.  For them, the check may help them avoid accidentally
recording invalid dates.  (Note that this check is rather simplistic and
there are still several forms of invalid dates that fast-import does not
check for: dates in the future, timezone values with minutes that are
not divisible by 15, and timezone values with minutes that are 60 or
greater.)  While this simple check may have some value for those users,
other users or tools will want to import existing repositories as-is.
Provide a --date-format=raw-permissive format that will not error out on
these otherwise invalid timezones so that such existing repositories can
be imported.

[1] https://github.com/rails/rails/commit/4cf94979c9f4d6683c9338d694d5eb3106a4e734
[2] https://lore.kernel.org/git/20200521195513.GA1542632@coredump.intra.peff.net/
[3] https://github.com/newren/git-filter-repo/issues/88

Signed-off-by: Elijah Newren <newren@gmail.com>
---
    fast-import: accept invalid timezones so we can import existing repos
    
    Changes since v1:
    
     * Instead of just allowing such timezones outright, did it behind a
       --date-format=raw-permissive as suggested by Jonathan

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-795%2Fnewren%2Floosen-fast-import-timezone-parsing-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-795/newren/loosen-fast-import-timezone-parsing-v2
Pull-Request: https://github.com/git/git/pull/795

Range-diff vs v1:

 1:  d3a7dbc3892 ! 1:  9580aacdb21 fast-import: accept invalid timezones so we can import existing repos
     @@ Metadata
      Author: Elijah Newren <newren@gmail.com>
      
       ## Commit message ##
     -    fast-import: accept invalid timezones so we can import existing repos
     +    fast-import: add new --date-format=raw-permissive format
      
          There are multiple repositories in the wild with random, invalid
          timezones.  Most notably is a commit from rails.git with a timezone of
     -    "+051800"[1].  However, a few searches found other repos with that same
     -    invalid timezone.  Further, Peff reports that GitHub relaxed their fsck
     -    checks in August 2011 to accept any timezone value[2], and there have
     -    been multiple reports to filter-repo about fast-import crashing while
     -    trying to import their existing repositories since they had timezone
     -    values such as "-7349423" and "-43455309"[3].
     +    "+051800"[1].  A few searches will find other repos with that same
     +    invalid timezone as well.  Further, Peff reports that GitHub relaxed
     +    their fsck checks in August 2011 to accept any timezone value[2], and
     +    there have been multiple reports to filter-repo about fast-import
     +    crashing while trying to import their existing repositories since they
     +    had timezone values such as "-7349423" and "-43455309"[3].
      
     -    It is not clear what created these invalid timezones, but since git has
     -    permitted their use and worked with these repositories for years at this
     -    point, it seems pointless to make fast-import be the only thing that
     -    disallows them.  Relax the parsing to allow these timezones when using
     -    raw import format; when using --date-format=rfc2822 (which is not the
     -    default), we can continue being more strict about timezones.
     +    The existing check on timezone values inside fast-import may prove
     +    useful for people who are crafting fast-import input by hand or with a
     +    new script.  For them, the check may help them avoid accidentally
     +    recording invalid dates.  (Note that this check is rather simplistic and
     +    there are still several forms of invalid dates that fast-import does not
     +    check for: dates in the future, timezone values with minutes that are
     +    not divisible by 15, and timezone values with minutes that are 60 or
     +    greater.)  While this simple check may have some value for those users,
     +    other users or tools will want to import existing repositories as-is.
     +    Provide a --date-format=raw-permissive format that will not error out on
     +    these otherwise invalid timezones so that such existing repositories can
     +    be imported.
      
          [1] https://github.com/rails/rails/commit/4cf94979c9f4d6683c9338d694d5eb3106a4e734
          [2] https://lore.kernel.org/git/20200521195513.GA1542632@coredump.intra.peff.net/
     @@ Commit message
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## fast-import.c ##
     -@@ fast-import.c: static int validate_raw_date(const char *src, struct strbuf *result)
     +@@ fast-import.c: struct hash_list {
     + 
     + typedef enum {
     + 	WHENSPEC_RAW = 1,
     ++	WHENSPEC_RAW_PERMISSIVE,
     + 	WHENSPEC_RFC2822,
     + 	WHENSPEC_NOW
     + } whenspec_type;
     +@@ fast-import.c: static int parse_data(struct strbuf *sb, uintmax_t limit, uintmax_t *len_res)
     + 	return 1;
     + }
     + 
     +-static int validate_raw_date(const char *src, struct strbuf *result)
     ++static int validate_raw_date(const char *src, struct strbuf *result, int strict)
       {
       	const char *orig_src = src;
       	char *endp;
     --	unsigned long num;
     + 	unsigned long num;
     ++	int out_of_range_timezone;
       
       	errno = 0;
       
     --	num = strtoul(src, &endp, 10);
     -+	strtoul(src, &endp, 10);
     - 	/* NEEDSWORK: perhaps check for reasonable values? */
     - 	if (errno || endp == src || *endp != ' ')
     - 		return -1;
      @@ fast-import.c: static int validate_raw_date(const char *src, struct strbuf *result)
     - 	if (*src != '-' && *src != '+')
       		return -1;
       
     --	num = strtoul(src + 1, &endp, 10);
     + 	num = strtoul(src + 1, &endp, 10);
      -	if (errno || endp == src + 1 || *endp || 1400 < num)
     -+	strtoul(src + 1, &endp, 10);
     -+	if (errno || endp == src + 1 || *endp)
     ++	out_of_range_timezone = strict && (1400 < num);
     ++	if (errno || endp == src + 1 || *endp || out_of_range_timezone)
       		return -1;
       
       	strbuf_addstr(result, orig_src);
     +@@ fast-import.c: static char *parse_ident(const char *buf)
     + 
     + 	switch (whenspec) {
     + 	case WHENSPEC_RAW:
     +-		if (validate_raw_date(ltgt, &ident) < 0)
     ++		if (validate_raw_date(ltgt, &ident, 1) < 0)
     ++			die("Invalid raw date \"%s\" in ident: %s", ltgt, buf);
     ++		break;
     ++	case WHENSPEC_RAW_PERMISSIVE:
     ++		if (validate_raw_date(ltgt, &ident, 0) < 0)
     + 			die("Invalid raw date \"%s\" in ident: %s", ltgt, buf);
     + 		break;
     + 	case WHENSPEC_RFC2822:
     +@@ fast-import.c: static void option_date_format(const char *fmt)
     + {
     + 	if (!strcmp(fmt, "raw"))
     + 		whenspec = WHENSPEC_RAW;
     ++	else if (!strcmp(fmt, "raw-permissive"))
     ++		whenspec = WHENSPEC_RAW_PERMISSIVE;
     + 	else if (!strcmp(fmt, "rfc2822"))
     + 		whenspec = WHENSPEC_RFC2822;
     + 	else if (!strcmp(fmt, "now"))
      
       ## t/t9300-fast-import.sh ##
      @@ t/t9300-fast-import.sh: test_expect_success 'B: accept empty committer' '
       	test -z "$out"
       '
       
     -+test_expect_success 'B: accept invalid timezone' '
     ++test_expect_success 'B: reject invalid timezone' '
     ++	cat >input <<-INPUT_END &&
     ++	commit refs/heads/invalid-timezone
     ++	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1234567890 +051800
     ++	data <<COMMIT
     ++	empty commit
     ++	COMMIT
     ++	INPUT_END
     ++
     ++	test_when_finished "git update-ref -d refs/heads/invalid-timezone" &&
     ++	test_must_fail git fast-import <input
     ++'
     ++
     ++test_expect_success 'B: accept invalid timezone with raw-permissive' '
      +	cat >input <<-INPUT_END &&
      +	commit refs/heads/invalid-timezone
      +	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1234567890 +051800
     @@ t/t9300-fast-import.sh: test_expect_success 'B: accept empty committer' '
      +	test_when_finished "git update-ref -d refs/heads/invalid-timezone
      +		git gc
      +		git prune" &&
     -+	git fast-import <input &&
     ++	git fast-import --date-format=raw-permissive <input &&
      +	git cat-file -p invalid-timezone >out &&
      +	grep "1234567890 [+]051800" out
      +'


 fast-import.c          | 15 ++++++++++++---
 t/t9300-fast-import.sh | 30 ++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index c98970274c4..74d7298bc5a 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -139,6 +139,7 @@ struct hash_list {
 
 typedef enum {
 	WHENSPEC_RAW = 1,
+	WHENSPEC_RAW_PERMISSIVE,
 	WHENSPEC_RFC2822,
 	WHENSPEC_NOW
 } whenspec_type;
@@ -1911,11 +1912,12 @@ static int parse_data(struct strbuf *sb, uintmax_t limit, uintmax_t *len_res)
 	return 1;
 }
 
-static int validate_raw_date(const char *src, struct strbuf *result)
+static int validate_raw_date(const char *src, struct strbuf *result, int strict)
 {
 	const char *orig_src = src;
 	char *endp;
 	unsigned long num;
+	int out_of_range_timezone;
 
 	errno = 0;
 
@@ -1929,7 +1931,8 @@ static int validate_raw_date(const char *src, struct strbuf *result)
 		return -1;
 
 	num = strtoul(src + 1, &endp, 10);
-	if (errno || endp == src + 1 || *endp || 1400 < num)
+	out_of_range_timezone = strict && (1400 < num);
+	if (errno || endp == src + 1 || *endp || out_of_range_timezone)
 		return -1;
 
 	strbuf_addstr(result, orig_src);
@@ -1963,7 +1966,11 @@ static char *parse_ident(const char *buf)
 
 	switch (whenspec) {
 	case WHENSPEC_RAW:
-		if (validate_raw_date(ltgt, &ident) < 0)
+		if (validate_raw_date(ltgt, &ident, 1) < 0)
+			die("Invalid raw date \"%s\" in ident: %s", ltgt, buf);
+		break;
+	case WHENSPEC_RAW_PERMISSIVE:
+		if (validate_raw_date(ltgt, &ident, 0) < 0)
 			die("Invalid raw date \"%s\" in ident: %s", ltgt, buf);
 		break;
 	case WHENSPEC_RFC2822:
@@ -3258,6 +3265,8 @@ static void option_date_format(const char *fmt)
 {
 	if (!strcmp(fmt, "raw"))
 		whenspec = WHENSPEC_RAW;
+	else if (!strcmp(fmt, "raw-permissive"))
+		whenspec = WHENSPEC_RAW_PERMISSIVE;
 	else if (!strcmp(fmt, "rfc2822"))
 		whenspec = WHENSPEC_RFC2822;
 	else if (!strcmp(fmt, "now"))
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 768257b29e0..14e9baa22db 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -410,6 +410,36 @@ test_expect_success 'B: accept empty committer' '
 	test -z "$out"
 '
 
+test_expect_success 'B: reject invalid timezone' '
+	cat >input <<-INPUT_END &&
+	commit refs/heads/invalid-timezone
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1234567890 +051800
+	data <<COMMIT
+	empty commit
+	COMMIT
+	INPUT_END
+
+	test_when_finished "git update-ref -d refs/heads/invalid-timezone" &&
+	test_must_fail git fast-import <input
+'
+
+test_expect_success 'B: accept invalid timezone with raw-permissive' '
+	cat >input <<-INPUT_END &&
+	commit refs/heads/invalid-timezone
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1234567890 +051800
+	data <<COMMIT
+	empty commit
+	COMMIT
+	INPUT_END
+
+	test_when_finished "git update-ref -d refs/heads/invalid-timezone
+		git gc
+		git prune" &&
+	git fast-import --date-format=raw-permissive <input &&
+	git cat-file -p invalid-timezone >out &&
+	grep "1234567890 [+]051800" out
+'
+
 test_expect_success 'B: accept and fixup committer with no name' '
 	cat >input <<-INPUT_END &&
 	commit refs/heads/empty-committer-2

base-commit: 2d5e9f31ac46017895ce6a183467037d29ceb9d3
-- 
gitgitgadget

  parent reply	other threads:[~2020-05-28 20:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-28 19:15 [PATCH] fast-import: accept invalid timezones so we can import existing repos Elijah Newren via GitGitGadget
2020-05-28 19:26 ` Jonathan Nieder
2020-05-28 20:40 ` Elijah Newren via GitGitGadget [this message]
2020-05-28 23:08   ` [PATCH v2] fast-import: add new --date-format=raw-permissive format Junio C Hamano
2020-05-29  0:20   ` Jonathan Nieder
2020-05-29  6:13   ` Jeff King
2020-05-29 17:19     ` Junio C Hamano
2020-05-30 20:25   ` [PATCH v3] " Elijah Newren via GitGitGadget
2020-05-30 23:13     ` Jeff King
2021-02-03 11:57     ` Why does fast-import need to check the validity of idents? + Other ident adventures =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason
2021-02-03 19:20       ` Junio C Hamano
2021-02-05 15:25         ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.795.v2.git.git.1590698437607.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).