git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: peff@peff.net, Elijah Newren <newren@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: [PATCH] fast-import: accept invalid timezones so we can import existing repos
Date: Thu, 28 May 2020 19:15:12 +0000	[thread overview]
Message-ID: <pull.795.git.git.1590693313099.gitgitgadget@gmail.com> (raw)

From: Elijah Newren <newren@gmail.com>

There are multiple repositories in the wild with random, invalid
timezones.  Most notably is a commit from rails.git with a timezone of
"+051800"[1].  However, a few searches found other repos with that same
invalid timezone.  Further, Peff reports that GitHub relaxed their fsck
checks in August 2011 to accept any timezone value[2], and there have
been multiple reports to filter-repo about fast-import crashing while
trying to import their existing repositories since they had timezone
values such as "-7349423" and "-43455309"[3].

It is not clear what created these invalid timezones, but since git has
permitted their use and worked with these repositories for years at this
point, it seems pointless to make fast-import be the only thing that
disallows them.  Relax the parsing to allow these timezones when using
raw import format; when using --date-format=rfc2822 (which is not the
default), we can continue being more strict about timezones.

[1] https://github.com/rails/rails/commit/4cf94979c9f4d6683c9338d694d5eb3106a4e734
[2] https://lore.kernel.org/git/20200521195513.GA1542632@coredump.intra.peff.net/
[3] https://github.com/newren/git-filter-repo/issues/88

Signed-off-by: Elijah Newren <newren@gmail.com>
---
    fast-import: accept invalid timezones so we can import existing repos
    
    Note: this is not a fix for a regression, so you can ignore it for 2.27;
    it can sit in pu.
    
    Peff leaned towards normalizing these timezones in fast-export[1], but
    (A) it's not clear what "valid" timezone we should randomly pick and
    regardless of what we pick, it seems it'll be wrong for most cases, (B)
    it would provide yet another way that "git fast-export --all | git
    fast-import" would not preserve the original history, as users sometimes
    expect[2], and (C) that'd prevent users from passing a special callback
    to filter-repo to fix up these values[3].
    
    Since I'm not a fan of picking a random value to reassign these to (in
    either fast-export or fast-import), I went the route of relaxing the
    requirements in fast-import, similar to what Peff reports GitHub did
    about 9 years ago in their incoming fsck checks.
    
    [1] 
    https://lore.kernel.org/git/20200521195513.GA1542632@coredump.intra.peff.net/
    [2] 
    https://lore.kernel.org/git/CABPp-BFLJ48BZ97Y9mr4i3q7HMqjq18cXMgSYdxqD1cMzH8Spg@mail.gmail.com/
    [3] Example: 
    https://github.com/newren/git-filter-repo/issues/88#issuecomment-629706776

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-795%2Fnewren%2Floosen-fast-import-timezone-parsing-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-795/newren/loosen-fast-import-timezone-parsing-v1
Pull-Request: https://github.com/git/git/pull/795

 fast-import.c          |  7 +++----
 t/t9300-fast-import.sh | 17 +++++++++++++++++
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index c98970274c4..4a3c193007d 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1915,11 +1915,10 @@ static int validate_raw_date(const char *src, struct strbuf *result)
 {
 	const char *orig_src = src;
 	char *endp;
-	unsigned long num;
 
 	errno = 0;
 
-	num = strtoul(src, &endp, 10);
+	strtoul(src, &endp, 10);
 	/* NEEDSWORK: perhaps check for reasonable values? */
 	if (errno || endp == src || *endp != ' ')
 		return -1;
@@ -1928,8 +1927,8 @@ static int validate_raw_date(const char *src, struct strbuf *result)
 	if (*src != '-' && *src != '+')
 		return -1;
 
-	num = strtoul(src + 1, &endp, 10);
-	if (errno || endp == src + 1 || *endp || 1400 < num)
+	strtoul(src + 1, &endp, 10);
+	if (errno || endp == src + 1 || *endp)
 		return -1;
 
 	strbuf_addstr(result, orig_src);
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 768257b29e0..0e798e68476 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -410,6 +410,23 @@ test_expect_success 'B: accept empty committer' '
 	test -z "$out"
 '
 
+test_expect_success 'B: accept invalid timezone' '
+	cat >input <<-INPUT_END &&
+	commit refs/heads/invalid-timezone
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1234567890 +051800
+	data <<COMMIT
+	empty commit
+	COMMIT
+	INPUT_END
+
+	test_when_finished "git update-ref -d refs/heads/invalid-timezone
+		git gc
+		git prune" &&
+	git fast-import <input &&
+	git cat-file -p invalid-timezone >out &&
+	grep "1234567890 [+]051800" out
+'
+
 test_expect_success 'B: accept and fixup committer with no name' '
 	cat >input <<-INPUT_END &&
 	commit refs/heads/empty-committer-2

base-commit: 2d5e9f31ac46017895ce6a183467037d29ceb9d3
-- 
gitgitgadget

             reply	other threads:[~2020-05-28 19:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-28 19:15 Elijah Newren via GitGitGadget [this message]
2020-05-28 19:26 ` [PATCH] fast-import: accept invalid timezones so we can import existing repos Jonathan Nieder
2020-05-28 20:40 ` [PATCH v2] fast-import: add new --date-format=raw-permissive format Elijah Newren via GitGitGadget
2020-05-28 23:08   ` Junio C Hamano
2020-05-29  0:20   ` Jonathan Nieder
2020-05-29  6:13   ` Jeff King
2020-05-29 17:19     ` Junio C Hamano
2020-05-30 20:25   ` [PATCH v3] " Elijah Newren via GitGitGadget
2020-05-30 23:13     ` Jeff King
2021-02-03 11:57     ` Why does fast-import need to check the validity of idents? + Other ident adventures =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason
2021-02-03 19:20       ` Junio C Hamano
2021-02-05 15:25         ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.795.git.git.1590693313099.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).