From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason <avarab@gmail.com>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, peff@peff.net, jrnieder@gmail.com,
Elijah Newren <newren@gmail.com>,
Junio C Hamano <gitster@pobox.com>
Subject: Why does fast-import need to check the validity of idents? + Other ident adventures
Date: Wed, 03 Feb 2021 12:57:08 +0100 [thread overview]
Message-ID: <87bld8ov9q.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <pull.795.v3.git.git.1590870357549.gitgitgadget@gmail.com>
[Originally sent 5 days ago, but seems to have been a victim of the
vger.kernel.org problems at the time, re-sending]
On Sat, May 30 2020, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
Full snipped E-Mail in the archive:
https://lore.kernel.org/git/pull.795.v3.git.git.1590870357549.gitgitgadget@gmail.com/
> There are multiple repositories in the wild with random, invalid
> timezones. Most notably is a commit from rails.git with a timezone of
> "+051800"[1]. A few searches will find other repos with that same
> invalid timezone as well. Further, Peff reports that GitHub relaxed
> their fsck checks in August 2011 to accept any timezone value[2], and
> there have been multiple reports to filter-repo about fast-import
> crashing while trying to import their existing repositories since they
> had timezone values such as "-7349423" and "-43455309"[3].
I've been looking at some of our duplicate logic here after my mktag
series where we now use fsck validation. It had a hardcoded "1400"
offset value, which I see fast-import.c still has.
Then in mailmap.c we have parse_name_and_email(), then there's
split_ident_line() in ident.c, and of course
fsck_ident(). record_person_from_buf() in fmt-merge-msg.c, copy_name()
and copy_email() in ref-filter.c. Maybe handle_from() in mailinfo.c also
counts. Anyway, aside from the last these are all parsers for
"author/committer" lines in commits one way or another.
But I was wondering about fast-import.c in particular. I think Elijah's
patch here is obviously good an incremental improvement. But stepping
back a bit: who cares about sort-of-fsck validation in fast-import.c
anyway?
Shouldn't it just pretty much be importing data as-is, and then we could
document "if you don't trust it, run fsck afterwards"?
Or, if it's a use-case people actually care about, then I might see
about unifying some of these parser functions as part of a series I'm
preparing.
next prev parent reply other threads:[~2021-02-03 11:57 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-28 19:15 [PATCH] fast-import: accept invalid timezones so we can import existing repos Elijah Newren via GitGitGadget
2020-05-28 19:26 ` Jonathan Nieder
2020-05-28 20:40 ` [PATCH v2] fast-import: add new --date-format=raw-permissive format Elijah Newren via GitGitGadget
2020-05-28 23:08 ` Junio C Hamano
2020-05-29 0:20 ` Jonathan Nieder
2020-05-29 6:13 ` Jeff King
2020-05-29 17:19 ` Junio C Hamano
2020-05-30 20:25 ` [PATCH v3] " Elijah Newren via GitGitGadget
2020-05-30 23:13 ` Jeff King
2021-02-03 11:57 ` =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason [this message]
2021-02-03 19:20 ` Why does fast-import need to check the validity of idents? + Other ident adventures Junio C Hamano
2021-02-05 15:25 ` Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bld8ov9q.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).