git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason <avarab@gmail.com>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, peff@peff.net, jrnieder@gmail.com,
	Elijah Newren <newren@gmail.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Why does fast-import need to check the validity of idents? + Other ident adventures
Date: Wed, 03 Feb 2021 12:57:08 +0100	[thread overview]
Message-ID: <87bld8ov9q.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <pull.795.v3.git.git.1590870357549.gitgitgadget@gmail.com>


[Originally sent 5 days ago, but seems to have been a victim of the
vger.kernel.org problems at the time, re-sending]

On Sat, May 30 2020, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>

Full snipped E-Mail in the archive:
https://lore.kernel.org/git/pull.795.v3.git.git.1590870357549.gitgitgadget@gmail.com/

> There are multiple repositories in the wild with random, invalid
> timezones.  Most notably is a commit from rails.git with a timezone of
> "+051800"[1].  A few searches will find other repos with that same
> invalid timezone as well.  Further, Peff reports that GitHub relaxed
> their fsck checks in August 2011 to accept any timezone value[2], and
> there have been multiple reports to filter-repo about fast-import
> crashing while trying to import their existing repositories since they
> had timezone values such as "-7349423" and "-43455309"[3].

I've been looking at some of our duplicate logic here after my mktag
series where we now use fsck validation. It had a hardcoded "1400"
offset value, which I see fast-import.c still has.

Then in mailmap.c we have parse_name_and_email(), then there's
split_ident_line() in ident.c, and of course
fsck_ident(). record_person_from_buf() in fmt-merge-msg.c, copy_name()
and copy_email() in ref-filter.c. Maybe handle_from() in mailinfo.c also
counts. Anyway, aside from the last these are all parsers for
"author/committer" lines in commits one way or another.

But I was wondering about fast-import.c in particular. I think Elijah's
patch here is obviously good an incremental improvement. But stepping
back a bit: who cares about sort-of-fsck validation in fast-import.c
anyway?

Shouldn't it just pretty much be importing data as-is, and then we could
document "if you don't trust it, run fsck afterwards"?

Or, if it's a use-case people actually care about, then I might see
about unifying some of these parser functions as part of a series I'm
preparing.

  parent reply	other threads:[~2021-02-03 11:57 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-28 19:15 [PATCH] fast-import: accept invalid timezones so we can import existing repos Elijah Newren via GitGitGadget
2020-05-28 19:26 ` Jonathan Nieder
2020-05-28 20:40 ` [PATCH v2] fast-import: add new --date-format=raw-permissive format Elijah Newren via GitGitGadget
2020-05-28 23:08   ` Junio C Hamano
2020-05-29  0:20   ` Jonathan Nieder
2020-05-29  6:13   ` Jeff King
2020-05-29 17:19     ` Junio C Hamano
2020-05-30 20:25   ` [PATCH v3] " Elijah Newren via GitGitGadget
2020-05-30 23:13     ` Jeff King
2021-02-03 11:57     ` =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason [this message]
2021-02-03 19:20       ` Why does fast-import need to check the validity of idents? + Other ident adventures Junio C Hamano
2021-02-05 15:25         ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bld8ov9q.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).