git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason <avarab@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, peff@peff.net, jrnieder@gmail.com,
	Elijah Newren <newren@gmail.com>
Subject: Re: Why does fast-import need to check the validity of idents? + Other ident adventures
Date: Wed, 03 Feb 2021 11:20:27 -0800	[thread overview]
Message-ID: <xmqq7dnpc610.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <87bld8ov9q.fsf@evledraar.gmail.com> (utf's message of "Wed, 03 Feb 2021 12:57:08 +0100")

"=?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?=" Bjarmason <avarab@gmail.com>
writes:

> But I was wondering about fast-import.c in particular. I think Elijah's
> patch here is obviously good an incremental improvement. But stepping
> back a bit: who cares about sort-of-fsck validation in fast-import.c
> anyway?

Those who want to notice and verify the procedure they used to
produce the import data from the original before it is too late?

I.e. data gets imported to Git, victory declared and then old SCM
turned gets off---and only then the resulting imported repository is
found not to pass fsck.

> Shouldn't it just pretty much be importing data as-is, and then we could
> document "if you don't trust it, run fsck afterwards"?

If it is a small import, the distinction does not matter, but for a
huge import, the procedure to produce the data is likely to be
mechanical, so even after processing just a very small portion of
early part of the datastream, systematic errors would be noticed
before fast-import wastes importing too much garbage that need to be
discarded after running such fsck.  So in that sense, I suspect that
there is value in the early validation.

> Or, if it's a use-case people actually care about, then I might see
> about unifying some of these parser functions as part of a series I'm
> preparing.

I think allowing people to loosen particular checks for fast-import
(or elsewhere for that matter) is a good idea, and you can do so
more easily once the existing checking is migrated to your new
scheme that shares code with the fsck machinery.

  reply	other threads:[~2021-02-03 19:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-28 19:15 [PATCH] fast-import: accept invalid timezones so we can import existing repos Elijah Newren via GitGitGadget
2020-05-28 19:26 ` Jonathan Nieder
2020-05-28 20:40 ` [PATCH v2] fast-import: add new --date-format=raw-permissive format Elijah Newren via GitGitGadget
2020-05-28 23:08   ` Junio C Hamano
2020-05-29  0:20   ` Jonathan Nieder
2020-05-29  6:13   ` Jeff King
2020-05-29 17:19     ` Junio C Hamano
2020-05-30 20:25   ` [PATCH v3] " Elijah Newren via GitGitGadget
2020-05-30 23:13     ` Jeff King
2021-02-03 11:57     ` Why does fast-import need to check the validity of idents? + Other ident adventures =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason
2021-02-03 19:20       ` Junio C Hamano [this message]
2021-02-05 15:25         ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq7dnpc610.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).