git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, peff@peff.net, jrnieder@gmail.com,
	Elijah Newren <newren@gmail.com>
Subject: Re: Why does fast-import need to check the validity of idents? + Other ident adventures
Date: Fri, 05 Feb 2021 16:25:23 +0100	[thread overview]
Message-ID: <87k0rmcza4.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <xmqq7dnpc610.fsf@gitster.c.googlers.com>


On Wed, Feb 03 2021, Junio C Hamano wrote:

> "=?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?=" Bjarmason <avarab@gmail.com>
> writes:
>
>> But I was wondering about fast-import.c in particular. I think Elijah's
>> patch here is obviously good an incremental improvement. But stepping
>> back a bit: who cares about sort-of-fsck validation in fast-import.c
>> anyway?
>
> Those who want to notice and verify the procedure they used to
> produce the import data from the original before it is too late?
>
> I.e. data gets imported to Git, victory declared and then old SCM
> turned gets off---and only then the resulting imported repository is
> found not to pass fsck.
>
>> Shouldn't it just pretty much be importing data as-is, and then we could
>> document "if you don't trust it, run fsck afterwards"?
>
> If it is a small import, the distinction does not matter, but for a
> huge import, the procedure to produce the data is likely to be
> mechanical, so even after processing just a very small portion of
> early part of the datastream, systematic errors would be noticed
> before fast-import wastes importing too much garbage that need to be
> discarded after running such fsck.  So in that sense, I suspect that
> there is value in the early validation.

What I was fishing for here is that perhaps since fast-import was
originally written this use-case of in-place conversion of primary data
on a server might have become too obscure to care about, i.e. as opposed
to doing a conversion locally and then "git push"-ing it to something
that does transfer.fsckObjects.

>> Or, if it's a use-case people actually care about, then I might see
>> about unifying some of these parser functions as part of a series I'm
>> preparing.
>
> I think allowing people to loosen particular checks for fast-import
> (or elsewhere for that matter) is a good idea, and you can do so
> more easily once the existing checking is migrated to your new
> scheme that shares code with the fsck machinery.

...allright, depending on how much of a hassle that is I might just add
tests for the differences and leave this particular problem to someone
else :)

      reply	other threads:[~2021-02-05 22:29 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-28 19:15 [PATCH] fast-import: accept invalid timezones so we can import existing repos Elijah Newren via GitGitGadget
2020-05-28 19:26 ` Jonathan Nieder
2020-05-28 20:40 ` [PATCH v2] fast-import: add new --date-format=raw-permissive format Elijah Newren via GitGitGadget
2020-05-28 23:08   ` Junio C Hamano
2020-05-29  0:20   ` Jonathan Nieder
2020-05-29  6:13   ` Jeff King
2020-05-29 17:19     ` Junio C Hamano
2020-05-30 20:25   ` [PATCH v3] " Elijah Newren via GitGitGadget
2020-05-30 23:13     ` Jeff King
2021-02-03 11:57     ` Why does fast-import need to check the validity of idents? + Other ident adventures =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason
2021-02-03 19:20       ` Junio C Hamano
2021-02-05 15:25         ` Ævar Arnfjörð Bjarmason [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k0rmcza4.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).