git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org, dstolee@microsoft.com, gitster@pobox.com,
	martin.agren@gmail.com, peff@peff.net
Subject: Re: [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids'
Date: Fri, 1 May 2020 16:38:48 -0600	[thread overview]
Message-ID: <20200501223848.GH41612@syl.local> (raw)
In-Reply-To: <20200424105957.GB5925@szeder.dev>

On Fri, Apr 24, 2020 at 12:59:57PM +0200, SZEDER Gábor wrote:
> On Wed, Apr 22, 2020 at 05:39:30PM -0600, Taylor Blau wrote:
> > > > diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> > > > index 46f7f7c573..91e8027b86 100644
> > > > --- a/Documentation/git-commit-graph.txt
> > > > +++ b/Documentation/git-commit-graph.txt
> > > > @@ -82,6 +82,11 @@ tip with the previous tip.
> > > >  Finally, if `--expire-time=<datetime>` is not specified, let `datetime`
> > > >  be the current time. After writing the split commit-graph, delete all
> > > >  unused commit-graph whose modified times are older than `datetime`.
> > > > ++
> > > > +The `--[no-]check-oids` option decides whether or not OIDs are required
> > > > +to be commits. By default, `--check-oids` is implied, generating an
> > > > +error on non-commit objects. If `--no-check-oids` is given, non-commits
> > > > +are silently discarded.
> > >
> > > What happens with OIDs of tags, in particular with OIDs of tags that
> > > can be peeled down to commit objects?  According to (my (too
> > > pedantic?) interpretation of) the above description they will trigger
> > > an error with '--check-oids' or will be ignored with
> > > '--no-check-oids'.  The implementation, however, accepts those oids
> > > and peels them down to commit objects; I think this is the right
> > > behaviour.
> >
> > That's right, and certainly merits a mention in the documentation. I've
> > added that...
> >
> > > What happens with OIDs that name non-existing objects?
> >
> > ...these are silently discarded. I think that you could make a
> > compelling argument in either direction on this one, but I'm slightly
> > swayed towards "discard these, too", since '--no-check-oids' is
> > literally saying "don't check these".
>
> I don't want to argue either way, but I'd argue for making a conscious
> decision that is justified in the commit message and documented in the
> docs.

Me either, I very much welcome your consistently thoughtful replies
(even if my extreme delay in responding to this one would suggest
otherwise... ;)).

> So, the option is '--stdin-commits' or '--input=stdin-commits', but
> it's not only about commits.  Now, allowing OIDs of tags pointing to
> commits and peeling them makes obvious sense, because we want commits
> reachable from those tags included in the commit-graph file.  Allowing
> OIDs of tags pointing to non-commits and silently ignoring them makes
> sense (to better support your 'git f-e-r ... | git c-g write ...' use
> case), though it's not that obvious (after all I managed to overlook
> it back then, that's why we are now here discussing these
> '--check-oids' patches).
>
> But I'm not sure about silently ignoring OIDs pointing to non-existent
> objects, because those might be a sign of some issues in whatever is
> generating the list of objects to be fed to 'git c-g write'.  E.g. there
> could be a race between 'git for-each-ref' listing OIDs and some other
> processes pruning them.  Does this worth worrying about?  Dunno...
> but at least let's think it through, and record in the commit message
> why we made that decision, whatever that decision might be.

Yeah, I think that the most reasonable behavior is definitely that we
should complain about non-existent objects over 'git commit-graph write
--stdin-commits' no matter if '--[no-]check-oids' is given or not.

But, let's step back for a minute. What are we actually hoping to
accomplish with '--check-oids'? I wrote this patch because I wanted a
way to support 'git for-each-ref' piping into 'git commit-graph write'
without having to juggle which tags peel down to commits and which
don't.

Now, I figured that it would be unreasonable to change the default
behavior of 'git commit-graph write --stdin-commits' (which is to
complain and error out in this circumstance), so I added
'--no-check-oids' as a way to avoid that behavior for callers that want
that.

But, are there ever any callers that *wouldn't* want this behavior? As
far as I can tell, probably not. We're only going to be permitting
*more* inputs to 'git commit-graph write', and I seriously doubt that
anybody is depending on the above behavior. (Of course, if that's not
the case, I'd love for somebody to speak up here and we can continue
the course on this patch).

So, I propose the following:

  * We drop the idea of '--[no-]{check,verify}-oids', and always
    silently ignore non-commit inputs, retaining the existing behavior
    of always complaining about things that aren't valid hex OIDs, such
    as "HEAD".

  * We always error out on missing or corrupt commit OIDs, including
    valid OIDs that don't resolve to any object, or resolve to a tag
    that can't be fully peeled.

Does that seem reasonable?

> > I guess that pushes us into the territory of whether or not "check" is
> > the right verb. "verify"?
>
> Oh, I didn't think about this, but now that you mention it we have
> '--verify' in 'git rev-parse', 'git tag' and elsewhere, and we have
> 'verify-commit', 'verify-path' and 'verify-tag' commands.  So
> '--verify-oids' might be more consistent.  I kind of like the 'oids'
> suffix in the option name, though I don't know what else we might want
> to verify in this command in the future...
>
> > "scrutinize" :)?
>
> Huhh, erm, no ;)
>
> > If you're otherwise satisfied with this series, here's the updated
> > patch.
>
> I haven't yet looked closely at the rest of the series...  The
> documentation update in the updated patch below looks good to me,
> thanks.
>
> > -- >8 --
> >
> > Subject: [PATCH] commit-graph.c: introduce '--[no-]check-oids'
> >
> > When operating on a stream of commit OIDs on stdin, 'git commit-graph
> > write' checks that each OID refers to an object that is indeed a commit.
> > This is convenient to make sure that the given input is well-formed, but
> > can sometimes be undesirable.
> >
> > For example, server operators may wish to feed the full commit object
> > IDs pointed to by refs that were updated during a push to 'git
> > commit-graph write --input=stdin-commits', and silently discard any
> > input that doesn't point at a commit. This can be done by combing the
> > output of 'git for-each-ref' with '--format %(*objecttype)', but this
> > requires opening up a potentially large number of objects.  Instead, it
> > is more convenient to feed the updated object IDs to the commit-graph
> > machinery, and let it throw out whatever remains.  to commits.
>
> Either the bulk of a sentence is missing, or there is a stray(?) "to
> commits." at the end of this line.
>
> > Introduce '--[no-]check-oids' to make such a behavior possible. With
> > '--check-oids' (the default behavior to retain backwards compatibility),
> > 'git commit-graph write' will barf on a non-commit line in its input.
> > With '--no-check-oids', such lines will be silently ignored, making the
> > above possible by specifying this option.
> >
> > No matter which is supplied, 'git commit-graph write' retains the
> > behavior from the previous commit of rejecting non-OID inputs like
> > "HEAD" and "refs/heads/foo" as before.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  Documentation/git-commit-graph.txt |  6 ++++++
> >  builtin/commit-graph.c             | 11 ++++++++---
> >  t/t5318-commit-graph.sh            | 28 ++++++++++++++++++++++++++++
> >  3 files changed, 42 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> > index 46f7f7c573..6bdbe42766 100644
> > --- a/Documentation/git-commit-graph.txt
> > +++ b/Documentation/git-commit-graph.txt
> > @@ -82,6 +82,12 @@ tip with the previous tip.
> >  Finally, if `--expire-time=<datetime>` is not specified, let `datetime`
> >  be the current time. After writing the split commit-graph, delete all
> >  unused commit-graph whose modified times are older than `datetime`.
> > ++
> > +The `--[no-]check-oids` option decides whether or not OIDs are required
> > +to be commits. By default, `--check-oids` is implied, generating an
> > +error on non-commit objects. If `--no-check-oids` is given, non-commits
> > +and non-existent objects are silently discarded. In either case, tags
> > +are peeled down to the object they reference.

Thanks,
Taylor

  reply	other threads:[~2020-05-01 22:38 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-14  4:03 [PATCH 0/7] commit-graph: split strategies, '--[no-]check-oids' Taylor Blau
2020-04-14  4:04 ` [PATCH 1/7] t/helper/test-read-graph.c: support commit-graph chains Taylor Blau
2020-04-14  4:04 ` [PATCH 2/7] builtin/commit-graph.c: support for '--split[=<strategy>]' Taylor Blau
2020-04-14  4:04 ` [PATCH 3/7] builtin/commit-graph.c: introduce split strategy 'no-merge' Taylor Blau
2020-04-14  4:04 ` [PATCH 4/7] builtin/commit-graph.c: introduce split strategy 'replace' Taylor Blau
2020-04-14  4:04 ` [PATCH 5/7] oidset: introduce 'oidset_size' Taylor Blau
2020-04-14  4:04 ` [PATCH 6/7] commit-graph.h: replace 'commit_hex' with 'commits' Taylor Blau
2020-04-14  4:04 ` [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids' Taylor Blau
2020-04-15  4:29   ` Taylor Blau
2020-04-15  4:31     ` Taylor Blau
2020-04-22 10:55       ` SZEDER Gábor
2020-04-22 23:39         ` Taylor Blau
2020-04-24 10:59           ` SZEDER Gábor
2020-05-01 22:38             ` Taylor Blau [this message]
2020-05-03  9:40               ` Jeff King
2020-05-03 16:55                 ` Junio C Hamano
2020-05-04 14:59                   ` Jeff King
2020-05-04 16:29                     ` Junio C Hamano
2020-05-04 22:16                       ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200501223848.GH41612@syl.local \
    --to=me@ttaylorr.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=martin.agren@gmail.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).