All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode
Date: Fri, 12 Jul 2013 05:22:12 -0400	[thread overview]
Message-ID: <20130712092212.GA4859@sigill.intra.peff.net> (raw)
In-Reply-To: <51DFC2B2.3080300@alum.mit.edu>

On Fri, Jul 12, 2013 at 10:47:46AM +0200, Michael Haggerty wrote:

> > The solution feels a little hacky, but I'm not sure there is a better
> > one short of reverting the warning entirely.
> > 
> > We could also tie it to warn_ambiguous_refs (or add another config
> > variable), but I don't think that makes sense. It is not about "do I
> > care about ambiguities in this repository", but rather "I am going to
> > do a really large number of sha1 resolutions, and I do not want to pay
> > the price in this particular code path for the warning").
> > 
> > There may be other sites that resolve a large number of refs and run
> > into this, but I couldn't think of any. Doing for_each_ref would not
> > have the same problem, as we already know they are refs there.
> 
> ISTM that callers of --batch-check would usually know whether they are
> feeding it SHA-1s or arbitrary object specifiers.  (And in most cases
> when there are a large number of them, they are probably all SHA-1s).

Thanks for bringing this up; I had meant to outline this option in the
cover letter, but forgot when posting.

If were designing cat-file from scratch, I'd consider having --batch
take just 40-hex sha1s in the first place. Since we can't do that now
for backwards compatibility, having an option is the best we can do
there.

But having to use such an option to avoid a 10x slowdown just strikes me
as ugly for two reasons:

  1. It's part of the user-visible interface, and it seems like an extra
     "wtf?" moment when somebody wonders why their script is painfully
     slow, and why they need a magic option to fix it. We're cluttering
     the interface forever.

  2. It's a sign that the refname check was put in for a specific
     performance profile (resolving one or a few refs), but didn't
     consider resolving a large number. I'm wondering what other
     performance regressions it's possible to trigger.

> So instead of framing this change as "skip object refname ambiguity
> check" (i.e., sacrifice some correctness for performance), perhaps it
> would be less hacky to offer the following new feature:

I wouldn't frame it as "correctness for performance" at all. The check
does not impact behavior at all, and is purely informational (to catch
quite a rare situation, too). I'd argue that the check simply isn't that
interesting in this case in the first place.

> To cat-file we could add an option like "--sha1-only" or "--literal" or
> "--no-dwim" (... better names are failing me) which would skip *all*
> dwimming of 40-character strings.  It would also assume that any shorter
> strings are abbreviated SHA-1s and fail if they are not.  This would be
> a nice feature by itself ("these are object names, dammit, and don't try
> to tell me differently!") and would have the additional small advantage
> of speeding up lookups of abbreviated SHA-1s, which (regardless of your
> patch) otherwise go through the whole DWIM process.

I can see in theory that somebody might want that, but I am having a
hard time thinking of a practical use.

> I understand that such an option alone would leave some old scripts
> suffering the performance regression caused by the check, but in most
> cases it would be easy to fix them.

I'm less worried about old scripts, and more worried about carrying
around a confusing option forever, especially when I expect most
invocations to use it (perhaps my experience is biased, but I use
cat-file --batch quite a lot in scripting, and it has almost always been
on the output of rev-list).

IOW, it seems like a poor default, and we are choosing it only because
of backwards compatibility. I guess another option is to switch the
default with the usual deprecation dance.

Yet another option is to consider what the check is doing, and
accomplish the same thing in a different way. The real pain is that we
are individually trying to resolve each object by hitting the filesystem
(and doing lots of silly checks on the refname format, when we know it
must be valid).

We don't actually care in this case if the ref list is up to date (we
are not trying to update or read a ref, but only know if it exists, and
raciness is OK). IOW, could we replace the dwim_ref call for the warning
with something that directly queries the ref cache?

-Peff

  reply	other threads:[~2013-07-12  9:22 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-12  6:15 [PATCH 0/7] cat-file --batch-check performance improvements Jeff King
2013-07-12  6:20 ` [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode Jeff King
2013-07-12  8:47   ` Michael Haggerty
2013-07-12  9:22     ` Jeff King [this message]
2013-07-12 10:30       ` Michael Haggerty
2013-07-15  4:23         ` Jeff King
2013-07-15  3:45       ` Junio C Hamano
2013-07-15  4:17         ` Jeff King
2013-07-12  6:21 ` [PATCH 2/7] sha1_object_info_extended: rename "status" to "type" Jeff King
2013-07-12  6:30 ` [PATCH 3/7] sha1_loose_object_info: make type lookup optional Jeff King
2013-07-12  6:31 ` [PATCH 4/7] packed_object_info: hoist delta type resolution to helper Jeff King
2013-07-12  6:32 ` [PATCH 5/7] packed_object_info: make type lookup optional Jeff King
2013-07-12  6:34 ` [PATCH 6/7] sha1_object_info_extended: make type calculation optional Jeff King
2013-07-12  6:37 ` [PATCH 7/7] sha1_object_info_extended: pass object_info to helpers Jeff King
2013-07-12 17:23 ` [PATCH 0/7] cat-file --batch-check performance improvements Junio C Hamano
2013-07-12 20:12   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130712092212.GA4859@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.