All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode
Date: Fri, 12 Jul 2013 12:30:07 +0200	[thread overview]
Message-ID: <51DFDAAF.60808@alum.mit.edu> (raw)
In-Reply-To: <20130712092212.GA4859@sigill.intra.peff.net>

On 07/12/2013 11:22 AM, Jeff King wrote:
> Yet another option is to consider what the check is doing, and
> accomplish the same thing in a different way. The real pain is that we
> are individually trying to resolve each object by hitting the filesystem
> (and doing lots of silly checks on the refname format, when we know it
> must be valid).
> 
> We don't actually care in this case if the ref list is up to date (we
> are not trying to update or read a ref, but only know if it exists, and
> raciness is OK). IOW, could we replace the dwim_ref call for the warning
> with something that directly queries the ref cache?

I think it would be quite practical to add an API something like

    struct ref_snapshot *get_ref_snapshot(const char *prefix)
    void release_ref_snapshot(struct ref_snapshot *)
    int lookup_ref(struct ref_snapshot *, const char *refname,
                   unsigned char *sha1, int *flags)

where prefix is the part of the refs tree that you want included in the
snapshot (e.g., "refs/heads") and ref_snapshot is probably opaque
outside of the refs module.

Symbolic refs, which are currently not stored in the ref_cache, would
have to be added because otherwise we would have to do all of the
lookups anyway.

I think this would be a good step to take for many reasons, including
because it would be another useful step in the direction of ref
transactions.

But with particular respect to "git cat-file", I see problems:

1. get_ref_snapshot() would have to read all loose and packed refs
within the specified subtree, because loose refs have to be read before
packed refs.  So the call would be expensive if there are a lot of loose
refs.  And DWIM wouldn't know in advance where the references might be,
so it would have to set prefix="".  If many refs are looked up, then it
would presumably be worth it.  But if only a couple of lookups are done
and there are a lot of loose refs, then using a cache would probably
slow things down.

The slowdown could be ameliorated by adding some more intelligence, for
example only populating the loose refs cache after a certain number of
lookups have already been done.

2. A "git cat-file --batch" process can be long-lived.  What guarantees
would users expect regarding its lookup results?  Currently, its ref
lookups reflect the state of the repo at the moment the commit
identifier is written into the pipe.  Using a cache like this would mean
that ref lookups would always reflect the snapshot taken at the start of
the "git cat-file" run, regardless of whether the script using it might
have added or modified some references since then.  I think this would
have to be considered a regression.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  reply	other threads:[~2013-07-12 10:30 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-12  6:15 [PATCH 0/7] cat-file --batch-check performance improvements Jeff King
2013-07-12  6:20 ` [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode Jeff King
2013-07-12  8:47   ` Michael Haggerty
2013-07-12  9:22     ` Jeff King
2013-07-12 10:30       ` Michael Haggerty [this message]
2013-07-15  4:23         ` Jeff King
2013-07-15  3:45       ` Junio C Hamano
2013-07-15  4:17         ` Jeff King
2013-07-12  6:21 ` [PATCH 2/7] sha1_object_info_extended: rename "status" to "type" Jeff King
2013-07-12  6:30 ` [PATCH 3/7] sha1_loose_object_info: make type lookup optional Jeff King
2013-07-12  6:31 ` [PATCH 4/7] packed_object_info: hoist delta type resolution to helper Jeff King
2013-07-12  6:32 ` [PATCH 5/7] packed_object_info: make type lookup optional Jeff King
2013-07-12  6:34 ` [PATCH 6/7] sha1_object_info_extended: make type calculation optional Jeff King
2013-07-12  6:37 ` [PATCH 7/7] sha1_object_info_extended: pass object_info to helpers Jeff King
2013-07-12 17:23 ` [PATCH 0/7] cat-file --batch-check performance improvements Junio C Hamano
2013-07-12 20:12   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51DFDAAF.60808@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.