git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Tag peeling peculiarities
@ 2013-03-13 14:59 Michael Haggerty
  2013-03-13 15:34 ` Michael Haggerty
  2013-03-13 17:29 ` Junio C Hamano
  0 siblings, 2 replies; 17+ messages in thread
From: Michael Haggerty @ 2013-03-13 14:59 UTC (permalink / raw)
  To: git discussion list; +Cc: Junio C Hamano

I have been working on the pack-refs code [1] and noticed what looks
like a problem with the handling of peeled refs in the packed-refs file
and in the reference cache.  In particular, the peeled versions of tags
outside of refs/tags are *not* stored in packed-refs, but after the
packed-refs file is read it is assumed that such tags cannot be peeled.

It is clear that annotated tags want to live under refs/tags, but there
are some ways to create them in other places (see below).  It is not
clear to me whether the prohibition of tags outside of refs/tags should
be made more airtight or whether the peeling of tags outside of
refs/tags should be fixed.

Example:

~/tmp$ git init foo
Initialized empty Git repository in /home/mhagger/tmp/foo/.git/
~/tmp$ cd foo
~/tmp/foo$ git config user.name 'Lou User'
~/tmp/foo$ git config user.email 'luser@exaple.com'
~/tmp/foo$
~/tmp/foo$ git commit --allow-empty -m "Initial commit"
[master (root-commit) 7e80ddd] Initial commit
~/tmp/foo$ git tag -m footag footag
~/tmp/foo$
~/tmp/foo$ # This is prohibited:
~/tmp/foo$ git update-ref refs/heads/foobranch $(git rev-parse footag)
error: Trying to write non-commit object
d9cdc84dd156ff83799f5226794711fbb2c8273a to branch refs/heads/foobranch
fatal: Cannot update the ref 'refs/heads/foobranch'.
~/tmp/foo$
~/tmp/foo$ # But this is allowed:
~/tmp/foo$ git update-ref refs/remotes/foo/bar $(git rev-parse footag)
~/tmp/foo$
~/tmp/foo$ # So is this:
~/tmp/foo$ git update-ref refs/yak/foobranch $(git rev-parse footag)
~/tmp/foo$
~/tmp/foo$ # Before packing, all tags are available in peel versions:
~/tmp/foo$ git show-ref -d
7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/heads/master
d9cdc84dd156ff83799f5226794711fbb2c8273a refs/remotes/foo/bar
7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/remotes/foo/bar^{}
d9cdc84dd156ff83799f5226794711fbb2c8273a refs/tags/footag
7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/tags/footag^{}
d9cdc84dd156ff83799f5226794711fbb2c8273a refs/yak/foobranch
7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/yak/foobranch^{}
~/tmp/foo$
~/tmp/foo$ git pack-refs --all
~/tmp/foo$
~/tmp/foo$ # After packing, tags outside of refs/tags are not peeled any
more:
~/tmp/foo$ git show-ref -d
7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/heads/master
d9cdc84dd156ff83799f5226794711fbb2c8273a refs/remotes/foo/bar
d9cdc84dd156ff83799f5226794711fbb2c8273a refs/tags/footag
7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/tags/footag^{}
d9cdc84dd156ff83799f5226794711fbb2c8273a refs/yak/foobranch
~/tmp/foo$
~/tmp/foo$ # Peeling the tags via "tag^0" works even after packing:
~/tmp/foo$ git rev-parse refs/yak/foobranch^0
7e80ddd68f0225a0ea221f7cddbacf050be5a265
~/tmp/foo$
~/tmp/foo$ # Here is another way to create a tag outside of refs/tags:
~/tmp/foo$ cd ..
~/tmp$ git clone foo foo-clone
Cloning into 'foo-clone'...
done.
~/tmp$ cd foo-clone
~/tmp/foo-clone$ git config --add remote.origin.fetch
'+refs/tags/*:refs/remotes/origin/tags/*'
~/tmp/foo-clone$ git fetch
From /home/mhagger/tmp/foo
 * [new tag]         footag     -> origin/tags/footag
~/tmp/foo-clone$
~/tmp/foo-clone$ # Again, the tag outside of refs/tags are not peeled
correctly after packing:
~/tmp/foo-clone$ git pack-refs --all
~/tmp/foo-clone$ git show-ref -d
7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/heads/master
7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/remotes/origin/HEAD
7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/remotes/origin/master
d9cdc84dd156ff83799f5226794711fbb2c8273a refs/remotes/origin/tags/footag
d9cdc84dd156ff83799f5226794711fbb2c8273a refs/tags/footag
7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/tags/footag^{}

Michael

[1] I am trying to fix the problem that peeled refs are lost whenever a
packed reference is deleted.

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-13 14:59 Tag peeling peculiarities Michael Haggerty
@ 2013-03-13 15:34 ` Michael Haggerty
  2013-03-13 17:29 ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Michael Haggerty @ 2013-03-13 15:34 UTC (permalink / raw)
  To: git discussion list; +Cc: Junio C Hamano

On 03/13/2013 03:59 PM, Michael Haggerty wrote:
> I have been working on the pack-refs code [1] and noticed what looks
> like a problem with the handling of peeled refs in the packed-refs file
> and in the reference cache.  In particular, the peeled versions of tags
> outside of refs/tags are *not* stored in packed-refs, but after the
> packed-refs file is read it is assumed that such tags cannot be peeled.
> 
> It is clear that annotated tags want to live under refs/tags, but there
> are some ways to create them in other places (see below).  It is not
> clear to me whether the prohibition of tags outside of refs/tags should
> be made more airtight or whether the peeling of tags outside of
> refs/tags should be fixed.
> 
> Example:
> [...]

I should have mentioned that I already understand the programmatic
*cause* of the behavior that I described in my last email:

* in pack-refs.c:handle_one_ref(), tags that are not in refs/tags are
  explicitly excluded from being peeled.

* in refs.c:read_packed_refs(), if the packed-refs file starts with

      "# pack-refs with: peeled "

  then the REF_KNOWS_PEELED bit is set on *every* reference read from
  the file into the packed refs cache, whether or not it is under
  refs/tags.

* in refs.c:peel_ref(), if a refs cache entry has its REF_KNOWS_PEELED
  bit set but its peeled field is empty, then it is assumed that the
  reference is unpeelable.

What I am *not* clear about is which of these steps is incorrect, and
also whether this problem will have any significant ill effects.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-13 14:59 Tag peeling peculiarities Michael Haggerty
  2013-03-13 15:34 ` Michael Haggerty
@ 2013-03-13 17:29 ` Junio C Hamano
  2013-03-13 21:58   ` Jeff King
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2013-03-13 17:29 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git discussion list

Michael Haggerty <mhagger@alum.mit.edu> writes:

> It is not
> clear to me whether the prohibition of tags outside of refs/tags should
> be made more airtight or whether the peeling of tags outside of
> refs/tags should be fixed.

Retroactively forbidding presense/creation of tags outside the
designated refs/tags hierarchy will not fly.  I think we should peel
them when we are reading from "# pack-refs with: peeled" version.

Theoretically, we could introduce "# pack-refs with: fully-peeled"
that records peeled versions for _all_ annotated tags even in
unusual places, but that would be introducing problems to existing
versions of the software to cater to use cases that is not blessed
officially, so I doubt it has much value.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-13 17:29 ` Junio C Hamano
@ 2013-03-13 21:58   ` Jeff King
  2013-03-14  4:41     ` Michael Haggerty
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2013-03-13 21:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael Haggerty, git discussion list

On Wed, Mar 13, 2013 at 10:29:54AM -0700, Junio C Hamano wrote:

> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
> > It is not
> > clear to me whether the prohibition of tags outside of refs/tags should
> > be made more airtight or whether the peeling of tags outside of
> > refs/tags should be fixed.
> 
> Retroactively forbidding presense/creation of tags outside the
> designated refs/tags hierarchy will not fly.  I think we should peel
> them when we are reading from "# pack-refs with: peeled" version.
> 
> Theoretically, we could introduce "# pack-refs with: fully-peeled"
> that records peeled versions for _all_ annotated tags even in
> unusual places, but that would be introducing problems to existing
> versions of the software to cater to use cases that is not blessed
> officially, so I doubt it has much value.

I agree. The REF_KNOWS_PEELED thing is an optimization, so it's OK to
simply omit the optimization in the corner case, since we don't expect
it to happen often. So what happens now is a bug, but it is a bug in the
reader that mis-applies the optimization, and we can just fix the
reader.

I do not think there is any point in peeling while we are reading the
pack-refs file; it is no more expensive to do so later, and in most
cases we will not even bother peeling. We should simply omit the
REF_KNOWS_PEELED bit so that later calls to peel_ref do not follow the
optimization code path. Something like this:

diff --git a/refs.c b/refs.c
index 175b9fc..ee498c8 100644
--- a/refs.c
+++ b/refs.c
@@ -824,7 +824,10 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
 
 		refname = parse_ref_line(refline, sha1);
 		if (refname) {
-			last = create_ref_entry(refname, sha1, flag, 1);
+			int this_flag = flag;
+			if (prefixcmp(refname, "refs/tags/"))
+				this_flag &= ~REF_KNOWS_PEELED;
+			last = create_ref_entry(refname, sha1, this_flag, 1);
 			add_ref(dir, last);
 			continue;
 		}

I think with this patch, though, that upload-pack would end up having to
read the object type of everything under refs/heads to decide whether it
needs to be peeled (and in most cases, it does not, making the initial
ref advertisement potentially much more expensive). How do we want to
handle that? Should we teach upload-pack not to bother advertising peels
outside of refs/tags? That would break people who expect tag
auto-following to work for refs outside of refs/tags, but I am not sure
that is a sane thing to expect.

-Peff

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-13 21:58   ` Jeff King
@ 2013-03-14  4:41     ` Michael Haggerty
  2013-03-14  5:24       ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Haggerty @ 2013-03-14  4:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git discussion list

On 03/13/2013 10:58 PM, Jeff King wrote:
> On Wed, Mar 13, 2013 at 10:29:54AM -0700, Junio C Hamano wrote:
> 
>> Michael Haggerty <mhagger@alum.mit.edu> writes:
>>
>>> It is not
>>> clear to me whether the prohibition of tags outside of refs/tags should
>>> be made more airtight or whether the peeling of tags outside of
>>> refs/tags should be fixed.
>>
>> Retroactively forbidding presense/creation of tags outside the
>> designated refs/tags hierarchy will not fly.  I think we should peel
>> them when we are reading from "# pack-refs with: peeled" version.
>>
>> Theoretically, we could introduce "# pack-refs with: fully-peeled"
>> that records peeled versions for _all_ annotated tags even in
>> unusual places, but that would be introducing problems to existing
>> versions of the software to cater to use cases that is not blessed
>> officially, so I doubt it has much value.

I think that instead of changing "peeled" to "fully-peeled", it would be
better to add "fully-peeled" as an additional keyword, like

    # pack-refs with: peeled fully-peeled

Old readers would still see the "peeled" keyword and ignore the
fully-peeled keyword, and would be able to read the file correctly.  See
below for more discussion.

> I agree. The REF_KNOWS_PEELED thing is an optimization, so it's OK to
> simply omit the optimization in the corner case, since we don't expect
> it to happen often. So what happens now is a bug, but it is a bug in the
> reader that mis-applies the optimization, and we can just fix the
> reader.
> 
> I do not think there is any point in peeling while we are reading the
> pack-refs file; it is no more expensive to do so later, and in most
> cases we will not even bother peeling. We should simply omit the
> REF_KNOWS_PEELED bit so that later calls to peel_ref do not follow the
> optimization code path. Something like this:
> 
> diff --git a/refs.c b/refs.c
> index 175b9fc..ee498c8 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -824,7 +824,10 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
>  
>  		refname = parse_ref_line(refline, sha1);
>  		if (refname) {
> -			last = create_ref_entry(refname, sha1, flag, 1);
> +			int this_flag = flag;
> +			if (prefixcmp(refname, "refs/tags/"))
> +				this_flag &= ~REF_KNOWS_PEELED;
> +			last = create_ref_entry(refname, sha1, this_flag, 1);
>  			add_ref(dir, last);
>  			continue;
>  		}

It would also be possible to set the REF_KNOWS_PEELED for any entries
for which a peeled reference happens to be present in the packed-refs
file, though the code would never be triggered if the current writer is
not changed.

> I think with this patch, though, that upload-pack would end up having to
> read the object type of everything under refs/heads to decide whether it
> needs to be peeled (and in most cases, it does not, making the initial
> ref advertisement potentially much more expensive). How do we want to
> handle that? Should we teach upload-pack not to bother advertising peels
> outside of refs/tags? That would break people who expect tag
> auto-following to work for refs outside of refs/tags, but I am not sure
> that is a sane thing to expect.

Here is analysis of our options as I see them:

1. Accept that tags outside of refs/tags are not reliably advertised in
   their peeled form.  Document this deficiency and either:

   a. Don't even bother trying to peel refs outside of refs/tags (the
      status quo).

   b. Change the pack-refs writer to write all peeled refs, but leave
      the reader unchanged.  This is a low-risk option that would cause
      old and new clients to do the right thing when reading a full
      packed-refs file, but an old or new client reading a non-full
      packed-refs file would not realize that it is non-full and would
      fail to advertise all peeled refs.  Minor disadvantage: pack-refs
      becomes slower.

2. Insist that tags outside of refs/tags are reliably advertised.  I
   see three ways to do it:

   a. Without changing the packed-refs contents.  This would require
      upload-pack to read *all* references outside of refs/tags.  (This
      is what Peff's patch does.)

   b. Write all peeled refs to packed-refs without changing the
      packed-refs header.  This would hardly help, as upload-pack
      would still have to read all non-tag references outside of
      refs/tags to be sure that none are tags.

   c. Add a new keyword to the top of the packed-refs file as
      described above.  Then

      * Old writer, new reader: the reader would know that some
        peeled refs might be missing.  upload-pack would have to
        resolve refs outside of refs/tags, but could optionally write
        a new-format packed-refs file to avoid repeating the work.

      * New writer, new reader: the reader would know that all refs
        are peeled properly and would not have to read any objects.

      * Old writer, old reader: status quo; peeled refs are advertised
        incompletely.

      * New writer, old reader: This is the interesting case.  The
        current code in Git would read the header and see "peeled" but
        ignore "fully-peeled".  But the line-reading code would
        nevertheless happily read and record peeled references no
        matter what namespace they live in.  It would also advertise
        them correctly.  One would have to check historical versions
        and other clients to see whether they would also behave well.

   d. Add some new notation, like a "^" on a line by itself, to mean
      "I tried to peel this reference but it was not an annotated tag".
      Old readers ignore such lines but new readers could take it
      as an indication to set the REF_KNOWS_PEELED bit for that entry.
      (It is not clear to me whether it would be permissible to make a
      change like this without changing the header line.)

I think the best option would be 1b.  Even though there would never be a
guarantee that all peeled references are recorded and advertised
correctly, the behavior would asymptotically approach correctness as old
Git versions are retired, and the situations where it fails are probably
rare and no worse than the status quo.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-14  4:41     ` Michael Haggerty
@ 2013-03-14  5:24       ` Jeff King
  2013-03-14  5:32         ` Jeff King
  2013-03-14 11:28         ` Michael Haggerty
  0 siblings, 2 replies; 17+ messages in thread
From: Jeff King @ 2013-03-14  5:24 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, git discussion list

On Thu, Mar 14, 2013 at 05:41:58AM +0100, Michael Haggerty wrote:

> Here is analysis of our options as I see them:
> 
> 1. Accept that tags outside of refs/tags are not reliably advertised in
>    their peeled form.  Document this deficiency and either:
> 
>    a. Don't even bother trying to peel refs outside of refs/tags (the
>       status quo).

When you say "not reliably advertised" you mean from upload-pack, right?
What about other callers? From my reading of peel_ref, anybody who calls
it to get a packed ref may actually get a wrong answer. So this is not
just about tag auto-following over fetch, but about other places, too
(however, a quick grep does not make it look like this other places are
all that commonly used).

Another fun fact: upload-pack did not use peel_ref until recently
(435c833, in v1.8.1). So while it is tempting to say "well, this was
always broken, and nobody cared", it was not really; it is a fairly
recent regression in 435c833.

>    b. Change the pack-refs writer to write all peeled refs, but leave
>       the reader unchanged.  This is a low-risk option that would cause
>       old and new clients to do the right thing when reading a full
>       packed-refs file, but an old or new client reading a non-full
>       packed-refs file would not realize that it is non-full and would
>       fail to advertise all peeled refs.  Minor disadvantage: pack-refs
>       becomes slower.

This seems sane. The missing thing is a flag to tell the reader that it
was written by a newer version; I see you dealt with that case below.

I don't think pack-refs being a little bit slower matters. Checking the
types to peel is not even that much work; it's just that it adds up when
you do it for every no-op fetch's ref advertisement. But we can assume
that writing happens a lot less than reading; that is the point of
storing the peeled information in the first place. If that assumption is
not correct in some scenario, then those people should probably not be
packing their refs at all, so I think we can discount them from this
discussion.

> 2. Insist that tags outside of refs/tags are reliably advertised.  I
>    see three ways to do it:
> 
>    a. Without changing the packed-refs contents.  This would require
>       upload-pack to read *all* references outside of refs/tags.  (This
>       is what Peff's patch does.)

FWIW, this is what upload-pack used to do before I switched it to use
peel_ref. The savings are measurable, though they are not
ground-breaking. Still, I think your "fully-packed" proposal above lets
us keep the improvement without too much effort.

>    b. Write all peeled refs to packed-refs without changing the
>       packed-refs header.  This would hardly help, as upload-pack
>       would still have to read all non-tag references outside of
>       refs/tags to be sure that none are tags.

Right, this seems silly; the new reader does extra work for
compatibility with an older writer, but the normal case is to use the
same version. The obvious optimization is to add a flag that says "hey,
I was written by a new writer". And that is your "fully-packed"
proposal, covered below.

>    c. Add a new keyword to the top of the packed-refs file as
>       described above.  Then
> 
>       * Old writer, new reader: the reader would know that some
>         peeled refs might be missing.  upload-pack would have to
>         resolve refs outside of refs/tags, but could optionally write
>         a new-format packed-refs file to avoid repeating the work.

I think that is OK. For the most part, this is a temporary situation
that happens when you are moving from an older to a newer version of
git. If you are switching back and forth between versions, then we are
correct, but you don't get the benefit of the micro-optimization, which
seems fair.

>       * New writer, new reader: the reader would know that all refs
>         are peeled properly and would not have to read any objects.

This is the common case I think we should be optimizing for. And
obviously the outcome here is good. It's also fine without adding the
"fully-packed" flag, though.

>       * Old writer, old reader: status quo; peeled refs are advertised
>         incompletely.

Right. We can't fix those without a time machine, though.

>       * New writer, old reader: This is the interesting case.  The
>         current code in Git would read the header and see "peeled" but
>         ignore "fully-peeled".  But the line-reading code would
>         nevertheless happily read and record peeled references no
>         matter what namespace they live in.  It would also advertise
>         them correctly.  One would have to check historical versions
>         and other clients to see whether they would also behave well.

Right. So we have pretty sane backwards-compatible behavior. I think if
other implementations are not happy seeing a new tag, they are wrong.
The whole point of the "#"-tags is for future expansion. Looking over
libgit2, it seems to ignore the "#"-line completely, so I think it would
behave OK (and it should be taught about fully-peeled, too, but that is
not our immediate problem).

>    d. Add some new notation, like a "^" on a line by itself, to mean
>       "I tried to peel this reference but it was not an annotated tag".
>       Old readers ignore such lines but new readers could take it
>       as an indication to set the REF_KNOWS_PEELED bit for that entry.
>       (It is not clear to me whether it would be permissible to make a
>       change like this without changing the header line.)

I don't see the point. The writer wants to provide REF_KNOWS_PEELED for
every entry in its list so that the reader (which is run more often)
does not have to put forth any effort. So this accomplishes the same
thing as a "anything without a peeled ref did not need peeling" flag,
but takes more space.

> I think the best option would be 1b.  Even though there would never be a
> guarantee that all peeled references are recorded and advertised
> correctly, the behavior would asymptotically approach correctness as old
> Git versions are retired, and the situations where it fails are probably
> rare and no worse than the status quo.

Thanks for laying out the options. I think 1b or 2c are the only sane
paths forward. With either option, a newer writer produces something
that all implementations, old and new, should get right, and that is
primarily what we care about.

So the only question is how much work we want to put into making sure
the new reader handles the old writer correctly. Doing 2c is obviously
more rigorous, and it is not that much work to add the fully-packed
flag, but I kind of wonder if anybody even cares. We can just say "it's
a bug fix; run `git pack-refs` again if you care" and call it a day
(i.e., 1b).

I could go either way.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-14  5:24       ` Jeff King
@ 2013-03-14  5:32         ` Jeff King
  2013-03-14 15:14           ` Junio C Hamano
  2013-03-14 11:28         ` Michael Haggerty
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff King @ 2013-03-14  5:32 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, git discussion list

On Thu, Mar 14, 2013 at 01:24:48AM -0400, Jeff King wrote:

> So the only question is how much work we want to put into making sure
> the new reader handles the old writer correctly. Doing 2c is obviously
> more rigorous, and it is not that much work to add the fully-packed
> flag, but I kind of wonder if anybody even cares. We can just say "it's
> a bug fix; run `git pack-refs` again if you care" and call it a day
> (i.e., 1b).

Urgh, for some reason I kept writing "fully-packed" but obviously I
meant "fully-peeled". Hopefully you figured it out.

Just as a quick sketch of how much work is in involved in 2c, I think
the complete solution would look like this (but note I haven't tested
this at all):

diff --git a/pack-refs.c b/pack-refs.c
index f09a054..261a6a6 100644
--- a/pack-refs.c
+++ b/pack-refs.c
@@ -27,6 +27,7 @@ static int handle_one_ref(const char *path, const unsigned char *sha1,
 			  int flags, void *cb_data)
 {
 	struct pack_refs_cb_data *cb = cb_data;
+	struct object *o;
 	int is_tag_ref;
 
 	/* Do not pack the symbolic refs */
@@ -39,14 +40,12 @@ static int handle_one_ref(const char *path, const unsigned char *sha1,
 		return 0;
 
 	fprintf(cb->refs_file, "%s %s\n", sha1_to_hex(sha1), path);
-	if (is_tag_ref) {
-		struct object *o = parse_object(sha1);
-		if (o->type == OBJ_TAG) {
-			o = deref_tag(o, path, 0);
-			if (o)
-				fprintf(cb->refs_file, "^%s\n",
-					sha1_to_hex(o->sha1));
-		}
+	o = parse_object(sha1);
+	if (o->type == OBJ_TAG) {
+		o = deref_tag(o, path, 0);
+		if (o)
+			fprintf(cb->refs_file, "^%s\n",
+				sha1_to_hex(o->sha1));
 	}
 
 	if ((cb->flags & PACK_REFS_PRUNE) && !do_not_prune(flags)) {
@@ -128,7 +127,7 @@ int pack_refs(unsigned int flags)
 		die_errno("unable to create ref-pack file structure");
 
 	/* perhaps other traits later as well */
-	fprintf(cbdata.refs_file, "# pack-refs with: peeled \n");
+	fprintf(cbdata.refs_file, "# pack-refs with: peeled fully-peeled \n");
 
 	for_each_ref(handle_one_ref, &cbdata);
 	if (ferror(cbdata.refs_file))
diff --git a/refs.c b/refs.c
index 175b9fc..770abf4 100644
--- a/refs.c
+++ b/refs.c
@@ -808,6 +808,7 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
 	struct ref_entry *last = NULL;
 	char refline[PATH_MAX];
 	int flag = REF_ISPACKED;
+	int fully_peeled = 0;
 
 	while (fgets(refline, sizeof(refline), f)) {
 		unsigned char sha1[20];
@@ -818,13 +819,18 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
 			const char *traits = refline + sizeof(header) - 1;
 			if (strstr(traits, " peeled "))
 				flag |= REF_KNOWS_PEELED;
+			if (strstr(traits, " fully-peeled "))
+				fully_peeled = 1;
 			/* perhaps other traits later as well */
 			continue;
 		}
 
 		refname = parse_ref_line(refline, sha1);
 		if (refname) {
-			last = create_ref_entry(refname, sha1, flag, 1);
+			int this_flag = flag;
+			if (!fully_peeled && prefixcmp(refname, "refs/tags/"))
+				this_flag &= ~REF_KNOWS_PEELED;
+			last = create_ref_entry(refname, sha1, this_flag, 1);
 			add_ref(dir, last);
 			continue;
 		}

So it's really not that much code. The bigger question is whether we
want to have to carry the "fully-peeled" tag forever, and how other
implementations would treat it.

-Peff

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-14  5:24       ` Jeff King
  2013-03-14  5:32         ` Jeff King
@ 2013-03-14 11:28         ` Michael Haggerty
  2013-03-14 13:40           ` Jeff King
  1 sibling, 1 reply; 17+ messages in thread
From: Michael Haggerty @ 2013-03-14 11:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git discussion list

On 03/14/2013 06:24 AM, Jeff King wrote:
> On Thu, Mar 14, 2013 at 05:41:58AM +0100, Michael Haggerty wrote:
> 
>> Here is analysis of our options as I see them:
>>
>> 1. Accept that tags outside of refs/tags are not reliably advertised in
>>    their peeled form.  Document this deficiency and either:
>>
>>    a. Don't even bother trying to peel refs outside of refs/tags (the
>>       status quo).
> 
> When you say "not reliably advertised" you mean from upload-pack, right?
> What about other callers? From my reading of peel_ref, anybody who calls
> it to get a packed ref may actually get a wrong answer. So this is not
> just about tag auto-following over fetch, but about other places, too
> (however, a quick grep does not make it look like this other places are
> all that commonly used).

Yes, this is a good point.  I didn't audit all of the callers to see
whether any of them need 100% reliability, but I guess I should:

upload-pack.c:763: in send_ref(): This is the caller that we have been
talking about.

builtin/pack-objects.c:559: in mark_tagged(): This function is only
called for references under refs/tags.

builtin/pack-objects.c:2035: in add_ref_tag(): peel_ref() is called only
for refnames under refs/tags.

builtin/describe.c:147: in get_name(): peel_ref() is called only for
refnames under refs/tags.

builtin/show-ref.c:81: in show_ref(): this is broken too, as I showed in
my original email.  This breakage was also introduced in 435c833.

Perhaps if peel_ref() *were* 100% reliable, we might be able to use it
to avoid object lookups in some other places.

> Another fun fact: upload-pack did not use peel_ref until recently
> (435c833, in v1.8.1). So while it is tempting to say "well, this was
> always broken, and nobody cared", it was not really; it is a fairly
> recent regression in 435c833.

I didn't realize that; thanks for pointing it out.  Although peel_ref()
itself has been broken since it was introduced, at least in the sense
that it gives wrong answers for tags outside of refs/tags, prior to
435c833 it appears to only have been used for refs/tags.

>> I think the best option would be 1b.  Even though there would never be a
>> guarantee that all peeled references are recorded and advertised
>> correctly, the behavior would asymptotically approach correctness as old
>> Git versions are retired, and the situations where it fails are probably
>> rare and no worse than the status quo.
> 
> Thanks for laying out the options. I think 1b or 2c are the only sane
> paths forward. With either option, a newer writer produces something
> that all implementations, old and new, should get right, and that is
> primarily what we care about.
> 
> So the only question is how much work we want to put into making sure
> the new reader handles the old writer correctly. Doing 2c is obviously
> more rigorous, and it is not that much work to add the fully-packed
> flag, but I kind of wonder if anybody even cares. We can just say "it's
> a bug fix; run `git pack-refs` again if you care" and call it a day
> (i.e., 1b).
> 
> I could go either way.

On 03/14/2013 06:32 AM, Jeff King wrote:
> [...]
> So it's really not that much code. The bigger question is whether we
> want to have to carry the "fully-peeled" tag forever, and how other
> implementations would treat it.

I agree that 2c is also an attractive option.  Its biggest advantage is
that it make peel_ref() reliable and therefore potentially usable for
other purposes.  And probably the effort of carrying forward the
"fully-peeled" tag is no more work than the alternative of carrying
forward the knowledge that peel_ref() is unreliable.

Your patch looks about right aside from its lack of comments :-).  I was
going to implement approximately the same thing in a patch series that I
am working on, but if you prefer then go ahead and submit your patch and
I will rebase my branch on top of it.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-14 11:28         ` Michael Haggerty
@ 2013-03-14 13:40           ` Jeff King
  2013-03-15  5:12             ` Michael Haggerty
  2013-03-16  8:48             ` Michael Haggerty
  0 siblings, 2 replies; 17+ messages in thread
From: Jeff King @ 2013-03-14 13:40 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, git discussion list

On Thu, Mar 14, 2013 at 12:28:53PM +0100, Michael Haggerty wrote:

> Perhaps if peel_ref() *were* 100% reliable, we might be able to use it
> to avoid object lookups in some other places.

In theory, some of the many uses of deref_tag could be adopted. However,
we do not always have the refname handy at that point (and I believe
peel_ref's optimization only kicks in during for_each_ref traversals
anyway).

It may still be a win to check the packed-refs file before peeling a
random sha1, as looking up there should be cheaper than actually loading
the object. But right now, the way the optimization is used is always
O(1) to just check the last ref loaded. With your recent ref
refactoring, I think we should be able to do lookups in O(lg n).

> > Another fun fact: upload-pack did not use peel_ref until recently
> > (435c833, in v1.8.1). So while it is tempting to say "well, this was
> > always broken, and nobody cared", it was not really; it is a fairly
> > recent regression in 435c833.
> 
> I didn't realize that; thanks for pointing it out.  Although peel_ref()
> itself has been broken since it was introduced, at least in the sense
> that it gives wrong answers for tags outside of refs/tags, prior to
> 435c833 it appears to only have been used for refs/tags.

Hmph. I coincidentally ran across another problem with 435c833 today.
Try this:

  git init repo &&
  cd repo &&
  git commit --allow-empty -m one &&
  git checkout -b other &&
  git commit --allow-empty -m two &&
  git checkout master &&
  git commit --allow-empty -m three &&
  git pack-refs --all &&
  git.compile clone --no-local --depth=1 --no-single-branch . foo

I get:

  Cloning into 'foo'...
  fatal: (null) is unknown object
  remote: Total 0 (delta 0), reused 0 (delta 0)
  fatal: recursion detected in die handler
  remote: aborting due to possible repository corruption on the remote side.
  fatal: error in sideband demultiplexer

This is not due to the same problem you are describing with peel_refs,
but simply due to the fact that we do not load and parse objects anymore
(which is the point of the optimization). The client feeds these back to
us in the "want" list, and we then feed these objects to the revision
walker, which expects them to be parsed and have their "type" field
actually filled in.

We never noticed before because:

  1. It only happens with --depth, because otherwise we do not do the
     revision walk in-process.

  2. Modern git, when given --depth, will default to --single-branch,
     and so ask only about HEAD, which we do parse. However, older
     versions of git (pre v1.7.10) will ask for much more, and trigger
     the bug.

     I haven't tried yet, but I suspect you may also be able to trigger
     it by asking for "clone --depth=1 -b other".

That's the overtly visible symptom. We also check the type in
ok_to_give_up, so I suspect that can produce subtly wrong results in
some situations. The solution is to actually parse the objects in
question, but I need to figure out when is the optimal time. Obviously
any time we read a "want" line would be enough, but we may be able to
get by with less. OTOH, it probably doesn't matter that much; the point
of the optimization was to avoid touching objects for the ref
advertisement. Once we have a "want", the client really is going to
fetch objects, and accessing them will probably be lost in the noise.

But that's somewhat off-topic for this discussion. I'll look into it
further and try to make a patch later today or tomorrow.

> Your patch looks about right aside from its lack of comments :-).  I was
> going to implement approximately the same thing in a patch series that I
> am working on, but if you prefer then go ahead and submit your patch and
> I will rebase my branch on top of it.

I just meant it as a quick sketch, since you said you were working in
the area. I'm not sure what you are working on. If we don't consider it
an urgent bugfix, I'm just as happy for you to incorporate the idea into
what you are doing.

But if we want to do it as a maint-track fix, I can clean up my patch.
Junio?

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-14  5:32         ` Jeff King
@ 2013-03-14 15:14           ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2013-03-14 15:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Haggerty, git discussion list

Jeff King <peff@peff.net> writes:

> On Thu, Mar 14, 2013 at 01:24:48AM -0400, Jeff King wrote:
>
>> So the only question is how much work we want to put into making sure
>> the new reader handles the old writer correctly. Doing 2c is obviously
>> more rigorous, and it is not that much work to add the fully-packed
>> flag, but I kind of wonder if anybody even cares. We can just say "it's
>> a bug fix; run `git pack-refs` again if you care" and call it a day
>> (i.e., 1b).
>
> Urgh, for some reason I kept writing "fully-packed" but obviously I
> meant "fully-peeled". Hopefully you figured it out.
>
> Just as a quick sketch of how much work is in involved in 2c, I think
> the complete solution would look like this (but note I haven't tested
> this at all):

Looks reasonable from a cursory look, and I think we cannot avoid
going with the route to add "fully-peeled" token, because we do have
to express "we know this ref outside refs/tags/ area is not an
annotated tag" by marking them as REF_KNOWS_PEELED for the peel_ref()
optimization to work.



> diff --git a/pack-refs.c b/pack-refs.c
> index f09a054..261a6a6 100644
> --- a/pack-refs.c
> +++ b/pack-refs.c
> @@ -27,6 +27,7 @@ static int handle_one_ref(const char *path, const unsigned char *sha1,
>  			  int flags, void *cb_data)
>  {
>  	struct pack_refs_cb_data *cb = cb_data;
> +	struct object *o;
>  	int is_tag_ref;
>  
>  	/* Do not pack the symbolic refs */
> @@ -39,14 +40,12 @@ static int handle_one_ref(const char *path, const unsigned char *sha1,
>  		return 0;
>  
>  	fprintf(cb->refs_file, "%s %s\n", sha1_to_hex(sha1), path);
> -	if (is_tag_ref) {
> -		struct object *o = parse_object(sha1);
> -		if (o->type == OBJ_TAG) {
> -			o = deref_tag(o, path, 0);
> -			if (o)
> -				fprintf(cb->refs_file, "^%s\n",
> -					sha1_to_hex(o->sha1));
> -		}
> +	o = parse_object(sha1);
> +	if (o->type == OBJ_TAG) {
> +		o = deref_tag(o, path, 0);
> +		if (o)
> +			fprintf(cb->refs_file, "^%s\n",
> +				sha1_to_hex(o->sha1));
>  	}
>  
>  	if ((cb->flags & PACK_REFS_PRUNE) && !do_not_prune(flags)) {
> @@ -128,7 +127,7 @@ int pack_refs(unsigned int flags)
>  		die_errno("unable to create ref-pack file structure");
>  
>  	/* perhaps other traits later as well */
> -	fprintf(cbdata.refs_file, "# pack-refs with: peeled \n");
> +	fprintf(cbdata.refs_file, "# pack-refs with: peeled fully-peeled \n");
>  
>  	for_each_ref(handle_one_ref, &cbdata);
>  	if (ferror(cbdata.refs_file))
> diff --git a/refs.c b/refs.c
> index 175b9fc..770abf4 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -808,6 +808,7 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
>  	struct ref_entry *last = NULL;
>  	char refline[PATH_MAX];
>  	int flag = REF_ISPACKED;
> +	int fully_peeled = 0;
>  
>  	while (fgets(refline, sizeof(refline), f)) {
>  		unsigned char sha1[20];
> @@ -818,13 +819,18 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
>  			const char *traits = refline + sizeof(header) - 1;
>  			if (strstr(traits, " peeled "))
>  				flag |= REF_KNOWS_PEELED;
> +			if (strstr(traits, " fully-peeled "))
> +				fully_peeled = 1;
>  			/* perhaps other traits later as well */
>  			continue;
>  		}
>  
>  		refname = parse_ref_line(refline, sha1);
>  		if (refname) {
> -			last = create_ref_entry(refname, sha1, flag, 1);
> +			int this_flag = flag;
> +			if (!fully_peeled && prefixcmp(refname, "refs/tags/"))
> +				this_flag &= ~REF_KNOWS_PEELED;
> +			last = create_ref_entry(refname, sha1, this_flag, 1);
>  			add_ref(dir, last);
>  			continue;
>  		}
>
> So it's really not that much code. The bigger question is whether we
> want to have to carry the "fully-peeled" tag forever, and how other
> implementations would treat it.
>
> -Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-14 13:40           ` Jeff King
@ 2013-03-15  5:12             ` Michael Haggerty
  2013-03-15 16:28               ` Junio C Hamano
  2013-03-16  8:48             ` Michael Haggerty
  1 sibling, 1 reply; 17+ messages in thread
From: Michael Haggerty @ 2013-03-15  5:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git discussion list

On 03/14/2013 02:40 PM, Jeff King wrote:
> On Thu, Mar 14, 2013 at 12:28:53PM +0100, Michael Haggerty wrote:
> 
>> Perhaps if peel_ref() *were* 100% reliable, we might be able to use it
>> to avoid object lookups in some other places.
> 
> In theory, some of the many uses of deref_tag could be adopted. However,
> we do not always have the refname handy at that point (and I believe
> peel_ref's optimization only kicks in during for_each_ref traversals
> anyway).
> 
> It may still be a win to check the packed-refs file before peeling a
> random sha1, as looking up there should be cheaper than actually loading
> the object. But right now, the way the optimization is used is always
> O(1) to just check the last ref loaded. With your recent ref
> refactoring, I think we should be able to do lookups in O(lg n).

There are more conceptual problems in the handling of peeled references
and current_ref.  I want to document them here for future reference and
to get other people thinking about whether other parts of the code might
be making wrong assumptions about this stuff.


What is stored in ref_value.peeled?  Is it the peeled version of
ref_value.sha1, or is it the peeled version of the associated refname?
Because they are not necessarily the same thing: an entry in the packed
ref_cache *might* be overridden by a loose reference with the same
refname but a different SHA1 which peels to a different value.

Obviously we cannot always guarantee that ref_value.peeled is the peeled
version of the refname, because it is read from a packed-refs file that
might be old.  So the only sane policy is that ref_value.peeled should
be the peeled version of ref_value.sha1 regardless of whether the
refname now points elsewhere.

It follows that the only time we can be sure that ref_value.peeled is
the correct peeled version of refname is if we know that there is no
loose reference overriding it.

This leads to some subtleties when reading and writing ref_value.peeled.


When reading ref_value.peeled:

When we iterate over references using do_for_each_ref(), then we only
present a packed ref if there is no loose ref with the same refname.  So
in that case it is OK to point current_ref at the packed ref entry, and
if somebody calls peel_ref() for the current reference then (modulo race
conditions?) current_ref->u.value.peeled is indeed the correct peeled
value for that refname.

But if we iterate over *only* the packed refs using
do_for_each_ref_in_dir() (as, for example, in repack_without_ref()),
then the iteration doesn't even look at the loose refs, so we don't know
whether the packed ref_entry being iterated over is authoritative.  But
we nevertheless set current_ref to that entry.  So if somebody would
call peel_ref() on the current refname during such an iteration, they
could get the peeled version of the packed ref rather than the correct
peeled version for the refname.

Currently, nobody calls peel_ref() during such an iteration, so I don't
think it causes any bugs.  But it is an unmarked landmine.


When setting ref_value.peeled:

When determining the value to write to ref_value.peeled (which I believe
only happens in pack-refs), we should record the peeled version of
ref_entry.sha1, and *not* the peeled version of refname.  This is indeed
what pack-refs does.  But if somebody were to get the clever and
reasonable-sounding idea of using peel_ref() in the implementation of
pack-refs.c:handle_one_ref(), then it would be easy to accidentally get
a peeled value for the current refname rather than for the SHA1 being
recorded in the file.


I am trying to change repack_without_refs() to write the peeled refs to
the packed-refs file (currently the peeled refs are lost if a packed ref
is deleted).  For this I would like to reuse the pre-peeled values from
the old version of the packed-refs file to avoid having to resolve them
all again.  So all of the issues mentioned above are coming together for me.

I think my solution will be to add a static function in refs.c that
passes pointers to the ref_entries to its callback, so that the function
has direct access to the peeled member rather than having to access it
via the information-losing peel_ref().

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-15  5:12             ` Michael Haggerty
@ 2013-03-15 16:28               ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2013-03-15 16:28 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jeff King, git discussion list

Michael Haggerty <mhagger@alum.mit.edu> writes:

> What is stored in ref_value.peeled?  Is it the peeled version of
> ref_value.sha1, or is it the peeled version of the associated refname?
> Because they are not necessarily the same thing: an entry in the packed
> ref_cache *might* be overridden by a loose reference with the same
> refname but a different SHA1 which peels to a different value.

The way it should work is that we look for loose ones first and then
only if we do not find a loose one we use packed one, when asked for
a ref by name.  The .sha1 and .peeled fields for a single ref_entry
struct must be consistent with each other, even though you might
have got a ref_value by reading the packed-refs file and another
ref_value by reading loose one and have both in core.  When you have
both packed and loose, the former should not be used at all, and it
certainly should not be pointed by current_ref.

Stepping back a bit, I suspect that it may be worth evaluating to
see if it still makes sense to keep the current_ref optimization
that was introduced in 0ae91be0e1fa (Optimize peel_ref for the
current ref of a for_each_ref callback, 2008-02-24).  Back then, it
was fairly expensive to find a cached_ref entry by name, because it
was implemented as two linked lists (one for loose, one for packed)
and we would have had to traverse them to answer the question.

Since then, e9c4c11165e4 (refs: Use binary search to lookup refs
faster, 2011-09-29) introduced ref_array to make it more efficient
to find a ref_entry by name, and your more recent series that
includes bc5fd6d3c29f (refs.c: reorder definitions more logically,
2012-04-10) further touched that codepath for enumeration in a
subtree.

I am not sure if that last change helped or hurt the performance of
a single ref lookup, but we can certainly say that the performance
characteristics of read_ref_full() is very different in today's code
and the current_ref optimization may no longer be of much value.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-14 13:40           ` Jeff King
  2013-03-15  5:12             ` Michael Haggerty
@ 2013-03-16  8:48             ` Michael Haggerty
  2013-03-16  9:34               ` Jeff King
  1 sibling, 1 reply; 17+ messages in thread
From: Michael Haggerty @ 2013-03-16  8:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git discussion list

On 03/14/2013 02:40 PM, Jeff King wrote:
> Hmph. I coincidentally ran across another problem with 435c833 today.
> Try this:
> [...]
> 
> But that's somewhat off-topic for this discussion. I'll look into it
> further and try to make a patch later today or tomorrow.
> 
> On Thu, Mar 14, 2013 at 12:28:53PM +0100, Michael Haggerty wrote:
>> Your patch looks about right aside from its lack of comments :-).  I was
>> going to implement approximately the same thing in a patch series that I
>> am working on, but if you prefer then go ahead and submit your patch and
>> I will rebase my branch on top of it.
> 
> I just meant it as a quick sketch, since you said you were working in
> the area. I'm not sure what you are working on. If we don't consider it
> an urgent bugfix, I'm just as happy for you to incorporate the idea into
> what you are doing.
> 
> But if we want to do it as a maint-track fix, I can clean up my patch.
> Junio?

My patch series is nearly done.  I will need another day or two to
review and make it submission-ready, but I wanted to give you an idea of
what I'm up to and I could also use your feedback on some points.

The (preliminary) patch series is available on GitHub:

    https://github.com/mhagger/git/commits/pack-refs-unification

Highlights:

* Move pack-refs code into refs.c

* Implement fully-peeled packed-refs files, as discussed upthread: peel
  references outside of refs/tags, and keep rigorous track of
  which references have REF_KNOWS_PEELED.

* Change how references are iterated over internally (without changing
  the exported API):

  * Provide direct access to the ref_entries

  * Don't set current_ref if not iterating over all refs

* Change pack-refs to use the peeled information from ref_entries if it
  is available, rather than looking up the references again.

* repack_without_ref() writes peeled refs to the packed-refs file.
  Use the old values if available; otherwise look them up.  (The old
  version of repack_without_refs() wrote a packed-refs file without
  any peeled values even if they were present in the old packed-refs
  file.)  Remove the deleted reference from the in-RAM packed ref cache
  in-place instead of discarding the whole cache.

* Add some tests and lots of comments.

Here are my questions for you:

1. There are multiple functions for peeling refs:
   peel_ref() (and peel_object(), which was extracted from it);
   peel_entry() (derived from pack-refs.c:pack_one_ref()).  It would be
   nice to combine these into the One True Function.  But given the
   problem that you mentioned above (which is rooted in parts of the
   code that I don't know) I don't know what that version should do.

2. repack_without_ref() now writes peeled refs, peeling them if
   necessary.  It does this *without* referring to the loose refs
   to avoid having to load all of the loose references, which can be
   time-consuming.  But this means that it might try to lookup SHA1s
   that are not the current value of the corresponding reference any
   more, and might even have been garbage collected.  Is the code that
   I use to do this, in peel_entry(), safe to call for nonexistent
   SHA1s (I would like it to return 0 if the SHA1 is invalid)?:

	o = parse_object(entry->u.value.sha1);
	if (o->type == OBJ_TAG) {
		o = deref_tag(o, entry->name, 0);
		if (o) {
			hashcpy(entry->u.value.peeled, o->sha1);
			entry->flag |= REF_KNOWS_PEELED;
			return 1;
		}
	}
	return 0;

   I've tried to test this experimentally and it seems to work, but I
   would like to have some theoretical support :-)

3. This same change to repack_with_ref() means that it could become
   significantly slower to delete a packed ref if the packed-ref file
   is not fully-peeled.  On the plus side, once this is done once, the
   packed-refs files will be rewritten in fully-peeled form.  But if
   two versions of Git are being used in a repository, this cost could
   be incurred repeatedly.  Does anybody object?

4. Should I change the locking policy as discussed in this thread?:

       http://article.gmane.org/gmane.comp.version-control.git/212131

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-16  8:48             ` Michael Haggerty
@ 2013-03-16  9:34               ` Jeff King
  2013-03-16 13:38                 ` Michael Haggerty
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2013-03-16  9:34 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, git discussion list

On Sat, Mar 16, 2013 at 09:48:42AM +0100, Michael Haggerty wrote:

> My patch series is nearly done.  I will need another day or two to
> review and make it submission-ready, but I wanted to give you an idea of
> what I'm up to and I could also use your feedback on some points.

I was just sending out my cleaned up patches to do do fully-peeled, too.
I think the duplication is OK, though, as your topic would be for
"master" and mine can go to "maint".

> * Implement fully-peeled packed-refs files, as discussed upthread: peel
>   references outside of refs/tags, and keep rigorous track of
>   which references have REF_KNOWS_PEELED.

Looks pretty similar to mine. We even added similar tests.

I notice that you do the "add REF_KNOWS_PEELED if we actually got a peel
line" optimization. I didn't bother, because this will never be
triggered by a git-written file (either we do not write the entry, or we
set fully-peeled). I wonder if any other implementation does peel every
ref, though.

> * Change pack-refs to use the peeled information from ref_entries if it
>   is available, rather than looking up the references again.

I don't know that it matters from a performance standpoint (we don't
really care how long pack-refs takes, as long as it is within reason,
because it is run as part of a gc).  But it would mean that any errors
in the file are propagated from one run of pack-refs to the next. I
don't know if we want to spend the extra time to be more robust.

> * repack_without_ref() writes peeled refs to the packed-refs file.
>   Use the old values if available; otherwise look them up.

Whereas here we probably _do_ want the performance optimization, because
we do care about the performance of a ref deletion.

> 1. There are multiple functions for peeling refs:
>    peel_ref() (and peel_object(), which was extracted from it);
>    peel_entry() (derived from pack-refs.c:pack_one_ref()).  It would be
>    nice to combine these into the One True Function.  But given the
>    problem that you mentioned above (which is rooted in parts of the
>    code that I don't know) I don't know what that version should do.

I'm not sure I understand the question. Just skimming your patches, it
looks like peel_entry could just call peel_object?

> 2. repack_without_ref() now writes peeled refs, peeling them if
>    necessary.  It does this *without* referring to the loose refs
>    to avoid having to load all of the loose references, which can be
>    time-consuming.  But this means that it might try to lookup SHA1s
>    that are not the current value of the corresponding reference any
>    more, and might even have been garbage collected.

Yuck. I really wonder if repack_without_ref should literally just be
writing out the exact same file, minus the lines for the deleted ref.
Is there a reason we need to do any processing at all? I guess the
answer is that you want to avoid re-parsing the current file, and just
write it out from memory. But don't we need to refresh the memory cache
from disk under the lock anyway? That was the pack-refs race that I
fixed recently.

> Is the code that
>    I use to do this, in peel_entry(), safe to call for nonexistent
>    SHA1s (I would like it to return 0 if the SHA1 is invalid)?:
> 
> 	o = parse_object(entry->u.value.sha1);
> 	if (o->type == OBJ_TAG) {
> 		o = deref_tag(o, entry->name, 0);
> 		if (o) {
> 			hashcpy(entry->u.value.peeled, o->sha1);
> 			entry->flag |= REF_KNOWS_PEELED;
> 			return 1;
> 		}
> 	}
> 	return 0;

I think this approach is safe, as parse_object will silently return NULL
for a missing object. You do need to check for "o != NULL" in your
conditional, though.

> 3. This same change to repack_with_ref() means that it could become
>    significantly slower to delete a packed ref if the packed-ref file
>    is not fully-peeled.  On the plus side, once this is done once, the
>    packed-refs files will be rewritten in fully-peeled form.  But if
>    two versions of Git are being used in a repository, this cost could
>    be incurred repeatedly.  Does anybody object?

I think it's OK in concept. But I still am wondering if it would be
simpler still to just pass the file through while locked.

> 4. Should I change the locking policy as discussed in this thread?:
> 
>        http://article.gmane.org/gmane.comp.version-control.git/212131

I think it's overall a sane direction. It does increase lock contention
slightly when two processes are deleting at the same time, but it would
fix the weird corner cases I described (mostly deleted refs reappearing
due to races). And the lock contention is already there in a
fully-packed repo anyway. I.e., right now we read the packed-refs file
and lock it if our to-delete ref is in there; with the proposed change,
we would lock before even reading it. So the increased contention is
only when two deleters race each other, _and_ one of them is not
deleting a packed ref.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-16  9:34               ` Jeff King
@ 2013-03-16 13:38                 ` Michael Haggerty
  2013-03-18  3:17                   ` Michael Haggerty
  2013-03-22 17:42                   ` Jeff King
  0 siblings, 2 replies; 17+ messages in thread
From: Michael Haggerty @ 2013-03-16 13:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git discussion list

On 03/16/2013 10:34 AM, Jeff King wrote:
> On Sat, Mar 16, 2013 at 09:48:42AM +0100, Michael Haggerty wrote:
> 
>> My patch series is nearly done.  I will need another day or two to
>> review and make it submission-ready, but I wanted to give you an idea of
>> what I'm up to and I could also use your feedback on some points.
> 
> I was just sending out my cleaned up patches to do do fully-peeled, too.
> I think the duplication is OK, though, as your topic would be for
> "master" and mine can go to "maint".

Yes, though I'll have to rebase mine on top of yours.

>> * Implement fully-peeled packed-refs files, as discussed upthread: peel
>>   references outside of refs/tags, and keep rigorous track of
>>   which references have REF_KNOWS_PEELED.
> 
> Looks pretty similar to mine. We even added similar tests.
> 
> I notice that you do the "add REF_KNOWS_PEELED if we actually got a peel
> line" optimization. I didn't bother, because this will never be
> triggered by a git-written file (either we do not write the entry, or we
> set fully-peeled). I wonder if any other implementation does peel every
> ref, though.

We read the peeled value for refs outside of refs/tags even if
fully-peeled is not set, so it seemed somehow inconsistent not to set
REF_KNOWS_PEELED.  But I agree that Git itself should never write such
entries, except of course for the interval between your first patch and
your second patch ;-)

>> * Change pack-refs to use the peeled information from ref_entries if it
>>   is available, rather than looking up the references again.
> 
> I don't know that it matters from a performance standpoint (we don't
> really care how long pack-refs takes, as long as it is within reason,
> because it is run as part of a gc).  But it would mean that any errors
> in the file are propagated from one run of pack-refs to the next. I
> don't know if we want to spend the extra time to be more robust.

I thought about this argument too.  Either way is OK with me.  BTW would
it make sense to build a consistency check of peeled references into fsck?

>> * repack_without_ref() writes peeled refs to the packed-refs file.
>>   Use the old values if available; otherwise look them up.
> 
> Whereas here we probably _do_ want the performance optimization, because
> we do care about the performance of a ref deletion.

Agreed.

>> 1. There are multiple functions for peeling refs:
>>    peel_ref() (and peel_object(), which was extracted from it);
>>    peel_entry() (derived from pack-refs.c:pack_one_ref()).  It would be
>>    nice to combine these into the One True Function.  But given the
>>    problem that you mentioned above (which is rooted in parts of the
>>    code that I don't know) I don't know what that version should do.
> 
> I'm not sure I understand the question. Just skimming your patches, it
> looks like peel_entry could just call peel_object?

I believe that the version in peel_object() is the one that you
optimized in your now-famous 435c833 patches, which your earlier email
said was connected to a null pointer dereference.  I'm not at all
familiar with the API being used there, so I don't know whether the two
versions are interchangeable, or whether you need to fix your
optimization, or whether your optimization will need to be reverted
because of the problem you discovered.

>> 2. repack_without_ref() now writes peeled refs, peeling them if
>>    necessary.  It does this *without* referring to the loose refs
>>    to avoid having to load all of the loose references, which can be
>>    time-consuming.  But this means that it might try to lookup SHA1s
>>    that are not the current value of the corresponding reference any
>>    more, and might even have been garbage collected.
> 
> Yuck. I really wonder if repack_without_ref should literally just be
> writing out the exact same file, minus the lines for the deleted ref.
> Is there a reason we need to do any processing at all? I guess the
> answer is that you want to avoid re-parsing the current file, and just
> write it out from memory. But don't we need to refresh the memory cache
> from disk under the lock anyway? That was the pack-refs race that I
> fixed recently.

It would certainly be thinkable to rewrite the file textually without
parsing it again.  But I did it this way for two reasons:

1. It would be better to have one packed-refs file parser and writer
rather than two.  (I'm working towards unifying the two writers, and
will continue once I understand their differences.)

2. Given how peeled references were added, it seems dangerous to read,
modify, and write data that might be in a future format that we don't
entirely understand.  For example, suppose a new feature is added to
mark Junio's favorite tags:

    # pack-refs with: peeled fully-peeled junios-favorites \n
    ce432cac30f98b291be609a0fc974042a2156f55 refs/heads/master
    83b3dd7151e7eb0e5ac62ee03aca7e6bccaa8d07 refs/tags/evil-dogs
    ^e1d04f8aad59397cbd55e72bf8a1bd75606f5350
    7ed863a85a6ce2c4ac4476848310b8f917ab41f9 refs/tags/lolcats
    ^990f856a62a24bfd56bac1f5e4581381369e4ede
    ^^^junios-favorite
    b0517166ae2ad92f3b17638cbdee0f04b8170d99 refs/tags/nonsense
    ^4baff50551545e2b6825973ec37bcaf03edb95fe

This would be backwards-compatible with the current code (granted, one
would lose the favorite information if the file is rewritten with an
older version of the code).  But if we delete the lolcats tag textually,
we would cause the favorite annotation to be moved to a different tag
and thereby corrupt the data.

>> Is the code that
>>    I use to do this, in peel_entry(), safe to call for nonexistent
>>    SHA1s (I would like it to return 0 if the SHA1 is invalid)?:
>>
>> 	o = parse_object(entry->u.value.sha1);
>> 	if (o->type == OBJ_TAG) {
>> 		o = deref_tag(o, entry->name, 0);
>> 		if (o) {
>> 			hashcpy(entry->u.value.peeled, o->sha1);
>> 			entry->flag |= REF_KNOWS_PEELED;
>> 			return 1;
>> 		}
>> 	}
>> 	return 0;
> 
> I think this approach is safe, as parse_object will silently return NULL
> for a missing object. You do need to check for "o != NULL" in your
> conditional, though.

Thanks; will fix.

>> 3. This same change to repack_with_ref() means that it could become
>>    significantly slower to delete a packed ref if the packed-ref file
>>    is not fully-peeled.  On the plus side, once this is done once, the
>>    packed-refs files will be rewritten in fully-peeled form.  But if
>>    two versions of Git are being used in a repository, this cost could
>>    be incurred repeatedly.  Does anybody object?
> 
> I think it's OK in concept. But I still am wondering if it would be
> simpler still to just pass the file through while locked.

See above.

>> 4. Should I change the locking policy as discussed in this thread?:
>>
>>        http://article.gmane.org/gmane.comp.version-control.git/212131
> 
> I think it's overall a sane direction. It does increase lock contention
> slightly when two processes are deleting at the same time, but it would
> fix the weird corner cases I described (mostly deleted refs reappearing
> due to races). And the lock contention is already there in a
> fully-packed repo anyway. I.e., right now we read the packed-refs file
> and lock it if our to-delete ref is in there; with the proposed change,
> we would lock before even reading it. So the increased contention is
> only when two deleters race each other, _and_ one of them is not
> deleting a packed ref.

OK, I'll work on that too.

Thanks for your feedback!
Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-16 13:38                 ` Michael Haggerty
@ 2013-03-18  3:17                   ` Michael Haggerty
  2013-03-22 17:42                   ` Jeff King
  1 sibling, 0 replies; 17+ messages in thread
From: Michael Haggerty @ 2013-03-18  3:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git discussion list

On 03/16/2013 02:38 PM, Michael Haggerty wrote:
> On 03/16/2013 10:34 AM, Jeff King wrote:
>> On Sat, Mar 16, 2013 at 09:48:42AM +0100, Michael Haggerty wrote:
>>
>>> My patch series is nearly done.  I will need another day or two to
>>> review and make it submission-ready, but I wanted to give you an idea of
>>> what I'm up to and I could also use your feedback on some points.

I will wait for the dust to settle on Peff's "peel-ref optimization
fixes" patches, then rebase my series on top of his before submitting.
The rest of my series is not so urgent so I don't think the delay will
be a problem.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Tag peeling peculiarities
  2013-03-16 13:38                 ` Michael Haggerty
  2013-03-18  3:17                   ` Michael Haggerty
@ 2013-03-22 17:42                   ` Jeff King
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff King @ 2013-03-22 17:42 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, git discussion list

[this email is from last week, and I think most of was responded to in
 other parts of the thread, but there were a few loose ends]

On Sat, Mar 16, 2013 at 02:38:12PM +0100, Michael Haggerty wrote:

> >> * Change pack-refs to use the peeled information from ref_entries if it
> >>   is available, rather than looking up the references again.
> > 
> > I don't know that it matters from a performance standpoint (we don't
> > really care how long pack-refs takes, as long as it is within reason,
> > because it is run as part of a gc).  But it would mean that any errors
> > in the file are propagated from one run of pack-refs to the next. I
> > don't know if we want to spend the extra time to be more robust.
> 
> I thought about this argument too.  Either way is OK with me.  BTW would
> it make sense to build a consistency check of peeled references into fsck?

Yeah, I do think an fsck check makes sense. It should not be expensive,
and if there is a realistic corruption/health check for a repo, it makes
sense to me for it to be part of fsck. I do not recall many incidents of
packed-refs corruption in the history of the list, but this fairly
recent one comes to mind:

  http://thread.gmane.org/gmane.comp.version-control.git/217065

On the other hand, if it is just as cheap to rewrite the file as it is
to do the health checks, I wonder if the advice should simply be "run
pack-refs again" (and doing so should always start from scratch, not
trusting the existing version).

> > Yuck. I really wonder if repack_without_ref should literally just be
> > writing out the exact same file, minus the lines for the deleted ref.
> > Is there a reason we need to do any processing at all? I guess the
> > answer is that you want to avoid re-parsing the current file, and just
> > write it out from memory. But don't we need to refresh the memory cache
> > from disk under the lock anyway? That was the pack-refs race that I
> > fixed recently.
> 
> It would certainly be thinkable to rewrite the file textually without
> parsing it again.  But I did it this way for two reasons:
> 
> 1. It would be better to have one packed-refs file parser and writer
> rather than two.  (I'm working towards unifying the two writers, and
> will continue once I understand their differences.)

I see your point, though I also feel that with the right refactoring,
they could share the parser. And the second writer would be mostly a
pass-through. But much more compelling is reason 2...

> 2. Given how peeled references were added, it seems dangerous to read,
> modify, and write data that might be in a future format that we don't
> entirely understand.  For example, suppose a new feature is added to
> mark Junio's favorite tags:
> 
>     # pack-refs with: peeled fully-peeled junios-favorites \n
>     ce432cac30f98b291be609a0fc974042a2156f55 refs/heads/master
>     83b3dd7151e7eb0e5ac62ee03aca7e6bccaa8d07 refs/tags/evil-dogs
>     ^e1d04f8aad59397cbd55e72bf8a1bd75606f5350
>     7ed863a85a6ce2c4ac4476848310b8f917ab41f9 refs/tags/lolcats
>     ^990f856a62a24bfd56bac1f5e4581381369e4ede
>     ^^^junios-favorite
>     b0517166ae2ad92f3b17638cbdee0f04b8170d99 refs/tags/nonsense
>     ^4baff50551545e2b6825973ec37bcaf03edb95fe

Hmm. Good point. It is tempting to make a rule like "extensions that
are specific to a ref must come after the ref but before the next ref".
And then the writer could simply drop any lines between the to-delete
ref and the one that follows it.

I think that would work OK in practice, but I am worried that it would
paint us into a corner later on (after all, we do not know what
extensions will be added in the future). The only thing I can think of
that would break is something where a ref or its extension depends on
previous entries.  E.g., we start prefix-compressing the ref names to
save space. Of course that would break backwards compatibility horribly
anyway, so it's a no-go. But maybe some extension would want to refer to
a prior ref. I dunno.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-03-22 17:43 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-13 14:59 Tag peeling peculiarities Michael Haggerty
2013-03-13 15:34 ` Michael Haggerty
2013-03-13 17:29 ` Junio C Hamano
2013-03-13 21:58   ` Jeff King
2013-03-14  4:41     ` Michael Haggerty
2013-03-14  5:24       ` Jeff King
2013-03-14  5:32         ` Jeff King
2013-03-14 15:14           ` Junio C Hamano
2013-03-14 11:28         ` Michael Haggerty
2013-03-14 13:40           ` Jeff King
2013-03-15  5:12             ` Michael Haggerty
2013-03-15 16:28               ` Junio C Hamano
2013-03-16  8:48             ` Michael Haggerty
2013-03-16  9:34               ` Jeff King
2013-03-16 13:38                 ` Michael Haggerty
2013-03-18  3:17                   ` Michael Haggerty
2013-03-22 17:42                   ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).