All of lore.kernel.org
 help / color / mirror / Atom feed
* Alternates and push
@ 2008-09-06 12:42 Jon Smirl
  2008-09-06 16:20 ` Theodore Tso
  0 siblings, 1 reply; 20+ messages in thread
From: Jon Smirl @ 2008-09-06 12:42 UTC (permalink / raw)
  To: Git Mailing List

At github my repo, digispeaker, has an alternate pointing to github's
local copy of Linus' tree. I ignored my tree for a month and then
pushed to it including 200MB of objects from Linus' tree. These 200MB
of objects were pushed up to the server, but these objects were
already in the alternates repository.

What's supposed to happen? Is something broken in github's setup, or
does pushing not take into account alternates?

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-06 12:42 Alternates and push Jon Smirl
@ 2008-09-06 16:20 ` Theodore Tso
  2008-09-06 18:06   ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Theodore Tso @ 2008-09-06 16:20 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List

On Sat, Sep 06, 2008 at 08:42:50AM -0400, Jon Smirl wrote:
> At github my repo, digispeaker, has an alternate pointing to github's
> local copy of Linus' tree. I ignored my tree for a month and then
> pushed to it including 200MB of objects from Linus' tree. These 200MB
> of objects were pushed up to the server, but these objects were
> already in the alternates repository.
> 
> What's supposed to happen? Is something broken in github's setup, or
> does pushing not take into account alternates?

Long-standing mis-feature in git's logic in deciding what to push.
It's been reported a few times, but apparently it's hard to fix, or at
least it never hsa been fixed as far as I know.

I work around it by ssh'ing into master.kernel.org and doing a "git
branch -f origin <commit-id-of-master-on-linus's-tree>".  As long as
there is one branch which is up-to-date, git will avoid pushing a huge
number of objects to master.kernel.org.  Of course, this workaround
only works if you have shell access....

						- Ted

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-06 16:20 ` Theodore Tso
@ 2008-09-06 18:06   ` Junio C Hamano
  2008-09-06 18:24     ` Jon Smirl
                       ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Junio C Hamano @ 2008-09-06 18:06 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jon Smirl, Git Mailing List

Theodore Tso <tytso@MIT.EDU> writes:

> Long-standing mis-feature in git's logic in deciding what to push.
> It's been reported a few times, but apparently it's hard to fix, or at
> least it never hsa been fixed as far as I know.

This comes from an early (mis)design of git.

Background.

 * A git repository and the object store it uses can be separate.  From
   the beginning, you can have a"objects/" directory (aka "object store")
   that is shared by more than one repositories.  There is no Porcelain
   level support to set up two repositories that physically share the same
   object store, but the result of "git init; rm -rf .git/objects; ln -s
   $other/.git/objects .git/objects" was supposed to work (and it still
   largely works, until you gc) in the original design.

   The alternate object store does not even have to be a git repository,
   which makes things worse.  You can have everybody pointing at
   /var/cache/objects, and /var/cache does not have to be a git repository
   (i.e. no var/cache/refs).

 * The existing alternates mechanism is not about alternate repositories.
   It is about alternate object stores.  That is why each line of this
   file points at "objects" directory elsewhere, not the ".git" directory
   that is typically at one level above that "objects" directory.

   The fact your repository's object store points at the object store that
   happens to be inside Linus's repository does not imply that Linus's
   object store is associated with refs in Linus's repository in any way
   (that's the early _mis_design part).

 * An existing ref in a git repository is meant to be a guarantee that all
   objects the object referenced by the ref is found somewhere in the
   object store(s) the repository uses.  Object transfers in git
   (i.e. fetch and push) use this guarantee to tell what a repository has
   to the other side.

   What happens in your case is that github end knows that the repository
   you are pushing into have up to the refs you have there.  Alternate may
   point at object store that holds objects from Linus's repository, but
   there is no information as to what the latest commits you do not see in
   your refs namespace (namely, "what's Linus's latest" is not something
   you can learn from your repository that has alternates).

A possible fix would involve:

 - Deprecate objects/info/alternates file, and GIT_OBJECT_DIRECTORY and
   GIT_ALTERNATE_OBJECT_DIRECTORIES environment variables;

 - Introduce info/alternates that points at alternate _repositories_ (as
   opposed to objects/info/alternates that points at alternate object
   stores);

 - Teach fetch and push to include refs from alternate _repositories_ into
   what local side considers complete.

The above won't break existing setups, but it won't help them either.  All
the borrowing repositoies need to be converted if we go that route.

We could instead redefine the semantics of the existing alternates
mechanism.  This technically *breaks* backward compatibility, but I
suspect it won't hurt many existing installations:

 - Declare that a freestanding object store is illegal.  In other words,
   if a directory "$D/objects" is (1) used as $GIT_OBJECT_DIRECTORY's
   value, (2) pointed by some repository's "alternates" file, or (3)
   listed in $GIT_ALTERNATE_OBJECT_DIRECTORIES's value, this change makes
   it illegal for "$D" not being a proper git repository.

   This will not break your example of your repository's object store
   borrowing from the object store inside Linus's repository.

 - When you have "$D/objects" in alternates, start relying on "$D/refs"
   being correct (i.e. repository $D is not corrupt).  This technically
   makes the system slightly less robust, as we are depending on _other
   people's_ good behaviour even more when you use alternates, but you are
   already depending on them having good objects in $D/objects anyway, so
   it is not a big deal.

 - Now that we declared that everything reachable from "$D/refs" do not
   have to be transferred from elsewhere when a push sends things into us
   (or a fetch gets things from elsewhere into us) when you have
   "$D/objects" in your alternates.  In your "borrowing from Linus"
   example, Linus's latest will be reachable from somewhere in "$D/refs",
   when you are borrowing from him by having "$D/objects" in your
   alternates. 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-06 18:06   ` Junio C Hamano
@ 2008-09-06 18:24     ` Jon Smirl
  2008-09-06 19:21       ` Shawn O. Pearce
  2008-09-07 18:49     ` Jan Hudec
  2008-09-07 23:41     ` Theodore Tso
  2 siblings, 1 reply; 20+ messages in thread
From: Jon Smirl @ 2008-09-06 18:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Theodore Tso, Git Mailing List

On 9/6/08, Junio C Hamano <gitster@pobox.com> wrote:
> Theodore Tso <tytso@MIT.EDU> writes:
>
>  > Long-standing mis-feature in git's logic in deciding what to push.
>  > It's been reported a few times, but apparently it's hard to fix, or at
>  > least it never hsa been fixed as far as I know.
>
>
> This comes from an early (mis)design of git.

What happens if I set a remote in my github digispeaker repo pointing
to github's linus repo and the set a chron script to fetch it once a
day? The alternates link is still in place.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-06 18:24     ` Jon Smirl
@ 2008-09-06 19:21       ` Shawn O. Pearce
  2008-09-09  8:35         ` Petr Baudis
  0 siblings, 1 reply; 20+ messages in thread
From: Shawn O. Pearce @ 2008-09-06 19:21 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Junio C Hamano, Theodore Tso, Git Mailing List

Jon Smirl <jonsmirl@gmail.com> wrote:
> On 9/6/08, Junio C Hamano <gitster@pobox.com> wrote:
> > Theodore Tso <tytso@MIT.EDU> writes:
> >
> >  > Long-standing mis-feature in git's logic in deciding what to push.
> >  > It's been reported a few times, but apparently it's hard to fix, or at
> >  > least it never hsa been fixed as far as I know.
> >
> >
> > This comes from an early (mis)design of git.
> 
> What happens if I set a remote in my github digispeaker repo pointing
> to github's linus repo and the set a chron script to fetch it once a
> day? The alternates link is still in place.

github should do what repo.or.cz does:

	ln -s .../linus.git/refs digispeaker.git/refs/forkee

That way the refs available in Linus' tree are also available
in your tree as Git will transparently follow the symlink.

However you have to be careful to make sure `git pack-refs` isn't
run with `--all --prune` as it will delete the refs from linus.git
when executed in digispeaker.git.  Fun times when I did that to my
own repository one day.  ;-)

Though we probably should fix Git to be somewhat smarter.  But
until that stable binary is available, the symlink trick above
is a good work around.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-06 18:06   ` Junio C Hamano
  2008-09-06 18:24     ` Jon Smirl
@ 2008-09-07 18:49     ` Jan Hudec
  2008-09-07 18:56       ` Junio C Hamano
  2008-09-07 23:41     ` Theodore Tso
  2 siblings, 1 reply; 20+ messages in thread
From: Jan Hudec @ 2008-09-07 18:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Theodore Tso, Jon Smirl, Git Mailing List

On Sat, Sep 06, 2008 at 11:06:49 -0700, Junio C Hamano wrote:
> Theodore Tso <tytso@MIT.EDU> writes:
> 
> > Long-standing mis-feature in git's logic in deciding what to push.
> > It's been reported a few times, but apparently it's hard to fix, or at
> > least it never hsa been fixed as far as I know.
> 
> This comes from an early (mis)design of git.
> [...]
>  * The existing alternates mechanism is not about alternate repositories.
>    It is about alternate object stores.  That is why each line of this
>    file points at "objects" directory elsewhere, not the ".git" directory
>    that is typically at one level above that "objects" directory.
> 
>    The fact your repository's object store points at the object store that
>    happens to be inside Linus's repository does not imply that Linus's
>    object store is associated with refs in Linus's repository in any way
>    (that's the early _mis_design part).

Why is this a *mis*design? Couldn't push be fixed by redesigning it's
protocol along the lines of:
 - clients sends a list of sha1s it wants to push, from the tip down
 - server stops it when it sees an object it has -- this check can be done
   against the object store without having a ref for it.

Regards,
Jan

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-07 18:49     ` Jan Hudec
@ 2008-09-07 18:56       ` Junio C Hamano
  2008-09-07 19:17         ` Jan Hudec
  2008-09-07 19:18         ` Junio C Hamano
  0 siblings, 2 replies; 20+ messages in thread
From: Junio C Hamano @ 2008-09-07 18:56 UTC (permalink / raw)
  To: Jan Hudec; +Cc: Theodore Tso, Jon Smirl, Git Mailing List

Jan Hudec <bulb@ucw.cz> writes:

> On Sat, Sep 06, 2008 at 11:06:49 -0700, Junio C Hamano wrote:
>> Theodore Tso <tytso@MIT.EDU> writes:
>> 
>> > Long-standing mis-feature in git's logic in deciding what to push.
>> > It's been reported a few times, but apparently it's hard to fix, or at
>> > least it never hsa been fixed as far as I know.
>> 
>> This comes from an early (mis)design of git.
>> [...]
>>  * The existing alternates mechanism is not about alternate repositories.
>>    It is about alternate object stores.  That is why each line of this
>>    file points at "objects" directory elsewhere, not the ".git" directory
>>    that is typically at one level above that "objects" directory.
>> 
>>    The fact your repository's object store points at the object store that
>>    happens to be inside Linus's repository does not imply that Linus's
>>    object store is associated with refs in Linus's repository in any way
>>    (that's the early _mis_design part).
>
> Why is this a *mis*design? Couldn't push be fixed by redesigning it's
> protocol along the lines of:
>  - clients sends a list of sha1s it wants to push, from the tip down
>  - server stops it when it sees an object it has -- this check can be done
>    against the object store without having a ref for it.

Because your second step is *BROKEN*.

Think of a case where an earlier commit walker started fetching into that
"server" end, which got newer commits and their associated objects first
and then older ones, and then got killed before reaching to the objects it
already had.  In such a case, the commit walker will *not* update the refs
on the server end (and for a very good reason).

After that, the server end would have:

 * refs that point at some older commits, all objects from whom are
   guaranteed to be in the repository (that's the "ref" guarantee);

 * newer commits and their objects, but if you follow them you will hit
   some objects that are *NOT* in the repository.

Now imagine starting your broken procedure to serve clients from such a
repository.  Your second step would cause the server to attempt to pack
the difference from the latter classes of incomplete objects and makes the
pack generation fail.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-07 18:56       ` Junio C Hamano
@ 2008-09-07 19:17         ` Jan Hudec
  2008-09-07 19:18         ` Junio C Hamano
  1 sibling, 0 replies; 20+ messages in thread
From: Jan Hudec @ 2008-09-07 19:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Theodore Tso, Jon Smirl, Git Mailing List

On Sun, Sep 07, 2008 at 11:56:42 -0700, Junio C Hamano wrote:
> Jan Hudec <bulb@ucw.cz> writes:
> 
> > On Sat, Sep 06, 2008 at 11:06:49 -0700, Junio C Hamano wrote:
> >> Theodore Tso <tytso@MIT.EDU> writes:
> >> 
> >> > Long-standing mis-feature in git's logic in deciding what to push.
> >> > It's been reported a few times, but apparently it's hard to fix, or at
> >> > least it never hsa been fixed as far as I know.
> >> 
> >> This comes from an early (mis)design of git.
> >> [...]
> >>  * The existing alternates mechanism is not about alternate repositories.
> >>    It is about alternate object stores.  That is why each line of this
> >>    file points at "objects" directory elsewhere, not the ".git" directory
> >>    that is typically at one level above that "objects" directory.
> >> 
> >>    The fact your repository's object store points at the object store that
> >>    happens to be inside Linus's repository does not imply that Linus's
> >>    object store is associated with refs in Linus's repository in any way
> >>    (that's the early _mis_design part).
> >
> > Why is this a *mis*design? Couldn't push be fixed by redesigning it's
> > protocol along the lines of:
> >  - clients sends a list of sha1s it wants to push, from the tip down
> >  - server stops it when it sees an object it has -- this check can be done
> >    against the object store without having a ref for it.
> 
> Because your second step is *BROKEN*.
> 
> Think of a case where an earlier commit walker started fetching into that
> "server" end, which got newer commits and their associated objects first
> and then older ones, and then got killed before reaching to the objects it

Is there really any transport that stores newer objects first? And can't be
fixed to only store the objects when they are good. My proposal definitely
does not need to do that (the list goes from newest to older, but the data in
third step can come from oldest to newer).

> already had.  In such a case, the commit walker will *not* update the refs
> on the server end (and for a very good reason).
> 
> After that, the server end would have:
> 
>  * refs that point at some older commits, all objects from whom are
>    guaranteed to be in the repository (that's the "ref" guarantee);
> 
>  * newer commits and their objects, but if you follow them you will hit
>    some objects that are *NOT* in the repository.
> 
> Now imagine starting your broken procedure to serve clients from such a
> repository.  Your second step would cause the server to attempt to pack
> the difference from the latter classes of incomplete objects and makes the
> pack generation fail.

Yes, it would add a requirement that objects are only placed in the object
store after all objects they reference, but I can't imagine good reason this
couldn't be done.

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-07 18:56       ` Junio C Hamano
  2008-09-07 19:17         ` Jan Hudec
@ 2008-09-07 19:18         ` Junio C Hamano
  2008-09-08 17:56           ` Jan Hudec
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2008-09-07 19:18 UTC (permalink / raw)
  To: Jan Hudec; +Cc: Theodore Tso, Jon Smirl, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> Jan Hudec <bulb@ucw.cz> writes:
> ...
>> Why is this a *mis*design? Couldn't push be fixed by redesigning it's
>> protocol along the lines of:
>>  - clients sends a list of sha1s it wants to push, from the tip down
>>  - server stops it when it sees an object it has -- this check can be done
>>    against the object store without having a ref for it.
>
> Because your second step is *BROKEN*.
>
> Think of a case where an earlier commit walker started fetching into that
> "server" end, which got newer commits and their associated objects first
> and then older ones, and then got killed before reaching to the objects it
> already had.  In such a case, the commit walker will *not* update the refs
> on the server end (and for a very good reason).
>
> After that, the server end would have:
>
>  * refs that point at some older commits, all objects from whom are

s/from whom/reachable from which/;

>    guaranteed to be in the repository (that's the "ref" guarantee);
>
>  * newer commits and their objects, but if you follow them you will hit
>    some objects that are *NOT* in the repository.

To visualize, the server object store and refs would be like this:

    ---o---o---A...x...x...x...x...o---o---X
               ^ ref

Commits 'x' are all missing because the commit walker fetched commit X,
inspected its tree and got the necessary tree and blob objects, went back
to get X's parent, did the same, then its parent, attempted to do the same
but got killed before connecting the history fully to A.

If you accepted history on top of X before guaranteeing that you have
everything reachable from X already in this round of push will give you this:


    ---o---o---A...x...x...x...x...o---o---X---o---o---Y
               ^ ref =========== (wrong) ============> ^ ref

and if you update the ref to point at Y, then you cannot satisfy requests
from other people who want the history that leads to Y, because somewhere
between A and X there are commit that you do not even have to begin with.

So you may even be able accept objects between X..Y, but you cannot update
the ref from A to Y after accepting such a push, which is pointless.

You could try a variant of it to unbreak your trick, though.  When you see
an object that you have, say 'X' above, you traverse down from there until
reaching some ref (in this case, A) and make sure that you have everything
in between (not just commits but also associated trees and blobs that are
needed).  This is quite similar to what is happening when the commit
walker says "walk deadbeef..." in its progress output.  So it _could_ be
done, but it would be somewhat expensive.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-06 18:06   ` Junio C Hamano
  2008-09-06 18:24     ` Jon Smirl
  2008-09-07 18:49     ` Jan Hudec
@ 2008-09-07 23:41     ` Theodore Tso
  2008-09-08  0:02       ` Junio C Hamano
  2008-09-08  5:07       ` Junio C Hamano
  2 siblings, 2 replies; 20+ messages in thread
From: Theodore Tso @ 2008-09-07 23:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jon Smirl, Git Mailing List

On Sat, Sep 06, 2008 at 11:06:49AM -0700, Junio C Hamano wrote:
> We could instead redefine the semantics of the existing alternates
> mechanism.  This technically *breaks* backward compatibility, but I
> suspect it won't hurt many existing installations:
> 
>  - Declare that a freestanding object store is illegal.  In other words,
>    if a directory "$D/objects" is (1) used as $GIT_OBJECT_DIRECTORY's
>    value, (2) pointed by some repository's "alternates" file, or (3)
>    listed in $GIT_ALTERNATE_OBJECT_DIRECTORIES's value, this change makes
>    it illegal for "$D" not being a proper git repository.
> 
>    This will not break your example of your repository's object store
>    borrowing from the object store inside Linus's repository.
> 
>  - When you have "$D/objects" in alternates, start relying on "$D/refs"
>    being correct (i.e. repository $D is not corrupt).  This technically
>    makes the system slightly less robust, as we are depending on _other
>    people's_ good behaviour even more when you use alternates, but you are
>    already depending on them having good objects in $D/objects anyway, so
>    it is not a big deal.

One way that wouldn't break backwards compatibility would be to use
$D/refs if it exists, but if it isn't, fall back to existing behavior
(which is to say, only use the refs in the repository, not in the
borrowed repository/object store).  Is there a reason why this would
be problematic?

							- Ted

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-07 23:41     ` Theodore Tso
@ 2008-09-08  0:02       ` Junio C Hamano
  2008-09-08  0:41         ` Theodore Tso
  2008-09-08  5:07       ` Junio C Hamano
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2008-09-08  0:02 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jon Smirl, Git Mailing List

Theodore Tso <tytso@MIT.EDU> writes:

> On Sat, Sep 06, 2008 at 11:06:49AM -0700, Junio C Hamano wrote:
>> We could instead redefine the semantics of the existing alternates
>> mechanism.  This technically *breaks* backward compatibility, but I
>> suspect it won't hurt many existing installations:
>> 
>>  - Declare that a freestanding object store is illegal.  In other words,
>>    if a directory "$D/objects" is (1) used as $GIT_OBJECT_DIRECTORY's
>>    value, (2) pointed by some repository's "alternates" file, or (3)
>>    listed in $GIT_ALTERNATE_OBJECT_DIRECTORIES's value, this change makes
>>    it illegal for "$D" not being a proper git repository.
>> 
>>    This will not break your example of your repository's object store
>>    borrowing from the object store inside Linus's repository.
>> 
>>  - When you have "$D/objects" in alternates, start relying on "$D/refs"
>>    being correct (i.e. repository $D is not corrupt).  This technically
>>    makes the system slightly less robust, as we are depending on _other
>>    people's_ good behaviour even more when you use alternates, but you are
>>    already depending on them having good objects in $D/objects anyway, so
>>    it is not a big deal.
>
> One way that wouldn't break backwards compatibility would be to use
> $D/refs if it exists, but if it isn't, fall back to existing behavior
> (which is to say, only use the refs in the repository, not in the
> borrowed repository/object store).  Is there a reason why this would
> be problematic?

I think you just reiterated what I said in "we could instead", and I think
we are in agreement.

I already stated the reason this could be problematic and also I said I do
not think it is a big deal in the above.

I think the question to ask at this point is not "is there a reason why
this would be problematic", but "is it really not a big deal as Junio
claims?" and "aren't there _other_ reasons than the above that makes it
problematic?".

The arguments to make are "Junio is worrying too much; depending on the
other repository's ref is no worse than depending on the objects the other
repository uses, and here is a proof that it is not just 'not a big deal'
but 'no problem at all'", "I've polled the userbase and there is no
existing configuration that will be broken by this change", and "I have
this configuration that will be broken by above change, don't do it".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-08  0:02       ` Junio C Hamano
@ 2008-09-08  0:41         ` Theodore Tso
  2008-09-08  2:53           ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Theodore Tso @ 2008-09-08  0:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jon Smirl, Git Mailing List

On Sun, Sep 07, 2008 at 05:02:05PM -0700, Junio C Hamano wrote:
> 
> I think you just reiterated what I said in "we could instead", and I think
> we are in agreement.
> 

Sorry, I misunderstood what you were saying.  I thought you were
proposing that if $D/refs didn't exist, then if alternates pointed at
$D/objects, git would reject using it.  But that makes no sense, which
is why I didn't understand why you proposed it.  (Turns out I
misunderstood you.  :-)

> The arguments to make are "Junio is worrying too much; depending on the
> other repository's ref is no worse than depending on the objects the other
> repository uses, and here is a proof that it is not just 'not a big deal'
> but 'no problem at all'", "I've polled the userbase and there is no
> existing configuration that will be broken by this change", and "I have
> this configuration that will be broken by above change, don't do it".

So the only configuration I can think of that would be broken by this
is where $D/refs exists, but is insane.  (i.e., such that git fsck for
$D would result in errors).  That seems pretty unlikely...

   	 	   	     	  	       - Ted

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-08  0:41         ` Theodore Tso
@ 2008-09-08  2:53           ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2008-09-08  2:53 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jon Smirl, Git Mailing List

Theodore Tso <tytso@MIT.EDU> writes:

>> The arguments to make are "Junio is worrying too much; depending on the
>> other repository's ref is no worse than depending on the objects the other
>> repository uses, and here is a proof that it is not just 'not a big deal'
>> but 'no problem at all'", "I've polled the userbase and there is no
>> existing configuration that will be broken by this change", and "I have
>> this configuration that will be broken by above change, don't do it".
>
> So the only configuration I can think of that would be broken by this
> is where $D/refs exists, but is insane.  (i.e., such that git fsck for
> $D would result in errors).  That seems pretty unlikely...

I was more worried about "once $D/ started its life as an actively used
and maintained repository, but it itself has been abandoned and its refs
are not maintained anymore; only $D/objects are still in use", but even in
that case, it won't be "$D/refs exists but insane" case, so we should be
Ok.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-07 23:41     ` Theodore Tso
  2008-09-08  0:02       ` Junio C Hamano
@ 2008-09-08  5:07       ` Junio C Hamano
  2008-09-08  6:42         ` Junio C Hamano
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2008-09-08  5:07 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jon Smirl, Git Mailing List

Theodore Tso <tytso@MIT.EDU> writes:

> One way that wouldn't break backwards compatibility would be to use
> $D/refs if it exists, but if it isn't, fall back to existing behavior
> (which is to say, only use the refs in the repository, not in the
> borrowed repository/object store).  Is there a reason why this would
> be problematic?

This does not answer "is this safe enough change?" question, but the code
to implement this should look like this (Area expert Daniel CC'ed).

 Makefile                                 |    2 +-
 receive-pack.c => builtin-receive-pack.c |   60 ++++++++++++++++++++++++++++-
 builtin.h                                |    1 +
 cache.h                                  |    2 +
 git.c                                    |    1 +
 sha1_file.c                              |   10 +++++
 6 files changed, 72 insertions(+), 4 deletions(-)

diff --git c/Makefile w/Makefile
index dfed7ba..badf27a 100644
--- c/Makefile
+++ w/Makefile
@@ -294,7 +294,6 @@ PROGRAMS += git-mktag$X
 PROGRAMS += git-mktree$X
 PROGRAMS += git-pack-redundant$X
 PROGRAMS += git-patch-id$X
-PROGRAMS += git-receive-pack$X
 PROGRAMS += git-send-pack$X
 PROGRAMS += git-show-index$X
 PROGRAMS += git-unpack-file$X
@@ -543,6 +542,7 @@ BUILTIN_OBJS += builtin-prune-packed.o
 BUILTIN_OBJS += builtin-prune.o
 BUILTIN_OBJS += builtin-push.o
 BUILTIN_OBJS += builtin-read-tree.o
+BUILTIN_OBJS += builtin-receive-pack.o
 BUILTIN_OBJS += builtin-reflog.o
 BUILTIN_OBJS += builtin-remote.o
 BUILTIN_OBJS += builtin-rerere.o
diff --git c/receive-pack.c w/builtin-receive-pack.c
similarity index 88%
rename from receive-pack.c
rename to builtin-receive-pack.c
index b81678a..c9bb722 100644
--- c/receive-pack.c
+++ w/builtin-receive-pack.c
@@ -6,6 +6,8 @@
 #include "exec_cmd.h"
 #include "commit.h"
 #include "object.h"
+#include "remote.h"
+#include "transport.h"
 
 static const char receive_pack_usage[] = "git-receive-pack <git-dir>";
 
@@ -462,14 +464,64 @@ static int delete_only(struct command *cmd)
 	return 1;
 }
 
-int main(int argc, char **argv)
+/* Need to move this to path.c or somewhere, and remove the same from builtin-clone.c */
+static int is_directory(const char *path)
+{
+	struct stat buf;
+
+	return !stat(path, &buf) && S_ISDIR(buf.st_mode);
+}
+
+static int add_refs_from_alternate(struct alternate_object_database *e, void *unused)
+{
+	char *other = xstrdup(make_absolute_path(e->base));
+	size_t len = strlen(other);
+	struct remote *remote;
+	struct transport *transport;
+	const struct ref *extra;
+	struct strbuf buf = STRBUF_INIT;
+
+	while (other[len-1] == '/')
+		other[--len] = '\0';
+	if (len < 8 || memcmp(other + len - 8, "/objects", 8))
+		return 0;
+	/* Is this a git repository with refs? */
+	memcpy(other + len - 8, "/refs", 6);
+	if (!is_directory(other))
+		return 0;
+	other[len - 8] = '\0';
+	remote = remote_get(other);
+	transport = transport_get(remote, other);
+	for (extra = transport_get_remote_refs(transport);
+	     extra;
+	     extra = extra->next) {
+		/*
+		 * Make sure this is named uniquely but with invalid name, so
+		 * that we can catch the other end trying to do anything funky
+		 * with it.
+		 */
+		strbuf_reset(&buf);
+		strbuf_addf(&buf, "refs/borrowed-ref/%s", extra->name);
+		add_extra_ref(buf.buf, extra->old_sha1, 0);
+	}
+	transport_disconnect(transport);
+	free(other);
+	return 0;
+}
+
+static void add_alternate_refs(void)
+{
+	foreach_alt_odb(add_refs_from_alternate, NULL);
+}
+
+int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 {
 	int i;
 	char *dir = NULL;
 
 	argv++;
 	for (i = 1; i < argc; i++) {
-		char *arg = *argv++;
+		const char *arg = *argv++;
 
 		if (*arg == '-') {
 			/* Do flag handling here */
@@ -477,7 +529,7 @@ int main(int argc, char **argv)
 		}
 		if (dir)
 			usage(receive_pack_usage);
-		dir = arg;
+		dir = xstrdup(arg);
 	}
 	if (!dir)
 		usage(receive_pack_usage);
@@ -497,7 +549,9 @@ int main(int argc, char **argv)
 	else if (0 <= receive_unpack_limit)
 		unpack_limit = receive_unpack_limit;
 
+	add_alternate_refs();
 	write_head_info();
+	clear_extra_refs();
 
 	/* EOF */
 	packet_flush(1);
diff --git c/builtin.h w/builtin.h
index f3502d3..ade9a12 100644
--- c/builtin.h
+++ w/builtin.h
@@ -78,6 +78,7 @@ extern int cmd_prune(int argc, const char **argv, const char *prefix);
 extern int cmd_prune_packed(int argc, const char **argv, const char *prefix);
 extern int cmd_push(int argc, const char **argv, const char *prefix);
 extern int cmd_read_tree(int argc, const char **argv, const char *prefix);
+extern int cmd_receive_pack(int argc, const char **argv, const char *prefix);
 extern int cmd_reflog(int argc, const char **argv, const char *prefix);
 extern int cmd_remote(int argc, const char **argv, const char *prefix);
 extern int cmd_config(int argc, const char **argv, const char *prefix);
diff --git c/cache.h w/cache.h
index de8c2b6..ddea8d1 100644
--- c/cache.h
+++ w/cache.h
@@ -640,6 +640,8 @@ extern struct alternate_object_database {
 } *alt_odb_list;
 extern void prepare_alt_odb(void);
 extern void add_to_alternates_file(const char *reference);
+typedef int alt_odb_fn(struct alternate_object_database *, void *);
+extern void foreach_alt_odb(alt_odb_fn, void*);
 
 struct pack_window {
 	struct pack_window *next;
diff --git c/git.c w/git.c
index fdb0f71..bf1ac1b 100644
--- c/git.c
+++ w/git.c
@@ -328,6 +328,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "prune-packed", cmd_prune_packed, RUN_SETUP },
 		{ "push", cmd_push, RUN_SETUP },
 		{ "read-tree", cmd_read_tree, RUN_SETUP },
+		{ "receive-pack", cmd_receive_pack },
 		{ "reflog", cmd_reflog, RUN_SETUP },
 		{ "remote", cmd_remote, RUN_SETUP },
 		{ "repo-config", cmd_config },
diff --git c/sha1_file.c w/sha1_file.c
index 9ee1ed1..6bd0ef0 100644
--- c/sha1_file.c
+++ w/sha1_file.c
@@ -394,6 +394,16 @@ void add_to_alternates_file(const char *reference)
 		link_alt_odb_entries(alt, alt + strlen(alt), '\n', NULL, 0);
 }
 
+void foreach_alt_odb(alt_odb_fn fn, void *cb)
+{
+	struct alternate_object_database *ent;
+
+	prepare_alt_odb();
+	for (ent = alt_odb_list; ent; ent = ent->next)
+		if (fn(ent, cb))
+			return;
+}
+
 void prepare_alt_odb(void)
 {
 	const char *alt;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-08  5:07       ` Junio C Hamano
@ 2008-09-08  6:42         ` Junio C Hamano
  2008-09-08  7:24           ` Daniel Barkalow
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2008-09-08  6:42 UTC (permalink / raw)
  To: Shawn O. Pearce, Daniel Barkalow
  Cc: Theodore Tso, Jon Smirl, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> Theodore Tso <tytso@MIT.EDU> writes:
>
>> One way that wouldn't break backwards compatibility would be to use
>> $D/refs if it exists, but if it isn't, fall back to existing behavior
>> (which is to say, only use the refs in the repository, not in the
>> borrowed repository/object store).  Is there a reason why this would
>> be problematic?
>
> This does not answer "is this safe enough change?" question, but the code
> to implement this should look like this (Area expert Daniel CC'ed).

Oops, I forgot to Cc Daniel.  Shawn also CC'ed because he will have the
same issue with the smart CGI based http-push he is building.

> +static int add_refs_from_alternate(struct alternate_object_database *e, void *unused)
> +{
> +	char *other = xstrdup(make_absolute_path(e->base));
> +...
> +	remote = remote_get(other);
> +	transport = transport_get(remote, other);
> +	for (extra = transport_get_remote_refs(transport);
> +	     extra;
> +	     extra = extra->next) {
> +		/*
> +		 * Make sure this is named uniquely but with invalid name, so
> +		 * that we can catch the other end trying to do anything funky
> +		 * with it.
> +		 */
> +		strbuf_reset(&buf);
> +		strbuf_addf(&buf, "refs/borrowed-ref/%s", extra->name);
> +		add_extra_ref(buf.buf, extra->old_sha1, 0);
> +	}
> +	transport_disconnect(transport);
> +	free(other);
> +	return 0;
> +}

I need to clarify the comment in the above function.

When you are running "git push", the protocol employed is the send-pack
protocol, which begins by the receiving end advertising what refs it has
and where they point at.  The above code is in the receiving end and adds
extra refs from alternate repositories to this advertisement.

The advertisement serves two different purposes.  One is the immediate
issue we are tackling --- to tell the sending end what objects are not
needed in the pack it sends.  Objects that are reachable from the objects
in this advertisement do not have to be sent.  When the sending end wants
to send its 'master', for example, it uses this information to compute the
set of objects by doing (roughly):

	git rev-list --objects master --not $sha1_1 $sha1_2 ...

where $sha1_N are the objects we advertise here.  The more we assure the
sender we have, the less it has to send to us.

But there is another purpose the sender uses this information for.  It is
used to determine which refs to push (when "git push" is run without any
explicit refspecs, it does "matching refs" -- it learns the set of refs
the receiving end has from this advertisement, and updates the refs of the
same name the sending end also has), and which branches on the receiving
end should be removed (when "git push --mirror" is run, it will send NULL
updates to refs the receiving end has but the sender doesn't).

We do want to show the extra object names to the sender so that it can
exclude more objects from the transfer, but for the latter purpose, we
really do not want these phoney refs to be understood as refs we have by
the sending end.  In the original version of my patch, I actually had
"refs/borrowed*refs%s/%s" as the format string (notice the asterisk) to
make it an invalid refname, and (notice the extra %s) to make them unique
by including the full-pathname to the repository as part of this string,
so that we can pretend we have different 'master' from two alternate
repositories.

The sending end, however, has a safety to silently ignore malformed refs
it learns from the receiving end over the wire.  So the current sender
won't work with asterisk in there '*', nor full-pathname, because
typically it contains "/.git/" in the string, which would make it an
invalid refname to be ignored.

Also, the sending end remembers the objects we have per refname, so
sending two records with the same refname pointing at two different
objects will not work well either (the current code is loose and does not
check duplicates, but that is not a feature by design but is an accident).
If the receiving repository borrows from two alternate repositories, both
of which have 'master' branch, we would be sending two records, each of
which claims that it is "refs/borrowed-ref/refs/heads/master" but pointing
at a different commit.

That is the "unique and invalid" comment above is about (iow, the above
hunk that does not send "unique and invalid" refs cannot be the final form
of this enhancement).

What this means is that we also need to update the sending side in order
to enable this enhancement.  connect.c::get_remote_heads() is the relevant
code and we would probably need to add an option to keep all refs (and
invalid refs) in the returned list.  Both remote.c::match_refs() and
builtin-send-pack.c::do_send_pack() need to be taught to ignore the
invalid refs in the list, while builtin-send-pack.c::pack_objects() should
utilize all the refs (and invalid ones) when excluding the objects the
receiving end claims to have.  While at it, we probably should declare
that sending duplicate invalid refs is not an error (and we can use
something like "*borrowed*" as the phoney refname --- we do not need the
uniqueness for them, nor we need to tell the other end what ref we are
borrowing from whom).

We can start sending invalid refs (just replace - with * in the patch I
sent) from the receiving end without waiting for the sender end program to
get updated.  The sending end will ignore it and nothing (other than the
extra startup overhead the additional code has) bad should happen.  When
the sender is updated to keep the invalid refs, it will start to take
notice and your push will suddenly get smaller.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-08  6:42         ` Junio C Hamano
@ 2008-09-08  7:24           ` Daniel Barkalow
  2008-09-08 14:56             ` Shawn O. Pearce
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Barkalow @ 2008-09-08  7:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, Theodore Tso, Jon Smirl, Git Mailing List

On Sun, 7 Sep 2008, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Theodore Tso <tytso@MIT.EDU> writes:
> >
> >> One way that wouldn't break backwards compatibility would be to use
> >> $D/refs if it exists, but if it isn't, fall back to existing behavior
> >> (which is to say, only use the refs in the repository, not in the
> >> borrowed repository/object store).  Is there a reason why this would
> >> be problematic?
> >
> > This does not answer "is this safe enough change?" question, but the code
> > to implement this should look like this (Area expert Daniel CC'ed).
> 
> Oops, I forgot to Cc Daniel.  Shawn also CC'ed because he will have the
> same issue with the smart CGI based http-push he is building.

The use of transport in general looks good to me; I'm not really an area 
expert for the native protocol.

> > +static int add_refs_from_alternate(struct alternate_object_database *e, void *unused)
> > +{
> > +	char *other = xstrdup(make_absolute_path(e->base));
> > +...
> > +	remote = remote_get(other);
> > +	transport = transport_get(remote, other);
> > +	for (extra = transport_get_remote_refs(transport);
> > +	     extra;
> > +	     extra = extra->next) {
> > +		/*
> > +		 * Make sure this is named uniquely but with invalid name, so
> > +		 * that we can catch the other end trying to do anything funky
> > +		 * with it.
> > +		 */
> > +		strbuf_reset(&buf);
> > +		strbuf_addf(&buf, "refs/borrowed-ref/%s", extra->name);
> > +		add_extra_ref(buf.buf, extra->old_sha1, 0);
> > +	}
> > +	transport_disconnect(transport);
> > +	free(other);
> > +	return 0;
> > +}
> 
> I need to clarify the comment in the above function.
> 
> When you are running "git push", the protocol employed is the send-pack
> protocol, which begins by the receiving end advertising what refs it has
> and where they point at.  The above code is in the receiving end and adds
> extra refs from alternate repositories to this advertisement.
> 
> The advertisement serves two different purposes.  One is the immediate
> issue we are tackling --- to tell the sending end what objects are not
> needed in the pack it sends.  Objects that are reachable from the objects
> in this advertisement do not have to be sent.  When the sending end wants
> to send its 'master', for example, it uses this information to compute the
> set of objects by doing (roughly):
> 
> 	git rev-list --objects master --not $sha1_1 $sha1_2 ...
> 
> where $sha1_N are the objects we advertise here.  The more we assure the
> sender we have, the less it has to send to us.
> 
> But there is another purpose the sender uses this information for.  It is
> used to determine which refs to push (when "git push" is run without any
> explicit refspecs, it does "matching refs" -- it learns the set of refs
> the receiving end has from this advertisement, and updates the refs of the
> same name the sending end also has), and which branches on the receiving
> end should be removed (when "git push --mirror" is run, it will send NULL
> updates to refs the receiving end has but the sender doesn't).
> 
> We do want to show the extra object names to the sender so that it can
> exclude more objects from the transfer, but for the latter purpose, we
> really do not want these phoney refs to be understood as refs we have by
> the sending end.  In the original version of my patch, I actually had
> "refs/borrowed*refs%s/%s" as the format string (notice the asterisk) to
> make it an invalid refname, and (notice the extra %s) to make them unique
> by including the full-pathname to the repository as part of this string,
> so that we can pretend we have different 'master' from two alternate
> repositories.
> 
> The sending end, however, has a safety to silently ignore malformed refs
> it learns from the receiving end over the wire.  So the current sender
> won't work with asterisk in there '*', nor full-pathname, because
> typically it contains "/.git/" in the string, which would make it an
> invalid refname to be ignored.
>
> Also, the sending end remembers the objects we have per refname, so
> sending two records with the same refname pointing at two different
> objects will not work well either (the current code is loose and does not
> check duplicates, but that is not a feature by design but is an accident).
> If the receiving repository borrows from two alternate repositories, both
> of which have 'master' branch, we would be sending two records, each of
> which claims that it is "refs/borrowed-ref/refs/heads/master" but pointing
> at a different commit.
> 
> That is the "unique and invalid" comment above is about (iow, the above
> hunk that does not send "unique and invalid" refs cannot be the final form
> of this enhancement).
> 
> What this means is that we also need to update the sending side in order
> to enable this enhancement.  connect.c::get_remote_heads() is the relevant
> code and we would probably need to add an option to keep all refs (and
> invalid refs) in the returned list.  Both remote.c::match_refs() and
> builtin-send-pack.c::do_send_pack() need to be taught to ignore the
> invalid refs in the list, while builtin-send-pack.c::pack_objects() should
> utilize all the refs (and invalid ones) when excluding the objects the
> receiving end claims to have.  While at it, we probably should declare
> that sending duplicate invalid refs is not an error (and we can use
> something like "*borrowed*" as the phoney refname --- we do not need the
> uniqueness for them, nor we need to tell the other end what ref we are
> borrowing from whom).

I think we should use a really invalid ref name, like "^" or something 
like that, so that it's clearly not an actually available ref, but just a 
way for the remote to mention that it has the object, and the remote 
doesn't have to try to make it unique.

> We can start sending invalid refs (just replace - with * in the patch I
> sent) from the receiving end without waiting for the sender end program to
> get updated.  The sending end will ignore it and nothing (other than the
> extra startup overhead the additional code has) bad should happen.  When
> the sender is updated to keep the invalid refs, it will start to take
> notice and your push will suddenly get smaller.

Agreed.
	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-08  7:24           ` Daniel Barkalow
@ 2008-09-08 14:56             ` Shawn O. Pearce
  0 siblings, 0 replies; 20+ messages in thread
From: Shawn O. Pearce @ 2008-09-08 14:56 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, Theodore Tso, Jon Smirl, Git Mailing List

Daniel Barkalow <barkalow@iabervon.org> wrote:
> On Sun, 7 Sep 2008, Junio C Hamano wrote:
> > 
> > Oops, I forgot to Cc Daniel.  Shawn also CC'ed because he will have the
> > same issue with the smart CGI based http-push he is building.

Not just the smart CGI based http-push thingy.  JGit is impacted
anytime we start talking about network transport.  ;-)
 
> > The sending end, however, has a safety to silently ignore malformed refs
> > it learns from the receiving end over the wire.  So the current sender
> > won't work with asterisk in there '*', nor full-pathname, because
> > typically it contains "/.git/" in the string, which would make it an
> > invalid refname to be ignored.
> > 
> > Also, the sending end remembers the objects we have per refname, so
> > sending two records with the same refname pointing at two different
> > objects will not work well either (the current code is loose and does not
> > check duplicates, but that is not a feature by design but is an accident).

Cute. Learn something new every day.

JGit doesn't do the check_ref_format(), but it does do duplicate
detection.  If the remote side sends us duplicate advertisements
JGit freaks out and aborts the transfer.  This is largely just a
function of JGit hashing the advertised refs for O(1) access later
on during processing.  We avoid a lot of ugly O(N^2) loops that way.

> I think we should use a really invalid ref name, like "^" or something 
> like that, so that it's clearly not an actually available ref, but just a 
> way for the remote to mention that it has the object, and the remote 
> doesn't have to try to make it unique.

I agree.  How about just calling all of these ".have"?

According to check_ref_format() it fails, so older clients will
just ignore it.  Attempts by bad clients to push or delete ".have"
should also fail automatically in receive-pack's own test at the
start of the update() function.

Using a constant name for all instances hides what the underlying
alternates really have, in case their branches aren't public, but
their databases are.  It also makes it easier for hashing clients
like JGit to collect these object IDs into a table of known objects,
but ignoring the name component.

Sadly no matter what we do here JGit's duplicate detection is in
direct opposite behavior with git.git's ignoring check_ref_format()
failures.  So we'll have to patch JGit no matter what we do.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-07 19:18         ` Junio C Hamano
@ 2008-09-08 17:56           ` Jan Hudec
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Hudec @ 2008-09-08 17:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Theodore Tso, Jon Smirl, Git Mailing List

On Sun, Sep 07, 2008 at 12:18:02 -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Jan Hudec <bulb@ucw.cz> writes:
> > ...
> >> Why is this a *mis*design? Couldn't push be fixed by redesigning it's
> >> protocol along the lines of:
> >>  - clients sends a list of sha1s it wants to push, from the tip down
> >>  - server stops it when it sees an object it has -- this check can be done
> >>    against the object store without having a ref for it.
> >
> > Because your second step is *BROKEN*.
> >
> > Think of a case where an earlier commit walker started fetching into that
> > "server" end, which got newer commits and their associated objects first
> > and then older ones, and then got killed before reaching to the objects it
> > already had.  In such a case, the commit walker will *not* update the refs
> > on the server end (and for a very good reason).
> >
> > After that, the server end would have:
> >
> >  * refs that point at some older commits, all objects from whom are
> 
> s/from whom/reachable from which/;
> 
> >    guaranteed to be in the repository (that's the "ref" guarantee);
> >
> >  * newer commits and their objects, but if you follow them you will hit
> >    some objects that are *NOT* in the repository.
> 
> To visualize, the server object store and refs would be like this:
> 
>     ---o---o---A...x...x...x...x...o---o---X
>                ^ ref
> 
> Commits 'x' are all missing because the commit walker fetched commit X,
> inspected its tree and got the necessary tree and blob objects, went back
> to get X's parent, did the same, then its parent, attempted to do the same
> but got killed before connecting the history fully to A.

The problem was I didn't realize how this could happen. Now when you said
/walker/, it's obvious which way of adding objects does it. I'd however
argue that that, rather than having the object store independent in the first
place, is misdesign.

> If you accepted history on top of X before guaranteeing that you have
> everything reachable from X already in this round of push will give you this:
> 
> 
>     ---o---o---A...x...x...x...x...o---o---X---o---o---Y
>                ^ ref =========== (wrong) ============> ^ ref
> 
> and if you update the ref to point at Y, then you cannot satisfy requests
> from other people who want the history that leads to Y, because somewhere
> between A and X there are commit that you do not even have to begin with.
> 
> So you may even be able accept objects between X..Y, but you cannot update
> the ref from A to Y after accepting such a push, which is pointless.
> 
> You could try a variant of it to unbreak your trick, though.  When you see
> an object that you have, say 'X' above, you traverse down from there until
> reaching some ref (in this case, A) and make sure that you have everything
> in between (not just commits but also associated trees and blobs that are
> needed).  This is quite similar to what is happening when the commit
> walker says "walk deadbeef..." in its progress output.  So it _could_ be
> done, but it would be somewhat expensive.

No. I would vote for unbreaking it at the walker instead. Instead of putting
the downloaded packs directly into the object store, it could put them in
some staging area and only move them in place when all dependencies are
downloaded. Still makes the solution comparably complex to the other ones.

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-06 19:21       ` Shawn O. Pearce
@ 2008-09-09  8:35         ` Petr Baudis
  2008-09-09 14:57           ` Shawn O. Pearce
  0 siblings, 1 reply; 20+ messages in thread
From: Petr Baudis @ 2008-09-09  8:35 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Jon Smirl, Junio C Hamano, Theodore Tso, Git Mailing List

On Sat, Sep 06, 2008 at 12:21:06PM -0700, Shawn O. Pearce wrote:
> github should do what repo.or.cz does:
> 
> 	ln -s .../linus.git/refs digispeaker.git/refs/forkee
> 
> That way the refs available in Linus' tree are also available
> in your tree as Git will transparently follow the symlink.
> 
> However you have to be careful to make sure `git pack-refs` isn't
> run with `--all --prune` as it will delete the refs from linus.git
> when executed in digispeaker.git.  Fun times when I did that to my
> own repository one day.  ;-)
> 
> Though we probably should fix Git to be somewhat smarter.  But
> until that stable binary is available, the symlink trick above
> is a good work around.

But it should be used only in controlled environment. If you happen to
have permissions to write to the forkee, you can wipe out its refs with
git fetch --mirror (and if you don't happen to have permissions, it will
just fail, so currently you cannot use this on repo.or.cz forks,
unfortunately). If you don't make sure refs are never packed (I do on
repo.or.cz, historically because of dumb transports - do they support
packed refs by now?), this won't work either. Maybe there are other
considerations too.

-- 
				Petr "Pasky" Baudis
The next generation of interesting software will be done
on the Macintosh, not the IBM PC.  -- Bill Gates

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Alternates and push
  2008-09-09  8:35         ` Petr Baudis
@ 2008-09-09 14:57           ` Shawn O. Pearce
  0 siblings, 0 replies; 20+ messages in thread
From: Shawn O. Pearce @ 2008-09-09 14:57 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Jon Smirl, Junio C Hamano, Theodore Tso, Git Mailing List

Petr Baudis <pasky@suse.cz> wrote:
> On Sat, Sep 06, 2008 at 12:21:06PM -0700, Shawn O. Pearce wrote:
> > github should do what repo.or.cz does:
> > 
> > 	ln -s .../linus.git/refs digispeaker.git/refs/forkee
> 
> But it should be used only in controlled environment. If you happen to
> have permissions to write to the forkee, you can wipe out its refs with
> git fetch --mirror (and if you don't happen to have permissions, it will
> just fail, so currently you cannot use this on repo.or.cz forks,
> unfortunately). If you don't make sure refs are never packed (I do on
> repo.or.cz, historically because of dumb transports - do they support
> packed refs by now?), this won't work either. Maybe there are other
> considerations too.

All very good points.

The dumb transports do support packed-refs in more modern versions.
I forget when they learned it though.  Being nice to them by not
packing refs is still good if you can.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-09-09 14:58 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-06 12:42 Alternates and push Jon Smirl
2008-09-06 16:20 ` Theodore Tso
2008-09-06 18:06   ` Junio C Hamano
2008-09-06 18:24     ` Jon Smirl
2008-09-06 19:21       ` Shawn O. Pearce
2008-09-09  8:35         ` Petr Baudis
2008-09-09 14:57           ` Shawn O. Pearce
2008-09-07 18:49     ` Jan Hudec
2008-09-07 18:56       ` Junio C Hamano
2008-09-07 19:17         ` Jan Hudec
2008-09-07 19:18         ` Junio C Hamano
2008-09-08 17:56           ` Jan Hudec
2008-09-07 23:41     ` Theodore Tso
2008-09-08  0:02       ` Junio C Hamano
2008-09-08  0:41         ` Theodore Tso
2008-09-08  2:53           ` Junio C Hamano
2008-09-08  5:07       ` Junio C Hamano
2008-09-08  6:42         ` Junio C Hamano
2008-09-08  7:24           ` Daniel Barkalow
2008-09-08 14:56             ` Shawn O. Pearce

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.