All of lore.kernel.org
 help / color / mirror / Atom feed
* clone breaks replace
@ 2011-01-06 21:00 Phillip Susi
  2011-01-06 21:33 ` Jonathan Nieder
  0 siblings, 1 reply; 34+ messages in thread
From: Phillip Susi @ 2011-01-06 21:00 UTC (permalink / raw)
  To: git

I've been experimenting with git replace to remove ancient history, and
I have found that cloning a repository breaks replace.  I read about
this process at http://progit.org/2010/03/17/replace.html.  I managed to
correctly add a replace commit that truncates the history and contains
instructions where you can find it, and running git log only goes back
to the replacement commit, unless you add --no-replace-objects, which
causes it to show the original full history.

The problem is that when I clone the repository, I expect the clone to
contain only history up to the replacement record, and not the old
history before that.  Instead, the clone contains only the full original
history, and the replacement ref is not imported at all.  A git replace
in the new clone shows nothing.

Shouldn't clone copy .git/refs/replace?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-06 21:00 clone breaks replace Phillip Susi
@ 2011-01-06 21:33 ` Jonathan Nieder
  2011-01-06 21:59   ` Junio C Hamano
  2011-01-07 19:43   ` Phillip Susi
  0 siblings, 2 replies; 34+ messages in thread
From: Jonathan Nieder @ 2011-01-06 21:33 UTC (permalink / raw)
  To: Phillip Susi; +Cc: git, Christian Couder

Phillip Susi wrote:

> I managed to
> correctly add a replace commit that truncates the history and contains
> instructions where you can find it, and running git log only goes back
> to the replacement commit, unless you add --no-replace-objects, which
> causes it to show the original full history.

Before I get to your real question: this seems a bit backwards.  Let
me say a few words about why.

In the days before replacement refs (and today, too), each commit
name described not only the state of a tree at a moment but the
history that led up to it.  In fact you can see this somewhat directly:
given two distinct commits A and B if you try

	$ git cat-file commit A >a.commit
	$ git cat-file commit B >b.commit
	$ diff -u a.commit b.commit

then you will see precisely what can make them different:

 - the author's name and email and the date of authorship
 - the committer's name and email and the date committed
 - the names of the parent commits, describing the history
 - the name of a tree, describing the content
 - the log message, including its encoding

The commit name is a hash of that information (see git-hash-object(1))
and an invariant maintained is "if a repository has access to commit A,
it has access to its parents, their parents, and so on".  This invariant
is maintained during object transfer and garbage collection and relied
on by object transfer and revision traversal.

The beauty of replacement refs is that they can be easily added or
removed without breaking this invariant.  And a replacement ref is an
actual reference into history, so garbage collection does not remove
those commits and the repository keeps enough information to traverse
both the modified and unmodified history.

Therefore if you want clients to be able to choose between a minimal
history and a larger one to save bandwidth, it has to work like this

 - to get the minimal history, fetch _without_ any replacement refs
 - to get the full history, fetch the replacement refs on top of that.

because an additional reference can only increase the number of
objects to be downloaded.

> The problem is that when I clone the repository, I expect the clone to
> contain only history up to the replacement record, and not the old
> history before that.  Instead, the clone contains only the full original
> history, and the replacement ref is not imported at all.  A git replace
> in the new clone shows nothing.
>
> Shouldn't clone copy .git/refs/replace?

With that in mind, I suspect the best way to achieve what you are
looking for is the following:

 1. Make a big, ugly history (branch "big").  Presumably this part's
    already done.

 2. Find the part you want to get rid of and make appropriate
    replacement refs so "gitk big" shows what you want it to.

 3. Use "git filter-branch" to make that history a reality (branch
    "simpler").  Remove the replacement refs.

 4. Use "git replace" to graft back on the pieces you cauterized.
    Publish the result.

 5. Perhaps also run and publish "git replace big simpler", so
    contributors of branches based against the old 'big' can merge
    your latest changes from 'simpler'.  Encourage contributors to
    use 'git rebase' or 'git filter-branch' to rebase their
    contributions against the new, simpler history.

Does that make sense?

Jonathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-06 21:33 ` Jonathan Nieder
@ 2011-01-06 21:59   ` Junio C Hamano
  2011-01-07 19:43   ` Phillip Susi
  1 sibling, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2011-01-06 21:59 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Phillip Susi, git, Christian Couder

Jonathan Nieder <jrnieder@gmail.com> writes:

> Therefore if you want clients to be able to choose between a minimal
> history and a larger one to save bandwidth, it has to work like this
>
>  - to get the minimal history, fetch _without_ any replacement refs
>  - to get the full history, fetch the replacement refs on top of that.
>
> because an additional reference can only increase the number of
> objects to be downloaded.

Very nicely and clearly put.  Can we have this somewhere in the docs?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-06 21:33 ` Jonathan Nieder
  2011-01-06 21:59   ` Junio C Hamano
@ 2011-01-07 19:43   ` Phillip Susi
  2011-01-07 20:51     ` Jonathan Nieder
  1 sibling, 1 reply; 34+ messages in thread
From: Phillip Susi @ 2011-01-07 19:43 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Christian Couder

On 1/6/2011 4:33 PM, Jonathan Nieder wrote:
> Therefore if you want clients to be able to choose between a minimal
> history and a larger one to save bandwidth, it has to work like this
> 
>  - to get the minimal history, fetch _without_ any replacement refs
>  - to get the full history, fetch the replacement refs on top of that.
> 
> because an additional reference can only increase the number of
> objects to be downloaded.

This seems backwards.  The original commit links to its parent and
therefore, the full history trail going back.  The reason you add the
replacement record is to get rid of that parent link, thus truncating
the history.  Therefore, if you fetch the original record that still has
the reference to its parent, and not the replacement record, you end up
with the full history.  Ergo, to get only the truncated history, you
must fetch the replacement record, and pay attention to it to stop
fetching commits older than the truncation point.

>  3. Use "git filter-branch" to make that history a reality (branch
>     "simpler").  Remove the replacement refs.

Isn't the whole purpose of using replace to avoid having to use
filter-branch, which throws out all of the existing commit records, and
creates an entirely new commit chain that is slightly modified?

>  4. Use "git replace" to graft back on the pieces you cauterized.
>     Publish the result.

If you are going to use filter-branch, then what do you need to replace?
 And publishing the result of a replace seems to have no effect, since
other people do not get the replace ref when they clone.

>  5. Perhaps also run and publish "git replace big simpler", so
>     contributors of branches based against the old 'big' can merge
>     your latest changes from 'simpler'.  Encourage contributors to
>     use 'git rebase' or 'git filter-branch' to rebase their
>     contributions against the new, simpler history.

Again, the entire point of replace seems to be to AVOID having to go
through the hassle of having to rebase or filter-branch.  Isn't that
exactly how you would accomplish this before replace was added?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-07 19:43   ` Phillip Susi
@ 2011-01-07 20:51     ` Jonathan Nieder
  2011-01-07 21:15       ` Stephen Bash
                         ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Jonathan Nieder @ 2011-01-07 20:51 UTC (permalink / raw)
  To: Phillip Susi; +Cc: git, Christian Couder

Phillip Susi wrote:

> Isn't the whole purpose of using replace to avoid having to use
> filter-branch, which throws out all of the existing commit records, and
> creates an entirely new commit chain that is slightly modified?

No.  What documentation suggested that?  Maybe it can be fixed.

The original purpose of grafts (the ideological ancestor of
replacement refs) was to serve a very particular use case.  Sit down
by the fire, if you will, and...

Git had just came into existence and pack files did not exist yet.  A
full import of the Linux kernel history was possible but the result
was enormous and not something ready to be imposed on all Linux
contributors.  So what can one do?

 $ git show -s v2.6.12-rc2^0
 commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
 Author: Linus Torvalds <torvalds@ppc970.osdl.org>
 Date:   Sat Apr 16 15:20:36 2005 -0700

     Linux-2.6.12-rc2

     Initial git repository build. I'm not bothering with the full history,
     even though we have it. We can create a separate "historical" git
     archive of that later if we want to, and in the meantime it's about
     3.2GB when imported into git - space that would just make the early
     git days unnecessarily complicated, when we don't have a lot of good
     infrastructure for it.

     Let it rip!

Fast forward three months, and there is discussion[1] about what to do
with the historical git archive.  A clever idea: teach git to _pretend_
that the historical archive is the parent to v2.6.12-rc2, so
"git log --grep", "gitk", and so on work as they ought to.

So grafts were born.  One of the nicest advantages of grafts is that
they make it easy to do complex history surgery: make some grafts ---
cut here, paste there --- and then run "git filter-branch" to make it
permanent.

But grafts have a serious problem.

Transport machinery needs to ignore grafts --- otherwise, the two ends
of a connection could have different ideas of the history preceding a
commit, resulting in confusion and breakage.  A fix to that was
finally grafted on a few years later (see also [2]).

 $ GIT_NOTES_REF=refs/remotes/charon/notes/full \
   git log --grep=graft --grep=repack --all-match --no-merges
 [...]
     git repack: keep commits hidden by a graft
 [...]
     Archived-At: <http://thread.gmane.org/gmane.comp.version-control.git/123874>

There is also the problem that grafts are too "raw": it is very easy
to make a graft pointing to a nonexistent object, say.  And meanwhile
git has no native support for transfering grafts over the wire.

In that context there emerged the nicer (imho) refs/replace mechanism:

 - reachability checking and transport machinery can treat them like
   all other references --- no need for low-level tools to pay
   attention to the artificial history;
 - easy to script around with "git replace" and "git for-each-ref"
 - can choose to fetch or not fetch with the usual
   "git fetch repo refs/replace/*:refs/replace/*" syntax

Common applications:

 - locally staging history changes that will later be made permanent
   with "git filter-branch";
 - grafting on additional (historical) history;
 - replacing ancient broken commits with fixed ones, for use by "git
   bisect".

Hope that helps,
Jonathan

[1] http://thread.gmane.org/gmane.comp.version-control.git/6470/focus=6484
found with "git log --grep=graft --reverse"
[2] http://thread.gmane.org/gmane.comp.version-control.git/37744/focus=37908

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-07 20:51     ` Jonathan Nieder
@ 2011-01-07 21:15       ` Stephen Bash
  2011-01-07 21:34       ` Jonathan Nieder
  2011-01-07 21:44       ` Phillip Susi
  2 siblings, 0 replies; 34+ messages in thread
From: Stephen Bash @ 2011-01-07 21:15 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Christian Couder, Phillip Susi

----- Original Message -----
> From: "Jonathan Nieder" <jrnieder@gmail.com>
> To: "Phillip Susi" <psusi@cfl.rr.com>
> Cc: git@vger.kernel.org, "Christian Couder" <chriscool@tuxfamily.org>
> Sent: Friday, January 7, 2011 3:51:03 PM
> Subject: Re: clone breaks replace
> Phillip Susi wrote:
> 
> > Isn't the whole purpose of using replace to avoid having to use
> > filter-branch, which throws out all of the existing commit records,
> > and creates an entirely new commit chain that is slightly modified?
> 
> No. What documentation suggested that? Maybe it can be fixed.

I'll chime in here as another person who read the ProGit blog entry on git-replace [1] and came to the same conclusion Phillip (and I'm guessing others) did.  OTOH when I attempted to read the actual git-replace manpage, I got completely lost, so I retained my (apparently incorrect) understanding from ProGit.

Thanks,
Stephen

[1] http://progit.org/2010/03/17/replace.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-07 20:51     ` Jonathan Nieder
  2011-01-07 21:15       ` Stephen Bash
@ 2011-01-07 21:34       ` Jonathan Nieder
  2011-01-07 21:44       ` Phillip Susi
  2 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2011-01-07 21:34 UTC (permalink / raw)
  To: Phillip Susi; +Cc: git, Christian Couder, Stephen Bash

Jonathan Nieder wrote:

> Transport machinery needs to ignore grafts --- otherwise, the two ends
> of a connection could have different ideas of the history preceding a
> commit, resulting in confusion and breakage.  A fix to that was
> finally grafted on a few years later (see also [2]).

Sorry, I walked away mid-paragraph and left out a crucial piece when I
returned.  Because transport machinery ignores grafts, garbage
collection must make sure not to remove pieces of the non-artificial
history.  It is the garbage collection that Dscho fixed with
v1.6.4-rc3~7^2.

Sorry for the nonsense.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-07 20:51     ` Jonathan Nieder
  2011-01-07 21:15       ` Stephen Bash
  2011-01-07 21:34       ` Jonathan Nieder
@ 2011-01-07 21:44       ` Phillip Susi
  2011-01-07 21:49         ` Jonathan Nieder
  2 siblings, 1 reply; 34+ messages in thread
From: Phillip Susi @ 2011-01-07 21:44 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Christian Couder

On 1/7/2011 3:51 PM, Jonathan Nieder wrote:
> Phillip Susi wrote:
> 
>> Isn't the whole purpose of using replace to avoid having to use
>> filter-branch, which throws out all of the existing commit records, and
>> creates an entirely new commit chain that is slightly modified?
> 
> No.  What documentation suggested that?  Maybe it can be fixed.

It's just what made sense to me.  If you can modify the history with
filter-branch, then you don't need replace refs.  The downside to
filter-branch is that it breaks people tracking your repository, since
the history they had been tracking is thrown out and replaced with a
completely new commit chain that looks similar, but as far as git is
concerned, is unrelated to the original.  Replace refs seem to have been
created to allow you to accomplish the goal of modifying an old commit
record, but without having to rewrite that and all subsequent commits,
causing breakage.

>  - can choose to fetch or not fetch with the usual
>    "git fetch repo refs/replace/*:refs/replace/*" syntax

It seems like this should be the default behavior.  Or perhaps
refs/replace should be forked into one meant to be private, and one
meant to be public, and fetched by default.  Or maybe it should be
fetched by default, but not pushed, so you have to explicitly push
replacements to the public mirror that you intend for public
consumption.  Having the replace only apply locally and still needing to
filter-branch to make the change visible to the public seems to render
the replace somewhat pointless.

Take the kernel history as an example, only imagine that Linus did not
originally make that first commit leaving out the prior history, but
wants to go back and fix it now.  He can do it with a replace, but then
if he runs filter-branch as you suggest to make the change 'real', then
everyone tracking his tree will fail the next time they try to pull.
You could get the same result without replace, so why bother?

If the replace was fetched by default, the people already tracking would
get it the next time they pull and would not have a problem.  If they
wanted to see the old history, then they would already have it in the
repository and just need to add --no-replace-objects to see it, or run
git log on the original commit id that the replace record should refer
you to ( in the comments ).  Those cloning the repository for the first
time would get it, and avoid fetching all of the old history since they
would be using the replace record in place of the original commit.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-07 21:44       ` Phillip Susi
@ 2011-01-07 21:49         ` Jonathan Nieder
  2011-01-07 22:09           ` Phillip Susi
  2011-01-07 22:09           ` Jeff King
  0 siblings, 2 replies; 34+ messages in thread
From: Jonathan Nieder @ 2011-01-07 21:49 UTC (permalink / raw)
  To: Phillip Susi; +Cc: git, Christian Couder, Stephen Bash

Phillip Susi wrote:

> Take the kernel history as an example, only imagine that Linus did not
> originally make that first commit leaving out the prior history, but
> wants to go back and fix it now.  He can do it with a replace, but then
> if he runs filter-branch as you suggest to make the change 'real', then
> everyone tracking his tree will fail the next time they try to pull.
> You could get the same result without replace, so why bother?
>
> If the replace was fetched by default, the people already tracking would
> get it the next time they pull and would not have a problem.

Interesting.  I hadn't thought about this detail before.

> Those cloning the repository for the first
> time would get it, and avoid fetching all of the old history since they
> would be using the replace record in place of the original commit.

No, it doesn't work that way.  Imagine for a moment that each commit
object actually contains all of its ancestors.  That isn't precisely
right but in a way it is close.

To change the ancestry of a commit, you really do need to change its
name.  If you disagree, feel free to try it and I'd be glad to help
where I can with the coding if the design is sane.  Deal?

Maybe it would be nice if git replace worked that way, but that would
be fundamentally a _different_ feature.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-07 21:49         ` Jonathan Nieder
@ 2011-01-07 22:09           ` Phillip Susi
  2011-01-07 22:09           ` Jeff King
  1 sibling, 0 replies; 34+ messages in thread
From: Phillip Susi @ 2011-01-07 22:09 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Christian Couder, Stephen Bash

On 1/7/2011 4:49 PM, Jonathan Nieder wrote:
> No, it doesn't work that way.  Imagine for a moment that each commit
> object actually contains all of its ancestors.  That isn't precisely
> right but in a way it is close.
> 
> To change the ancestry of a commit, you really do need to change its
> name.  If you disagree, feel free to try it and I'd be glad to help
> where I can with the coding if the design is sane.  Deal?

That's why a replace record seems to be the perfect solution.  The
original record still references the old history, but you ignore it in
favor of the replacement, which does not.  Thus you have a choice; you
ignore the replacement and use the original with the full history
attached, or you respect the replacement and the history is truncated.

As long as git-upload-pack respects the replacement, then new checkouts
will ignore the old history.  You could then create a new historical
branch that points to the parent commit of the replaced one, and tell
people to fetch that branch to get the old history, or pass
--no-replace-objects over the wire to git-upload-pack.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-07 21:49         ` Jonathan Nieder
  2011-01-07 22:09           ` Phillip Susi
@ 2011-01-07 22:09           ` Jeff King
  2011-01-07 22:58             ` Junio C Hamano
  2011-01-08  0:43             ` Phillip Susi
  1 sibling, 2 replies; 34+ messages in thread
From: Jeff King @ 2011-01-07 22:09 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Phillip Susi, git, Christian Couder, Stephen Bash

On Fri, Jan 07, 2011 at 03:49:07PM -0600, Jonathan Nieder wrote:

> Phillip Susi wrote:
> 
> > Take the kernel history as an example, only imagine that Linus did not
> > originally make that first commit leaving out the prior history, but
> > wants to go back and fix it now.  He can do it with a replace, but then
> > if he runs filter-branch as you suggest to make the change 'real', then
> > everyone tracking his tree will fail the next time they try to pull.
> > You could get the same result without replace, so why bother?
> >
> > If the replace was fetched by default, the people already tracking would
> > get it the next time they pull and would not have a problem.
> 
> Interesting.  I hadn't thought about this detail before.

I think there are two separate issues here:

  1. Should transport protocols respect replacements (i.e., if you
     truncate history with a replacement object and I fetch from you,
     should you get the full history or the truncated one)?

  2. Should clone fetch refs from refs/replace (either by default, or
     with an option)?

Based on previous discussions, I think the answer to the first is no.
The resulting repo violates a fundamental assumption of git. Yes,
because of the replacement object, many things will still work. But many
parts of git intentionally do not respect replacement, and they will be
broken.

Instead, I think of replacements as a specific view into history, not a
fundamental history-changing operation itself. Which means you can never
save bandwidth or space by truncating history with replacements. You can
only give somebody the full history, and share with them your view. If
you want to truncate, you must rewrite history[1].

Which leads to the second question. It is basically a matter of saying
"do you want to fetch the view that upstream has"? I can definitely see
that being useful, and meriting an option. However, it may or may not be
worth turning on by default, as upstream's view may be confusing.

-Peff

[1] Actually, what we are talking about it basically shallow clone.
    Which does do exactly this truncation, but does not use the replace
    mechanism. So it _is_ possible, but lots of things need to be
    tweaked to understand the shallow-ness. Perhaps in the long run
    making git understand replacement-truncated repos with missing
    objects would be a good thing, and shallow clones can be implemented
    simply as a special case of that. It would probably make the code a
    bit cleaner.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-07 22:09           ` Jeff King
@ 2011-01-07 22:58             ` Junio C Hamano
  2011-01-11  5:36               ` Jeff King
  2011-01-08  0:43             ` Phillip Susi
  1 sibling, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2011-01-07 22:58 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Phillip Susi, git, Christian Couder, Stephen Bash

Jeff King <peff@peff.net> writes:

>   2. Should clone fetch refs from refs/replace (either by default, or
>      with an option)?
> ...
> Which leads to the second question. It is basically a matter of saying
> "do you want to fetch the view that upstream has"? I can definitely see
> that being useful, and meriting an option. However, it may or may not be
> worth turning on by default, as upstream's view may be confusing.

I think that should be stated a bit differently.  "Do you want to fetch
the view that the upstream offers as an option, and if you want, which
ones (meaning: there could be more than one replacement grafts to give
different views)?"

And as an optional view, I would say it is perfectly Ok to fetch whichever
view you want as a separate step after the initial clone.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-07 22:09           ` Jeff King
  2011-01-07 22:58             ` Junio C Hamano
@ 2011-01-08  0:43             ` Phillip Susi
  2011-01-11  5:47               ` Jeff King
  1 sibling, 1 reply; 34+ messages in thread
From: Phillip Susi @ 2011-01-08  0:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, git, Christian Couder, Stephen Bash

On 01/07/2011 05:09 PM, Jeff King wrote:
> I think there are two separate issues here:
>
>    1. Should transport protocols respect replacements (i.e., if you
>       truncate history with a replacement object and I fetch from you,
>       should you get the full history or the truncated one)?
>
>    2. Should clone fetch refs from refs/replace (either by default, or
>       with an option)?
>
> Based on previous discussions, I think the answer to the first is no.
> The resulting repo violates a fundamental assumption of git. Yes,
> because of the replacement object, many things will still work. But many
> parts of git intentionally do not respect replacement, and they will be
> broken.

What parts do not respect replacement?  More importantly, what parts 
will be broken?  The man page seems to indicate that about the only 
thing that does not by default is reachability testing, which to me 
means fsck and prune.  It seems to be the purpose of replace to 
/prevent/ breakage and be respected by default, unless doing so would 
cause harm, which is why fsck and prune do not.

> Instead, I think of replacements as a specific view into history, not a
> fundamental history-changing operation itself. Which means you can never
> save bandwidth or space by truncating history with replacements. You can
> only give somebody the full history, and share with them your view. If
> you want to truncate, you must rewrite history[1].

Right, but if you only care about that view, then there is no need to 
waste bandwidth fetching the original one.  It goes without saying that 
people pulling from the repository mainly care about the view upstream 
chooses to publish.  Upstream can choose to rewrite, which will cause 
breakage and is a sort of sneaky way to hide the original history, or 
they can use replace, which avoids the breakage and gives the client the 
choice of which view to use.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-07 22:58             ` Junio C Hamano
@ 2011-01-11  5:36               ` Jeff King
  2011-01-11 17:40                 ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2011-01-11  5:36 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, Phillip Susi, git, Christian Couder, Stephen Bash

On Fri, Jan 07, 2011 at 02:58:34PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >   2. Should clone fetch refs from refs/replace (either by default, or
> >      with an option)?
> > ...
> > Which leads to the second question. It is basically a matter of saying
> > "do you want to fetch the view that upstream has"? I can definitely see
> > that being useful, and meriting an option. However, it may or may not be
> > worth turning on by default, as upstream's view may be confusing.
> 
> I think that should be stated a bit differently.  "Do you want to fetch
> the view that the upstream offers as an option, and if you want, which
> ones (meaning: there could be more than one replacement grafts to give
> different views)?"

Sure, I think that is a sane way for the user to think about it, but do
we actually support multiple views? I thought replacement objects were
all or nothing.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-08  0:43             ` Phillip Susi
@ 2011-01-11  5:47               ` Jeff King
  2011-01-11  6:52                 ` Jonathan Nieder
  2011-01-11 15:24                 ` Phillip Susi
  0 siblings, 2 replies; 34+ messages in thread
From: Jeff King @ 2011-01-11  5:47 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Jonathan Nieder, git, Christian Couder, Stephen Bash

On Fri, Jan 07, 2011 at 07:43:40PM -0500, Phillip Susi wrote:

> >Based on previous discussions, I think the answer to the first is no.
> >The resulting repo violates a fundamental assumption of git. Yes,
> >because of the replacement object, many things will still work. But many
> >parts of git intentionally do not respect replacement, and they will be
> >broken.
> 
> What parts do not respect replacement?  More importantly, what parts
> will be broken?  The man page seems to indicate that about the only
> thing that does not by default is reachability testing, which to me
> means fsck and prune.  It seems to be the purpose of replace to
> /prevent/ breakage and be respected by default, unless doing so would
> cause harm, which is why fsck and prune do not.

Off the top of my head, I don't know. I suspect it would take somebody
writing a patch to create such an incomplete repository (or making one
manually) and seeing how badly things broke. Maybe nothing would, and I
am being overly conservative. It just makes me nervous to start
violating what has always been a fundamental assumption about the object
database (though as I pointed out, we did start violating it with
shallow clones, so maybe it is not so bad).

> >Instead, I think of replacements as a specific view into history, not a
> >fundamental history-changing operation itself. Which means you can never
> >save bandwidth or space by truncating history with replacements. You can
> >only give somebody the full history, and share with them your view. If
> >you want to truncate, you must rewrite history[1].
> 
> Right, but if you only care about that view, then there is no need to
> waste bandwidth fetching the original one.  It goes without saying
> that people pulling from the repository mainly care about the view
> upstream chooses to publish.  Upstream can choose to rewrite, which
> will cause breakage and is a sort of sneaky way to hide the original
> history, or they can use replace, which avoids the breakage and gives
> the client the choice of which view to use.

Once you have fetched with that view, how locked into that view are you?
Certainly you can never push to or be the fetch remote for another
repository that does not want to respect that view, because you simply
don't have the objects to complete the history for them.

But what about deepening your own repo? In your proposal, I contact the
server and ask for the replacement refs along with the branch refs. For
the history of the branches, it gives me the truncated version with the
replacement objects, right? Now how do I go back later and say "I'm
interested in getting the rest of history, give me the real one"?

I guess you can get the parent pointer from the real, "non-replaced"
object and ask for it. But you can't ask for a specific commit, so for
every such truncation, the parent needs to publish an extra ref (but
_not_ make it one of the ones fetched by default, or it would nullify
your original shallow fetch), and we need to contact them and find that
ref.

So I guess it's do-able, but there are a few interesting corners. I
think somebody would need to whip up a proof of concept patch to explore
those corners.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11  5:47               ` Jeff King
@ 2011-01-11  6:52                 ` Jonathan Nieder
  2011-01-11 15:37                   ` Phillip Susi
  2011-01-11 15:24                 ` Phillip Susi
  1 sibling, 1 reply; 34+ messages in thread
From: Jonathan Nieder @ 2011-01-11  6:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Phillip Susi, git, Christian Couder, Stephen Bash

Jeff King wrote:
> On Fri, Jan 07, 2011 at 07:43:40PM -0500, Phillip Susi wrote:

>> What parts do not respect replacement?  More importantly, what parts
>> will be broken?
[...]
> Off the top of my head, I don't know. I suspect it would take somebody
> writing a patch to create such an incomplete repository (or making one
> manually) and seeing how badly things broke.

I have two worries:

 - first, how easily can the replacement be undone? (as you mention
   below)
 - second, what happens if the two ends of transport have different
   replacements?

That second worry is the more major in my opinion.  Shallow clones are
a different story --- they do not fundamentally change the history and
they have special support in git protocol.  It is possible to punt on
both by saying that (1) replacements _cannot_ be undone --- a second
replacement is needed --- and (2) the receiving end of a connection is
not allowed to have any replacements for objects in common that the
sending end does not have, but then does that buy you anything
significant over a filter-branch?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11  5:47               ` Jeff King
  2011-01-11  6:52                 ` Jonathan Nieder
@ 2011-01-11 15:24                 ` Phillip Susi
  2011-01-11 17:39                   ` Jeff King
  1 sibling, 1 reply; 34+ messages in thread
From: Phillip Susi @ 2011-01-11 15:24 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, git, Christian Couder, Stephen Bash

On 1/11/2011 12:47 AM, Jeff King wrote:
> Once you have fetched with that view, how locked into that view are you?
> Certainly you can never push to or be the fetch remote for another
> repository that does not want to respect that view, because you simply
> don't have the objects to complete the history for them.

If you want to fetch the original history, then it is as simple as git
--no-replace-objects fetch.  Unless of course, the upstream repository
actually removed the original history ( or you are pulling from someone
else who only pulled the truncated history ), possibly transplanting it
to a historical repository that they should refer you to in the message
of the replace commit.  Then you just fetch from there instead, and
viola!  You have the complete original history.

> I guess you can get the parent pointer from the real, "non-replaced"
> object and ask for it. But you can't ask for a specific commit, so for
> every such truncation, the parent needs to publish an extra ref (but
> _not_ make it one of the ones fetched by default, or it would nullify
> your original shallow fetch), and we need to contact them and find that
> ref.

Yes, either a new branch or separate historical repository could be
published to pull the original history from, or git would need to pass
the --no-replace-objects flag to git-upload-pack on the server, causing
it to ignore the replace and send the original history.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11  6:52                 ` Jonathan Nieder
@ 2011-01-11 15:37                   ` Phillip Susi
  2011-01-11 18:22                     ` Jonathan Nieder
  0 siblings, 1 reply; 34+ messages in thread
From: Phillip Susi @ 2011-01-11 15:37 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jeff King, git, Christian Couder, Stephen Bash

On 1/11/2011 1:52 AM, Jonathan Nieder wrote:
> I have two worries:
> 
>  - first, how easily can the replacement be undone? (as you mention
>    below)

git replace -d id, or git --no-replace-objects.  It also might be nice
to add a new switch to git replace to disable a replace without deleting
it, so that it can later be enabled again.

>  - second, what happens if the two ends of transport have different
>    replacements?

Then you have a conflict, just like if the two ends have different tags
with the same name.

> That second worry is the more major in my opinion.  Shallow clones are
> a different story --- they do not fundamentally change the history and
> they have special support in git protocol.  It is possible to punt on
> both by saying that (1) replacements _cannot_ be undone --- a second
> replacement is needed --- and (2) the receiving end of a connection is
> not allowed to have any replacements for objects in common that the
> sending end does not have, but then does that buy you anything
> significant over a filter-branch?

One of the major advantages of replacements is that they can easily be
undone, so defeating that would be silly.  Just like with conflicting
tags, if the receiving end has conflicting replacements, they will be
kept instead of the remote version and a warning issued.  If you want
the remote version, delete your local one and fetch again.

What it buys you over filter-branch is:

1)  Those tracking your repo don't have breakage when they next fetch
because the chain of commits they were tracking has been destroyed and
replaced by a completely different one.

2)  It is obvious when a replace has been done, and the original is
still available.  This is good for auditing and traceability.  Paper
trails are good.

3)  Inserting a replace record takes a lot less cpu and IO than
filter-branch rewriting the entire chain.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11 15:24                 ` Phillip Susi
@ 2011-01-11 17:39                   ` Jeff King
  2011-01-11 19:48                     ` Johannes Sixt
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2011-01-11 17:39 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Jonathan Nieder, git, Christian Couder, Stephen Bash

On Tue, Jan 11, 2011 at 10:24:01AM -0500, Phillip Susi wrote:

> Yes, either a new branch or separate historical repository could be
> published to pull the original history from, or git would need to pass
> the --no-replace-objects flag to git-upload-pack on the server, causing
> it to ignore the replace and send the original history.

AFAIK, git can't pass --no-replace-objects to the server over git:// (or
smart http). You would need a protocol extension.

And here's another corner case I thought of:

Suppose you have some server S1 with this history:

  A--B--C--D

and a replace object truncating history to look like:

  B'--C--D

You clone from S1 and have only commits B', C, and D (or maybe even B,
depending on the implementation). But definitely not A, nor its
associated tree and blobs.

Now you want to fetch from another server S2, which built some commits
on the original history:

  A--B--C--D--E--F

You and S2 negotiate that you both have D, which implies that you have
all of the ancestors of D. S2 therefore sends you a thin pack containing
E and F, which may contain deltas against objects found in D or its
ancestors. Some of which may be only in A, which means you do not have
them.

Aside from fetching the entire real history, the only solution is that
you somehow have to communicate to S2 exactly which objects you have,
presumably by telling them which replacements you have used to arrive at
the object set you have. Which in the general case would mean actually
shipping them your replacement refs and objects (simply handling the
special case of commit truncation isn't sufficient; you could have
replaced any object with any other one).

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11  5:36               ` Jeff King
@ 2011-01-11 17:40                 ` Junio C Hamano
  2011-01-11 17:50                   ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2011-01-11 17:40 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Phillip Susi, git, Christian Couder, Stephen Bash

Jeff King <peff@peff.net> writes:

> Sure, I think that is a sane way for the user to think about it, but do
> we actually support multiple views? I thought replacement objects were
> all or nothing.

It is not implausible for a long running large project to restart their
history from a physical root commit every year, stiching the year-long
segments together at their ends with replacements, to make a default clone
to get a year's worth of the most recent history while allowing people to
get more by asking, no?

Of course, if you trust shallow-clones, you do not have to do that kind of
history surgery ;-).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11 17:40                 ` Junio C Hamano
@ 2011-01-11 17:50                   ` Jeff King
  2011-01-11 17:56                     ` Jonathan Nieder
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2011-01-11 17:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, Phillip Susi, git, Christian Couder, Stephen Bash

On Tue, Jan 11, 2011 at 09:40:17AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Sure, I think that is a sane way for the user to think about it, but do
> > we actually support multiple views? I thought replacement objects were
> > all or nothing.
> 
> It is not implausible for a long running large project to restart their
> history from a physical root commit every year, stiching the year-long
> segments together at their ends with replacements, to make a default clone
> to get a year's worth of the most recent history while allowing people to
> get more by asking, no?

Oh, absolutely I think it is reasonable. I just meant that we do not
have a convenient way of saying "fetch these replace objects, but only
use this particular subset". I think you are stuck with something manual
like:

  # grab "view" from upstream and name it; let's imagine it links 2010
  # history into 2009
  git fetch origin refs/replace/$sha1 refs/views/2009/$sha1

  # now we feel like using them
  git for-each-ref --shell --format='%(refname)' refs/views/2009 |
    while read ref; do
      git update-ref "refs/replace/${ref#refs/views/2009}" "$ref"
    done

Which is a little overkill for the simple example you gave, but would
also handle something as complex as a view like "pretend the foo/
subtree never existed" or even "pretend the foo/ subtree existed all
along".

Not that I'm sure such things are actually sane to do, performance-wise.
The replace system is fast, but it was designed for a handful of
objects, not hundreds or thousands.

Anyway. My point is that we don't have the porcelain to do something
like managing views or enabling/disabling them in a sane manner.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11 17:50                   ` Jeff King
@ 2011-01-11 17:56                     ` Jonathan Nieder
  2011-01-11 18:03                       ` Jeff King
  2011-01-11 19:32                       ` Christian Couder
  0 siblings, 2 replies; 34+ messages in thread
From: Jonathan Nieder @ 2011-01-11 17:56 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Phillip Susi, git, Christian Couder, Stephen Bash

Jeff King wrote:

> I think you are stuck with something manual
> like:
> 
>   # grab "view" from upstream and name it; let's imagine it links 2010
>   # history into 2009
>   git fetch origin refs/replace/$sha1 refs/views/2009/$sha1
> 
>   # now we feel like using them
>   git for-each-ref --shell --format='%(refname)' refs/views/2009 |
>     while read ref; do
>       git update-ref "refs/replace/${ref#refs/views/2009}" "$ref"
>     done
> 
> Which is a little overkill for the simple example you gave, but would
> also handle something as complex as a view like "pretend the foo/
> subtree never existed" or even "pretend the foo/ subtree existed all
> along".
> 
> Not that I'm sure such things are actually sane to do, performance-wise.
> The replace system is fast, but it was designed for a handful of
> objects, not hundreds or thousands.
> 
> Anyway. My point is that we don't have the porcelain to do something
> like managing views or enabling/disabling them in a sane manner.

Maybe something like

	git fetch origin refs/views/2009/*:refs/replace/*

except that that does not provide a nice way to remove to replace
refs when done.

A potential usability enhancement might be to allow additional
replacement hierarchies to be requested on a per command basis, like

	GIT_REPLACE_REFS=refs/remotes/origin/views/2009 gitk --all

along the lines of GIT_NOTES_REF.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11 17:56                     ` Jonathan Nieder
@ 2011-01-11 18:03                       ` Jeff King
  2011-01-11 19:32                       ` Christian Couder
  1 sibling, 0 replies; 34+ messages in thread
From: Jeff King @ 2011-01-11 18:03 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Junio C Hamano, Phillip Susi, git, Christian Couder, Stephen Bash

On Tue, Jan 11, 2011 at 11:56:21AM -0600, Jonathan Nieder wrote:

> Maybe something like
> 
> 	git fetch origin refs/views/2009/*:refs/replace/*

Heh, yeah, that is much simpler than what I did. :)

> A potential usability enhancement might be to allow additional
> replacement hierarchies to be requested on a per command basis, like
> 
> 	GIT_REPLACE_REFS=refs/remotes/origin/views/2009 gitk --all
> 
> along the lines of GIT_NOTES_REF.

Yes, that is a much better solution, IMHO.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11 15:37                   ` Phillip Susi
@ 2011-01-11 18:22                     ` Jonathan Nieder
  2011-01-11 18:42                       ` Phillip Susi
  0 siblings, 1 reply; 34+ messages in thread
From: Jonathan Nieder @ 2011-01-11 18:22 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Jeff King, git, Christian Couder, Stephen Bash

Hi,

Thoughts on use cases.  Jeff already explained the main protocol
problem to be solved very well (thanks!).

Phillip Susi wrote:

> 1)  Those tracking your repo don't have breakage when they next fetch
> because the chain of commits they were tracking has been destroyed and
> replaced by a completely different one.

This does not require transport respecting replacements.  Just start
a new line of history and teach "git pull" to pull replacement refs
first when requested in the refspec.

It could work like this:

	alice$ git branch historical
	alice$ git checkout --orphan newline
	alice$ git branch newroot
	alice$ ... hack hack hack ...
	alice$ git replace newroot historical
	alice$ git push world refs/replace/* +HEAD:master

	bob$ git remote show origin
	  URL: git://git.alice.example.com/project.git
	  Ref specifier: refs/replace/*:refs/replace/* refs/heads/*:refs/remotes/origin/*
	  HEAD branch: master
	  Remote branch:
	    master tracked
	  Local branch configured for 'git pull':
	    master merges with remote master
	bob$ git pull
	remote: Counting objects: 18, done.
	remote: Compressing objects: 100% (11/11), done.
	remote: Total 11 (delta 8), reused 0 (delta 0)
	Unpacking objects: 100% (11/11), done.
	From git://git.alice.example.com/project.git
	 * [new replacement]      87a8c7yc65c87c98c87c6a87c8a     -> replace/87a8c7yc65c87c98c87c6a87c8a
	   a78c9df..8c98df9  master     -> origin/master

> 2)  It is obvious when a replace has been done, and the original is
> still available.  This is good for auditing and traceability.  Paper
> trails are good.

With the method you are suggesting, others do _not_ always have the
original still available.  After I fetch from you with
--respect-hard-replacements, then while I am on an airplane I will
have this hard replacement ref staring at me that I cannot remove.

If the original goes missing or gets corrupted on the few machines
that had it, the hard replacement ref is permanent.

> 3)  Inserting a replace record takes a lot less cpu and IO than
> filter-branch rewriting the entire chain.

If the modified history is much shorter than the original (as in the
use case you described), would building it really take so much CPU and
I/O?  Moreover, is the extra CPU time to keep checking all the
replacements on the client side worth saving that one-time CPU time
expenditure on the server?

If (and only if) so then I see how that could be an advantage.

Sorry for the longwinded message.  Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11 18:22                     ` Jonathan Nieder
@ 2011-01-11 18:42                       ` Phillip Susi
  0 siblings, 0 replies; 34+ messages in thread
From: Phillip Susi @ 2011-01-11 18:42 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jeff King, git, Christian Couder, Stephen Bash

On 1/11/2011 1:22 PM, Jonathan Nieder wrote:
>> 1)  Those tracking your repo don't have breakage when they next fetch
>> because the chain of commits they were tracking has been destroyed and
>> replaced by a completely different one.
> 
> This does not require transport respecting replacements.  Just start
> a new line of history and teach "git pull" to pull replacement refs
> first when requested in the refspec.

That's what I've been saying.  My statement that you quote above is
stating why git replace is better than git filter-branch.

>> 2)  It is obvious when a replace has been done, and the original is
>> still available.  This is good for auditing and traceability.  Paper
>> trails are good.
> 
> With the method you are suggesting, others do _not_ always have the
> original still available.  After I fetch from you with
> --respect-hard-replacements, then while I am on an airplane I will
> have this hard replacement ref staring at me that I cannot remove.

They may not have it in their local repository, but it is clear that
there IS an original history, and the replace record comment should tell
them from where they can fetch it, and those tracking the repository
before the replace was added already have it.

Using filter-branch on the other hand, is a sort of dirty hack that
violates the integrity constrains normally in place, and can leave you
with a history that has no indication that there ever was more.

> If the original goes missing or gets corrupted on the few machines
> that had it, the hard replacement ref is permanent.

I think it goes without saying that if you loose part of the repository,
and there are no other copies, then you have lost part of the repository.

> If the modified history is much shorter than the original (as in the
> use case you described), would building it really take so much CPU and
> I/O?  Moreover, is the extra CPU time to keep checking all the
> replacements on the client side worth saving that one-time CPU time
> expenditure on the server?

It would take more than just inserting the replace record.  I'm not sure
what you mean by "keep checking all the replacements on the client side".

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11 17:56                     ` Jonathan Nieder
  2011-01-11 18:03                       ` Jeff King
@ 2011-01-11 19:32                       ` Christian Couder
  1 sibling, 0 replies; 34+ messages in thread
From: Christian Couder @ 2011-01-11 19:32 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Jeff King, Junio C Hamano, Phillip Susi, git, Stephen Bash

Hi,

On Tuesday 11 January 2011 18:56:21 Jonathan Nieder wrote:
> 
> A potential usability enhancement might be to allow additional
> replacement hierarchies to be requested on a per command basis, like
> 
> 	GIT_REPLACE_REFS=refs/remotes/origin/views/2009 gitk --all
> 
> along the lines of GIT_NOTES_REF.

Yes, it should not be much work to implement GIT_REPLACE_REFS like the above, 
but I think it should accept a list of ref directories, for example:

GIT_REPLACE _REFS=".:bisect:refs/remotes/origin/views/2009"

Best regards,
Christian.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11 17:39                   ` Jeff King
@ 2011-01-11 19:48                     ` Johannes Sixt
  2011-01-11 19:51                       ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Johannes Sixt @ 2011-01-11 19:48 UTC (permalink / raw)
  To: Jeff King
  Cc: Phillip Susi, Jonathan Nieder, git, Christian Couder, Stephen Bash

On Dienstag, 11. Januar 2011, Jeff King wrote:
> On Tue, Jan 11, 2011 at 10:24:01AM -0500, Phillip Susi wrote:
> > Yes, either a new branch or separate historical repository could be
> > published to pull the original history from, or git would need to pass
> > the --no-replace-objects flag to git-upload-pack on the server, causing
> > it to ignore the replace and send the original history.
>
> AFAIK, git can't pass --no-replace-objects to the server over git:// (or
> smart http). You would need a protocol extension.

Why would you have to? git-upload-pack never looks at replacement objects.

> And here's another corner case I thought of:
>
> Suppose you have some server S1 with this history:
>
>   A--B--C--D
>
> and a replace object truncating history to look like:
>
>   B'--C--D
>
> You clone from S1 and have only commits B', C, and D (or maybe even B,
> depending on the implementation). But definitely not A, nor its
> associated tree and blobs.

Why so? Cloning transfers the database using git-upload-pack, 
git-pack-objects, git-index-pack, and git-unpack-objects. All of them have 
object replacements disabled. (And AFAICS, there is no possibility to 
*enable* it.)

Therefore, after cloning you get

 A--B--C--D

and perhaps also the replacement object B'.

Hint: git grep read_replace_refs

-- Hannes

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11 19:48                     ` Johannes Sixt
@ 2011-01-11 19:51                       ` Jeff King
  2011-01-11 20:00                         ` Johannes Sixt
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2011-01-11 19:51 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Phillip Susi, Jonathan Nieder, git, Christian Couder, Stephen Bash

On Tue, Jan 11, 2011 at 08:48:57PM +0100, Johannes Sixt wrote:

> On Dienstag, 11. Januar 2011, Jeff King wrote:
> > On Tue, Jan 11, 2011 at 10:24:01AM -0500, Phillip Susi wrote:
> > > Yes, either a new branch or separate historical repository could be
> > > published to pull the original history from, or git would need to pass
> > > the --no-replace-objects flag to git-upload-pack on the server, causing
> > > it to ignore the replace and send the original history.
> >
> > AFAIK, git can't pass --no-replace-objects to the server over git:// (or
> > smart http). You would need a protocol extension.
> 
> Why would you have to? git-upload-pack never looks at replacement objects.

I think you missed the first part of this discussion. Phillip is
proposing that it should, and I am arguing against it.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11 19:51                       ` Jeff King
@ 2011-01-11 20:00                         ` Johannes Sixt
  2011-01-11 20:22                           ` Phillip Susi
  0 siblings, 1 reply; 34+ messages in thread
From: Johannes Sixt @ 2011-01-11 20:00 UTC (permalink / raw)
  To: Jeff King
  Cc: Phillip Susi, Jonathan Nieder, git, Christian Couder, Stephen Bash

On Dienstag, 11. Januar 2011, Jeff King wrote:
> I think you missed the first part of this discussion. Phillip is
> proposing that it should, and I am arguing against it.

You're right, sorry for the noise. Now I understand this three-word-subject.

-- Hannes

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11 20:00                         ` Johannes Sixt
@ 2011-01-11 20:22                           ` Phillip Susi
  2011-01-11 20:50                             ` Jonathan Nieder
  0 siblings, 1 reply; 34+ messages in thread
From: Phillip Susi @ 2011-01-11 20:22 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Jeff King, Jonathan Nieder, git, Christian Couder, Stephen Bash

On 1/11/2011 3:00 PM, Johannes Sixt wrote:
> On Dienstag, 11. Januar 2011, Jeff King wrote:
>> I think you missed the first part of this discussion. Phillip is
>> proposing that it should, and I am arguing against it.
> 
> You're right, sorry for the noise. Now I understand this three-word-subject.

What it really comes down to is that you can use replace locally to
modify your history and it works great.  As soon as someone clones from
you though, they don't get the replace and so they end up with a
different history than you see.

I suggested that git-upload-pack should respect replace records by
default, so that people cloning your repository will get the same
replaced history instead of the original.

It seems that the recommended use of replace is to locally append
history back on, after it has been removed upstream with git
filter-branch.  Using filter-branch is bad, so it makes more sense to me
to do the remove with git replace, and then if you want to add it back,
you just have to disable the replace ( and maybe fetch additional objects ).

The one problem that has come up is that when you fetch and tell the
server you have a commit after the replace, it assumes that you also
have the commits prior to the replace and may delta against objects you
do not have.  Fixing that would require informing the server of any
replacements you have, and it being able to use that information to
avoid deltas against objects hidden by the replace.

Does that sound like a pretty good summary to everyone?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11 20:22                           ` Phillip Susi
@ 2011-01-11 20:50                             ` Jonathan Nieder
  2011-01-12  0:59                               ` Phillip Susi
  0 siblings, 1 reply; 34+ messages in thread
From: Jonathan Nieder @ 2011-01-11 20:50 UTC (permalink / raw)
  To: Phillip Susi
  Cc: Johannes Sixt, Jeff King, git, Christian Couder, Stephen Bash

Phillip Susi wrote:

> It seems that the recommended use of replace is to locally append
> history back on, after it has been removed upstream with git
> filter-branch.  Using filter-branch is bad, so it makes more sense to me
> to do the remove with git replace, and then if you want to add it back,
> you just have to disable the replace ( and maybe fetch additional objects ).
>
> The one problem that has come up is that when you fetch and tell the
> server you have a commit after the replace, it assumes that you also
> have the commits prior to the replace and may delta against objects you
> do not have.  Fixing that would require informing the server of any
> replacements you have, and it being able to use that information to
> avoid deltas against objects hidden by the replace.
>
> Does that sound like a pretty good summary to everyone?

Yes, except for "Using filter-branch is bad".  Using filter-branch is
not bad.  Also there are many recommended uses of replace: for example,
to swap out a commit that builds for one that doesn't when using "git
bisect", or to stage history changes before making them permanent with
filter-branch.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: clone breaks replace
  2011-01-11 20:50                             ` Jonathan Nieder
@ 2011-01-12  0:59                               ` Phillip Susi
  2011-01-14 20:53                                 ` small downloads and immutable history (Re: clone breaks replace) Jonathan Nieder
  0 siblings, 1 reply; 34+ messages in thread
From: Phillip Susi @ 2011-01-12  0:59 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Johannes Sixt, Jeff King, git, Christian Couder, Stephen Bash

On 01/11/2011 03:50 PM, Jonathan Nieder wrote:
> Yes, except for "Using filter-branch is bad".  Using filter-branch is
> not bad.

It is bad because it breaks people tracking your branch, and violates 
the immutability of history.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* small downloads and immutable history (Re: clone breaks replace)
  2011-01-12  0:59                               ` Phillip Susi
@ 2011-01-14 20:53                                 ` Jonathan Nieder
  2011-01-15  5:27                                   ` Phillip Susi
  0 siblings, 1 reply; 34+ messages in thread
From: Jonathan Nieder @ 2011-01-14 20:53 UTC (permalink / raw)
  To: Phillip Susi
  Cc: Johannes Sixt, Jeff King, git, Christian Couder, Stephen Bash

Phillip Susi wrote:
> On 01/11/2011 03:50 PM, Jonathan Nieder wrote:

>> Yes, except for "Using filter-branch is bad".  Using filter-branch is
>> not bad.
>
> It is bad because it breaks people tracking your branch, and
> violates the immutability of history.

Ah, I forgot the use case.  If you are using this to at long last get
past the limitations (e.g., inability to push) of "fetch --depth",
then yes, rewriting existing history is bad.

So what's left is some way to make the "have" part of transport
negotiation make sense in this context.  I'll be happy if it happens.

Thanks for clarifying.
Jonathan

[note: if you occasionally use

 git commit; # new commit
 git tag tmp
 git checkout --orphan newroot
 git replace newroot tmp
 git tag -d tmp

so the history without replacement refs is short, no rewriting of
history has to take place.  Some testing and tweaking might be
required to make "git pull" continue to fast-forward.]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: small downloads and immutable history (Re: clone breaks replace)
  2011-01-14 20:53                                 ` small downloads and immutable history (Re: clone breaks replace) Jonathan Nieder
@ 2011-01-15  5:27                                   ` Phillip Susi
  0 siblings, 0 replies; 34+ messages in thread
From: Phillip Susi @ 2011-01-15  5:27 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Johannes Sixt, Jeff King, git, Christian Couder, Stephen Bash

On 01/14/2011 03:53 PM, Jonathan Nieder wrote:
> Ah, I forgot the use case.  If you are using this to at long last get
> past the limitations (e.g., inability to push) of "fetch --depth",
> then yes, rewriting existing history is bad.

I'm not really talking about using --depth, but more of the project 
deciding to truncate the history in the central repository.

> So what's left is some way to make the "have" part of transport
> negotiation make sense in this context.  I'll be happy if it happens.

Good point.  Whether local history is short because of --depth or 
replace records, the same problem arises; the negotiation needs to be 
able to exclude older objects that are not present locally, rather than 
assuming that the client has the entire history if it has any at all. 
It seems like this should just require sending the server and end point 
in addition to a start point.  In other words, not just send ID of the 
most recent commit, but also the oldest that it has on hand, so that the 
server can be sure that it does not deltafy against objects prior to 
that commit.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2011-01-15  5:27 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-06 21:00 clone breaks replace Phillip Susi
2011-01-06 21:33 ` Jonathan Nieder
2011-01-06 21:59   ` Junio C Hamano
2011-01-07 19:43   ` Phillip Susi
2011-01-07 20:51     ` Jonathan Nieder
2011-01-07 21:15       ` Stephen Bash
2011-01-07 21:34       ` Jonathan Nieder
2011-01-07 21:44       ` Phillip Susi
2011-01-07 21:49         ` Jonathan Nieder
2011-01-07 22:09           ` Phillip Susi
2011-01-07 22:09           ` Jeff King
2011-01-07 22:58             ` Junio C Hamano
2011-01-11  5:36               ` Jeff King
2011-01-11 17:40                 ` Junio C Hamano
2011-01-11 17:50                   ` Jeff King
2011-01-11 17:56                     ` Jonathan Nieder
2011-01-11 18:03                       ` Jeff King
2011-01-11 19:32                       ` Christian Couder
2011-01-08  0:43             ` Phillip Susi
2011-01-11  5:47               ` Jeff King
2011-01-11  6:52                 ` Jonathan Nieder
2011-01-11 15:37                   ` Phillip Susi
2011-01-11 18:22                     ` Jonathan Nieder
2011-01-11 18:42                       ` Phillip Susi
2011-01-11 15:24                 ` Phillip Susi
2011-01-11 17:39                   ` Jeff King
2011-01-11 19:48                     ` Johannes Sixt
2011-01-11 19:51                       ` Jeff King
2011-01-11 20:00                         ` Johannes Sixt
2011-01-11 20:22                           ` Phillip Susi
2011-01-11 20:50                             ` Jonathan Nieder
2011-01-12  0:59                               ` Phillip Susi
2011-01-14 20:53                                 ` small downloads and immutable history (Re: clone breaks replace) Jonathan Nieder
2011-01-15  5:27                                   ` Phillip Susi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.