All of lore.kernel.org
 help / color / mirror / Atom feed
* the 100 mb push
@ 2009-12-15 19:23 Joey Hess
  2009-12-15 19:42 ` Shawn O. Pearce
  2009-12-15 20:00 ` Junio C Hamano
  0 siblings, 2 replies; 3+ messages in thread
From: Joey Hess @ 2009-12-15 19:23 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3109 bytes --]

Is it normal for git push to sometimes transfer much more data
than seems necessary? Here is a case where that happens:

joey@gnu:~/src/p.t>git branch
* master
  pristine-tar
  testsuite
joey@gnu:~/src/p.t>git remote show origin
* remote origin
  Fetch URL: ssh://joey@git.kitenet.net/srv/git/pristine-tar.test
  Push  URL: ssh://joey@git.kitenet.net/srv/git/pristine-tar.test
  HEAD branch: master
  Remote branches:
    master       tracked
    pristine-tar tracked
    testsuite    tracked
  Local branches configured for 'git pull':
    master       merges with remote master
    pristine-tar merges with remote pristine-tar
    testsuite    merges with remote testsuite
  Local refs configured for 'git push':
    master       pushes to master       (fast forwardable)
    pristine-tar pushes to pristine-tar (up to date)
    testsuite    pushes to testsuite    (local out of date)

Here, master is a typical small project branch. It has a 1 line change
made locally.

Meanwhile, the testsuite branch is a 100+ mb monster, containing a lot
of big binaries. In it, a small change has been made in the origin
repo. In the local repo, a *lot* of *big* files have been deleted from
the same branch, about 20 mb of files were removed all told. But the diff
for this change should be quite small.

So, testsuite needs to be merged before it can be pushed, but git push
doesn't tell me that. Instead, it goes off and does this for 2+ hours:

joey@gnu:~/src/p.t>git push
Counting objects: 241, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (206/206), done.
Writing objects:  15% (36/237), 2.16 MiB | 15 KiB/s 
^C

It seems to be uploading the entire repo over the wire, and this is a
typical asymmetric network connection, so that goes slow. (Took me a
while to realize it was not just auto-gcing the repo locally.)

Once I realized what was going on, it was easy to merge it as shown
below, and then the push transferred an appropriatly small amount of data.
So, my question is, assuming this is not a straight up bug in git, would
it make sense to avoid this gotcha in some way?

joey@gnu:~/src/p.t2>git checkout testsuite
Switched to branch 'testsuite'
Your branch is ahead of 'origin/testsuite' by 1 commit.
joey@gnu:~/src/p.t2>git pull
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ssh://git.kitenet.net/srv/git/pristine-tar.test
   3c16948..fce7ec1  testsuite  -> origin/testsuite
Merge made by recursive.
 Makefile |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)
joey@gnu:~/src/p.t2>git push
Counting objects: 13, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (8/8), done.
Writing objects: 100% (8/8), 889 bytes, done.
Total 8 (delta 5), reused 0 (delta 0)
To ssh://joey@git.kitenet.net/srv/git/pristine-tar.test
   aab45a1..cc93945  master -> master
   fce7ec1..d82f225  testsuite -> testsuite

git version 1.6.5.3

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: the 100 mb push
  2009-12-15 19:23 the 100 mb push Joey Hess
@ 2009-12-15 19:42 ` Shawn O. Pearce
  2009-12-15 20:00 ` Junio C Hamano
  1 sibling, 0 replies; 3+ messages in thread
From: Shawn O. Pearce @ 2009-12-15 19:42 UTC (permalink / raw)
  To: Joey Hess; +Cc: git

Joey Hess <joey@kitenet.net> wrote:
> Is it normal for git push to sometimes transfer much more data
> than seems necessary? Here is a case where that happens:

Yes.
 
> Meanwhile, the testsuite branch is a 100+ mb monster, containing a lot
> of big binaries. In it, a small change has been made in the origin
> repo. In the local repo, a *lot* of *big* files have been deleted from
> the same branch, about 20 mb of files were removed all told. But the diff
> for this change should be quite small.
> 
> So, testsuite needs to be merged before it can be pushed, but git push
> doesn't tell me that. Instead, it goes off and does this for 2+ hours:

The problem here is, unlike fetch, push does not do a common
ancestor negotiation.  The sending side (your push client)
just assumes the remote side has *only* what the remote side
advertised.

Since the remote side advertised a commit on the testsuite branch
that your client doesn't have, your client was forced to assume
there was no common ancestor and sent the entire thing.

This usually doesn't show up that badly because the delta tends to
be smaller (no huge binary files), tends to be a strict fast forward
(so your client contains what the remote advertised), and tags may
help to limit the upload size by being at fixed points in the history
(so at worst you upload since the last tag).

Junio wrote a patch series for git push over a year ago to make
it do common ancestor negotiation like git fetch does, but it had
a deadlock problem and the patch series got dropped.  Not enough
people were interested to help Junio carry it through to being
ready for inclusion.
 
-- 
Shawn.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: the 100 mb push
  2009-12-15 19:23 the 100 mb push Joey Hess
  2009-12-15 19:42 ` Shawn O. Pearce
@ 2009-12-15 20:00 ` Junio C Hamano
  1 sibling, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2009-12-15 20:00 UTC (permalink / raw)
  To: Joey Hess; +Cc: git

Joey Hess <joey@kitenet.net> writes:

> So, my question is, assuming this is not a straight up bug in git, would
> it make sense to avoid this gotcha in some way?

The "push" support was originally written for people who push into their
own repositories for publishing (i.e. almost always fast-forwarding) and
lacked the elaborate common ancestor discovery negotiation the "fetch"
side had.

Suppose you have a rewound or forked history, like this:


(their side)

                *---Y pu
               / 
          *---X
         /
    *---*---Z master


(your side)

          *---X---*---A pu
         /
    *---*---Z---*---B master

  - You were in sync when the 'pu' was at X with them; somebody pushed a
    few commits on top of it (forked case); or

  - You were in sync when the 'pu' was at Y with them (you pushed it their
    last time yourself), but you rebuilt 'pu' since then (rewound case).

If you run "git push there master +pu", it learns that the tips of
'master' and 'pu' are at Z and Y respectively at their end.  Because the
protocol did not negotiate the common ancestor, it would try to send:

    rev-list A B ^Z ^Y

but using only the information available at your end locally.

Because you either never have heard of (in a forked case) or no longer
know (in a rewound case) what 'Y' is, in order to update 'pu', you end up
sending commits 'Z..A', duplicating 'Z..X' part, because "^Y" cannot
participate in the ancestory computation.  Commits 'Z..B' will be sent to
update 'master'.

And this aspect of "git push" protocol hasn't changed much since it was
written.

In contrast, the protocol used for "fetch" tries to discover 'X' in such a
case by a little exchange, like this:

 downloader: Your tip is at 'Y'?  I've never heard of it.  Please
             tell me about its parents.

 uploader:   It is this "Y^", and its parent is 'X', and ...

 downloader: Ok, I know what 'X' is. I heard enough to proceed. Thank you.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-12-15 20:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-15 19:23 the 100 mb push Joey Hess
2009-12-15 19:42 ` Shawn O. Pearce
2009-12-15 20:00 ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.