linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Mercurial 0.4e vs git network pull
@ 2005-05-15 11:22 Adam J. Richter
  2005-05-15 12:40 ` Petr Baudis
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Adam J. Richter @ 2005-05-15 11:22 UTC (permalink / raw)
  To: mpm, pasky; +Cc: git, jgarzik, linux-kernel, mercurial, torvalds

On Sun, 15 May 2005 10:54:05 +0200, Petr Baudis wrote:
>Dear diary, on Thu, May 12, 2005 at 10:57:35PM CEST, I got a letter
>where Matt Mackall <mpm@selenic.com> told me that...
>> Does this need an HTTP request (and round trip) per object? It appears
>> to. That's 2200 requests/round trips for my 800 patch benchmark.

>Yes it does. On the other side, it needs no server-side CGI. But I guess
>it should be pretty easy to write some kind of server-side CGI streamer,
>and it would then easily take just a single HTTP request (telling the
>server the commit ID and receiving back all the objects).

	I don't understand what was wrong with Jeff Garzik's previous
suggestion of using http/1.1 pipelining to coalesce the round trips.
If you're worried about queuing too many http/1.1 requests, the client
could adopt a policy of not having more than a certain number of
requests outstanding or perhaps even making a new http connection
after a certain number of requests to avoid starving other clients
when the number of clients doing one of these transfers exceeds the
number of threads that the http server uses.

	Being able to do without a server side CGI script might
encourage deployment a bit more, both for security reasons and
effort of deployment.

	In any case, using httpd or ftp makes it easier to deploy
servers in cases where it might be harder to modify firewall rules,
so I am glad to see that, even if it is through a CGI script.

                    __     ______________
Adam J. Richter        \ /
adam@yggdrasil.com      | g g d r a s i l

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-15 11:22 Mercurial 0.4e vs git network pull Adam J. Richter
@ 2005-05-15 12:40 ` Petr Baudis
  2005-05-16 22:22   ` Tristan Wibberley
  2005-05-15 17:39 ` Matt Mackall
  2005-05-16  9:29 ` Matthias Urlichs
  2 siblings, 1 reply; 26+ messages in thread
From: Petr Baudis @ 2005-05-15 12:40 UTC (permalink / raw)
  To: Adam J. Richter; +Cc: mpm, git, jgarzik, linux-kernel, mercurial, torvalds

Dear diary, on Sun, May 15, 2005 at 01:22:19PM CEST, I got a letter
where "Adam J. Richter" <adam@yggdrasil.com> told me that...
> On Sun, 15 May 2005 10:54:05 +0200, Petr Baudis wrote:
> >Dear diary, on Thu, May 12, 2005 at 10:57:35PM CEST, I got a letter
> >where Matt Mackall <mpm@selenic.com> told me that...
> >> Does this need an HTTP request (and round trip) per object? It appears
> >> to. That's 2200 requests/round trips for my 800 patch benchmark.
> 
> >Yes it does. On the other side, it needs no server-side CGI. But I guess
> >it should be pretty easy to write some kind of server-side CGI streamer,
> >and it would then easily take just a single HTTP request (telling the
> >server the commit ID and receiving back all the objects).
> 
> 	I don't understand what was wrong with Jeff Garzik's previous
> suggestion of using http/1.1 pipelining to coalesce the round trips.
> If you're worried about queuing too many http/1.1 requests, the client
> could adopt a policy of not having more than a certain number of
> requests outstanding or perhaps even making a new http connection
> after a certain number of requests to avoid starving other clients
> when the number of clients doing one of these transfers exceeds the
> number of threads that the http server uses.

The problem is that to fetch a revision tree, you have to

	send request for commit A
	receive commit A
	look at commit A for list of its parents
	send request for the parents
	receive the parents
	look inside for list of its parents
	...

(and same for the trees).

> 	Being able to do without a server side CGI script might
> encourage deployment a bit more, both for security reasons and
> effort of deployment.

You could still use it without the server side CGI script as it is now,
just without the speedups.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-15 11:22 Mercurial 0.4e vs git network pull Adam J. Richter
  2005-05-15 12:40 ` Petr Baudis
@ 2005-05-15 17:39 ` Matt Mackall
  2005-05-15 18:23   ` Jeff Garzik
  2005-05-16  9:29 ` Matthias Urlichs
  2 siblings, 1 reply; 26+ messages in thread
From: Matt Mackall @ 2005-05-15 17:39 UTC (permalink / raw)
  To: Adam J. Richter; +Cc: pasky, git, jgarzik, linux-kernel, mercurial, torvalds

On Sun, May 15, 2005 at 04:22:19AM -0700, Adam J. Richter wrote:
> On Sun, 15 May 2005 10:54:05 +0200, Petr Baudis wrote:
> >Dear diary, on Thu, May 12, 2005 at 10:57:35PM CEST, I got a letter
> >where Matt Mackall <mpm@selenic.com> told me that...
> >> Does this need an HTTP request (and round trip) per object? It appears
> >> to. That's 2200 requests/round trips for my 800 patch benchmark.
> 
> >Yes it does. On the other side, it needs no server-side CGI. But I guess
> >it should be pretty easy to write some kind of server-side CGI streamer,
> >and it would then easily take just a single HTTP request (telling the
> >server the commit ID and receiving back all the objects).
> 
> 	I don't understand what was wrong with Jeff Garzik's previous
> suggestion of using http/1.1 pipelining to coalesce the round trips.

You can't do pipelining if you can't look ahead far enough to fill the pipe.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-15 17:39 ` Matt Mackall
@ 2005-05-15 18:23   ` Jeff Garzik
  2005-05-16  1:12     ` Matt Mackall
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff Garzik @ 2005-05-15 18:23 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Adam J. Richter, pasky, git, linux-kernel, mercurial, torvalds

Matt Mackall wrote:
> On Sun, May 15, 2005 at 04:22:19AM -0700, Adam J. Richter wrote:
> 
>>On Sun, 15 May 2005 10:54:05 +0200, Petr Baudis wrote:
>>
>>>Dear diary, on Thu, May 12, 2005 at 10:57:35PM CEST, I got a letter
>>>where Matt Mackall <mpm@selenic.com> told me that...
>>>
>>>>Does this need an HTTP request (and round trip) per object? It appears
>>>>to. That's 2200 requests/round trips for my 800 patch benchmark.
>>
>>>Yes it does. On the other side, it needs no server-side CGI. But I guess
>>>it should be pretty easy to write some kind of server-side CGI streamer,
>>>and it would then easily take just a single HTTP request (telling the
>>>server the commit ID and receiving back all the objects).
>>
>>	I don't understand what was wrong with Jeff Garzik's previous
>>suggestion of using http/1.1 pipelining to coalesce the round trips.
> 
> 
> You can't do pipelining if you can't look ahead far enough to fill the pipe.

Even if you cannot fill a pipeline, HTTP/1.1 is sufficiently useful 
simply by removing the per-request connection overhead.

	Jeff



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-15 18:23   ` Jeff Garzik
@ 2005-05-16  1:12     ` Matt Mackall
  0 siblings, 0 replies; 26+ messages in thread
From: Matt Mackall @ 2005-05-16  1:12 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Adam J. Richter, pasky, git, linux-kernel, mercurial, torvalds

On Sun, May 15, 2005 at 02:23:29PM -0400, Jeff Garzik wrote:
> Matt Mackall wrote:
> >On Sun, May 15, 2005 at 04:22:19AM -0700, Adam J. Richter wrote:
> >
> >>On Sun, 15 May 2005 10:54:05 +0200, Petr Baudis wrote:
> >>
> >>>Dear diary, on Thu, May 12, 2005 at 10:57:35PM CEST, I got a letter
> >>>where Matt Mackall <mpm@selenic.com> told me that...
> >>>
> >>>>Does this need an HTTP request (and round trip) per object? It appears
> >>>>to. That's 2200 requests/round trips for my 800 patch benchmark.
> >>
> >>>Yes it does. On the other side, it needs no server-side CGI. But I guess
> >>>it should be pretty easy to write some kind of server-side CGI streamer,
> >>>and it would then easily take just a single HTTP request (telling the
> >>>server the commit ID and receiving back all the objects).
> >>
> >>	I don't understand what was wrong with Jeff Garzik's previous
> >>suggestion of using http/1.1 pipelining to coalesce the round trips.
> >
> >
> >You can't do pipelining if you can't look ahead far enough to fill the 
> >pipe.
> 
> Even if you cannot fill a pipeline, HTTP/1.1 is sufficiently useful 
> simply by removing the per-request connection overhead.

Sure. It cuts round trips by a factor of 2. But that's just about all
it does.

Mercurial already does:
  - approximately O(log(new changesets)) requests/data to find new changesets
  - one request to get an entire changegroup (set of all new
    changesets), which comes back all nicely pipelined and sorted by file
  - delta transfer

In "dumb http" mode, ie what's been there since about day three, it
can do:
  - one request (size proportional to total number of changesets) to
    find new changesets
  - approximately two requests per changed file to pull all deltas
    (vs request per file revision)
  - delta transfer

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-15 11:22 Mercurial 0.4e vs git network pull Adam J. Richter
  2005-05-15 12:40 ` Petr Baudis
  2005-05-15 17:39 ` Matt Mackall
@ 2005-05-16  9:29 ` Matthias Urlichs
  2 siblings, 0 replies; 26+ messages in thread
From: Matthias Urlichs @ 2005-05-16  9:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: git

Hi, Adam J. Richter wrote:

> 	Being able to do without a server side CGI script might
> encourage deployment a bit more, both for security reasons and
> effort of deployment.

A simple server-side CGI would be a "send me all changeset SHA-1s,
starting at HEAD until you reach FOO" operation (FOO being the SHA1 of
the previous head you've pulled before). This operation is simple enough
that it people should have no problem installing such a CGI.

You could then stream-pull the actual contents over HTTP/1.1 without
further CGI interaction.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-15 12:40 ` Petr Baudis
@ 2005-05-16 22:22   ` Tristan Wibberley
  0 siblings, 0 replies; 26+ messages in thread
From: Tristan Wibberley @ 2005-05-16 22:22 UTC (permalink / raw)
  To: linux-kernel

On Sun, 2005-05-15 at 14:40 +0200, Petr Baudis wrote:
> Dear diary, on Sun, May 15, 2005 at 01:22:19PM CEST, I got a letter
> where "Adam J. Richter" <adam@yggdrasil.com> told me that...
> > 
> > 	I don't understand what was wrong with Jeff Garzik's previous
> > suggestion of using http/1.1 pipelining to coalesce the round trips.
> > If you're worried about queuing too many http/1.1 requests, the client
> > could adopt a policy of not having more than a certain number of
> > requests outstanding or perhaps even making a new http connection
> > after a certain number of requests to avoid starving other clients
> > when the number of clients doing one of these transfers exceeds the
> > number of threads that the http server uses.
> 
> The problem is that to fetch a revision tree, you have to
> 
> 	send request for commit A
> 	receive commit A
> 	look at commit A for list of its parents
> 	send request for the parents
> 	receive the parents
> 	look inside for list of its parents
> 	...

What about IMAP? You could ask for just the parents for several messages
(via a message header), then start asking for message bodies (with the
juicy stuff in). You could also ask for a list of the new commits then
ask for each of the bodies (several at a time). Not as good as a "Just
give me all new data", but an *awful* lot more efficient than HTTP. And
very flexible. You just need to map changesets to IMAP messages (if such
a mapping can actually make sense :)

Prolly a bit more work though.

--
Tristan Wibberley

The opinions expressed in this message are my own opinions and not those
of my employer.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-15  8:50         ` Petr Baudis
@ 2005-05-15 15:12           ` Christian Kujau
  0 siblings, 0 replies; 26+ messages in thread
From: Christian Kujau @ 2005-05-15 15:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Petr Baudis

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Petr Baudis wrote:
>>>remote HEAD you want to fetch, and the URL; see
>>>Documentation/git-http-pull.txt).
>>
[..]
> 
> It's in the git-pb and cogito trees. Linus is on holiday. :-)
> 

ah, thanks. (that's why "cg-update" returns so quickly ;-))

- --
BOFH excuse #266:

All of the packets are empty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCh2bG+A7rjkF8z0wRAqWTAKCO2SW1Ax5+HPrMa6pTQCj/PaQ5mQCfUcKe
f9oyyKbdVTdxpEKGgbSKNIM=
=0YP6
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-15 11:52 Adam J. Richter
@ 2005-05-15 14:23 ` Petr Baudis
  0 siblings, 0 replies; 26+ messages in thread
From: Petr Baudis @ 2005-05-15 14:23 UTC (permalink / raw)
  To: Adam J. Richter; +Cc: git, jgarzik, linux-kernel, mercurial, mpm, torvalds

Dear diary, on Sun, May 15, 2005 at 01:52:50PM CEST, I got a letter
where "Adam J. Richter" <adam@yggdrasil.com> told me that...
> On Sun, 15 May 2005 14:40:42 +0200, Petr Baudis wrote:
> >Dear diary, on Sun, May 15, 2005 at 01:22:19PM CEST, I got a letter
> >where "Adam J. Richter" <adam@yggdrasil.com> told me that...
> [...]
> >> 	I don't understand what was wrong with Jeff Garzik's previous
> >> suggestion of using http/1.1 pipelining to coalesce the round trips.
> >> If you're worried about queuing too many http/1.1 requests, the client
> >> could adopt a policy of not having more than a certain number of
> >> requests outstanding or perhaps even making a new http connection
> >> after a certain number of requests to avoid starving other clients
> >> when the number of clients doing one of these transfers exceeds the
> >> number of threads that the http server uses.
> 
> >The problem is that to fetch a revision tree, you have to
> 
> >	send request for commit A
> >	receive commit A
> >	look at commit A for list of its parents
> >	send request for the parents
> >	receive the parents
> >	look inside for list of its parents
> >	...
> 
> >(and same for the trees).
> 
> 	Don't you usually have a list of many files for which you
> want to retrieve this information?  I'd imagine that would usually
> suffice to fill the pipeline.

That might be true for the trees, but not for the commit lists. Most
commits have a single parent, except merges, which are however extremely
rare with more than two parents too.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
@ 2005-05-15 11:52 Adam J. Richter
  2005-05-15 14:23 ` Petr Baudis
  0 siblings, 1 reply; 26+ messages in thread
From: Adam J. Richter @ 2005-05-15 11:52 UTC (permalink / raw)
  To: pasky; +Cc: git, jgarzik, linux-kernel, mercurial, mpm, torvalds

On Sun, 15 May 2005 14:40:42 +0200, Petr Baudis wrote:
>Dear diary, on Sun, May 15, 2005 at 01:22:19PM CEST, I got a letter
>where "Adam J. Richter" <adam@yggdrasil.com> told me that...
[...]
>> 	I don't understand what was wrong with Jeff Garzik's previous
>> suggestion of using http/1.1 pipelining to coalesce the round trips.
>> If you're worried about queuing too many http/1.1 requests, the client
>> could adopt a policy of not having more than a certain number of
>> requests outstanding or perhaps even making a new http connection
>> after a certain number of requests to avoid starving other clients
>> when the number of clients doing one of these transfers exceeds the
>> number of threads that the http server uses.

>The problem is that to fetch a revision tree, you have to

>	send request for commit A
>	receive commit A
>	look at commit A for list of its parents
>	send request for the parents
>	receive the parents
>	look inside for list of its parents
>	...

>(and same for the trees).

	Don't you usually have a list of many files for which you
want to retrieve this information?  I'd imagine that would usually
suffice to fill the pipeline.

                    __     ______________
Adam J. Richter        \ /
adam@yggdrasil.com      | g g d r a s i l

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-12 20:57       ` Matt Mackall
  2005-05-12 21:24         ` Daniel Barkalow
@ 2005-05-15  8:54         ` Petr Baudis
  1 sibling, 0 replies; 26+ messages in thread
From: Petr Baudis @ 2005-05-15  8:54 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel, git, mercurial, Linus Torvalds

Dear diary, on Thu, May 12, 2005 at 10:57:35PM CEST, I got a letter
where Matt Mackall <mpm@selenic.com> told me that...
> Does this need an HTTP request (and round trip) per object? It appears
> to. That's 2200 requests/round trips for my 800 patch benchmark.

Yes it does. On the other side, it needs no server-side CGI. But I guess
it should be pretty easy to write some kind of server-side CGI streamer,
and it would then easily take just a single HTTP request (telling the
server the commit ID and receiving back all the objects).

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-15  0:40       ` Christian Kujau
@ 2005-05-15  8:50         ` Petr Baudis
  2005-05-15 15:12           ` Christian Kujau
  0 siblings, 1 reply; 26+ messages in thread
From: Petr Baudis @ 2005-05-15  8:50 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-kernel

Dear diary, on Sun, May 15, 2005 at 02:40:14AM CEST, I got a letter
where Christian Kujau <evil@g-house.de> told me that...
> Petr Baudis wrote:
> > remote HEAD you want to fetch, and the URL; see
> > Documentation/git-http-pull.txt).
> 
> where did you get this file from?
> 
> % ls Documentation/git-http-pull.txt
> ls: Documentation/git-http-pull.txt: No such file or directory
> 
> % find . -iname "*git*"   <-- returns nothing...

It's in the git-pb and cogito trees. Linus is on holiday. :-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-12 18:23 ` Petr Baudis
  2005-05-12 20:11   ` Matt Mackall
@ 2005-05-15  6:22   ` Ingo Molnar
  1 sibling, 0 replies; 26+ messages in thread
From: Ingo Molnar @ 2005-05-15  6:22 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Matt Mackall, linux-kernel, git, mercurial, Linus Torvalds


* Petr Baudis <pasky@ucw.cz> wrote:

> > Mercurial is also much smarter than rsync at determining what
> > outstanding changesets exist. Here's an empty pull as a demonstration:
> > 
> >  $ time hg merge hg://selenic.com/linux-hg/
> >  retrieving changegroup
> > 
> >  real    0m0.363s
> >  user    0m0.083s
> >  sys     0m0.007s
> > 
> > That's a single http request and a one line response.
> 
> So, what about comparing it with something comparable, say git pull 
> over HTTP? :-)

Matt, did you get around to do such a comparison?

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-12 20:14     ` Petr Baudis
  2005-05-12 20:57       ` Matt Mackall
@ 2005-05-15  0:40       ` Christian Kujau
  2005-05-15  8:50         ` Petr Baudis
  1 sibling, 1 reply; 26+ messages in thread
From: Christian Kujau @ 2005-05-15  0:40 UTC (permalink / raw)
  To: Petr Baudis; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Petr Baudis wrote:
> remote HEAD you want to fetch, and the URL; see
> Documentation/git-http-pull.txt).

where did you get this file from?

% ls Documentation/git-http-pull.txt
ls: Documentation/git-http-pull.txt: No such file or directory

% find . -iname "*git*"   <-- returns nothing...

thanks,
Christian.
- --
BOFH excuse #28:

CPU radiator broken
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFChppu+A7rjkF8z0wRAvpAAKCUYcIWny+/+XcTqZYiAfLtu2Cy0ACfTPwM
5bSWMmrUdVpihsZodRSd0/o=
=jHtB
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-12 21:24         ` Daniel Barkalow
  2005-05-12 22:29           ` Matt Mackall
@ 2005-05-13  5:44           ` Petr Baudis
  1 sibling, 0 replies; 26+ messages in thread
From: Petr Baudis @ 2005-05-13  5:44 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Matt Mackall, linux-kernel, git, mercurial, Linus Torvalds

Dear diary, on Thu, May 12, 2005 at 11:24:27PM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> In the present mainline, you first have to find the head commit you
> want. I have a patch which does this for you over the same
> connection. Starting from that point, it tracks reachability on the
> receiving end, and requests anything it doesn't have.

Could we get the patch, please? :-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-13  2:23                 ` Daniel Barkalow
@ 2005-05-13  2:44                   ` Matt Mackall
  0 siblings, 0 replies; 26+ messages in thread
From: Matt Mackall @ 2005-05-13  2:44 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Petr Baudis, linux-kernel, git, mercurial, Linus Torvalds

On Thu, May 12, 2005 at 10:23:01PM -0400, Daniel Barkalow wrote:
> On Thu, 12 May 2005, Matt Mackall wrote:
> 
> > On Thu, May 12, 2005 at 08:33:56PM -0400, Daniel Barkalow wrote:
> >
> > > Yes, although that also includes pulling the commits, and may be
> > > interleaved with pulling the trees and objects to cover the
> > > latency. (I.e., one round trip gets the new head hash; the second gets
> > > that commit; on the third the tree and the parent(s) can be requested at
> > > once; on the fouth the contents of the tree and the grandparents, at
> > > which point the bandwidth will probably be the limiting factor for the
> > > rest of the operation.)
> > 
> > What if a changeset is smaller than the bandwidth-delay product of
> > your link? As an extreme example, Mercurial is currently at a point
> > where its -entire repo- changegroup (set of all changesets) can be in
> > flight on the wire on a typical link.
> 
> If this is common for the repository in question, then it will be forced
> to wait for the parent to come in, true. If you have a number of merges,
> however, you start using more total bandwidth relative to latency while
> tracking them in parallel.

No, you're missing my point. If you can request all the files in a
changeset in less than a round-trip time, you have a pipeline stall.
Let's say a changeset is 10k and round trip time is 100ms. That means
you'll stall on any pipe with more than 100k/s. You won't know what
changeset to request next as it'll still be in flight.
 
> > > I must be misunderstanding your numbers, because 6 HTTP responses is more
> > > than 1K, ignoring any actual content from the server, and 1K for 800
> > > commits is less than 2 bytes per commit.
> > 
> > 1k of application-level data, sorry. And my whole point is that I
> > don't send those 800 commit identifiers (which are 40 bytes each as
> > hex). I send about 30 or so. It's basically a negotiation to find the
> > earliest commits not known to the client with a minimum of round trips
> > and data exchange.
> 
> Does this rely on the history being entirely linear? I suppose that
> requesting a rev-list from the server (which could have it as a static
> file generated when a new head was pushed) could jumpstart the
> process. The client could request all of the commits it doesn't have in
> rapid succession, and then request trees as the commits started coming
> in. Of course, this would get inefficient if you were, for example,
> pulling a merge with a branch with a long history, since you'd get a ton
> of old mainline (which you already have) interleaved with occasional new
> things.

I don't depend on history being linear (I'm not reinventing CVS here)
and I don't grab a list of all revisions (the point is to be
scalable). In fact, I do something fairly clever, and something I
don't think will work with git, because, yet again, it lacks the
metadata.

> > > I'm also worried about testing on 800 linear commits, since the projects
> > > under consideration tend to have very non-linear histories. 
> > 
> > Not true at all. Dumps from Andrew to Linus via patch bombs will
> > result in runs of hundreds of linear commits on a regular basis.
> > Linear patch series are the preferred way to make changes and series
> > of 30 or 40 small patches are not at all uncommon.
> 
> It has sounded like Andrew had some interest in using git, and a number of
> other developers are using it already. If this becomes still more common,
> it may be the case that, instead of sending patch bombs, Andrew will point
> Linus at authors' original series, in which case the mainline would be
> merges of a hundred linear series of various lengths. I had the
> impression, although I never looked carefully, that this was happening on
> a smaller scale with BK, where work by BK users got included using BK,
> rather than as patches applied out of a bomb.

Andrew already uses git, in a manner much like he used BK. He does a
pull from a repo, generates a patch of that repo vs mainline, and puts
that in -mm. And never passes that stuff on to Linus.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-13  1:11               ` Matt Mackall
@ 2005-05-13  2:23                 ` Daniel Barkalow
  2005-05-13  2:44                   ` Matt Mackall
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Barkalow @ 2005-05-13  2:23 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Petr Baudis, linux-kernel, git, mercurial, Linus Torvalds

On Thu, 12 May 2005, Matt Mackall wrote:

> On Thu, May 12, 2005 at 08:33:56PM -0400, Daniel Barkalow wrote:
>
> > Yes, although that also includes pulling the commits, and may be
> > interleaved with pulling the trees and objects to cover the
> > latency. (I.e., one round trip gets the new head hash; the second gets
> > that commit; on the third the tree and the parent(s) can be requested at
> > once; on the fouth the contents of the tree and the grandparents, at
> > which point the bandwidth will probably be the limiting factor for the
> > rest of the operation.)
> 
> What if a changeset is smaller than the bandwidth-delay product of
> your link? As an extreme example, Mercurial is currently at a point
> where its -entire repo- changegroup (set of all changesets) can be in
> flight on the wire on a typical link.

If this is common for the repository in question, then it will be forced
to wait for the parent to come in, true. If you have a number of merges,
however, you start using more total bandwidth relative to latency while
tracking them in parallel.

> > I must be misunderstanding your numbers, because 6 HTTP responses is more
> > than 1K, ignoring any actual content from the server, and 1K for 800
> > commits is less than 2 bytes per commit.
> 
> 1k of application-level data, sorry. And my whole point is that I
> don't send those 800 commit identifiers (which are 40 bytes each as
> hex). I send about 30 or so. It's basically a negotiation to find the
> earliest commits not known to the client with a minimum of round trips
> and data exchange.

Does this rely on the history being entirely linear? I suppose that
requesting a rev-list from the server (which could have it as a static
file generated when a new head was pushed) could jumpstart the
process. The client could request all of the commits it doesn't have in
rapid succession, and then request trees as the commits started coming
in. Of course, this would get inefficient if you were, for example,
pulling a merge with a branch with a long history, since you'd get a ton
of old mainline (which you already have) interleaved with occasional new
things.

> > I'm also worried about testing on 800 linear commits, since the projects
> > under consideration tend to have very non-linear histories. 
> 
> Not true at all. Dumps from Andrew to Linus via patch bombs will
> result in runs of hundreds of linear commits on a regular basis.
> Linear patch series are the preferred way to make changes and series
> of 30 or 40 small patches are not at all uncommon.

It has sounded like Andrew had some interest in using git, and a number of
other developers are using it already. If this becomes still more common,
it may be the case that, instead of sending patch bombs, Andrew will point
Linus at authors' original series, in which case the mainline would be
merges of a hundred linear series of various lengths. I had the
impression, although I never looked carefully, that this was happening on
a smaller scale with BK, where work by BK users got included using BK,
rather than as patches applied out of a bomb.

It certainly makes sense as a design goal to be able to support everything
happening within the system, rather than getting exported and reimported.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-13  0:33             ` Daniel Barkalow
@ 2005-05-13  1:11               ` Matt Mackall
  2005-05-13  2:23                 ` Daniel Barkalow
  0 siblings, 1 reply; 26+ messages in thread
From: Matt Mackall @ 2005-05-13  1:11 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Petr Baudis, linux-kernel, git, mercurial, Linus Torvalds

On Thu, May 12, 2005 at 08:33:56PM -0400, Daniel Barkalow wrote:
> On Thu, 12 May 2005, Matt Mackall wrote:
> 
> > On Thu, May 12, 2005 at 05:24:27PM -0400, Daniel Barkalow wrote:
> > > On Thu, 12 May 2005, Matt Mackall wrote:
> > > 
> > > > Does this need an HTTP request (and round trip) per object? It appears
> > > > to. That's 2200 requests/round trips for my 800 patch benchmark.
> > > 
> > > It requires a request per object, but it should be possible (with
> > > somewhat more complicated code) to overlap them such that it doesn't
> > > require a serial round trip for each. Since the server is sending static
> > > files, the overhead for each should be minimal.
> > 
> > It's not minimal. The size of an HTTP request is often not much
> > different than the size of a compressed file delta.
> 
> I was thinking of server-side processing overhead, not bandwidth. It's
> true that the bandwidth could be noticeable for these small files.
> 
> > All the junk that gets bundled in an http request/response will be
> > similar in size to the stuff in the third column.
> 
> kernel.org seems to send 283-byte responses, to be completely
> precise. This could be cut down substantially if Apache were tweaked a bit
> to skip all the optional headers which are useless or wrong in this
> context. (E.g., that includes sending a content-type of "text/plain" for
> the binary data)
> 
> > Does it do this recursively? Eg, if the server has 800 new linear
> > commits, does the client have to do 800 round trips following parent
> > pointers to find all the new changesets? 
> 
> Yes, although that also includes pulling the commits, and may be
> interleaved with pulling the trees and objects to cover the
> latency. (I.e., one round trip gets the new head hash; the second gets
> that commit; on the third the tree and the parent(s) can be requested at
> once; on the fouth the contents of the tree and the grandparents, at
> which point the bandwidth will probably be the limiting factor for the
> rest of the operation.)

What if a changeset is smaller than the bandwidth-delay product of
your link? As an extreme example, Mercurial is currently at a point
where its -entire repo- changegroup (set of all changesets) can be in
flight on the wire on a typical link.

> > In this case, Mercurial does about 6 round trips, totalling less than
> > 1K, plus one requests that pulls everything.
> 
> I must be misunderstanding your numbers, because 6 HTTP responses is more
> than 1K, ignoring any actual content from the server, and 1K for 800
> commits is less than 2 bytes per commit.

1k of application-level data, sorry. And my whole point is that I
don't send those 800 commit identifiers (which are 40 bytes each as
hex). I send about 30 or so. It's basically a negotiation to find the
earliest commits not known to the client with a minimum of round trips
and data exchange.

> I'm also worried about testing on 800 linear commits, since the projects
> under consideration tend to have very non-linear histories. 

Not true at all. Dumps from Andrew to Linus via patch bombs will
result in runs of hundreds of linear commits on a regular basis.
Linear patch series are the preferred way to make changes and series
of 30 or 40 small patches are not at all uncommon.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-12 22:29           ` Matt Mackall
@ 2005-05-13  0:33             ` Daniel Barkalow
  2005-05-13  1:11               ` Matt Mackall
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Barkalow @ 2005-05-13  0:33 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Petr Baudis, linux-kernel, git, mercurial, Linus Torvalds

On Thu, 12 May 2005, Matt Mackall wrote:

> On Thu, May 12, 2005 at 05:24:27PM -0400, Daniel Barkalow wrote:
> > On Thu, 12 May 2005, Matt Mackall wrote:
> > 
> > > Does this need an HTTP request (and round trip) per object? It appears
> > > to. That's 2200 requests/round trips for my 800 patch benchmark.
> > 
> > It requires a request per object, but it should be possible (with
> > somewhat more complicated code) to overlap them such that it doesn't
> > require a serial round trip for each. Since the server is sending static
> > files, the overhead for each should be minimal.
> 
> It's not minimal. The size of an HTTP request is often not much
> different than the size of a compressed file delta.

I was thinking of server-side processing overhead, not bandwidth. It's
true that the bandwidth could be noticeable for these small files.

> All the junk that gets bundled in an http request/response will be
> similar in size to the stuff in the third column.

kernel.org seems to send 283-byte responses, to be completely
precise. This could be cut down substantially if Apache were tweaked a bit
to skip all the optional headers which are useless or wrong in this
context. (E.g., that includes sending a content-type of "text/plain" for
the binary data)

> Does it do this recursively? Eg, if the server has 800 new linear
> commits, does the client have to do 800 round trips following parent
> pointers to find all the new changesets? 

Yes, although that also includes pulling the commits, and may be
interleaved with pulling the trees and objects to cover the
latency. (I.e., one round trip gets the new head hash; the second gets
that commit; on the third the tree and the parent(s) can be requested at
once; on the fouth the contents of the tree and the grandparents, at
which point the bandwidth will probably be the limiting factor for the
rest of the operation.)

> In this case, Mercurial does about 6 round trips, totalling less than
> 1K, plus one requests that pulls everything.

I must be misunderstanding your numbers, because 6 HTTP responses is more
than 1K, ignoring any actual content from the server, and 1K for 800
commits is less than 2 bytes per commit.

I'm also worried about testing on 800 linear commits, since the projects
under consideration tend to have very non-linear histories. 

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-12 21:24         ` Daniel Barkalow
@ 2005-05-12 22:29           ` Matt Mackall
  2005-05-13  0:33             ` Daniel Barkalow
  2005-05-13  5:44           ` Petr Baudis
  1 sibling, 1 reply; 26+ messages in thread
From: Matt Mackall @ 2005-05-12 22:29 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Petr Baudis, linux-kernel, git, mercurial, Linus Torvalds

On Thu, May 12, 2005 at 05:24:27PM -0400, Daniel Barkalow wrote:
> On Thu, 12 May 2005, Matt Mackall wrote:
> 
> > Does this need an HTTP request (and round trip) per object? It appears
> > to. That's 2200 requests/round trips for my 800 patch benchmark.
> 
> It requires a request per object, but it should be possible (with
> somewhat more complicated code) to overlap them such that it doesn't
> require a serial round trip for each. Since the server is sending static
> files, the overhead for each should be minimal.

It's not minimal. The size of an HTTP request is often not much
different than the size of a compressed file delta. Here's one of the
indexes from a file in an hg repo:

   rev    offset  length  base linkrev p1           p2           nodeid
     0         0    2307     0       0 0000000000.. 0000000000.. b6444347c6..
     1      2307      77     0       5 b6444347c6.. 0000000000.. 06763db6de..
     2      2384     225     0      11 06763db6de.. 0000000000.. acc8e2b2f0..
     3      2609      40     0      16 acc8e2b2f0.. 0000000000.. 461b079d98..
     4      2649     261     0      17 461b079d98.. 0000000000.. 8507ba44cc..
     5      2910     486     0      18 8507ba44cc.. 0000000000.. b68523252b..
     6      3396      98     0      21 b68523252b.. 0000000000.. b3f2586243..
     7      3494     238     0      22 b3f2586243.. 0000000000.. d73d0f8ee9..
     8      3732      39     0      23 d73d0f8ee9.. 0000000000.. caaf506196..
     9      3771     266     0      24 caaf506196.. 0000000000.. 54485fc96f..
    10      4037      81     0      29 54485fc96f.. 0000000000.. b9eae7b990..
    11      4118     310     0      31 b9eae7b990.. 0000000000.. a9926b092a..
    12      4428     545     0      33 a9926b092a.. 0000000000.. f26c600172..
    13      4973     419     0      34 f26c600172.. 0000000000.. ec4ab0acb7..
    14      5392     136     0      38 ec4ab0acb7.. 0000000000.. eb5f3f76c8..
    15      5528     161     0      39 eb5f3f76c8.. 0000000000.. 4fc5f3a3ae..
    16      5689     258     0      46 4fc5f3a3ae.. 0000000000.. 3ad83891fb..
    17      5947     171     0      49 3ad83891fb.. 0000000000.. 3983ac6cd2..
    18      6118     195     0      50 3983ac6cd2.. 0000000000.. f138865e04..
    19      6313      79     0      52 f138865e04.. 0000000000.. 3566c1f449..
    20      6392      85     0      53 3566c1f449.. 0000000000.. 0694a4e3eb..
    21      6477      91     0      54 0694a4e3eb.. 0000000000.. 5f98ae7426..
    22      6568     208     0      56 5f98ae7426.. 0000000000.. dae5cb80db..
    23      6776     286     0      62 dae5cb80db.. 0000000000.. 90ff243869..

All the junk that gets bundled in an http request/response will be
similar in size to the stuff in the third column.

Relative to the 10-20x overhead of not sending deltas, yes, it's only 10%.
 
> > How does git find the outstanding changesets?
> 
> In the present mainline, you first have to find the head commit you
> want. I have a patch which does this for you over the same
> connection. Starting from that point, it tracks reachability on the
> receiving end, and requests anything it doesn't have.

Does it do this recursively? Eg, if the server has 800 new linear
commits, does the client have to do 800 round trips following parent
pointers to find all the new changesets? In this case, Mercurial does
about 6 round trips, totalling less than 1K, plus one requests
that pulls everything.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-12 20:57       ` Matt Mackall
@ 2005-05-12 21:24         ` Daniel Barkalow
  2005-05-12 22:29           ` Matt Mackall
  2005-05-13  5:44           ` Petr Baudis
  2005-05-15  8:54         ` Petr Baudis
  1 sibling, 2 replies; 26+ messages in thread
From: Daniel Barkalow @ 2005-05-12 21:24 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Petr Baudis, linux-kernel, git, mercurial, Linus Torvalds

On Thu, 12 May 2005, Matt Mackall wrote:

> Does this need an HTTP request (and round trip) per object? It appears
> to. That's 2200 requests/round trips for my 800 patch benchmark.

It requires a request per object, but it should be possible (with
somewhat more complicated code) to overlap them such that it doesn't
require a serial round trip for each. Since the server is sending static
files, the overhead for each should be minimal.

> How does git find the outstanding changesets?

In the present mainline, you first have to find the head commit you
want. I have a patch which does this for you over the same
connection. Starting from that point, it tracks reachability on the
receiving end, and requests anything it doesn't have.

For the case of having nothing to do, it should be a single one-line
request/response for a static file (after which the local end determines
that it has everything it needs without talking to the server).

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-12 20:14     ` Petr Baudis
@ 2005-05-12 20:57       ` Matt Mackall
  2005-05-12 21:24         ` Daniel Barkalow
  2005-05-15  8:54         ` Petr Baudis
  2005-05-15  0:40       ` Christian Kujau
  1 sibling, 2 replies; 26+ messages in thread
From: Matt Mackall @ 2005-05-12 20:57 UTC (permalink / raw)
  To: Petr Baudis; +Cc: linux-kernel, git, mercurial, Linus Torvalds

On Thu, May 12, 2005 at 10:14:06PM +0200, Petr Baudis wrote:
> Dear diary, on Thu, May 12, 2005 at 10:11:16PM CEST, I got a letter
> where Matt Mackall <mpm@selenic.com> told me that...
> > On Thu, May 12, 2005 at 08:23:41PM +0200, Petr Baudis wrote:
> > > Dear diary, on Thu, May 12, 2005 at 11:44:06AM CEST, I got a letter
> > > where Matt Mackall <mpm@selenic.com> told me that...
> > > > Mercurial is more than 10 times as bandwidth efficient and
> > > > considerably more I/O efficient. On the server side, rsync uses about
> > > > twice as much CPU time as the Mercurial server and has about 10 times
> > > > the I/O and pagecache footprint as well.
> > > > 
> > > > Mercurial is also much smarter than rsync at determining what
> > > > outstanding changesets exist. Here's an empty pull as a demonstration:
> > > > 
> > > >  $ time hg merge hg://selenic.com/linux-hg/
> > > >  retrieving changegroup
> > > > 
> > > >  real    0m0.363s
> > > >  user    0m0.083s
> > > >  sys     0m0.007s
> > > > 
> > > > That's a single http request and a one line response.
> > > 
> > > So, what about comparing it with something comparable, say git pull over
> > > HTTP? :-)
> > 
> > ..because I get a headache every time I try to figure out how to use git? :-P
> > 
> > Seriously, have a pointer to how this works?
> 
> Either you use cogito and just pass cg-clone an HTTP URL (to the git
> repository as in the case of rsync -
> http://www.kernel.org/pub/scm/cogito/cogito.git should work), or you
> invoke git-http-pull directly (passing it desired commit ID of the
> remote HEAD you want to fetch, and the URL; see
> Documentation/git-http-pull.txt).

Does this need an HTTP request (and round trip) per object? It appears
to. That's 2200 requests/round trips for my 800 patch benchmark.

How does git find the outstanding changesets?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-12 20:11   ` Matt Mackall
@ 2005-05-12 20:14     ` Petr Baudis
  2005-05-12 20:57       ` Matt Mackall
  2005-05-15  0:40       ` Christian Kujau
  0 siblings, 2 replies; 26+ messages in thread
From: Petr Baudis @ 2005-05-12 20:14 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel, git, mercurial, Linus Torvalds

Dear diary, on Thu, May 12, 2005 at 10:11:16PM CEST, I got a letter
where Matt Mackall <mpm@selenic.com> told me that...
> On Thu, May 12, 2005 at 08:23:41PM +0200, Petr Baudis wrote:
> > Dear diary, on Thu, May 12, 2005 at 11:44:06AM CEST, I got a letter
> > where Matt Mackall <mpm@selenic.com> told me that...
> > > Mercurial is more than 10 times as bandwidth efficient and
> > > considerably more I/O efficient. On the server side, rsync uses about
> > > twice as much CPU time as the Mercurial server and has about 10 times
> > > the I/O and pagecache footprint as well.
> > > 
> > > Mercurial is also much smarter than rsync at determining what
> > > outstanding changesets exist. Here's an empty pull as a demonstration:
> > > 
> > >  $ time hg merge hg://selenic.com/linux-hg/
> > >  retrieving changegroup
> > > 
> > >  real    0m0.363s
> > >  user    0m0.083s
> > >  sys     0m0.007s
> > > 
> > > That's a single http request and a one line response.
> > 
> > So, what about comparing it with something comparable, say git pull over
> > HTTP? :-)
> 
> ..because I get a headache every time I try to figure out how to use git? :-P
> 
> Seriously, have a pointer to how this works?

Either you use cogito and just pass cg-clone an HTTP URL (to the git
repository as in the case of rsync -
http://www.kernel.org/pub/scm/cogito/cogito.git should work), or you
invoke git-http-pull directly (passing it desired commit ID of the
remote HEAD you want to fetch, and the URL; see
Documentation/git-http-pull.txt).

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-12 18:23 ` Petr Baudis
@ 2005-05-12 20:11   ` Matt Mackall
  2005-05-12 20:14     ` Petr Baudis
  2005-05-15  6:22   ` Ingo Molnar
  1 sibling, 1 reply; 26+ messages in thread
From: Matt Mackall @ 2005-05-12 20:11 UTC (permalink / raw)
  To: Petr Baudis; +Cc: linux-kernel, git, mercurial, Linus Torvalds

On Thu, May 12, 2005 at 08:23:41PM +0200, Petr Baudis wrote:
> Dear diary, on Thu, May 12, 2005 at 11:44:06AM CEST, I got a letter
> where Matt Mackall <mpm@selenic.com> told me that...
> > Mercurial is more than 10 times as bandwidth efficient and
> > considerably more I/O efficient. On the server side, rsync uses about
> > twice as much CPU time as the Mercurial server and has about 10 times
> > the I/O and pagecache footprint as well.
> > 
> > Mercurial is also much smarter than rsync at determining what
> > outstanding changesets exist. Here's an empty pull as a demonstration:
> > 
> >  $ time hg merge hg://selenic.com/linux-hg/
> >  retrieving changegroup
> > 
> >  real    0m0.363s
> >  user    0m0.083s
> >  sys     0m0.007s
> > 
> > That's a single http request and a one line response.
> 
> So, what about comparing it with something comparable, say git pull over
> HTTP? :-)

..because I get a headache every time I try to figure out how to use git? :-P

Seriously, have a pointer to how this works?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mercurial 0.4e vs git network pull
  2005-05-12  9:44 Matt Mackall
@ 2005-05-12 18:23 ` Petr Baudis
  2005-05-12 20:11   ` Matt Mackall
  2005-05-15  6:22   ` Ingo Molnar
  0 siblings, 2 replies; 26+ messages in thread
From: Petr Baudis @ 2005-05-12 18:23 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel, git, mercurial, Linus Torvalds

Dear diary, on Thu, May 12, 2005 at 11:44:06AM CEST, I got a letter
where Matt Mackall <mpm@selenic.com> told me that...
> Mercurial is more than 10 times as bandwidth efficient and
> considerably more I/O efficient. On the server side, rsync uses about
> twice as much CPU time as the Mercurial server and has about 10 times
> the I/O and pagecache footprint as well.
> 
> Mercurial is also much smarter than rsync at determining what
> outstanding changesets exist. Here's an empty pull as a demonstration:
> 
>  $ time hg merge hg://selenic.com/linux-hg/
>  retrieving changegroup
> 
>  real    0m0.363s
>  user    0m0.083s
>  sys     0m0.007s
> 
> That's a single http request and a one line response.

So, what about comparing it with something comparable, say git pull over
HTTP? :-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Mercurial 0.4e vs git network pull
@ 2005-05-12  9:44 Matt Mackall
  2005-05-12 18:23 ` Petr Baudis
  0 siblings, 1 reply; 26+ messages in thread
From: Matt Mackall @ 2005-05-12  9:44 UTC (permalink / raw)
  To: linux-kernel, git, mercurial, Linus Torvalds

Now that I'm back from vacation, there's a new Mercurial release as
well as snapshots at:

  http://selenic.com/mercurial/

A combined self-hosting repository / web interface can be found at:

  http://selenic.com/hg/

And there's now a mailing list at:

  http://selenic.com/mailman/listinfo/mercurial

The big news is that Mercurial now has a very fast network protocol.
This benchmark is pulling and merging 819 changesets (again, taken
from 2.6.12-rc2-mm3) from one repo to another over DSL using
Mercurial's new delta protocol:

 $ time hg merge hg://selenic.com/linux-hg/
 retrieving changegroup
 merging changesets
 merging manifests
 merging files

 real    0m10.276s
 user    0m3.299s
 sys     0m0.689s

For comparison, rsyncing the same set of changes between git repos from
the same server:

 $ time rsync -a rsync://10.0.0.12:2000/git/lgb/.git .
 sent 171508 bytes  received 31225542 bytes  312408.46 bytes/sec

 real    1m40.470s
 user    0m0.655s
 sys     0m1.896s

The original broken-out.tar.bz2: 2.3M
The same, uncompressed:           15M
The same, rsynced with git:       30M
The same, pulled with hg (zlib): 2.5M  <- what I used above
The same, pulled with hg (bz2):  2.1M

The server in question is a relatively busy 1GHz Athlon. The server
side of the hg protocol is stateless and is serviced by a simple CGI
script run under Apache.

Mercurial is more than 10 times as bandwidth efficient and
considerably more I/O efficient. On the server side, rsync uses about
twice as much CPU time as the Mercurial server and has about 10 times
the I/O and pagecache footprint as well.

Mercurial is also much smarter than rsync at determining what
outstanding changesets exist. Here's an empty pull as a demonstration:

 $ time hg merge hg://selenic.com/linux-hg/
 retrieving changegroup

 real    0m0.363s
 user    0m0.083s
 sys     0m0.007s

That's a single http request and a one line response.

And now with rsync:

 $ time rsync -av rsync://10.0.0.12:2000/git/lgb/.git .
 receiving file list ... done

 sent 76 bytes  received 1280245 bytes  2560642.00 bytes/sec
 total size is 85993841  speedup is 67.17

 real    0m0.539s
 user    0m0.185s
 sys     0m0.148s

Mercurial's communication here scales O(min(changed branches, log new
changesets)) which is less than O(new changesets), while rsync scales
with O(total number of file revisions) (ouch!). The above transfer
size for an empty pull will go from 1.2M to >12M when there's similar
history in git to what's in BK.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2005-05-16 22:28 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-05-15 11:22 Mercurial 0.4e vs git network pull Adam J. Richter
2005-05-15 12:40 ` Petr Baudis
2005-05-16 22:22   ` Tristan Wibberley
2005-05-15 17:39 ` Matt Mackall
2005-05-15 18:23   ` Jeff Garzik
2005-05-16  1:12     ` Matt Mackall
2005-05-16  9:29 ` Matthias Urlichs
  -- strict thread matches above, loose matches on Subject: below --
2005-05-15 11:52 Adam J. Richter
2005-05-15 14:23 ` Petr Baudis
2005-05-12  9:44 Matt Mackall
2005-05-12 18:23 ` Petr Baudis
2005-05-12 20:11   ` Matt Mackall
2005-05-12 20:14     ` Petr Baudis
2005-05-12 20:57       ` Matt Mackall
2005-05-12 21:24         ` Daniel Barkalow
2005-05-12 22:29           ` Matt Mackall
2005-05-13  0:33             ` Daniel Barkalow
2005-05-13  1:11               ` Matt Mackall
2005-05-13  2:23                 ` Daniel Barkalow
2005-05-13  2:44                   ` Matt Mackall
2005-05-13  5:44           ` Petr Baudis
2005-05-15  8:54         ` Petr Baudis
2005-05-15  0:40       ` Christian Kujau
2005-05-15  8:50         ` Petr Baudis
2005-05-15 15:12           ` Christian Kujau
2005-05-15  6:22   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).