linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matt Mackall <mpm@selenic.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>, git@vger.kernel.org
Subject: Mercurial 0.4b vs git patchbomb benchmark
Date: Thu, 28 Apr 2005 23:01:57 -0700	[thread overview]
Message-ID: <20050429060157.GS21897@waste.org> (raw)
In-Reply-To: <Pine.LNX.4.58.0504251859550.18901@ppc970.osdl.org>

On Mon, Apr 25, 2005 at 07:08:28PM -0700, Linus Torvalds wrote:
> 
> To make an interesting benchmark, try applying the first 200 patches in 
> the current git kernel archive. Can you do them three per second? THAT is 
> the thing you should optimize for, not checking in huge changes.

Ok, I've optimized for it a bit. This is basically:

 hg import -p1 -b ../broken-out `cat ../broken-out | grep -v #`

 ( latest code is at: http://selenic.com/mercurial/ )

My benchmark is to apply all 819 patches from -mm3 to 2.6.12-rc:

hg:

real    3m22.075s
user    1m57.195s
sys     0m14.068s

819/(60+57.195 + 14.068) = 6.239 patches/second  user+sys
repository: before 167M after 173M (3.5% growth)

git:

real    2m58.568s
user    1m11.196s
sys     0m50.144s

819/(60+11.196+50.144) = 6.750 patches/second  user+sys
repository: before 102M after 154M (51% growth)

Again, pretty close, time-wise. My code is actually spending a fair
amount of time doing delta compression in Python, which accounts for
most of the extra user time. So I think I can optimize most of that
away at some point. Interestingly hg is also using substantially less
system time.

What I'd like to highlight here is that git's repo is growing more
than 10 times faster. 52 megs is twice the size of a full kernel
tarball. And that's going to be the bottleneck for network pull
operations.

The fundamental problem I see with git is that the back-end has no
concept of the relation between files. This data is only present in
change nodes so you've got to potentially traverse all the commits to
reconstruct a file's history. That's gonna be O(top-level changes)
seeks. This introduces a number of problems:

- no way to easily find previous revisions of a file
  (being able to see when a particular change was introduced is a
  pretty critical feature)
- no way to do bandwidth-efficient delta transfer
- no way to do efficient delta storage
- no way to do merges based on the file's history[1]

Mercurial can grab look up and grab revisions of a file in O(1)
time/seeks. I haven't implemented annotate yet, but it can also be
done O(1) or O(file revisions).


[1] This last one is interesting. If we've got a repository with files A
and B:

M   M1   M2

AB
 |`-------v     M2 clones M
aB       AB     file A is change in mainline
 |`---v  AB'    file B is changed in M2
 |   aB / |     M1 clones M
 |   ab/  |     M1 changes B
 |   ab'  |     M1 merges from M2, changes to B conflict
 |    |  A'B'   M2 changes A
  `---+--.|
      |  a'B'   M2 merges from mainline, changes to A conflict
      `--.|
         ???    depending on which ancestor we choose, we will have
	        to redo A hand-merge, B hand-merge, or both
                but if we look at the files independently, everything
		is fine

-- 
Mathematics is the supreme nostalgia of our time.

  parent reply	other threads:[~2005-04-29  6:02 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-26  0:41 Mercurial 0.3 vs git benchmarks Matt Mackall
2005-04-26  1:49 ` Daniel Phillips
2005-04-26  2:08 ` Linus Torvalds
2005-04-26  2:30   ` Mike Taht
2005-04-26  3:04     ` Linus Torvalds
2005-04-26  4:00       ` Linus Torvalds
2005-04-26 11:13         ` Chris Mason
2005-04-26 15:09           ` Magnus Damm
2005-04-26 15:38             ` Chris Mason
2005-04-26 16:23               ` Magnus Damm
2005-04-26 18:18                 ` Chris Mason
2005-04-26 20:56                 ` Andrew Morton
2005-04-26 21:07                   ` Linus Torvalds
2005-04-26 22:50                     ` H. Peter Anvin
2005-04-26 22:56                     ` Andrew Morton
2005-04-26 23:43                       ` H. Peter Anvin
2005-04-27 15:01                         ` Florian Weimer
2005-04-27 15:13                           ` Thomas Glanzmann
2005-04-27 18:54                             ` H. Peter Anvin
2005-04-27 19:01                               ` Thomas Glanzmann
2005-04-27 19:57                                 ` Theodore Ts'o
2005-04-27 20:06                                   ` Thomas Glanzmann
2005-04-27 20:35                                 ` H. Peter Anvin
2005-04-27 20:39                                   ` Thomas Glanzmann
2005-04-27 20:47                                   ` Florian Weimer
2005-04-27 20:55                                 ` Florian Weimer
2005-04-27 21:04                                   ` H. Peter Anvin
2005-04-27 21:06                                     ` Florian Weimer
2005-04-27 21:32                                       ` Theodore Ts'o
2005-04-27 19:55                       ` Theodore Ts'o
2005-04-27  6:34                   ` Ingo Molnar
2005-04-27 21:10                     ` Bill Davidsen
2005-04-27 21:39                       ` Linus Torvalds
2005-04-26 16:42           ` Linus Torvalds
2005-04-26 17:39             ` Chris Mason
2005-04-26 19:52               ` Chris Mason
2005-04-26 18:15         ` H. Peter Anvin
2005-04-26 20:30           ` Bill Davidsen
2005-04-26 16:11       ` Bill Davidsen
2005-04-26  4:01   ` Matt Mackall
2005-04-26  4:20     ` Linus Torvalds
2005-04-26  4:09   ` Chris Wedgwood
2005-04-26  4:22     ` Andreas Gal
2005-04-26  4:22     ` Linus Torvalds
2005-04-29  6:01   ` Matt Mackall [this message]
2005-04-29  6:40     ` Mercurial 0.4b vs git patchbomb benchmark Sean
2005-04-29  7:40       ` Matt Mackall
2005-04-29  8:40         ` Sean
2005-04-29 14:34         ` Linus Torvalds
2005-04-29 15:18           ` Morten Welinder
2005-04-29 16:52             ` Matt Mackall
2005-05-02 16:10               ` Bill Davidsen
2005-05-02 19:02                 ` Sean
2005-05-02 22:02                 ` Linus Torvalds
2005-05-02 22:30                   ` Matt Mackall
2005-05-02 22:49                     ` Linus Torvalds
2005-05-03  0:00                       ` Matt Mackall
2005-05-03  2:48                         ` Linus Torvalds
2005-05-03  3:29                           ` Matt Mackall
2005-05-03  4:18                             ` Linus Torvalds
2005-05-03  4:24                         ` Linus Torvalds
2005-05-03  4:27                           ` Matt Mackall
2005-05-03  8:45                           ` Chris Wedgwood
2005-04-29 15:44           ` Tom Lord
2005-04-29 15:58             ` Linus Torvalds
2005-04-29 17:34               ` Tom Lord
2005-04-29 17:56                 ` Linus Torvalds
2005-04-29 18:08                   ` Tom Lord
2005-04-29 18:33                     ` Sean
2005-04-29 18:54                       ` Tom Lord
2005-04-29 19:13                         ` Sean
2005-05-02 16:15                           ` Bill Davidsen
2005-04-29 16:37           ` Matt Mackall
2005-04-29 17:09             ` Linus Torvalds
2005-04-29 19:12               ` Matt Mackall
2005-04-29 19:50                 ` Linus Torvalds
2005-04-29 20:23                   ` Matt Mackall
2005-04-29 20:49                     ` Linus Torvalds
2005-04-29 21:20                       ` Matt Mackall
2005-04-29 16:46           ` Bill Davidsen
2005-04-29 20:19       ` Andrea Arcangeli
2005-04-29 22:30         ` Olivier Galibert
2005-04-29 22:47           ` Andrea Arcangeli
2005-04-29 20:30     ` Andrea Arcangeli
2005-04-29 20:39       ` Matt Mackall
2005-04-30  2:52         ` Andrea Arcangeli
2005-04-30 15:20           ` Matt Mackall
2005-04-30 16:37             ` Andrea Arcangeli
2005-05-02 15:49           ` Bill Davidsen
2005-05-02 16:14             ` Valdis.Kletnieks
2005-05-03 17:40               ` Bill Davidsen
2005-05-04  2:10                 ` Mercurial 0.4b vs git patchbomb benchmark (/usr/bin/env again) David A. Wheeler
2005-05-02 16:17             ` Mercurial 0.4b vs git patchbomb benchmark Andrea Arcangeli
2005-05-02 16:31             ` Linus Torvalds
2005-05-02 17:18               ` Daniel Jacobowitz
2005-05-02 17:32                 ` Linus Torvalds
2005-05-02 20:54                 ` Sam Ravnborg
2005-05-02 17:20               ` Ryan Anderson
2005-05-02 17:31                 ` Linus Torvalds
2005-05-02 21:17               ` Kyle Moffett
2005-05-03 17:43               ` Bill Davidsen
     [not found] <3YQn9-8qX-5@gated-at.bofh.it>
     [not found] ` <3ZLEF-56n-1@gated-at.bofh.it>
     [not found]   ` <3ZM7L-5ot-13@gated-at.bofh.it>
     [not found]     ` <3ZN3P-69A-9@gated-at.bofh.it>
     [not found]       ` <3ZNdz-6gK-9@gated-at.bofh.it>
2005-05-03  1:16         ` Bodo Eggert <harvested.in.lkml@posting.7eggert.dyndns.org>
2005-05-03  1:29           ` Matt Mackall
2005-05-03 16:22             ` Bill Davidsen
2005-05-03 17:14               ` Rene Scharfe
2005-05-04 17:51                 ` Bill Davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050429060157.GS21897@waste.org \
    --to=mpm@selenic.com \
    --cc=git@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).