linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Matt Mackall <mpm@selenic.com>
Cc: Bill Davidsen <davidsen@tmr.com>,
	Morten Welinder <mwelinder@gmail.com>,
	Sean <seanlkml@sympatico.ca>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	git@vger.kernel.org
Subject: Re: Mercurial 0.4b vs git patchbomb benchmark
Date: Mon, 2 May 2005 15:49:49 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.58.0505021540070.3594@ppc970.osdl.org> (raw)
In-Reply-To: <20050502223002.GP21897@waste.org>



On Mon, 2 May 2005, Matt Mackall wrote:
> > 
> >  - you can share the objects freely between different trees, never 
> >    worrying about one tree corrupting another trees object by mistake.
> 
> Not sure if this is terribly useful. It just makes it harder to pull
> the subset you're interested in.

You don't have to share things in a single subdirectory. Symlinks and 
hardlinks work fine, as do actual filesystem tricks ;)

> >  - you can drop old objects.
> 
> You can't drop old objects without dropping all the changesets that
> refer to them or otherwise being prepared to deal with the broken
> links.

Absolutely. This needs support from fsck to allow us to say "commit xxxx 
is no longer in the tree, because we pruned it".

Alternatively (and that's the much less intrusive one), you keep all the
commit objects, but drop the tree and blob objects. Again, all you need 
for this to work is just feed a list of commits to fsck, and tell it 
"we've pruned those from the tree", which tells fsck not to start looking 
for the contents of those commits.

So for example, you can trivially have something that automates this: take 
each commit that is older than <x> days, add it to the "prune list", and 
run fsck, and delete all objects that now show up as being unreachable 
(since fsck won't be looking at what those commits reference).

I could write this up in ten minutes. It's really simple.

And it's simple _exactly_ because we don't do deltas.

> > delta models very fundamentally don't support this. 
> 
> The latter can be done in a pretty straightforward manner in mercurial
> with one pass over the data. But I have a goal to make keeping the
> whole history cheap enough that no one balks at it.

With delta's, you have two choices:

 - change all the sha1 names (ie a pruned tree would no longer be 
   compatible with a non-pruned one)
 - make the delta part not show up as part of the sha1 name (which means 
   that it's unprotected).

which one would you have?

> What is a tree re-linker? Finds duplicate files and hard-links them?
> Ok, that makes some sense. But it's a win on one machine and a lose
> everywhere else.

Where would it be a loss? Esepcially since with git, it's cheap (you don't 
need to compare content to find objects to link - you can just compare 
filename listings).

> I've added an "hg verify" command to Mercurial. It doesn't attempt to
> fix anything up yet, but it can catch a couple things that git
> probably can't (like file revisions that aren't owned by any
> changeset), namely because there's more metadata around to look at.

git-fsck-cache catches exactly those kinds of things. And since it checks
pretty much every _single_ assumption in git (which is not a lot, since
git doesn't have a lot of assumptions), I guarantee you that you can't
find any more than it does (the filename ordering is the big missing
piece: I _still_ don't verify that trees are ordered. I've been mentioning
it since the beginning, but I'm lazy).

In other words, your verifier can't verify anything more. It's entirely 
possible that more things can go _wrong_, since you have more indexes, so 
your verifier will have more to check, but that's not an advantage, that's 
a downside.

		Linus

  reply	other threads:[~2005-05-02 22:48 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-26  0:41 Mercurial 0.3 vs git benchmarks Matt Mackall
2005-04-26  1:49 ` Daniel Phillips
2005-04-26  2:08 ` Linus Torvalds
2005-04-26  2:30   ` Mike Taht
2005-04-26  3:04     ` Linus Torvalds
2005-04-26  4:00       ` Linus Torvalds
2005-04-26 11:13         ` Chris Mason
2005-04-26 15:09           ` Magnus Damm
2005-04-26 15:38             ` Chris Mason
2005-04-26 16:23               ` Magnus Damm
2005-04-26 18:18                 ` Chris Mason
2005-04-26 20:56                 ` Andrew Morton
2005-04-26 21:07                   ` Linus Torvalds
2005-04-26 22:50                     ` H. Peter Anvin
2005-04-26 22:56                     ` Andrew Morton
2005-04-26 23:43                       ` H. Peter Anvin
2005-04-27 15:01                         ` Florian Weimer
2005-04-27 15:13                           ` Thomas Glanzmann
2005-04-27 18:54                             ` H. Peter Anvin
2005-04-27 19:01                               ` Thomas Glanzmann
2005-04-27 19:57                                 ` Theodore Ts'o
2005-04-27 20:06                                   ` Thomas Glanzmann
2005-04-27 20:35                                 ` H. Peter Anvin
2005-04-27 20:39                                   ` Thomas Glanzmann
2005-04-27 20:47                                   ` Florian Weimer
2005-04-27 20:55                                 ` Florian Weimer
2005-04-27 21:04                                   ` H. Peter Anvin
2005-04-27 21:06                                     ` Florian Weimer
2005-04-27 21:32                                       ` Theodore Ts'o
2005-04-27 19:55                       ` Theodore Ts'o
2005-04-27  6:34                   ` Ingo Molnar
2005-04-27 21:10                     ` Bill Davidsen
2005-04-27 21:39                       ` Linus Torvalds
2005-04-26 16:42           ` Linus Torvalds
2005-04-26 17:39             ` Chris Mason
2005-04-26 19:52               ` Chris Mason
2005-04-26 18:15         ` H. Peter Anvin
2005-04-26 20:30           ` Bill Davidsen
2005-04-26 16:11       ` Bill Davidsen
2005-04-26  4:01   ` Matt Mackall
2005-04-26  4:20     ` Linus Torvalds
2005-04-26  4:09   ` Chris Wedgwood
2005-04-26  4:22     ` Andreas Gal
2005-04-26  4:22     ` Linus Torvalds
2005-04-29  6:01   ` Mercurial 0.4b vs git patchbomb benchmark Matt Mackall
2005-04-29  6:40     ` Sean
2005-04-29  7:40       ` Matt Mackall
2005-04-29  8:40         ` Sean
2005-04-29 14:34         ` Linus Torvalds
2005-04-29 15:18           ` Morten Welinder
2005-04-29 16:52             ` Matt Mackall
2005-05-02 16:10               ` Bill Davidsen
2005-05-02 19:02                 ` Sean
2005-05-02 22:02                 ` Linus Torvalds
2005-05-02 22:30                   ` Matt Mackall
2005-05-02 22:49                     ` Linus Torvalds [this message]
2005-05-03  0:00                       ` Matt Mackall
2005-05-03  2:48                         ` Linus Torvalds
2005-05-03  3:29                           ` Matt Mackall
2005-05-03  4:18                             ` Linus Torvalds
2005-05-03  4:24                         ` Linus Torvalds
2005-05-03  4:27                           ` Matt Mackall
2005-05-03  8:45                           ` Chris Wedgwood
2005-04-29 15:44           ` Tom Lord
2005-04-29 15:58             ` Linus Torvalds
2005-04-29 17:34               ` Tom Lord
2005-04-29 17:56                 ` Linus Torvalds
2005-04-29 18:08                   ` Tom Lord
2005-04-29 18:33                     ` Sean
2005-04-29 18:54                       ` Tom Lord
2005-04-29 19:13                         ` Sean
2005-05-02 16:15                           ` Bill Davidsen
2005-04-29 16:37           ` Matt Mackall
2005-04-29 17:09             ` Linus Torvalds
2005-04-29 19:12               ` Matt Mackall
2005-04-29 19:50                 ` Linus Torvalds
2005-04-29 20:23                   ` Matt Mackall
2005-04-29 20:49                     ` Linus Torvalds
2005-04-29 21:20                       ` Matt Mackall
2005-04-29 16:46           ` Bill Davidsen
2005-04-29 20:19       ` Andrea Arcangeli
2005-04-29 22:30         ` Olivier Galibert
2005-04-29 22:47           ` Andrea Arcangeli
2005-04-29 20:30     ` Andrea Arcangeli
2005-04-29 20:39       ` Matt Mackall
2005-04-30  2:52         ` Andrea Arcangeli
2005-04-30 15:20           ` Matt Mackall
2005-04-30 16:37             ` Andrea Arcangeli
2005-05-02 15:49           ` Bill Davidsen
2005-05-02 16:14             ` Valdis.Kletnieks
2005-05-03 17:40               ` Bill Davidsen
2005-05-04  2:10                 ` Mercurial 0.4b vs git patchbomb benchmark (/usr/bin/env again) David A. Wheeler
2005-05-02 16:17             ` Mercurial 0.4b vs git patchbomb benchmark Andrea Arcangeli
2005-05-02 16:31             ` Linus Torvalds
2005-05-02 17:18               ` Daniel Jacobowitz
2005-05-02 17:32                 ` Linus Torvalds
2005-05-02 20:54                 ` Sam Ravnborg
2005-05-02 17:20               ` Ryan Anderson
2005-05-02 17:31                 ` Linus Torvalds
2005-05-02 21:17               ` Kyle Moffett
2005-05-03 17:43               ` Bill Davidsen
     [not found] <3YQn9-8qX-5@gated-at.bofh.it>
     [not found] ` <3ZLEF-56n-1@gated-at.bofh.it>
     [not found]   ` <3ZM7L-5ot-13@gated-at.bofh.it>
     [not found]     ` <3ZN3P-69A-9@gated-at.bofh.it>
     [not found]       ` <3ZNdz-6gK-9@gated-at.bofh.it>
2005-05-03  1:16         ` Bodo Eggert <harvested.in.lkml@posting.7eggert.dyndns.org>
2005-05-03  1:29           ` Matt Mackall
2005-05-03 16:22             ` Bill Davidsen
2005-05-03 17:14               ` Rene Scharfe
2005-05-04 17:51                 ` Bill Davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0505021540070.3594@ppc970.osdl.org \
    --to=torvalds@osdl.org \
    --cc=davidsen@tmr.com \
    --cc=git@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpm@selenic.com \
    --cc=mwelinder@gmail.com \
    --cc=seanlkml@sympatico.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).