From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261206AbVEBWsJ (ORCPT ); Mon, 2 May 2005 18:48:09 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261203AbVEBWsJ (ORCPT ); Mon, 2 May 2005 18:48:09 -0400 Received: from fire.osdl.org ([65.172.181.4]:38788 "EHLO smtp.osdl.org") by vger.kernel.org with ESMTP id S261194AbVEBWr4 (ORCPT ); Mon, 2 May 2005 18:47:56 -0400 Date: Mon, 2 May 2005 15:49:49 -0700 (PDT) From: Linus Torvalds To: Matt Mackall cc: Bill Davidsen , Morten Welinder , Sean , linux-kernel , git@vger.kernel.org Subject: Re: Mercurial 0.4b vs git patchbomb benchmark In-Reply-To: <20050502223002.GP21897@waste.org> Message-ID: References: <20050429165232.GV21897@waste.org> <427650E7.2000802@tmr.com> <20050502223002.GP21897@waste.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2 May 2005, Matt Mackall wrote: > > > > - you can share the objects freely between different trees, never > > worrying about one tree corrupting another trees object by mistake. > > Not sure if this is terribly useful. It just makes it harder to pull > the subset you're interested in. You don't have to share things in a single subdirectory. Symlinks and hardlinks work fine, as do actual filesystem tricks ;) > > - you can drop old objects. > > You can't drop old objects without dropping all the changesets that > refer to them or otherwise being prepared to deal with the broken > links. Absolutely. This needs support from fsck to allow us to say "commit xxxx is no longer in the tree, because we pruned it". Alternatively (and that's the much less intrusive one), you keep all the commit objects, but drop the tree and blob objects. Again, all you need for this to work is just feed a list of commits to fsck, and tell it "we've pruned those from the tree", which tells fsck not to start looking for the contents of those commits. So for example, you can trivially have something that automates this: take each commit that is older than days, add it to the "prune list", and run fsck, and delete all objects that now show up as being unreachable (since fsck won't be looking at what those commits reference). I could write this up in ten minutes. It's really simple. And it's simple _exactly_ because we don't do deltas. > > delta models very fundamentally don't support this. > > The latter can be done in a pretty straightforward manner in mercurial > with one pass over the data. But I have a goal to make keeping the > whole history cheap enough that no one balks at it. With delta's, you have two choices: - change all the sha1 names (ie a pruned tree would no longer be compatible with a non-pruned one) - make the delta part not show up as part of the sha1 name (which means that it's unprotected). which one would you have? > What is a tree re-linker? Finds duplicate files and hard-links them? > Ok, that makes some sense. But it's a win on one machine and a lose > everywhere else. Where would it be a loss? Esepcially since with git, it's cheap (you don't need to compare content to find objects to link - you can just compare filename listings). > I've added an "hg verify" command to Mercurial. It doesn't attempt to > fix anything up yet, but it can catch a couple things that git > probably can't (like file revisions that aren't owned by any > changeset), namely because there's more metadata around to look at. git-fsck-cache catches exactly those kinds of things. And since it checks pretty much every _single_ assumption in git (which is not a lot, since git doesn't have a lot of assumptions), I guarantee you that you can't find any more than it does (the filename ordering is the big missing piece: I _still_ don't verify that trees are ordered. I've been mentioning it since the beginning, but I'm lazy). In other words, your verifier can't verify anything more. It's entirely possible that more things can go _wrong_, since you have more indexes, so your verifier will have more to check, but that's not an advantage, that's a downside. Linus