linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Matt Mackall <mpm@selenic.com>
Cc: Bill Davidsen <davidsen@tmr.com>,
	Morten Welinder <mwelinder@gmail.com>,
	Sean <seanlkml@sympatico.ca>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	git@vger.kernel.org
Subject: Re: Mercurial 0.4b vs git patchbomb benchmark
Date: Mon, 2 May 2005 19:48:29 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.58.0505021932270.3594@ppc970.osdl.org> (raw)
In-Reply-To: <20050503000011.GA22038@waste.org>



On Mon, 2 May 2005, Matt Mackall wrote:
> 
> Umm.. I am _not_ calculating the SHA of the delta itself. That'd be
> silly.

It's not silly.

Meta-data consistency is supremely important. If people can corrupt their 
metadata in strange an unobservable ways, that's almost as bad as 
corrupting the data itself. In fact, to some degree it's worse, since you 
make people trust the thing, but you don't actually guarantee it.

So how _do_ you guarantee consistency of a tree and the history that led 
up to it? 

And by that I don't mean any of the individual blobs - I realize that it's 
perfectly valid to just check out every single version, and have the sha1 
of that. But how do you guarantee that the sha's you check are the sha's 
that you saved in the first place, and somebody didn't replace something 
in the middle?

In other words, you need to hash the metadata too. Otherwise how do you
consistency-check the _collection_ of files?

It's absolutely not enough to just protect single-file content. That 
doesn't help one whit. It's not what a SCM is all about. You have to 
protect the state of _multiple_ files, ie the metadata has to be 
verifiable too.

If that meta-data is the index, then the index needs to be protected by a
SHA1. In git, that's why we don't just sha1 every blob, but every tree and
every commit. That's the thing that gets consistency _beyond_ a single
file.

> As various people have pointed out, you can hack delta transmission
> and file revision indexing on top of git. But to do that, you'll need
> to build the same indices that Mercurial has. And you'll need to check
> their integrity.

No, absolutely not.

Building indeces on top of git would be stupid. You can _cache_ deltas,
but there's a big difference between a index that actually describes how
random blobs go together, and a cache of a delta between two
well-specified end-points. And in particular, there is no "consistency" to
a delta. You don't need it.

Why? Because either the delta is correct, or it isn't. If it's correct,
the end result will be the right sha1. If it's not, the end result will be
something else. So when you do a "pull" from another repository, you can
trivially check whether the delta's you got were valid: did applying them
result in the same sha1 that the other repository had?

So git really validates the _only_ thing that matters: it validates the 
state of the data. It doesn't validate anything else, but if validates 
that one thing very completely indeed.

		Linus

  reply	other threads:[~2005-05-03  2:46 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-26  0:41 Mercurial 0.3 vs git benchmarks Matt Mackall
2005-04-26  1:49 ` Daniel Phillips
2005-04-26  2:08 ` Linus Torvalds
2005-04-26  2:30   ` Mike Taht
2005-04-26  3:04     ` Linus Torvalds
2005-04-26  4:00       ` Linus Torvalds
2005-04-26 11:13         ` Chris Mason
2005-04-26 15:09           ` Magnus Damm
2005-04-26 15:38             ` Chris Mason
2005-04-26 16:23               ` Magnus Damm
2005-04-26 18:18                 ` Chris Mason
2005-04-26 20:56                 ` Andrew Morton
2005-04-26 21:07                   ` Linus Torvalds
2005-04-26 22:50                     ` H. Peter Anvin
2005-04-26 22:56                     ` Andrew Morton
2005-04-26 23:43                       ` H. Peter Anvin
2005-04-27 15:01                         ` Florian Weimer
2005-04-27 15:13                           ` Thomas Glanzmann
2005-04-27 18:54                             ` H. Peter Anvin
2005-04-27 19:01                               ` Thomas Glanzmann
2005-04-27 19:57                                 ` Theodore Ts'o
2005-04-27 20:06                                   ` Thomas Glanzmann
2005-04-27 20:35                                 ` H. Peter Anvin
2005-04-27 20:39                                   ` Thomas Glanzmann
2005-04-27 20:47                                   ` Florian Weimer
2005-04-27 20:55                                 ` Florian Weimer
2005-04-27 21:04                                   ` H. Peter Anvin
2005-04-27 21:06                                     ` Florian Weimer
2005-04-27 21:32                                       ` Theodore Ts'o
2005-04-27 19:55                       ` Theodore Ts'o
2005-04-27  6:34                   ` Ingo Molnar
2005-04-27 21:10                     ` Bill Davidsen
2005-04-27 21:39                       ` Linus Torvalds
2005-04-26 16:42           ` Linus Torvalds
2005-04-26 17:39             ` Chris Mason
2005-04-26 19:52               ` Chris Mason
2005-04-26 18:15         ` H. Peter Anvin
2005-04-26 20:30           ` Bill Davidsen
2005-04-26 16:11       ` Bill Davidsen
2005-04-26  4:01   ` Matt Mackall
2005-04-26  4:20     ` Linus Torvalds
2005-04-26  4:09   ` Chris Wedgwood
2005-04-26  4:22     ` Andreas Gal
2005-04-26  4:22     ` Linus Torvalds
2005-04-29  6:01   ` Mercurial 0.4b vs git patchbomb benchmark Matt Mackall
2005-04-29  6:40     ` Sean
2005-04-29  7:40       ` Matt Mackall
2005-04-29  8:40         ` Sean
2005-04-29 14:34         ` Linus Torvalds
2005-04-29 15:18           ` Morten Welinder
2005-04-29 16:52             ` Matt Mackall
2005-05-02 16:10               ` Bill Davidsen
2005-05-02 19:02                 ` Sean
2005-05-02 22:02                 ` Linus Torvalds
2005-05-02 22:30                   ` Matt Mackall
2005-05-02 22:49                     ` Linus Torvalds
2005-05-03  0:00                       ` Matt Mackall
2005-05-03  2:48                         ` Linus Torvalds [this message]
2005-05-03  3:29                           ` Matt Mackall
2005-05-03  4:18                             ` Linus Torvalds
2005-05-03  4:24                         ` Linus Torvalds
2005-05-03  4:27                           ` Matt Mackall
2005-05-03  8:45                           ` Chris Wedgwood
2005-04-29 15:44           ` Tom Lord
2005-04-29 15:58             ` Linus Torvalds
2005-04-29 17:34               ` Tom Lord
2005-04-29 17:56                 ` Linus Torvalds
2005-04-29 18:08                   ` Tom Lord
2005-04-29 18:33                     ` Sean
2005-04-29 18:54                       ` Tom Lord
2005-04-29 19:13                         ` Sean
2005-05-02 16:15                           ` Bill Davidsen
2005-04-29 16:37           ` Matt Mackall
2005-04-29 17:09             ` Linus Torvalds
2005-04-29 19:12               ` Matt Mackall
2005-04-29 19:50                 ` Linus Torvalds
2005-04-29 20:23                   ` Matt Mackall
2005-04-29 20:49                     ` Linus Torvalds
2005-04-29 21:20                       ` Matt Mackall
2005-04-29 16:46           ` Bill Davidsen
2005-04-29 20:19       ` Andrea Arcangeli
2005-04-29 22:30         ` Olivier Galibert
2005-04-29 22:47           ` Andrea Arcangeli
2005-04-29 20:30     ` Andrea Arcangeli
2005-04-29 20:39       ` Matt Mackall
2005-04-30  2:52         ` Andrea Arcangeli
2005-04-30 15:20           ` Matt Mackall
2005-04-30 16:37             ` Andrea Arcangeli
2005-05-02 15:49           ` Bill Davidsen
2005-05-02 16:14             ` Valdis.Kletnieks
2005-05-03 17:40               ` Bill Davidsen
2005-05-04  2:10                 ` Mercurial 0.4b vs git patchbomb benchmark (/usr/bin/env again) David A. Wheeler
2005-05-02 16:17             ` Mercurial 0.4b vs git patchbomb benchmark Andrea Arcangeli
2005-05-02 16:31             ` Linus Torvalds
2005-05-02 17:18               ` Daniel Jacobowitz
2005-05-02 17:32                 ` Linus Torvalds
2005-05-02 20:54                 ` Sam Ravnborg
2005-05-02 17:20               ` Ryan Anderson
2005-05-02 17:31                 ` Linus Torvalds
2005-05-02 21:17               ` Kyle Moffett
2005-05-03 17:43               ` Bill Davidsen
     [not found] <3YQn9-8qX-5@gated-at.bofh.it>
     [not found] ` <3ZLEF-56n-1@gated-at.bofh.it>
     [not found]   ` <3ZM7L-5ot-13@gated-at.bofh.it>
     [not found]     ` <3ZN3P-69A-9@gated-at.bofh.it>
     [not found]       ` <3ZNdz-6gK-9@gated-at.bofh.it>
2005-05-03  1:16         ` Bodo Eggert <harvested.in.lkml@posting.7eggert.dyndns.org>
2005-05-03  1:29           ` Matt Mackall
2005-05-03 16:22             ` Bill Davidsen
2005-05-03 17:14               ` Rene Scharfe
2005-05-04 17:51                 ` Bill Davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0505021932270.3594@ppc970.osdl.org \
    --to=torvalds@osdl.org \
    --cc=davidsen@tmr.com \
    --cc=git@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpm@selenic.com \
    --cc=mwelinder@gmail.com \
    --cc=seanlkml@sympatico.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).