From: Ingo Molnar <mingo@elte.hu>
To: "David S. Miller" <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@osdl.org>,
andrea@suse.de, mbp@sourcefrog.net, linux-kernel@vger.kernel.org,
dlang@digitalinsight.com, Paul Jackson <pj@engr.sgi.com>
Subject: Re: Kernel SCM saga..
Date: Sun, 10 Apr 2005 13:33:36 +0200 [thread overview]
Message-ID: <20050410113336.GA8103@elte.hu> (raw)
In-Reply-To: <20050409155511.7432d5c7.davem@davemloft.net>
* David S. Miller <davem@davemloft.net> wrote:
> On Fri, 8 Apr 2005 22:45:18 -0700 (PDT)
> Linus Torvalds <torvalds@osdl.org> wrote:
>
> > Also, I don't want people editing repostitory files by hand. Sure, the
> > sha1 catches it, but still... I'd rather force the low-level ops to use
> > the proper helper routines. Which is why it's a raw zlib compressed blob,
> > not a gzipped file.
>
> I understand the arguments for compression, but I hate it for one
> simple reason: recovery is more difficult when you corrupt some
> file in your repository.
>
> It's happened to me more than once and I did lose data.
>
> Without compression, I might be able to recover if something
> causes a block of zeros to be written to the middle of some
> repository file. With compression, you pretty much just lose.
that depends on how you compress. You are perfectly right that with
default zlib compression, where you start the compression stream and
stop it at the end of the file, recovery in case of damage is very hard
for the portion that comes _after_ the damaged section. You'd have to
reconstruct the compression state which is akin to breaking a key.
But with zlib you can 'flush' the compression state every couple of
blocks and basically get the same recovery properties, at some very
minimal extra space cost (because when you flush out compression state
you get some extra padding bytes).
Flushing has another advantage as well: a small delta (even if it
increases/decreases the file size!) in the middle of a larger file will
still be compressed to the same output both before and after the change
area (modulo flush block size), which rsync can pick up just fine. (IIRC
that is one of the reasons why Debian, when compressing .deb's, does
zlib-flushes every couple of blocks, so that rsync/apt-get can pick up
partial .deb's as well.)
the zlib option is i think Z_PARTIAL_FLUSH, i'm using it in Tux to do
chunks of compression. The flushing cost ismax 12 bytes or so, so if
it's done every 4K we maximize the cost to 0.2%.
so flushing is both rsync-friendly and recovery-friendly.
(recovery isnt as simple as with plaintext, as you have to find the next
'block' and the block length will be inevitably variable. But it should
be pretty predictable, and tools might even exist.)
Ingo
next prev parent reply other threads:[~2005-04-10 11:34 UTC|newest]
Thread overview: 206+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-06 15:42 Kernel SCM saga Linus Torvalds
2005-04-06 16:00 ` Greg KH
2005-04-07 16:40 ` Rik van Riel
2005-04-08 0:53 ` Jesse Barnes
2005-04-06 16:09 ` Daniel Phillips
2005-04-06 19:07 ` Jon Smirl
2005-04-06 19:24 ` Matan Peled
2005-04-06 19:49 ` Jon Smirl
2005-04-06 20:34 ` Hua Zhong
2005-04-07 1:31 ` Christoph Lameter
2005-04-06 19:39 ` Paul P Komkoff Jr
2005-04-07 1:40 ` Martin Pool
2005-04-07 1:47 ` Jeff Garzik
2005-04-07 2:26 ` Martin Pool
2005-04-07 2:32 ` David Lang
2005-04-07 5:38 ` Martin Pool
2005-04-07 23:27 ` Linus Torvalds
2005-04-08 5:56 ` Martin Pool
2005-04-08 6:41 ` Linus Torvalds
2005-04-08 8:38 ` Andrea Arcangeli
2005-04-08 23:38 ` Daniel Phillips
2005-04-09 2:54 ` Andrea Arcangeli
2005-04-09 0:12 ` Linus Torvalds
2005-04-09 2:27 ` Andrea Arcangeli
2005-04-09 2:32 ` David Lang
2005-04-09 3:08 ` Brian Gerst
2005-04-09 3:15 ` Andrea Arcangeli
2005-04-09 5:45 ` Linus Torvalds
2005-04-09 22:55 ` David S. Miller
2005-04-09 23:13 ` Linus Torvalds
2005-04-10 0:14 ` Chris Wedgwood
2005-04-10 1:56 ` Paul Jackson
2005-04-10 12:03 ` Ingo Molnar
2005-04-10 17:38 ` Paul Jackson
2005-04-10 17:46 ` Ingo Molnar
2005-04-10 17:56 ` Paul Jackson
2005-04-10 0:22 ` Paul Jackson
2005-04-10 11:33 ` Ingo Molnar [this message]
2005-04-10 17:55 ` Matthias Andree
2005-04-09 16:33 ` Roman Zippel
2005-04-09 23:31 ` Tupshin Harper
2005-04-10 17:24 ` Code snippet to reconstruct ancestry graph from bk repo Paul P Komkoff Jr
2005-04-10 18:19 ` Roman Zippel
2005-04-08 16:46 ` Kernel SCM saga Catalin Marinas
2005-04-07 8:14 ` Magnus Damm
2005-04-07 7:53 ` Zwane Mwaikambo
2005-04-07 3:35 ` Daniel Phillips
2005-04-07 15:08 ` Daniel Phillips
2005-04-07 6:36 ` bert hubert
2005-04-06 23:22 ` Jon Masters
2005-04-07 6:51 ` Paul Mackerras
2005-04-07 7:48 ` Arjan van de Ven
2005-04-07 15:10 ` Linus Torvalds
2005-04-07 17:00 ` Daniel Phillips
2005-04-07 17:38 ` Linus Torvalds
2005-04-07 17:47 ` Chris Wedgwood
2005-04-07 18:06 ` Magnus Damm
2005-04-07 18:36 ` Daniel Phillips
2005-04-08 3:35 ` Jeff Garzik
2005-04-07 19:56 ` Sam Ravnborg
2005-04-07 23:21 ` Dave Airlie
2005-04-07 7:18 ` David Woodhouse
2005-04-07 8:50 ` Andrew Morton
2005-04-07 9:20 ` Paul Mackerras
2005-04-07 9:46 ` Andrew Morton
2005-04-07 11:17 ` Paul Mackerras
2005-04-07 10:41 ` Geert Uytterhoeven
2005-04-07 9:25 ` David Woodhouse
2005-04-07 9:49 ` Andrew Morton
2005-04-07 9:55 ` Russell King
2005-04-07 10:11 ` David Woodhouse
2005-04-07 9:40 ` David Vrabel
2005-04-07 9:24 ` Sergei Organov
2005-04-07 10:30 ` Matthias Andree
2005-04-07 10:54 ` Andrew Walrond
2005-04-09 16:17 ` David Roundy
2005-04-10 9:24 ` Giuseppe Bilotta
2005-04-10 13:51 ` David Roundy
2005-04-07 15:32 ` Linus Torvalds
2005-04-07 17:09 ` Daniel Phillips
2005-04-07 17:10 ` Al Viro
2005-04-07 17:47 ` Linus Torvalds
2005-04-07 18:04 ` Jörn Engel
2005-04-07 18:27 ` Daniel Phillips
2005-04-07 20:54 ` Arjan van de Ven
2005-04-08 3:41 ` Jeff Garzik
2005-04-07 17:52 ` Bartlomiej Zolnierkiewicz
2005-04-07 17:54 ` Daniel Phillips
2005-04-07 18:13 ` Dmitry Yusupov
2005-04-07 18:29 ` Daniel Phillips
2005-04-10 22:33 ` Troy Benjegerdes
2005-04-11 0:00 ` Christian Parpart
2005-04-08 17:24 ` Jon Masters
2005-04-08 22:05 ` Daniel Phillips
2005-04-08 22:52 ` Roman Zippel
2005-04-08 23:46 ` Tupshin Harper
2005-04-09 1:00 ` Roman Zippel
2005-04-09 1:23 ` Tupshin Harper
2005-04-09 16:52 ` Eric D. Mudama
2005-04-09 17:40 ` Roman Zippel
2005-04-09 18:56 ` Ray Lee
2005-04-07 7:44 ` Jan Hudec
2005-04-08 6:14 ` Matthias Urlichs
2005-04-09 1:01 ` Marcin Dalecki
2005-04-09 8:32 ` Jan Hudec
2005-04-11 2:26 ` Miles Bader
2005-04-11 2:56 ` Marcin Dalecki
2005-04-11 6:36 ` Jan Hudec
2005-04-07 10:56 ` Andrew Walrond
2005-04-08 0:57 ` Ian Wienand
2005-04-08 4:13 ` Chris Wedgwood
2005-04-08 4:42 ` Linus Torvalds
2005-04-08 5:04 ` Chris Wedgwood
2005-04-08 5:14 ` H. Peter Anvin
2005-04-08 7:05 ` Rogan Dawes
2005-04-08 7:21 ` Daniel Phillips
2005-04-08 7:49 ` H. Peter Anvin
2005-04-08 7:14 ` Andrea Arcangeli
2005-04-08 12:02 ` Matthias Andree
2005-04-08 12:21 ` Florian Weimer
2005-04-08 14:26 ` Linus Torvalds
2005-04-08 16:15 ` Matthias-Christian Ott
2005-04-08 17:14 ` Linus Torvalds
2005-04-08 17:15 ` Chris Wedgwood
2005-04-08 17:46 ` Linus Torvalds
2005-04-08 18:05 ` Chris Wedgwood
2005-04-08 19:03 ` Linus Torvalds
2005-04-08 19:16 ` Chris Wedgwood
2005-04-08 19:38 ` Florian Weimer
2005-04-08 19:48 ` Chris Wedgwood
2005-04-08 19:39 ` Linus Torvalds
2005-04-08 20:11 ` Uncached stat performace [ Was: Re: Kernel SCM saga.. ] Ragnar Kjørstad
2005-04-08 20:14 ` Chris Wedgwood
2005-04-08 20:50 ` Kernel SCM saga Luck, Tony
2005-04-08 21:27 ` Linus Torvalds
2005-04-09 17:14 ` Roman Zippel
2005-04-09 7:20 ` Willy Tarreau
2005-04-09 15:15 ` Paul Jackson
2005-04-08 17:25 ` Matthias-Christian Ott
2005-04-08 18:14 ` Linus Torvalds
2005-04-08 18:28 ` Jon Smirl
2005-04-08 18:58 ` Florian Weimer
2005-04-09 1:11 ` Marcin Dalecki
2005-04-09 1:50 ` David Lang
2005-04-09 22:12 ` Florian Weimer
2005-04-08 19:16 ` Matthias-Christian Ott
2005-04-08 19:32 ` Linus Torvalds
2005-04-08 19:44 ` Matthias-Christian Ott
2005-04-09 1:09 ` Marcin Dalecki
2005-04-08 17:35 ` Jeff Garzik
2005-04-08 18:47 ` Linus Torvalds
2005-04-08 18:56 ` Chris Wedgwood
2005-04-09 7:37 ` Willy Tarreau
2005-04-09 7:47 ` Neil Brown
2005-04-09 8:00 ` Willy Tarreau
2005-04-09 9:34 ` Neil Brown
2005-04-09 15:40 ` Paul Jackson
2005-04-09 16:16 ` Linus Torvalds
2005-04-09 17:15 ` Paul Jackson
2005-04-09 17:35 ` Paul Jackson
2005-04-09 1:04 ` Marcin Dalecki
2005-04-09 15:42 ` Paul Jackson
2005-04-09 18:45 ` Marcin Dalecki
2005-04-09 1:00 ` Marcin Dalecki
2005-04-09 1:09 ` Chris Wedgwood
2005-04-09 1:21 ` Marcin Dalecki
2005-04-08 7:17 ` ross
2005-04-08 15:50 ` Linus Torvalds
2005-04-09 2:53 ` Petr Baudis
2005-04-09 7:08 ` Randy.Dunlap
2005-04-09 18:06 ` [PATCH] " Petr Baudis
2005-04-10 1:01 ` Phillip Lougher
2005-04-10 1:42 ` Petr Baudis
2005-04-10 1:57 ` Phillip Lougher
2005-04-09 15:50 ` Paul Jackson
2005-04-09 16:26 ` Linus Torvalds
2005-04-09 17:08 ` Paul Jackson
2005-04-10 3:41 ` Paul Jackson
2005-04-10 8:39 ` David Lang
2005-04-10 9:40 ` Junio C Hamano
2005-04-10 16:46 ` Bill Davidsen
2005-04-10 17:50 ` Paul Jackson
2005-04-12 23:20 ` Pavel Machek
2005-04-08 7:34 ` Marcel Lanz
2005-04-08 9:23 ` Geert Uytterhoeven
2005-04-08 8:38 ` Matt Johnston
2005-04-12 7:14 ` Kernel SCM saga.. (bk license?) Kedar Sovani
2005-04-12 9:34 ` Catalin Marinas
2005-04-13 4:04 ` Ricky Beam
2005-04-08 11:42 ` Kernel SCM saga Catalin Marinas
[not found] <Pine.LNX.4.58.0504060800280.2215 () ppc970 ! osdl ! org>
2005-04-06 21:13 ` kfogel
2005-04-06 22:39 ` Jeff Garzik
2005-04-09 1:00 ` Marcin Dalecki
2005-04-06 22:37 Wolfgang Denk
2005-04-06 23:16 ` Tom Rini
2005-04-06 23:21 ` Eugene Surovegin
2005-04-06 23:33 ` Dan Malek
2005-04-07 0:13 ` Benjamin Herrenschmidt
2005-04-08 22:27 Rajesh Venkatasubramanian
2005-04-08 23:29 ` Linus Torvalds
2005-04-09 0:29 ` Linus Torvalds
2005-04-09 16:20 ` Paul Jackson
2005-04-09 4:06 Walter Landry
2005-04-09 11:02 Samium Gromoff
2005-04-09 11:29 Samium Gromoff
2005-04-10 4:20 Albert Cahalan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050410113336.GA8103@elte.hu \
--to=mingo@elte.hu \
--cc=andrea@suse.de \
--cc=davem@davemloft.net \
--cc=dlang@digitalinsight.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mbp@sourcefrog.net \
--cc=pj@engr.sgi.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.