All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: "David S. Miller" <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@osdl.org>,
	andrea@suse.de, mbp@sourcefrog.net, linux-kernel@vger.kernel.org,
	dlang@digitalinsight.com, Paul Jackson <pj@engr.sgi.com>
Subject: Re: Kernel SCM saga..
Date: Sun, 10 Apr 2005 13:33:36 +0200	[thread overview]
Message-ID: <20050410113336.GA8103@elte.hu> (raw)
In-Reply-To: <20050409155511.7432d5c7.davem@davemloft.net>


* David S. Miller <davem@davemloft.net> wrote:

> On Fri, 8 Apr 2005 22:45:18 -0700 (PDT)
> Linus Torvalds <torvalds@osdl.org> wrote:
> 
> > Also, I don't want people editing repostitory files by hand. Sure, the 
> > sha1 catches it, but still... I'd rather force the low-level ops to use 
> > the proper helper routines. Which is why it's a raw zlib compressed blob, 
> > not a gzipped file.
> 
> I understand the arguments for compression, but I hate it for one
> simple reason: recovery is more difficult when you corrupt some
> file in your repository.
> 
> It's happened to me more than once and I did lose data.
> 
> Without compression, I might be able to recover if something
> causes a block of zeros to be written to the middle of some
> repository file.  With compression, you pretty much just lose.

that depends on how you compress. You are perfectly right that with 
default zlib compression, where you start the compression stream and 
stop it at the end of the file, recovery in case of damage is very hard 
for the portion that comes _after_ the damaged section. You'd have to 
reconstruct the compression state which is akin to breaking a key.

But with zlib you can 'flush' the compression state every couple of 
blocks and basically get the same recovery properties, at some very 
minimal extra space cost (because when you flush out compression state 
you get some extra padding bytes).

Flushing has another advantage as well: a small delta (even if it 
increases/decreases the file size!) in the middle of a larger file will 
still be compressed to the same output both before and after the change 
area (modulo flush block size), which rsync can pick up just fine. (IIRC 
that is one of the reasons why Debian, when compressing .deb's, does 
zlib-flushes every couple of blocks, so that rsync/apt-get can pick up 
partial .deb's as well.)

the zlib option is i think Z_PARTIAL_FLUSH, i'm using it in Tux to do 
chunks of compression. The flushing cost ismax 12 bytes or so, so if 
it's done every 4K we maximize the cost to 0.2%.

so flushing is both rsync-friendly and recovery-friendly.

(recovery isnt as simple as with plaintext, as you have to find the next 
'block' and the block length will be inevitably variable. But it should 
be pretty predictable, and tools might even exist.)

	Ingo

  parent reply	other threads:[~2005-04-10 11:34 UTC|newest]

Thread overview: 206+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-06 15:42 Kernel SCM saga Linus Torvalds
2005-04-06 16:00 ` Greg KH
2005-04-07 16:40   ` Rik van Riel
2005-04-08  0:53     ` Jesse Barnes
2005-04-06 16:09 ` Daniel Phillips
2005-04-06 19:07 ` Jon Smirl
2005-04-06 19:24   ` Matan Peled
2005-04-06 19:49     ` Jon Smirl
2005-04-06 20:34       ` Hua Zhong
2005-04-07  1:31       ` Christoph Lameter
2005-04-06 19:39 ` Paul P Komkoff Jr
2005-04-07  1:40   ` Martin Pool
2005-04-07  1:47     ` Jeff Garzik
2005-04-07  2:26       ` Martin Pool
2005-04-07  2:32         ` David Lang
2005-04-07  5:38           ` Martin Pool
2005-04-07 23:27             ` Linus Torvalds
2005-04-08  5:56               ` Martin Pool
2005-04-08  6:41                 ` Linus Torvalds
2005-04-08  8:38                   ` Andrea Arcangeli
2005-04-08 23:38                     ` Daniel Phillips
2005-04-09  2:54                       ` Andrea Arcangeli
2005-04-09  0:12                     ` Linus Torvalds
2005-04-09  2:27                       ` Andrea Arcangeli
2005-04-09  2:32                         ` David Lang
2005-04-09  3:08                         ` Brian Gerst
2005-04-09  3:15                           ` Andrea Arcangeli
2005-04-09  5:45                         ` Linus Torvalds
2005-04-09 22:55                           ` David S. Miller
2005-04-09 23:13                             ` Linus Torvalds
2005-04-10  0:14                               ` Chris Wedgwood
2005-04-10  1:56                                 ` Paul Jackson
2005-04-10 12:03                                   ` Ingo Molnar
2005-04-10 17:38                                     ` Paul Jackson
2005-04-10 17:46                                       ` Ingo Molnar
2005-04-10 17:56                                         ` Paul Jackson
2005-04-10  0:22                             ` Paul Jackson
2005-04-10 11:33                             ` Ingo Molnar [this message]
2005-04-10 17:55                         ` Matthias Andree
2005-04-09 16:33                       ` Roman Zippel
2005-04-09 23:31                         ` Tupshin Harper
2005-04-10 17:24                         ` Code snippet to reconstruct ancestry graph from bk repo Paul P Komkoff Jr
2005-04-10 18:19                           ` Roman Zippel
2005-04-08 16:46                   ` Kernel SCM saga Catalin Marinas
2005-04-07  8:14           ` Magnus Damm
2005-04-07  7:53       ` Zwane Mwaikambo
2005-04-07  3:35     ` Daniel Phillips
2005-04-07 15:08       ` Daniel Phillips
2005-04-07  6:36   ` bert hubert
2005-04-06 23:22 ` Jon Masters
2005-04-07  6:51 ` Paul Mackerras
2005-04-07  7:48   ` Arjan van de Ven
2005-04-07 15:10   ` Linus Torvalds
2005-04-07 17:00     ` Daniel Phillips
2005-04-07 17:38       ` Linus Torvalds
2005-04-07 17:47         ` Chris Wedgwood
2005-04-07 18:06         ` Magnus Damm
2005-04-07 18:36         ` Daniel Phillips
2005-04-08  3:35         ` Jeff Garzik
2005-04-07 19:56       ` Sam Ravnborg
2005-04-07 23:21     ` Dave Airlie
2005-04-07  7:18 ` David Woodhouse
2005-04-07  8:50   ` Andrew Morton
2005-04-07  9:20     ` Paul Mackerras
2005-04-07  9:46       ` Andrew Morton
2005-04-07 11:17         ` Paul Mackerras
2005-04-07 10:41       ` Geert Uytterhoeven
2005-04-07  9:25     ` David Woodhouse
2005-04-07  9:49       ` Andrew Morton
2005-04-07  9:55       ` Russell King
2005-04-07 10:11         ` David Woodhouse
2005-04-07  9:40     ` David Vrabel
2005-04-07  9:24   ` Sergei Organov
2005-04-07 10:30     ` Matthias Andree
2005-04-07 10:54       ` Andrew Walrond
2005-04-09 16:17       ` David Roundy
2005-04-10  9:24         ` Giuseppe Bilotta
2005-04-10 13:51           ` David Roundy
2005-04-07 15:32   ` Linus Torvalds
2005-04-07 17:09     ` Daniel Phillips
2005-04-07 17:10     ` Al Viro
2005-04-07 17:47       ` Linus Torvalds
2005-04-07 18:04         ` Jörn Engel
2005-04-07 18:27           ` Daniel Phillips
2005-04-07 20:54           ` Arjan van de Ven
2005-04-08  3:41         ` Jeff Garzik
2005-04-07 17:52       ` Bartlomiej Zolnierkiewicz
2005-04-07 17:54       ` Daniel Phillips
2005-04-07 18:13         ` Dmitry Yusupov
2005-04-07 18:29           ` Daniel Phillips
2005-04-10 22:33             ` Troy Benjegerdes
2005-04-11  0:00               ` Christian Parpart
2005-04-08 17:24         ` Jon Masters
2005-04-08 22:05           ` Daniel Phillips
2005-04-08 22:52     ` Roman Zippel
2005-04-08 23:46       ` Tupshin Harper
2005-04-09  1:00         ` Roman Zippel
2005-04-09  1:23           ` Tupshin Harper
2005-04-09 16:52       ` Eric D. Mudama
2005-04-09 17:40         ` Roman Zippel
2005-04-09 18:56           ` Ray Lee
2005-04-07  7:44 ` Jan Hudec
2005-04-08  6:14   ` Matthias Urlichs
2005-04-09  1:01   ` Marcin Dalecki
2005-04-09  8:32     ` Jan Hudec
2005-04-11  2:26     ` Miles Bader
2005-04-11  2:56       ` Marcin Dalecki
2005-04-11  6:36         ` Jan Hudec
2005-04-07 10:56 ` Andrew Walrond
2005-04-08  0:57 ` Ian Wienand
2005-04-08  4:13 ` Chris Wedgwood
2005-04-08  4:42   ` Linus Torvalds
2005-04-08  5:04     ` Chris Wedgwood
2005-04-08  5:14       ` H. Peter Anvin
2005-04-08  7:05         ` Rogan Dawes
2005-04-08  7:21           ` Daniel Phillips
2005-04-08  7:49             ` H. Peter Anvin
2005-04-08  7:14     ` Andrea Arcangeli
2005-04-08 12:02       ` Matthias Andree
2005-04-08 12:21         ` Florian Weimer
2005-04-08 14:26       ` Linus Torvalds
2005-04-08 16:15         ` Matthias-Christian Ott
2005-04-08 17:14           ` Linus Torvalds
2005-04-08 17:15             ` Chris Wedgwood
2005-04-08 17:46               ` Linus Torvalds
2005-04-08 18:05                 ` Chris Wedgwood
2005-04-08 19:03                   ` Linus Torvalds
2005-04-08 19:16                     ` Chris Wedgwood
2005-04-08 19:38                       ` Florian Weimer
2005-04-08 19:48                         ` Chris Wedgwood
2005-04-08 19:39                       ` Linus Torvalds
2005-04-08 20:11                         ` Uncached stat performace [ Was: Re: Kernel SCM saga.. ] Ragnar Kjørstad
2005-04-08 20:14                           ` Chris Wedgwood
2005-04-08 20:50                       ` Kernel SCM saga Luck, Tony
2005-04-08 21:27                         ` Linus Torvalds
2005-04-09 17:14                           ` Roman Zippel
2005-04-09  7:20                     ` Willy Tarreau
2005-04-09 15:15                     ` Paul Jackson
2005-04-08 17:25             ` Matthias-Christian Ott
2005-04-08 18:14               ` Linus Torvalds
2005-04-08 18:28                 ` Jon Smirl
2005-04-08 18:58                   ` Florian Weimer
2005-04-09  1:11                   ` Marcin Dalecki
2005-04-09  1:50                     ` David Lang
2005-04-09 22:12                       ` Florian Weimer
2005-04-08 19:16                 ` Matthias-Christian Ott
2005-04-08 19:32                   ` Linus Torvalds
2005-04-08 19:44                     ` Matthias-Christian Ott
2005-04-09  1:09                 ` Marcin Dalecki
2005-04-08 17:35             ` Jeff Garzik
2005-04-08 18:47               ` Linus Torvalds
2005-04-08 18:56                 ` Chris Wedgwood
2005-04-09  7:37                   ` Willy Tarreau
2005-04-09  7:47                     ` Neil Brown
2005-04-09  8:00                       ` Willy Tarreau
2005-04-09  9:34                         ` Neil Brown
2005-04-09 15:40                 ` Paul Jackson
2005-04-09 16:16                   ` Linus Torvalds
2005-04-09 17:15                     ` Paul Jackson
2005-04-09 17:35                     ` Paul Jackson
2005-04-09  1:04             ` Marcin Dalecki
2005-04-09 15:42               ` Paul Jackson
2005-04-09 18:45                 ` Marcin Dalecki
2005-04-09  1:00           ` Marcin Dalecki
2005-04-09  1:09             ` Chris Wedgwood
2005-04-09  1:21               ` Marcin Dalecki
2005-04-08  7:17     ` ross
2005-04-08 15:50       ` Linus Torvalds
2005-04-09  2:53         ` Petr Baudis
2005-04-09  7:08           ` Randy.Dunlap
2005-04-09 18:06             ` [PATCH] " Petr Baudis
2005-04-10  1:01           ` Phillip Lougher
2005-04-10  1:42             ` Petr Baudis
2005-04-10  1:57               ` Phillip Lougher
2005-04-09 15:50         ` Paul Jackson
2005-04-09 16:26           ` Linus Torvalds
2005-04-09 17:08             ` Paul Jackson
2005-04-10  3:41             ` Paul Jackson
2005-04-10  8:39             ` David Lang
2005-04-10  9:40               ` Junio C Hamano
2005-04-10 16:46                 ` Bill Davidsen
2005-04-10 17:50                   ` Paul Jackson
2005-04-12 23:20                     ` Pavel Machek
2005-04-08  7:34     ` Marcel Lanz
2005-04-08  9:23       ` Geert Uytterhoeven
2005-04-08  8:38     ` Matt Johnston
2005-04-12  7:14     ` Kernel SCM saga.. (bk license?) Kedar Sovani
2005-04-12  9:34       ` Catalin Marinas
2005-04-13  4:04       ` Ricky Beam
2005-04-08 11:42   ` Kernel SCM saga Catalin Marinas
     [not found] <Pine.LNX.4.58.0504060800280.2215 () ppc970 ! osdl ! org>
2005-04-06 21:13 ` kfogel
2005-04-06 22:39   ` Jeff Garzik
2005-04-09  1:00   ` Marcin Dalecki
2005-04-06 22:37 Wolfgang Denk
2005-04-06 23:16 ` Tom Rini
2005-04-06 23:21 ` Eugene Surovegin
2005-04-06 23:33 ` Dan Malek
2005-04-07  0:13   ` Benjamin Herrenschmidt
2005-04-08 22:27 Rajesh Venkatasubramanian
2005-04-08 23:29 ` Linus Torvalds
2005-04-09  0:29   ` Linus Torvalds
2005-04-09 16:20   ` Paul Jackson
2005-04-09  4:06 Walter Landry
2005-04-09 11:02 Samium Gromoff
2005-04-09 11:29 Samium Gromoff
2005-04-10  4:20 Albert Cahalan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050410113336.GA8103@elte.hu \
    --to=mingo@elte.hu \
    --cc=andrea@suse.de \
    --cc=davem@davemloft.net \
    --cc=dlang@digitalinsight.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbp@sourcefrog.net \
    --cc=pj@engr.sgi.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.