All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthias-Christian Ott <matthias.christian@tiscali.de>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Andrea Arcangeli <andrea@suse.de>, Chris Wedgwood <cw@f00f.org>,
	Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Kernel SCM saga..
Date: Fri, 08 Apr 2005 21:16:12 +0200	[thread overview]
Message-ID: <4256D87C.5090207@tiscali.de> (raw)
In-Reply-To: <Pine.LNX.4.58.0504081047200.28951@ppc970.osdl.org>

Linus Torvalds wrote:

>On Fri, 8 Apr 2005, Matthias-Christian Ott wrote:
>  
>
>>Ok, but if you want to search for information in such big text files it 
>>slow, because you do linear search 
>>    
>>
>
>No I don't. I don't search for _anything_. I have my own
>content-addressable filesystem, and I guarantee you that it's faster than
>mysql, because it depends on the kernel doing the right thing (which it
>does).
>
>  
>
I'm not talking about mysql, i'm talking about fast databases like 
sqlite or db4.

>I never do a single "readdir". It's all direct data lookup, no "searching"  
>anywhere.
>
>Databases aren't magical. Quite the reverse. They easily end up being
>_slower_ than doing it by hand, simply because they have to solve a much
>more generic issue. If you design your data structures and abstractions
>right, a database is pretty much guaranteed to only incur overhead.
>
>The advantage of a database is the abstraction and management it gives 
>you. But I did my own special-case abstraction in git.
>
>Yeah, I bet "git" might suck if your OS sucks. I definitely depend on name
>caching at an OS level so that I know that opening a file is fast. In
>other words, there _is_ an indexing and caching database in there, and
>it's called the Linux VFS layer and the dentry cache.
>
>The proof is in the pudding. git is designed for _one_ thing, and one 
>thing only: tracking a series of directory states in a way that can be 
>replicated. It's very very fast at that. A database with a more flexible 
>abstraction migt be faster at other things, but the fact is, you do take a 
>hit.
>
>The problem with databases are:
>
> - they are damn hard to just replicate wildly and without control. The 
>   database backing file inherently has a lot of internal state. You may 
>   be able to "just copy it", but you have to copy the whole damn thing.
>  
>
This is _not_ true for every database (specialy plain/text databases 
with meta information).

>   In "git", the data is all there in immutable blobs that you can just 
>   rsync. In fact, you don't even need rsync: you can just look at the 
>   filenames, and anything new you copy. No need for any fancy "read the 
>   files to see that they match". They _will_ match, or you can tell 
>   immediately that a file is corrupt.
>
>   Look at this:
>
>	torvalds@ppc970:~/git> sha1sum .dircache/objects/e7/bfaadd5d2331123663a8f14a26604a3cdcb678 
>	e7bfaadd5d2331123663a8f14a26604a3cdcb678  .dircache/objects/e7/bfaadd5d2331123663a8f14a26604a3cdcb678
>
>   see a pattern anywhere? Imagine that you know the list of files you 
>   have, and the list of files the other side has (never mind the 
>   contents), and how _easy_ it is to synchronize. Without ever having to 
>   even read the remote files that you know you already have.
>   How do you replicate your database incrementally? I've given you enough 
>   clues to do it for "git" in probably five lines of perl.
>  
>
I replicate my database incremently by using a hash list like you (the 
client sends its hash list, the server compares the lists and acquaints 
the client behind which data (data = hash + data) the data has to added 
(this is like your solution -- you also submit the data and the location 
(you have directories too, right?)). A database is in some cases (like 
this one) like a filesystem, but it's build one top of better filesystem 
like xfs, reiser4 or ext3 which support features like LVM, Quotas or 
Journaling (Is your filesystem also build on top of existing filesystem? 
I don't think so because you're talking about vfs operatations on the 
filesystem).

> - they tend to take time to set up and prime.
>
>   In contrast, the filesystem is always there. Sure, you effectively have 
>   to "prime" that one too, but the thing is, if your OS is doing its job, 
>   you basically only need to prime it once per reboot. No need to prime 
>   it for each process you start or play games with connecting to servers 
>   etc. It's just there. Always.
>  
>
The database -- single file (sqlite or db4) -- is always there too 
because it's on the filesystem and doesn't need a server.

>So if you think of the filesystem as a database, you're all set. If you 
>design your data structure so that there is just one index, you make that 
>the name, and the kernel will do all the O(1) hashed lookups etc for you. 
>You do have to limit yourself in some ways. 
>  
>
But as mentioned you need to _open_ each file (It doesn't matter if it's 
cached (this speeds up only reading it) -- you need a _slow_ system call 
and _very slow_ hardware access anyway).
Have a look at this comparison:
If you have big chest and lots of small chests containing the same bulk 
of gold, it's more work to collect the gold from the small chests than 
from the big one (which would contain as many a cases as little chests 
exist). You can faster find your gold because you don't have to walk to 
the other chests and you don't have to open that much caps which saves 
also time.

>Oh, and you have to be willing to waste diskspace. "git" is _not_
>space-efficient. The good news is that it is cache-friendly, since it is
>also designed to never actually look at any old files that aren't part of
>the immediate history, so while it wastes diskspace, it does not waste the
>(much more precious) page cache.
>
>IOW big file-sets are always bad for performance if you need to traverse
>them to get anywhere, but if you index things so that you only read the 
>stuff you really really _need_ (which git does), big file-sets are just an 
>excuse to buy a new disk ;)
>
>			Linus
>
>  
>
I hope my idea/opinion is clear now.

Matthias-Christian

  parent reply	other threads:[~2005-04-08 19:16 UTC|newest]

Thread overview: 206+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-06 15:42 Kernel SCM saga Linus Torvalds
2005-04-06 16:00 ` Greg KH
2005-04-07 16:40   ` Rik van Riel
2005-04-08  0:53     ` Jesse Barnes
2005-04-06 16:09 ` Daniel Phillips
2005-04-06 19:07 ` Jon Smirl
2005-04-06 19:24   ` Matan Peled
2005-04-06 19:49     ` Jon Smirl
2005-04-06 20:34       ` Hua Zhong
2005-04-07  1:31       ` Christoph Lameter
2005-04-06 19:39 ` Paul P Komkoff Jr
2005-04-07  1:40   ` Martin Pool
2005-04-07  1:47     ` Jeff Garzik
2005-04-07  2:26       ` Martin Pool
2005-04-07  2:32         ` David Lang
2005-04-07  5:38           ` Martin Pool
2005-04-07 23:27             ` Linus Torvalds
2005-04-08  5:56               ` Martin Pool
2005-04-08  6:41                 ` Linus Torvalds
2005-04-08  8:38                   ` Andrea Arcangeli
2005-04-08 23:38                     ` Daniel Phillips
2005-04-09  2:54                       ` Andrea Arcangeli
2005-04-09  0:12                     ` Linus Torvalds
2005-04-09  2:27                       ` Andrea Arcangeli
2005-04-09  2:32                         ` David Lang
2005-04-09  3:08                         ` Brian Gerst
2005-04-09  3:15                           ` Andrea Arcangeli
2005-04-09  5:45                         ` Linus Torvalds
2005-04-09 22:55                           ` David S. Miller
2005-04-09 23:13                             ` Linus Torvalds
2005-04-10  0:14                               ` Chris Wedgwood
2005-04-10  1:56                                 ` Paul Jackson
2005-04-10 12:03                                   ` Ingo Molnar
2005-04-10 17:38                                     ` Paul Jackson
2005-04-10 17:46                                       ` Ingo Molnar
2005-04-10 17:56                                         ` Paul Jackson
2005-04-10  0:22                             ` Paul Jackson
2005-04-10 11:33                             ` Ingo Molnar
2005-04-10 17:55                         ` Matthias Andree
2005-04-09 16:33                       ` Roman Zippel
2005-04-09 23:31                         ` Tupshin Harper
2005-04-10 17:24                         ` Code snippet to reconstruct ancestry graph from bk repo Paul P Komkoff Jr
2005-04-10 18:19                           ` Roman Zippel
2005-04-08 16:46                   ` Kernel SCM saga Catalin Marinas
2005-04-07  8:14           ` Magnus Damm
2005-04-07  7:53       ` Zwane Mwaikambo
2005-04-07  3:35     ` Daniel Phillips
2005-04-07 15:08       ` Daniel Phillips
2005-04-07  6:36   ` bert hubert
2005-04-06 23:22 ` Jon Masters
2005-04-07  6:51 ` Paul Mackerras
2005-04-07  7:48   ` Arjan van de Ven
2005-04-07 15:10   ` Linus Torvalds
2005-04-07 17:00     ` Daniel Phillips
2005-04-07 17:38       ` Linus Torvalds
2005-04-07 17:47         ` Chris Wedgwood
2005-04-07 18:06         ` Magnus Damm
2005-04-07 18:36         ` Daniel Phillips
2005-04-08  3:35         ` Jeff Garzik
2005-04-07 19:56       ` Sam Ravnborg
2005-04-07 23:21     ` Dave Airlie
2005-04-07  7:18 ` David Woodhouse
2005-04-07  8:50   ` Andrew Morton
2005-04-07  9:20     ` Paul Mackerras
2005-04-07  9:46       ` Andrew Morton
2005-04-07 11:17         ` Paul Mackerras
2005-04-07 10:41       ` Geert Uytterhoeven
2005-04-07  9:25     ` David Woodhouse
2005-04-07  9:49       ` Andrew Morton
2005-04-07  9:55       ` Russell King
2005-04-07 10:11         ` David Woodhouse
2005-04-07  9:40     ` David Vrabel
2005-04-07  9:24   ` Sergei Organov
2005-04-07 10:30     ` Matthias Andree
2005-04-07 10:54       ` Andrew Walrond
2005-04-09 16:17       ` David Roundy
2005-04-10  9:24         ` Giuseppe Bilotta
2005-04-10 13:51           ` David Roundy
2005-04-07 15:32   ` Linus Torvalds
2005-04-07 17:09     ` Daniel Phillips
2005-04-07 17:10     ` Al Viro
2005-04-07 17:47       ` Linus Torvalds
2005-04-07 18:04         ` Jörn Engel
2005-04-07 18:27           ` Daniel Phillips
2005-04-07 20:54           ` Arjan van de Ven
2005-04-08  3:41         ` Jeff Garzik
2005-04-07 17:52       ` Bartlomiej Zolnierkiewicz
2005-04-07 17:54       ` Daniel Phillips
2005-04-07 18:13         ` Dmitry Yusupov
2005-04-07 18:29           ` Daniel Phillips
2005-04-10 22:33             ` Troy Benjegerdes
2005-04-11  0:00               ` Christian Parpart
2005-04-08 17:24         ` Jon Masters
2005-04-08 22:05           ` Daniel Phillips
2005-04-08 22:52     ` Roman Zippel
2005-04-08 23:46       ` Tupshin Harper
2005-04-09  1:00         ` Roman Zippel
2005-04-09  1:23           ` Tupshin Harper
2005-04-09 16:52       ` Eric D. Mudama
2005-04-09 17:40         ` Roman Zippel
2005-04-09 18:56           ` Ray Lee
2005-04-07  7:44 ` Jan Hudec
2005-04-08  6:14   ` Matthias Urlichs
2005-04-09  1:01   ` Marcin Dalecki
2005-04-09  8:32     ` Jan Hudec
2005-04-11  2:26     ` Miles Bader
2005-04-11  2:56       ` Marcin Dalecki
2005-04-11  6:36         ` Jan Hudec
2005-04-07 10:56 ` Andrew Walrond
2005-04-08  0:57 ` Ian Wienand
2005-04-08  4:13 ` Chris Wedgwood
2005-04-08  4:42   ` Linus Torvalds
2005-04-08  5:04     ` Chris Wedgwood
2005-04-08  5:14       ` H. Peter Anvin
2005-04-08  7:05         ` Rogan Dawes
2005-04-08  7:21           ` Daniel Phillips
2005-04-08  7:49             ` H. Peter Anvin
2005-04-08  7:14     ` Andrea Arcangeli
2005-04-08 12:02       ` Matthias Andree
2005-04-08 12:21         ` Florian Weimer
2005-04-08 14:26       ` Linus Torvalds
2005-04-08 16:15         ` Matthias-Christian Ott
2005-04-08 17:14           ` Linus Torvalds
2005-04-08 17:15             ` Chris Wedgwood
2005-04-08 17:46               ` Linus Torvalds
2005-04-08 18:05                 ` Chris Wedgwood
2005-04-08 19:03                   ` Linus Torvalds
2005-04-08 19:16                     ` Chris Wedgwood
2005-04-08 19:38                       ` Florian Weimer
2005-04-08 19:48                         ` Chris Wedgwood
2005-04-08 19:39                       ` Linus Torvalds
2005-04-08 20:11                         ` Uncached stat performace [ Was: Re: Kernel SCM saga.. ] Ragnar Kjørstad
2005-04-08 20:14                           ` Chris Wedgwood
2005-04-08 20:50                       ` Kernel SCM saga Luck, Tony
2005-04-08 21:27                         ` Linus Torvalds
2005-04-09 17:14                           ` Roman Zippel
2005-04-09  7:20                     ` Willy Tarreau
2005-04-09 15:15                     ` Paul Jackson
2005-04-08 17:25             ` Matthias-Christian Ott
2005-04-08 18:14               ` Linus Torvalds
2005-04-08 18:28                 ` Jon Smirl
2005-04-08 18:58                   ` Florian Weimer
2005-04-09  1:11                   ` Marcin Dalecki
2005-04-09  1:50                     ` David Lang
2005-04-09 22:12                       ` Florian Weimer
2005-04-08 19:16                 ` Matthias-Christian Ott [this message]
2005-04-08 19:32                   ` Linus Torvalds
2005-04-08 19:44                     ` Matthias-Christian Ott
2005-04-09  1:09                 ` Marcin Dalecki
2005-04-08 17:35             ` Jeff Garzik
2005-04-08 18:47               ` Linus Torvalds
2005-04-08 18:56                 ` Chris Wedgwood
2005-04-09  7:37                   ` Willy Tarreau
2005-04-09  7:47                     ` Neil Brown
2005-04-09  8:00                       ` Willy Tarreau
2005-04-09  9:34                         ` Neil Brown
2005-04-09 15:40                 ` Paul Jackson
2005-04-09 16:16                   ` Linus Torvalds
2005-04-09 17:15                     ` Paul Jackson
2005-04-09 17:35                     ` Paul Jackson
2005-04-09  1:04             ` Marcin Dalecki
2005-04-09 15:42               ` Paul Jackson
2005-04-09 18:45                 ` Marcin Dalecki
2005-04-09  1:00           ` Marcin Dalecki
2005-04-09  1:09             ` Chris Wedgwood
2005-04-09  1:21               ` Marcin Dalecki
2005-04-08  7:17     ` ross
2005-04-08 15:50       ` Linus Torvalds
2005-04-09  2:53         ` Petr Baudis
2005-04-09  7:08           ` Randy.Dunlap
2005-04-09 18:06             ` [PATCH] " Petr Baudis
2005-04-10  1:01           ` Phillip Lougher
2005-04-10  1:42             ` Petr Baudis
2005-04-10  1:57               ` Phillip Lougher
2005-04-09 15:50         ` Paul Jackson
2005-04-09 16:26           ` Linus Torvalds
2005-04-09 17:08             ` Paul Jackson
2005-04-10  3:41             ` Paul Jackson
2005-04-10  8:39             ` David Lang
2005-04-10  9:40               ` Junio C Hamano
2005-04-10 16:46                 ` Bill Davidsen
2005-04-10 17:50                   ` Paul Jackson
2005-04-12 23:20                     ` Pavel Machek
2005-04-08  7:34     ` Marcel Lanz
2005-04-08  9:23       ` Geert Uytterhoeven
2005-04-08  8:38     ` Matt Johnston
2005-04-12  7:14     ` Kernel SCM saga.. (bk license?) Kedar Sovani
2005-04-12  9:34       ` Catalin Marinas
2005-04-13  4:04       ` Ricky Beam
2005-04-08 11:42   ` Kernel SCM saga Catalin Marinas
     [not found] <Pine.LNX.4.58.0504060800280.2215 () ppc970 ! osdl ! org>
2005-04-06 21:13 ` kfogel
2005-04-06 22:39   ` Jeff Garzik
2005-04-09  1:00   ` Marcin Dalecki
2005-04-06 22:37 Wolfgang Denk
2005-04-06 23:16 ` Tom Rini
2005-04-06 23:21 ` Eugene Surovegin
2005-04-06 23:33 ` Dan Malek
2005-04-07  0:13   ` Benjamin Herrenschmidt
2005-04-08 22:27 Rajesh Venkatasubramanian
2005-04-08 23:29 ` Linus Torvalds
2005-04-09  0:29   ` Linus Torvalds
2005-04-09 16:20   ` Paul Jackson
2005-04-09  4:06 Walter Landry
2005-04-09 11:02 Samium Gromoff
2005-04-09 11:29 Samium Gromoff
2005-04-10  4:20 Albert Cahalan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4256D87C.5090207@tiscali.de \
    --to=matthias.christian@tiscali.de \
    --cc=andrea@suse.de \
    --cc=cw@f00f.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.