archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <>
To: Tomas Mraz <>
Cc: Tom Lord <>,,,
Subject: Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
Date: Fri, 22 Apr 2005 09:13:50 -0700 (PDT)	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <1114069758.5886.9.camel@perun.redhat.usu>

On Thu, 21 Apr 2005, Tomas Mraz wrote:
> However you're right that the original structure proposed by Linus is
> too flat.

You're wrong. 

The thing is, having 256 sybdirectories already eats one _megabyte_ of 
diskspace on common filessystems. If you expand that to be either deeper 
(ie subdirectories within subdirectories), or use more than 8 bits for the 
first level, you'll be using much more.

A megabyte of diskspace is peanuts for a project like Linux, but I think
it matters for small projects. I want git to work reasonably even for
really trivial stuff. 

For example, if you just expand the fan-out to use 12 bits instead of 8, 
you're now using 16MB of diskspace just for the directory structure, even 
for a trivially small project. I just think that sucks.

Secondly, any sane OS (and filesystem) will look up flat directories 
_faster_ than deep directories. Peter Anvid did the testing: a _totally_ 
flat directory is actually the best-performing one. 

I just don't want to go there, because while it's ok to have tens of
thousands of files in one subdirectory, I don't think it's ok to have
hundreds of thousands of files. The 8-bit initial fan-out is very much a
middle ground: we waste some space (and some time) doing it, but it does
make the really horrible case largely go away.

Trust me, the design of git didn't just come out of my *ss. Unlike pretty
much apparently any other SCM engineer in the history of mankind (judging
by the performance crap that is out there), I actually know what performs
well, and I can calculate how much space we waste, and I actually _did_ do
so, and chose a reasonably intelligent middle ground.

You can bicker about the details (should it be 9 bits? should we pack the
names more densely? should we use another algorithm for compression? why
does it bother to use ASCII headers?), but please realize that those are
_details_. And even then, they are details where I bet that I have
selected pretty reasonable initial values.

So my choices may not be optimal, but they are "reasonable across a wide
variety of different parameters". And that includes project size,
filesystem implementation, disk wastage etc etc.

The only "extreme" choice I actually made was to go with the highest
compression level of zlib. I think I made the right choice there too: it
wastes CPU-time, but it's still pretty cheap(*) and keeps getting cheaper.  
And we have a much higher read-to-write ratio that most other systems

(*) I'll also argue that one reason even "-9" is cheap is actually that
most of the files we compress are small. All the metadata files are really
quite small, and most source-files tend to be just a few kB in size too -
I personally believe that _big_ files tend to be things that really change
quite seldom (things with big tables like firmware files etc). And for a
small file, it doesn't actually matter that much, the compression window
just can't grow too much.

So people say that "gzip -9" is expensive, but it's really expensive only
for large files. Try it out, it's just a personal pet theory of mine. But
realize that I _do_ generally think things through. I don't just cobble
together random things. There's a real _reason_ why git runs like a bat
out of hell. It was _designed_.


  parent reply	other threads:[~2005-04-22 16:08 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-20 10:00 [ANNOUNCEMENT] /Arch/ embraces `git' Tom Lord
2005-04-20 10:19 ` Miles Bader
2005-04-20 17:15 ` duchier
2005-04-20 22:40   ` [Gnu-arch-users] Re: [GNU-arch-dev] " Tomas Mraz
2005-04-21  9:09     ` Denys Duchier
2005-04-21 10:21       ` Tomas Mraz
2005-04-21 11:46         ` [Gnu-arch-users] " duchier
2005-04-20 22:51   ` Tomas Mraz
2005-04-21 19:04     ` Tom Lord
2005-04-21 20:35     ` [Gnu-arch-users] Re: [GNU-arch-dev] " Tom Lord
2005-04-20 23:04   ` Tom Lord
2005-04-21  0:05     ` [Gnu-arch-users] Re: [GNU-arch-dev] " Denys Duchier
2005-04-21 20:39       ` [Gnu-arch-users] " Tom Lord
2005-04-21  7:49     ` Tomas Mraz
2005-04-21 21:51       ` [Gnu-arch-users] Re: [GNU-arch-dev] " Tom Lord
2005-04-21 21:52       ` Tom Lord
2005-04-22 16:13       ` Linus Torvalds [this message]
2005-04-22 17:39         ` Edésio Costa e Silva
2005-04-20 21:31 ` Petr Baudis
2005-04-20 21:55   ` C. Scott Ananian
2005-04-20 22:22     ` chunking (Re: [ANNOUNCEMENT] /Arch/ embraces `git') Linus Torvalds
2005-04-20 23:42       ` C. Scott Ananian
2005-04-22 21:02       ` blowing chunks (quick update) C. Scott Ananian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).