linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Viro <viro@math.psu.edu>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [Ext2-devel] disk throughput
Date: Mon, 5 Nov 2001 18:36:09 -0500 (EST)	[thread overview]
Message-ID: <Pine.GSO.4.21.0111051811150.27086-100000@weyl.math.psu.edu> (raw)
In-Reply-To: <9s75ku$7u2$1@penguin.transmeta.com>



On Mon, 5 Nov 2001, Linus Torvalds wrote:

> I don't particularly like behaviour that changes over time, so I would
> much rather just state clearly that the current inode allocation
> strategy is obviously complete crap. Proof: simple real-world
> benchmarks, along with some trivial thinking about seek latencies.
> 
> In particular, the way it works now, it will on purpose try to spread
> out inodes over the whole disk. Every new directory will be allocated in
> the group that has the most free inodes, which obviously on average
> means that you try to fill up all groups equally.
 
> Which makes _no_ sense. There is no advantage to trying to spread things
> out, only clear disadvantages.

Wrong.  Trivial example: create skeleton homedirs for 50 new users.
You _really_ don't want all of them in one cylinder group.  Because they
will be slowly filling up with files, while directory structure is very likely
to stay more or less stable.  You want the prefered group for the file
inode to be the same as its parent directory.  _And_ you want data
close to inode if we can afford that.  Worse yet, for data allocation
we use quadratic hash.  Which works nicely _unless_ starting point for
all of them sits in the same group.

See where it's going?  The real issue is ratio of frequencies for
directory and file creation.  The "time-dependent" part is ugly, but
the thing it tries to address is very, very real.  Allocation policy
for a tree created at once is different from allocation policy for
normal use.

Ideally we would need to predict how many (and how large) files
will go into directory.  We can't - we have no time machines.  But
heuristics you've mentioned is clearly broken.  It will end up with
mostly empty trees squeezed into a single cylinder group and when
they start to get populated that will be pure hell.

And yes, it's more than realistic scenario.  Your strategy would make
sense if all directories were created by untaring a large archive.
Which may be fairly accurate for your boxen (or mine, for that matter -
most of the time), but it's not universal.

Benchmarks that try to stress that code tend to be something like
cvs co, tar x, yodda, yodda.  _All_ of them deal only with "fast-growth"
pattern.  And yes, FFS inode allocator sucks for that scenario - no
arguments here.  Unfortunately, the variant you propose will suck for
slow-growth one and that is going to hurt a lot.

The fact that Linux became a huge directory tree means that we tend
to deal with fast-growth scenario quite often.  Not everyone works
on the kernel, though ;-)


  reply	other threads:[~2001-11-05 23:36 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-11-05  2:13 disk throughput Andrew Morton
2001-11-05  3:20 ` Mohammad A. Haque
2001-11-05  3:31   ` Andrew Morton
2001-11-05  3:32 ` [Ext2-devel] " Mike Fedyk
2001-11-05  3:45   ` Andrew Morton
2001-11-05  4:39     ` Mike Fedyk
2001-11-05  7:06     ` Jens Axboe
2001-11-05  7:14       ` Andrew Morton
2001-11-05  7:26         ` Jens Axboe
2001-11-05  7:14       ` Mike Fedyk
2001-11-05  7:18         ` Jens Axboe
2001-11-05  7:18       ` Jens Axboe
2001-11-05  9:14         ` Mike Fedyk
2001-11-05  9:20           ` Jens Axboe
2001-11-05  5:54   ` Albert D. Cahalan
2001-11-05  8:04     ` Andrew Morton
2001-11-05 12:28       ` Matthias Andree
2001-11-05 14:23       ` Alexander Viro
2001-11-05 22:22         ` Andrew Morton
2001-11-05 22:41           ` Andreas Dilger
2001-11-05 22:53             ` Andrew Morton
2001-11-08 15:28               ` Constantin Loizides
2001-11-05 23:14             ` Dan Hollis
2001-11-06 10:52           ` Daniel Phillips
2001-11-06 16:17           ` Jeremy Fitzhardinge
2001-11-08 15:24             ` Constantin Loizides
2001-11-08 16:46             ` Jeremy Fitzhardinge
2001-11-09  6:08               ` Andrew Morton
2001-11-09  8:49               ` Jeremy Fitzhardinge
2001-11-06 21:45           ` Stephen Tweedie
2001-11-05 20:16       ` Andreas Dilger
2001-11-05 20:28         ` m
2001-11-05 21:39           ` Andrew Morton
2001-11-05 22:59             ` Linus Torvalds
2001-11-05 23:36               ` Alexander Viro [this message]
2001-11-05 23:50                 ` Linus Torvalds
2001-11-06  0:03                   ` Linus Torvalds
2001-11-06  1:33                     ` Alexander Viro
2001-11-06  2:10                       ` Linus Torvalds
2001-11-06  3:02                         ` Alexander Viro
2001-11-06  8:39                           ` Alan Cox
2001-11-06  8:37                             ` Alexander Viro
2001-11-06  8:48                               ` Andrew Morton
2001-11-06  3:49                         ` Alexander Viro
2001-11-06  4:01                           ` Linus Torvalds
2001-11-06  4:21                             ` Alexander Viro
2001-11-06  5:01                               ` Linus Torvalds
2001-11-06  5:31                                 ` Andrew Morton
2001-11-06  5:48                                   ` Linus Torvalds
2001-11-06  7:34                                     ` Mike Castle
2001-11-06  7:10                                   ` Kai Henningsen
2001-11-09 22:35                       ` Riley Williams
2001-11-06  1:28                   ` Alexander Viro
2001-11-06  9:16                     ` Wojtek Pilorz
2001-11-06  9:58                       ` Alexander Viro
2001-11-08 12:51                   ` Pavel Machek
2001-11-06 21:48           ` Stephen Tweedie
2001-11-06 23:17             ` ext2/ialloc.c cleanup Alexander Viro
2001-11-07 19:34               ` [Ext2-devel] " Andreas Dilger
2001-11-07 20:02                 ` Alexander Viro
2001-11-08  2:06                   ` Andrew Morton
2001-11-08 20:45                     ` Andrew Morton
2001-11-08 22:16                       ` Alexander Viro
2001-11-08 22:43                         ` Andreas Dilger
2001-11-08 23:08                           ` Alexander Viro
2001-11-09  6:15                             ` Andrew Morton
2001-11-09  6:56                               ` Andreas Dilger
2001-11-09  7:09                                 ` Andrew Morton
2001-11-09  7:12                                 ` Alexander Viro
2001-11-09  7:18                                   ` Andrew Morton
2001-11-05  9:45     ` [Ext2-devel] disk throughput Alex Bligh - linux-kernel
2001-11-05  9:58       ` Alex Bligh - linux-kernel
2001-11-05  8:47 ` Jan Kara
2001-11-05  8:50   ` [Ext2-devel] " Mike Fedyk
2001-11-05  9:01     ` Jan Kara
2001-11-05 12:23 ` Matthias Andree
2001-11-05 22:39   ` Andrew Morton
2001-11-05 23:41     ` Matthias Andree
2001-11-12  6:04 [Ext2-devel] " Yan, Noah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.GSO.4.21.0111051811150.27086-100000@weyl.math.psu.edu \
    --to=viro@math.psu.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).