linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@digeo.com>
To: Chris Friesen <cfriesen@nortelnetworks.com>
Cc: Daniel Phillips <phillips@arcor.de>,
	"Martin J. Bligh" <mbligh@aracnet.com>,
	Oliver Neukum <oliver@neukum.name>,
	Rob Landley <landley@trommello.org>,
	Linus Torvalds <torvalds@transmeta.com>,
	linux-kernel@vger.kernel.org
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0  -  (NUMA))
Date: Mon, 07 Oct 2002 12:36:48 -0700	[thread overview]
Message-ID: <3DA1E250.1C5F7220@digeo.com> (raw)
In-Reply-To: 3DA1D969.8050005@nortelnetworks.com

Chris Friesen wrote:
> 
> Andrew Morton wrote:
> 
> > Go into ext2_new_inode, replace the call to find_group_dir with
> > find_group_other.  Then untar a kernel tree, unmount the fs,
> > remount it and see how long it takes to do a
> >
> >       `find . -type f  xargs cat > /dev/null'
> >
> > on that tree.  If your disk is like my disk, you will achieve
> > full disk bandwidth.
> 
> Pardon my ignorance, but what's the difference between find_group_dir
> and find_group_other, and why aren't we using find_group_other already
> if its so much faster?
> 

ext2 and ext3 filesystems are carved up into "block groups", aka
"cylinder groups".  Each one is 4096*8 blocks - typically 128 MB.
So you can easily have hundreds of blockgroups on a single partition.

The inode allocator is designed to arrange that files which are within the
same directory fall in the same blockgroup, for locality of reference.

But new directories are placed "far away", in block groups which have
plenty of free space.  (find_group_dir -> find a blockgroup for a
directory).

The thinking here is that files in a separate directory are related,
and files in different directories are unrelated.  So we can take 
advantage of that heuristic - go and use a new blockgroup each time
a new directory is created.  This is a levelling algorithm which
tries to keep all blockgroups at a similar occupancy level.
That's a good thing, because high occupancy levels lead to fragmentation.

find_group_other() is basically first-fit-from-start-of-disk, and
if we use that for directories as well as files, your untar-onto-a-clean-disk
simply lays everything out in a contiguous chunk.

Part of the problem here is that it has got worse over time.  The
size of a blockgroup is hardwired to blocksize*bits-in-a-byte*blocksize.
But disks keep on getting bigger.  Five years ago (when, presumably, this
algorithm was designed), a typical partition had, what?  Maybe four
blockgroups?  Now it has hundreds, and so the "levelling" is levelling
across hundreds of blockgroups and not just a handful.

I did a lot of work on this back in November 2001, mainly testing
with a trace-based workload from Keith Smith.  See
http://www.eecs.harvard.edu/~keith/usenix.1995.html

Al Viro wrote a modified allocator (which nobody understood ;))
based on Orlov's algorithm.

I ended up concluding that the current (sucky) code is indeed
best for minimising long-term fragmentation under slow-growth
scenarios.  And worst for fast-growth.

Orlov was in between on both.

Simply nuking find_group_dir() was best for fast-growth, worst
for slow-growth.

Block allocators are fertile grounds for academic papers.  It's
complex.  There is a risk that you can do something which is
cool in testing, but ends up exploding horridly after a year's
use.  By which time we have ten million deployed systems running like
dogs, damn all we can do about it.

The best solution is to use first-fit and online defrag to fix the
long-term fragmentation.  It really is.  There has been no appreciable
progress on this.

A *practical* solution is to keep a spare partition empty and do
a `cp -a' from one partition onto another once per week and
swizzle the mountpoints.  Because the big copy will unfragment
everything.

ho-hum.  I shall forward-port Orlov, and again attempt to understand
it ;)

  parent reply	other threads:[~2002-10-07 19:31 UTC|newest]

Thread overview: 206+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-24  1:54 [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAID device driver Larry Kessler
2002-09-24  2:22 ` Jeff Garzik
2002-09-26 15:52   ` Alan Cox
2002-09-26 22:55     ` [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice driver Larry Kessler
2002-09-26 22:58       ` Jeff Garzik
2002-09-26 23:07         ` Linus Torvalds
2002-09-27  2:27           ` Jeff Garzik
2002-09-27  4:45             ` Linus Torvalds
2002-09-28  7:46               ` Ingo Molnar
2002-09-28  9:16                 ` jw schultz
2002-09-30 14:05                   ` Denis Vlasenko
2002-09-30 10:22                     ` Tomas Szepe
2002-09-30 11:10                       ` jw schultz
2002-09-30 11:17                       ` Adrian Bunk
2002-09-30 19:48                       ` Rik van Riel
2002-09-30 20:30                         ` Christoph Hellwig
2002-09-28 15:40                 ` Kernel version [Was: Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice driver] Horst von Brand
2002-09-29  1:31                 ` v2.6 vs v3.0 Linus Torvalds
2002-09-29  6:14                   ` james
2002-09-29  6:55                     ` Andre Hedrick
2002-09-29 12:59                     ` Gerhard Mack
2002-09-29 13:46                       ` Dr. David Alan Gilbert
2002-09-29 14:06                         ` Wakko Warner
2002-09-29 15:42                         ` Jens Axboe
2002-09-29 16:21                           ` Alan Cox
2002-09-29 16:17                             ` Jens Axboe
2002-09-30  0:39                             ` Jeff Chua
2002-09-29 16:22                           ` Dave Jones
2002-09-29 16:26                             ` Jens Axboe
2002-09-29 21:46                             ` Matthias Andree
2002-09-30  7:05                               ` Michael Clark
2002-09-30  7:22                                 ` Andrew Morton
2002-09-30 13:08                                   ` Kevin Corry
2002-09-30 13:05                                 ` Kevin Corry
2002-09-30 13:49                                   ` Michael Clark
2002-09-30 14:26                                     ` Kevin Corry
2002-09-30 13:59                                   ` Michael Clark
2002-09-30 15:50                                     ` Kevin Corry
2002-09-29 17:06                       ` Jochen Friedrich
2002-09-29 15:18                     ` Trever L. Adams
2002-09-29 15:45                       ` Jens Axboe
2002-09-29 15:59                         ` Trever L. Adams
2002-09-29 16:06                           ` Jens Axboe
2002-09-29 16:13                             ` Trever L. Adams
2002-09-30  6:54                               ` Kai Henningsen
2002-09-30 18:40                                 ` Bill Davidsen
2002-10-01 12:38                                   ` Matthias Andree
2002-10-04 19:58                                     ` Bill Davidsen
2002-09-29 17:42                     ` Linus Torvalds
2002-09-29 17:54                       ` Rik van Riel
2002-09-29 18:24                       ` Alan Cox
2002-09-30  7:56                         ` Jens Axboe
2002-09-30  9:53                           ` Andre Hedrick
2002-09-30 11:54                             ` Jens Axboe
2002-09-30 12:58                           ` Alan Cox
2002-09-30 13:05                             ` Jens Axboe
2002-10-01  2:17                               ` Andre Hedrick
2002-09-30 16:39                       ` jbradford
2002-09-30 16:47                     ` Pau Aliagas
2002-09-29  7:16                   ` jbradford
2002-09-29  8:08                     ` Jeff Garzik
2002-09-29  8:17                     ` David S. Miller
2002-09-29  9:12                     ` Jens Axboe
2002-09-29 11:19                       ` Murray J. Root
2002-09-29 15:50                         ` Jens Axboe
2002-09-30  7:01                           ` Kai Henningsen
2002-09-29 16:04                         ` Zwane Mwaikambo
2002-09-29 14:56                       ` Alan Cox
2002-09-29 15:38                         ` Jens Axboe
2002-09-29 16:30                           ` Dave Jones
2002-09-29 16:42                           ` Bjoern A. Zeeb
2002-09-29 21:16                           ` Russell King
2002-09-29 21:32                             ` Alan Cox
2002-09-29 21:49                             ` steve
2002-09-29 21:52                           ` Matthias Andree
2002-09-30  7:31                             ` Tomas Szepe
2002-09-30 15:33                           ` Jan Harkes
2002-09-30 18:13                           ` Jeff Willis
2002-09-29 17:48                         ` Linus Torvalds
2002-09-29 18:13                           ` Jaroslav Kysela
2002-09-30 19:32                       ` Bill Davidsen
2002-10-01  6:26                         ` Jens Axboe
2002-10-01  7:54                           ` Mikael Pettersson
2002-10-01  8:27                             ` Jens Axboe
2002-10-01  8:44                               ` jbradford
2002-10-01 11:31                             ` Alan Cox
2002-10-01 11:25                               ` Jens Axboe
2002-09-29 15:34                     ` Andi Kleen
2002-09-29 17:26                       ` Jochen Friedrich
2002-09-29 17:35                         ` Jeff Garzik
2002-09-30  0:00                         ` Andi Kleen
2002-10-01 19:28                         ` IPv6 stability (success story ;) Petr Baudis
2002-09-29  9:15                   ` v2.6 vs v3.0 Jens Axboe
2002-09-29 19:53                     ` james
2002-09-29 15:26                   ` Matthias Andree
2002-09-29 16:24                     ` Alan Cox
2002-09-29 22:00                       ` Matthias Andree
2002-09-30 19:02                       ` Bill Davidsen
2002-09-30 18:37                   ` Bill Davidsen
2002-10-03 15:51               ` [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice) jbradford
2002-10-03 15:57                 ` Linus Torvalds
2002-10-03 16:16                   ` [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem jbradford
2002-10-03 22:30                     ` Greg KH
2002-10-04  6:33                       ` jbradford
2002-10-04  6:37                         ` Greg KH
2002-10-04  7:17                           ` jbradford
2002-10-04  7:30                             ` Greg KH
2002-10-03 16:37                   ` [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice) Alan Cox
2002-10-03 16:56                     ` Linus Torvalds
2002-10-03 17:40                       ` Alan Cox
2002-10-03 19:55                       ` jlnance
2002-10-03 16:51                   ` Dave Jones
2002-10-03 17:04                     ` Alan Cox
2002-10-03 20:43                     ` Andrew Morton
2002-10-03 22:05                       ` Dave Jones
2002-10-04  3:46                         ` Andreas Boman
2002-10-04  7:44                         ` jbradford
2002-10-03 19:51                   ` Rik van Riel
2002-10-04 22:26                   ` [OT] 2.6 not 3.0 - (NUMA) Martin J. Bligh
2002-10-04 23:13                     ` Linus Torvalds
2002-10-05  0:21                       ` Martin J. Bligh
2002-10-05  0:36                         ` Linus Torvalds
2002-10-05  1:25                           ` Michael Hohnbaum
2002-10-05 20:30                       ` The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA)) Rob Landley
2002-10-06  2:15                         ` Andrew Morton
2002-10-06  9:42                           ` Russell King
2002-10-06 17:06                             ` Alan Cox
2002-10-06 13:44                           ` Oliver Neukum
2002-10-06 15:19                             ` Martin J. Bligh
2002-10-06 15:14                               ` Oliver Neukum
2002-10-07  8:08                               ` Helge Hafting
2002-10-07  9:18                                 ` Oliver Neukum
2002-10-07 14:11                                   ` Jan Hudec
2002-10-07 15:01                                     ` Jesse Pollard
2002-10-07 15:34                                       ` Jan Hudec
2002-10-08  3:12                                         ` [OT] " Scott Mcdermott
2002-10-10 23:49                                           ` Mike Fedyk
2002-10-07 15:15                                   ` Martin J. Bligh
2002-10-08 13:49                                   ` Helge Hafting
2002-10-07 17:43                               ` Daniel Phillips
2002-10-07 18:31                                 ` Andrew Morton
2002-10-07 18:51                                   ` Linus Torvalds
2002-10-07 20:14                                     ` Alan Cox
2002-10-07 20:31                                       ` The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not3.0 " Andrew Morton
2002-10-07 20:46                                         ` Linus Torvalds
2002-10-07 20:44                                       ` The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 " Linus Torvalds
2002-10-07 21:16                                         ` The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not3.0 " Andrew Morton
2002-10-07 23:47                                           ` jw schultz
2002-10-11  0:02                                           ` Mike Fedyk
2002-10-07 18:58                                   ` The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 " Chris Friesen
2002-10-07 19:21                                     ` Daniel Phillips
2002-10-07 19:35                                       ` Linus Torvalds
2002-10-08  0:39                                         ` Theodore Ts'o
2002-10-08  2:59                                           ` Andrew Morton
2002-10-08 16:15                                             ` Theodore Ts'o
2002-10-08 19:39                                               ` Andrew Morton
2002-10-08 17:06                                                 ` Rob Landley
2002-10-07 19:36                                     ` Andrew Morton [this message]
2002-10-08  2:36                                       ` Simon Kirby
2002-10-08  2:47                                         ` Daniel Phillips
2002-10-08  2:50                                         ` Andrew Morton
2002-10-08  2:54                                           ` Simon Kirby
2002-10-08  3:00                                             ` Andrew Morton
2002-10-08 16:17                                               ` Theodore Ts'o
2002-10-08 12:49                                           ` jlnance
2002-10-08 17:09                                             ` Andrew Morton
2002-10-10 20:53                                               ` Thomas Zimmerman
2002-10-08 13:54                                       ` Helge Hafting
2002-10-08 15:31                                         ` Andreas Dilger
2002-10-07 19:05                                   ` Daniel Phillips
2002-10-07 19:24                                     ` Linus Torvalds
2002-10-07 20:02                                       ` Daniel Phillips
2002-10-07 20:14                                         ` Andrew Morton
2002-10-07 20:22                                           ` Daniel Phillips
2002-10-07 20:28                                         ` Linus Torvalds
2002-10-07 21:16                                           ` Daniel Phillips
2002-10-07 21:55                                             ` Linus Torvalds
2002-10-07 22:02                                               ` Daniel Phillips
2002-10-07 22:12                                                 ` Andrew Morton
2002-10-08  8:49                                                   ` Padraig Brady
2002-10-07 22:14                                             ` Charles Cazabon
2002-10-30 18:26                                   ` Lee Leahu
2002-10-06  6:33                         ` Martin J. Bligh
2002-10-07  5:28                         ` John Alvord
2002-10-07  8:39                         ` The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 n Giuliano Pochini
2002-10-07 13:56                         ` The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA)) Jesse Pollard
2002-10-07 14:03                           ` Rob Landley
2002-10-08 22:14                             ` Jesse Pollard
2002-10-08 19:11                               ` Rob Landley
2002-10-09  8:17                             ` Alexander Kellett
2002-10-07 18:22                           ` Daniel Phillips
2002-10-08  8:19                             ` Jan Hudec
2002-10-11 23:53                         ` Hans Reiser
2002-10-11 20:26                           ` Rob Landley
2002-10-12  4:14                             ` Nick LeRoy
2002-10-13 17:27                               ` Rob Landley
2002-10-12 10:03                             ` Hans Reiser
2002-10-13 17:32                               ` Rob Landley
2002-10-13 23:51                                 ` Hans Reiser
2002-10-14 16:33                                   ` Rob Landley
2002-10-14  7:10                                 ` Nikita Danilov
2002-10-21 15:36                                   ` [OT] Please don't call it 3.0!! (was Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))) Calin A. Culianu
2002-10-21 16:20                                     ` Wakko Warner
2002-10-12 11:42                             ` The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA)) Matthias Andree
2002-10-12 14:56                               ` Hugh Dickins
2002-09-27 11:32       ` [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice driver Alan Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3DA1E250.1C5F7220@digeo.com \
    --to=akpm@digeo.com \
    --cc=cfriesen@nortelnetworks.com \
    --cc=landley@trommello.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@aracnet.com \
    --cc=oliver@neukum.name \
    --cc=phillips@arcor.de \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).