linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Dave Jones <davej@codemonkey.org.uk>,
	torvalds@transmeta.com, linux-kernel@vger.kernel.org
Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support
Date: Wed, 25 Sep 2002 17:34:20 -0400	[thread overview]
Message-ID: <20020925213419.GB9557@think.thunk.org> (raw)
In-Reply-To: <20020925204101.GA5420@suse.de>

On Wed, Sep 25, 2002 at 09:41:01PM +0100, Dave Jones wrote:
> Just curious.. what measurable overhead (if any) is there of indexing
> dirs with smaller numbers of files vs non-indexed ?
> If so, where would be the break-even point ?

It should be negligible to the point of being non-measurable.  If the
number of files fit within a single block, the format doesn't change
at all.  Once the directory is grows so there are two block's worth of
directory entries, then we move to an indexed format, and every file
will be found within two disk block reads, and on average we will need
to do comparisons on half a block worth of directory names.  Without
directory indexing, on average a lookup will succeed in 1.5 disk reads
(50% will require one disk block read, and 50% will require two disk
block reads), and on average we will need to do comparisons on a full
block's worth of directory entries.

So when the directory is two blocks' worth of data, it's a slight lose
if you are looking at things from the point of view of disk reads, but
we're winning already from the point of view of CPU time.  Not that
this matters; it won't be measureable because the block read
statistics only apply the first time you search the directory.  After
that, the directory blocks will be in cache, and the only thing that
matters is the CPU time.  And the amount of CPU time that it takes to
search directory entries, whether it we need to search on average 1
block or half a block worth of directory entries, is small enough to
be not an issue.

Not that it matters, but for the record though, we break-even on disk
reads once the directory is 3 blocks long.  At that point, linear
search will require on average reading 2 blocks (1*1/3 + 2*1/3 + 3*1/3
== 2) and the indexed directory will still require 2 disk reads.  On
the CPU front, the linear search will require comparisons with 1.5
blocks worth of directory entries, while the indexed directory will
still require only 0.5 blocks worth of directory comparisons.  (I'm
ignoring here the CPU time it takes to calculate the hash, but it's
the time to search the directory blocks that matters, since that's
where you'll have all of the memory cache misses that will slow down
the search.)

Oh, and finally, for a small directory, it won't be measurable because
after the first time around, all of the directory entries will fit in
the dcache.  :-)

						- Ted

  parent reply	other threads:[~2002-09-25 21:30 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-25 20:03 tytso
2002-09-25 20:34 ` Andreas Dilger
2002-09-25 20:41 ` Dave Jones
2002-09-25 21:08   ` Andreas Dilger
2002-09-25 21:34   ` Theodore Ts'o [this message]
2002-09-25 22:54 ` Jeff Garzik
2002-09-25 23:29   ` Theodore Ts'o
2002-09-25 23:45     ` Ryan Cumming
2002-09-26  3:27       ` Theodore Ts'o
2002-09-26  5:23         ` Ryan Cumming
2002-09-26  5:57           ` Theodore Ts'o
2002-09-26  6:22             ` Ryan Cumming
2002-09-26 14:05               ` Theodore Ts'o
2002-09-26  6:25             ` Ryan Cumming
2002-09-26 11:25               ` Daniel Egger
2002-09-26  7:41             ` Ryan Cumming
2002-09-26 13:23               ` Theodore Ts'o
2002-09-26 15:42               ` Theodore Ts'o
2002-09-26 19:08                 ` Ryan Cumming
2002-09-26 19:51                   ` Horst von Brand
2002-09-26 19:59                     ` Ryan Cumming
2002-09-26 22:04                   ` Theodore Ts'o
2002-09-26 22:53                     ` Ryan Cumming
2002-09-26 23:57                       ` Theodore Ts'o
2002-09-27  1:00                         ` Ryan Cumming
2002-09-27  3:24                           ` Theodore Ts'o
2002-09-27  4:12                         ` Andreas Dilger
2002-09-27  7:55                           ` Ryan Cumming
2002-09-28  1:20                           ` Ryan Cumming
2002-09-28  1:46                             ` Ryan Cumming
2002-09-28 14:13                             ` Theodore Ts'o
2002-09-28 14:18                               ` Theodore Ts'o
2002-09-28 22:35                                 ` Ryan Cumming
2002-09-28 17:27                               ` [Ext2-devel] " Andreas Dilger
2002-09-28 18:43                                 ` chrisl
2002-09-28 19:45                                 ` chrisl
2002-09-28 22:30                               ` Ryan Cumming
2002-09-29  7:03                               ` [PATCH] fix htree dir corrupt after fsck -fD chrisl
2002-09-29  8:16                                 ` Ryan Cumming
2002-09-29  8:36                                   ` Ryan Cumming
2002-09-30  2:46                                   ` Ryan Cumming
2002-09-29 14:13                                 ` Theodore Ts'o
2002-09-25 23:31 ` [BK PATCH] Add ext3 indexed directory (htree) support Daniel Egger
2002-09-26  0:32   ` Randy.Dunlap
2002-09-26  0:50 ` Aaron Lehmann
2002-09-26  3:28   ` Theodore Ts'o
2002-10-02  9:11 tytso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020925213419.GB9557@think.thunk.org \
    --to=tytso@mit.edu \
    --cc=davej@codemonkey.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    --subject='Re: [BK PATCH] Add ext3 indexed directory (htree) support' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).