linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Phillips <phillips@arcor.de>
To: Alex Tomas <bzzz@tmi.comex.ru>
Cc: Alex Tomas <bzzz@tmi.comex.ru>,
	"Martin J. Bligh" <mbligh@aracnet.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	ext2-devel@lists.sourceforge.net, "Theodore Ts'o" <tytso@mit.edu>,
	Andrew Morton <akpm@digeo.com>
Subject: Re: [Bug 417] New: htree much slower than regular ext3
Date: Mon, 10 Mar 2003 18:58:16 +0100	[thread overview]
Message-ID: <20030309175353.B73FAFFE9A@mx12.arcor-online.net> (raw)
In-Reply-To: <m3r89hrp8t.fsf@lexa.home.net>

On Sun 09 Mar 03 08:08, Alex Tomas wrote:
> >>>>> Daniel Phillips (DP) writes:
>
>  DP> On Fri 07 Mar 03 16:46, Alex Tomas wrote:
>  DP> The problem I see with your approach is that the traversal is no
>  DP> longer in hash order, so a leaf split in the middle of a
>  DP> directory traversal could result in a lot of duplicate dirents.
>  DP> I'm not sure there's a way around that.
>
> 1) As far as I understand, duplicates are possible even in classic ext2
>    w/o sortdir/index. See the diagram:
>
>                     Process 1                  Process 2
>
>                     getdents(2) returns
>                     dentry1 (file1 -> Inode1)
>                     dentry2 (file2 -> Inode2)
>
> context switch -->
>                                                unlink(file1), empty dentry1
>                                                creat(file3), Inode3, use
> dentry1 creat(file1), Inode1, use dentry3
>
> context switch -->
>
>                     getdents(2) returns
>                     dentry3(file1 -> Inode1)
>
>
> Am I right?
>
>
> 2) Why do not use hash order for traversal like ext3_dx_readdir() does?
>    Upon reading several dentries within some hash set readdir() sorts them
>    in inode order and returns to an user.
>
>
> with best regards, Alex

You're right, but you still don't win the stuffed poodle.

I put forth the same argument at last year's kernel workshop and was 
overruled on the grounds that, let me see:

  - Duplicates as a result of unlinks are one thing, duplicates as a
    result of creates are another, worse thing.

  - Creating one duplicate as a result of one unlink is one thing,
    creating lots of duplicates with one operation is another, worse
    thing.

  - The rules are written to allow this particular duplicate behavior
    in UFS (design inherited by Ext2/3) because at the time, UFS was
    the only game in town.

The phrase "broken by design" comes to mind.  Ted and Stephen would no doubt 
be happy to elaborate.  Anyway, there's another reason we need to return 
results in hash order, and that is telldir/seekdir, which expects a stable 
enumeration of a directory with no duplicates, even with concurrent 
operations going on, and even if the server is rebooted in the middle of a 
directory traversal.  Not just broken by design, but smashed to pieces by 
design.  And we still have to make it work, for some definition of "work".

Anyway,

  reply	other threads:[~2003-03-09 17:44 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-02-27 17:31 [Bug 417] New: htree much slower than regular ext3 Martin J. Bligh
2003-02-28  2:55 ` Daniel Phillips
2003-02-27 21:00   ` Andreas Dilger
2003-02-28  4:12     ` Daniel Phillips
2003-02-27 21:33       ` Martin J. Bligh
2003-03-13 21:04     ` [Ext2-devel] " Stephen C. Tweedie
2003-03-07 15:46 ` Alex Tomas
2003-03-08 17:38   ` Daniel Phillips
2003-03-07 23:27     ` Theodore Ts'o
2003-03-09 19:26       ` Alex Tomas
2003-03-09  7:08     ` Alex Tomas
2003-03-10 17:58       ` Daniel Phillips [this message]
2003-03-10 21:25       ` Theodore Ts'o
2003-03-11 21:57   ` Bill Davidsen
     [not found] ` <20030307214833.00a37e35.akpm@digeo.com>
     [not found]   ` <20030308010424.Z1373@schatzie.adilger.int>
2003-03-09 22:54     ` [Ext2-devel] " Daniel Phillips
2003-03-08 23:19       ` Andrew Morton
2003-03-09 23:10   ` Daniel Phillips
     [not found] ` <20030309184755.ACC80FCA8C@mx12.arcor-online.net>
     [not found]   ` <m3u1ecl5h8.fsf@lexa.home.net>
2003-03-10 20:45     ` [RFC] Improved inode number allocation for HTree Daniel Phillips
     [not found]       ` <3E6D1D25.5000004@namesys.com>
     [not found]         ` <20030311031216.8A31CEFD5F@mx12.arcor-online.net>
2003-03-11 10:45           ` Hans Reiser
2003-03-11 13:00             ` Helge Hafting
2003-03-11 13:41               ` Daniel Phillips
2003-03-11 17:16                 ` Andreas Dilger
2003-03-11 19:39                 ` Helge Hafting
2003-03-11 20:19                   ` Daniel Phillips
2003-03-11 21:25                 ` atomic kernel operations are very tricky to export to user space (was [RFC] Improved inode number allocation for HTree ) Hans Reiser
2003-03-11 23:49                   ` Jamie Lokier
2003-03-10 20:48     ` [RFC] Improved inode number allocation for HTree Daniel Phillips
2003-03-10 21:04       ` John Bradford
2003-03-10 21:28         ` Andreas Schwab
2003-03-10 21:50           ` Filesystem write priorities, (Was: Re: [RFC] Improved inode number allocation for HTree) John Bradford
2003-03-14 21:55             ` [Ext2-devel] " Stephen C. Tweedie
2003-03-10 21:33         ` [RFC] Improved inode number allocation for HTree Daniel Phillips
2003-03-10 21:47           ` [Ext2-devel] " Bryan O'Sullivan
2003-03-10 22:02             ` Matthew Wilcox
2003-03-11  8:47               ` Jakob Oestergaard
2003-03-11 11:27                 ` John Bradford
2003-03-14 21:57               ` Stephen C. Tweedie
2003-03-15  8:39                 ` jw schultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030309175353.B73FAFFE9A@mx12.arcor-online.net \
    --to=phillips@arcor.de \
    --cc=akpm@digeo.com \
    --cc=bzzz@tmi.comex.ru \
    --cc=ext2-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@aracnet.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).