linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Hashing and directories
@ 2001-02-22 23:08 Bill Crawford
  2000-01-01  2:02 ` Pavel Machek
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Bill Crawford @ 2001-02-22 23:08 UTC (permalink / raw)
  To: Linux Kernel; +Cc: H. Peter Anvin, Daniel Phillips

 I was hoping to point out that in real life, most systems that
need to access large numbers of files are already designed to do
some kind of hashing, or at least to divide-and-conquer by using
multi-level directory structures.

 A particular reason for this, apart from filesystem efficiency,
is to make it easier for people to find things, as it is usually
easier to spot what you want amongst a hundred things than among
a thousand or ten thousand.

 A couple of practical examples from work here at Netcom UK (now
Ebone :), would be say DNS zone files or user authentication data.
We use Solaris and NFS a lot, too, so large directories are a bad
thing in general for us, so we tend to subdivide things using a
very simple scheme: taking the first letter and then sometimes
the second letter or a pair of letters from the filename.  This
actually works extremely well in practice, and as mentioned above
provides some positive side-effects.

 So I don't think it would actually be sensible to encourage
anyone to use massive directories for too many tasks.  It has a
fairly unfortunate impact on applying human intervention to a
broken system, for example, if it takes a long time to find a
file you're looking for.

 I guess what I really mean is that I think Linus' strategy of
generally optimizing for the "usual case" is a good thing.  It
is actually quite annoying in general to have that many files in
a single directory (think \winnt\... here).  So maybe it would
be better to focus on the normal situation of, say, a few hundred
files in a directory rather than thousands ...

 I still think it's a good idea to do anything you can to speed
up large directory operations on ext2 though :)

 On the plus side, hashes or anything resembling tree structures
would tend to improve the characteristics of insertion and removal
of entries on even moderately sized directories, which would
probably provide a net gain for many folks.

-- 
/* Bill Crawford, Unix Systems Developer, ebOne, formerly GTS Netcom */
#include "stddiscl.h"

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Re: Hashing and directories
@ 2001-03-07 15:56 Manfred Spraul
  2001-03-07 16:10 ` Jamie Lokier
  0 siblings, 1 reply; 31+ messages in thread
From: Manfred Spraul @ 2001-03-07 15:56 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel

Jamie wrote:

> Linus Torvalds wrote:
> > The long-term solution for this is to create the new VM space for
the
> > new process early, and add it to the list of mm_struct's that the
> > swapper knows about, and then just get rid of the
pages[MAX_ARG_PAGES]
> > array completely and instead just populate the new VM directly. That
> > way the destination is swappable etc, and you can also remove the
> > "put_dirty_page()" loop later on, as the pages will already be in
their
> > right places.
> >
> > It's definitely not a one-liner, but if somebody really feels
strongly
> > about this, then I can tell already that the above is the only way
to do
> > it sanely.

>  Yup. We discussed this years ago, and it nobody thought it important

> enough. mm->mmlist didn't exist then, and creating it it _just_ for

> this feature seemed too intrusive. I agree it's the only sane way to

> completely remove the limit.

I'm not sure that this is the right way: It means that every exec() must
call dup_mmap(), and usually only to copy a few hundert bytes. But I
don't see a sane alternative. I won't propose to create a temporary file
in a kernel tmpfs mount ;-)

--

    Manfred





^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2001-04-27 16:20 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-02-22 23:08 Hashing and directories Bill Crawford
2000-01-01  2:02 ` Pavel Machek
2001-03-01 20:54   ` Alexander Viro
2001-03-01 21:05     ` H. Peter Anvin
2001-03-01 21:13       ` Alexander Viro
2001-03-01 21:24         ` H. Peter Anvin
2001-03-02  9:04         ` Pavel Machek
2001-03-02 12:01           ` Oystein Viggen
2001-03-02 12:26             ` Tobias Ringstrom
2001-03-02 12:58           ` David Weinehall
2001-03-02 19:33           ` Tim Wright
2001-03-12 10:05           ` Herbert Xu
2001-03-12 10:43             ` Xavier Bestel
2001-03-01 21:23       ` Andreas Dilger
2001-03-01 21:26       ` Bill Crawford
2001-03-01 21:05     ` Tigran Aivazian
2001-03-02  8:56       ` Pavel Machek
2001-03-07  0:37         ` Jamie Lokier
2001-03-07  4:03           ` Linus Torvalds
2001-03-07 13:41             ` Jamie Lokier
2001-03-02  9:00     ` Pavel Machek
2001-03-03  0:03   ` Bill Crawford
2001-03-08 12:42   ` Goswin Brederlow
2001-04-27 16:20     ` Daniel Phillips
2001-02-22 23:22 ` H. Peter Anvin
2001-02-22 23:54   ` Bill Crawford
2001-03-10 11:22 ` Kai Henningsen
2001-03-07 15:56 Manfred Spraul
2001-03-07 16:10 ` Jamie Lokier
2001-03-07 16:23   ` Manfred Spraul
2001-03-07 18:21     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).