linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* New version of the ext3 indexed-directory patch
@ 2002-09-20 16:41 tytso
  2002-09-20 17:06 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: tytso @ 2002-09-20 16:41 UTC (permalink / raw)
  To: linux-kernel


I've done a bunch of hacking on the ext3 indexed directory patch, and I
believe it's just about ready for integration with the 2.5 tree.
Testing and comments are appreciated.

The code can be found either via bitkeeper, at:

	bk://extfs.bkbits.net/ext3-htree-2.4
and
	bk://extfs.bkbits.net/ext3-htree-2.5

Or for those people who want straight diffs, patches against 2.4 
and 2.5 can be found at:

	http://thunk.org/tytso/linux/ext3-dxdir

						- Ted

Here are my Release Notes for my changes:
	(i.e., the changes from Christopher Li's port of Daniel
	Phillip's hashtree code)

*) The Ext3 hash-tree code uses the new TEA and half-MD4 hash that ships
	with e2fsprogs 1.28

*) The code has been massively reorganized and rewritten to make it more
	maintainable.  In particular, the horrible mess which had been
	ext3_find_entry and ext3_add_entry has been broken up into
	multiple functions.  This eliminated a lot of goto's, and more
	importantly, found some memory leaks which have been eliminated.
	Some were only in the error paths, but some were in normal code
	paths that would be executed after a split, for example.

*) As a side effect of the reorganization, the ext3_find_entry() and
	ext3_readdir() paths can now support arbitrary numbers of
	indirect levels in the tree.  (The resulting code was simpler,
	too!)   The one code path that still could use some work is the
	split handling on the insert path.  Once that is generalized, we
	will be able to remove the depth restriction on the indexed tree.

*) The error handling has been cleaned up, so that they are appropriately
	reported back up to userspace, where approach.  Ext3_std_error()
	is now called when the journal routines report an error, in line
	with all of the other journalling calls in the rest of the ext3
	codebase.

*) Ext3_readdir now traverses the directory in hash sort order.  This
	(mostly) solves the requirement that readdir not return doubled
	file entries, or skip a block of files, which could happen if
	another process managed to trigger a tree split while the
	readdir operation was going on.  Unfortunately, on 32 bit
	machines, we have to use a 31-bit sort under all conditions,
	even if telldir64() is used in userspace.  This is because the
	VFS uses a single path for sys_lseek() and sys_llseek(), and so
	the ext3 code can't tell whether to send back a 32-bit value or
	a 64-bit value.  And, if we send a 64-bit positional value when
	the 32-bit lseek() was called, lseek() will bomb out.

	This is unfortunate, but the chances of our hitting this failure
	more are relatively small.  Since readdir (well, sys_getdents64)
	returns 4k of data at at time, that means a directory with
	400,000 entries have approximately 3,000 getdents() system
	calls, with each getdents() returning approximately 133 entries.
	In order to trigger the problem, the 31-bit hash collision
	must be positioned such that it straddles one of the 4k getdents
	blocks --- while a split operation is taking place.

	(If we were willing to modify the VFS layer, so that
	ext3_readdir() knew how much space was left in the filldir
	buffer, we could be even smarter about things by stopping if
	there is only space for less than all of the entries that have
	the same hash value.  I'm not entirely convinced this is
	necessary, though.  The risk is low and in the worst case, one
	or two filenames will simply be returned twice by readdir, which
	shouldn't cause problems for most applications.)

*) Ext3_readdir(), ext3_add_entry(), and ext3_find_entry() now
	automatically fall back to the old linear directory code if they
	find a indexed directory format they don't understand.  This
	allows us to add new hash versions, and/or allow the depth of
	the tree to be increased, without losing backwards compatiblity
	with deployed kernels.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: New version of the ext3 indexed-directory patch
  2002-09-20 16:41 New version of the ext3 indexed-directory patch tytso
@ 2002-09-20 17:06 ` Andrew Morton
  2002-09-22 20:00   ` Theodore Ts'o
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2002-09-20 17:06 UTC (permalink / raw)
  To: tytso; +Cc: linux-kernel

tytso@mit.edu wrote:
> 
> I've done a bunch of hacking on the ext3 indexed directory patch, and I
> believe it's just about ready for integration with the 2.5 tree.

Thanks; it's good that this is moving ahead.  I'll update to this
version in the -mm patchkit, so people can test it from there
too.

What is the status of e2fsprogs support for htree?  Is everything covered?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: New version of the ext3 indexed-directory patch
  2002-09-20 17:06 ` Andrew Morton
@ 2002-09-22 20:00   ` Theodore Ts'o
  0 siblings, 0 replies; 3+ messages in thread
From: Theodore Ts'o @ 2002-09-22 20:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Fri, Sep 20, 2002 at 10:06:04AM -0700, Andrew Morton wrote:
> 
> Thanks; it's good that this is moving ahead.  I'll update to this
> version in the -mm patchkit, so people can test it from there
> too.
> 
> What is the status of e2fsprogs support for htree?  Is everything covered?

Almost.  E2fsck support is fully there.  With e2fsprogs 1.28, you
still need to manually set up the dir_index feature flag to convert a
filesystme to use the directory indexing feature:

debugfs -w -R "features dir_index" /dev/hdXX
debugfs -w -R "ssv def_hash_version tea" /dev/hdXX
debugfs -w -R "ssv hash_seed random" /dev/hdXX

In the 1.29 release, mke2fs will create filesystems with the directory
indexing flag set by default, and tune2fs -O dir_index will do set up
the directory indexing flag and the default hash version
automatically.

There is also a bug in e2fsprogs 1.28 so that the -D option to e2fsck
(which optimizes all directories) has a 1 in 512 chance of corrupting
a directory, due to a fenceport error that escaped my testing.  This
will be fixed in 1.29, which should be released very shortly.

Folks who want to play with the latest e2fsprogs get grab it here:

	bk://thunk.org:5000

							- Ted

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2002-09-22 19:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-20 16:41 New version of the ext3 indexed-directory patch tytso
2002-09-20 17:06 ` Andrew Morton
2002-09-22 20:00   ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).