linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [UPDATE] Directory index for ext2
@ 2001-05-31 16:13 Daniel Phillips
  2001-05-31 19:44 ` [Ext2-devel] " Andreas Dilger
  2001-06-20 14:59 ` Tony Gale
  0 siblings, 2 replies; 17+ messages in thread
From: Daniel Phillips @ 2001-05-31 16:13 UTC (permalink / raw)
  To: linux-kernel, ext2-devel; +Cc: Alexander Viro, Andreas Dilger

Changes:

  - Freshen to 2.4.5
  - EXT2_FEATURE_COMPAT_DIR_INDEX flag finalized
  - Break up ext2_add_entry for aesthetic reasons (Al Viro)
  - Handle more than 64K directories per directory (Andreas Dilger)
  - Bug fix: new inode no longer inherits index flag (Andreas Dilger)
  - Bug fix: correct handling of error on index create (Al Viro)

To-Do:

  - More factoring of ext2_add_entry
  - Fall back to linear search in case of corrupted index
  - Finalize hash function

The patch is available at:

    http://nl.linux.org/~phillips/htree/dx.pcache-2.4.5   

It requires Al Viro's directory-in-pcache patch:

    ftp://ftp.math.psu.edu/pub/viro/ext2-dir-patch-S5.gz

To apply:

    cd mysource/linux
    zcat ext2-dir-patch-S5.gz | patch -p1
    cat dx.pcache-2.4.5 | patch -p0

To create an indexed directory:

    mount /dev/hdxxx /test -o index
    mkdir /test/foo

No known bugs, please test, thanks in advance.

--
Daniel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ext2-devel] [UPDATE] Directory index for ext2
  2001-05-31 16:13 [UPDATE] Directory index for ext2 Daniel Phillips
@ 2001-05-31 19:44 ` Andreas Dilger
  2001-05-31 21:02   ` Daniel Phillips
  2001-06-03  0:19   ` Daniel Phillips
  2001-06-20 14:59 ` Tony Gale
  1 sibling, 2 replies; 17+ messages in thread
From: Andreas Dilger @ 2001-05-31 19:44 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel, ext2-devel, Alexander Viro, Andreas Dilger

Daniel, you write:
>   - Fall back to linear search in case of corrupted index

OK, I have _some_ of the code needed to do this, but in one case I'm
not sure of what your intention was - in dx_probe() you check for
"unused_flags & 1" to signal a bad/unsupported index.  Why only check
the low bit instead of the whole thing?  I currently have:

	if (dir->i_size < dir->i_sb->s_blocksize * 2)
                goto fail;	// bad EXT2_INDEX_FL set, clear?
        if (IS_ERR(bh = ext2_bread (dir, 0, 0)))
                goto fail;      // FIXME error message?
        root = (struct dx_root *) bh->b_data;

	// use the following for production once hash_version is 1
        // if (root->info.hash_version & 3 == 0 || root->info.unused_flags)
        if (root->info.hash_version > 0 || root->info.unused_flags & 1)
                goto fail2;	// unsupported hash format
        if ((indirect = root->info.indirect_levels) > 1)
                goto fail2;	// unsupported hash format
        if (root->info.reserved_zero.v ||
            root->info.info_length < sizeof(root->info))
                goto fail2;	// bad root node data
	...
	if (dx_get_limit(entries) != dx_root_limit(dir, root->info.info_length))
                goto fail2;	// bad root node data
	...
		if (dx_get_limit(entries) != dx_node_limit (dir))
                        goto fail3;     // bad index node data

On lookup it is OK to fall back to linear search, but if we add an entry
via linear we would then overwrite the root index.  We probably want extra
logic so that if we have a bad interior node we overwrite that (or another
leaf instead of killing the root index).  We probably also want to make a
distinction between I/O errors and bad data (currently I just return NULL
for both).  I think Al's idea of doing the validation once on the initial
read is a good one.

Instead of keeping reserved_zero as zero and using it to detect if an
inode number was written there, we could write a magic number there to
signal a valid hash.  Alternately, instead of using hash_version to mark
the hash type, we could leave that unused and make the above magic
number the hash value, which is the hash of some fixed string, e.g.

HASH_V0 := dx_hack_hash("Daniel Phillips", 15) // constant value
HASH_V1 := dx_new_hash("Daniel Phillips", 15)  // constant value

struct dx_root
{
        struct fake_dirent dot;
        char dot_name[4];
        struct fake_dirent dotdot;
        char dotdot_name[4];
        struct dx_root_info {
                le_u32 hash_magic;
                u8 unused;
                u8 info_length; /* 8 */
                u8 indirect_levels;
                u8 unused_flags;
        } info;
        struct dx_entry entries[0];
};

/*
 * Hash value depends on existing hash type, so it is calculated here.
 * For new directories we never use this function, and simply pick the
 * default hash function when creating the new dx_root.
 */
static inline dx_frame *dx_probe(inode *dir, dentry *dentry, u32 *hash)
				 dx_frame *frame)
{
	struct dx_root *root;

	if (IS_ERR(bh = ext2_bread (dir, 0, 0)))
                goto fail;	// return error code
	root = (struct dx_root *) bh->b_data;

	switch (le32_to_cpu(root->info.hash_magic.v)) {
	case HASH_V0:
		// hash-specific dx_root validation here
		*hash = dx_hack_hash(dentry->d_name.name, dentry->d_name.len);
		return dx_probe_hash(dir, hash, frame, bh);
	case HASH_V1:
		// hash-specific dx_root validation here
		*hash = dx_new_hash(dentry->d_name.name, dentry->d_name.len);
		return dx_probe_hash(dir, hash, frame, bh);
	default: printk("unsupported hash %u in directory %lu\n",
			root->info.hash_magic, dir->i_ino);
		brelse(bh);
		*hash = 0;
	}
fail:
	return NULL;
}



>   - Finalize hash function

I noticed something interesting when running "mongo" with debugging on.
It is adding filenames which are only sequential numbers, and the hash
code is basically adding to only two blocks until those blocks are full,
at which point (I guess) the blocks split and we add to two other blocks.

I haven't looked at it closely, but for _example_ it something like:

65531 to block 113
65532 to block 51
65533 to block 51
65534 to block 113
65535 to block 113
(repeats)
65600 to block 21
65601 to block 96
65602 to block 96
65603 to block 21
65604 to block 21
(repeats)

I will have to recompile and run with debugging on again to get actual
output.

To me this would _seem_ bad, as it indicates the hash is not uniformly
distributing the files across the hash space.  However, skewed hashing
may not necessarily be the bad for performance.  It may even be that
because we never have to rebalance the hash index structure that as long
as we don't get large numbers of identical hashes it is just fine if
similar filenames produce similar hash values.  We just keep splitting
the leaf blocks until the hash moves over to a different "range".  For a
balanced tree-based structure a skewed hash would be bad, because you
would have to do full-tree rebalancing very often then.



> No known bugs, please test, thanks in advance.

Running mongo has shown up another bug, I see, but haven't had a chance to
look into yet.  It involves not being able to delete files from an indexed
directory:

rm: cannot remove `/mnt/tmp/testdir1-0-0/d0/d1/d2/d3/509.r': Input/output error

This is after the files had been renamed (.r suffix).  Do we re-hash
directory entries if the file is renamed?  If not, then that would
explain this problem.  It _looks_ like we do the right thing, but the
mongo testing wipes out the filesystem after each test, and the above
message is from a logfile only.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ext2-devel] [UPDATE] Directory index for ext2
  2001-05-31 19:44 ` [Ext2-devel] " Andreas Dilger
@ 2001-05-31 21:02   ` Daniel Phillips
  2001-05-31 22:42     ` Andreas Dilger
  2001-06-03  0:19   ` Daniel Phillips
  1 sibling, 1 reply; 17+ messages in thread
From: Daniel Phillips @ 2001-05-31 21:02 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-kernel, ext2-devel, Alexander Viro, Andreas Dilger

On Thursday 31 May 2001 21:44, Andreas Dilger wrote:
> Daniel, you write:
> >   - Fall back to linear search in case of corrupted index
>
> OK, I have _some_ of the code needed to do this, but in one case I'm
> not sure of what your intention was - in dx_probe() you check for
> "unused_flags & 1" to signal a bad/unsupported index.  Why only check
> the low bit instead of the whole thing?

I guess it really means I've allocated the low bit to mean 
'incompatible change here, old code should give up now', so 
"unused_flags" is a misnomer.

> I currently have:

[code that kmail fsck'd completely in the quote]

I'll incorporate it.

> On lookup it is OK to fall back to linear search, but if we add an
> entry via linear we would then overwrite the root index.  We probably
> want extra logic so that if we have a bad interior node we overwrite
> that (or another leaf instead of killing the root index).  We
> probably also want to make a distinction between I/O errors and bad
> data (currently I just return NULL for both).  I think Al's idea of
> doing the validation once on the initial read is a good one.

I'm doing that in the current patch, for leaf blocks, look at 
ext2_bread.  For index blocks, ext2_bread needs help to know that a 
block is supposed to be an index block.  Add a parameter?

The checks we're doing now aren't that expensive so I decided to take 
the lazy approach and just do them every time.  It's not the same as 
the leaf block checks, which otherwise would need to be in the inner 
loop.

[I'm still thinking about your comments on magic numbers and hash 
versions]

> >   - Finalize hash function
>
> I noticed something interesting when running "mongo" with debugging
> on. It is adding filenames which are only sequential numbers, and the
> hash code is basically adding to only two blocks until those blocks
> are full, at which point (I guess) the blocks split and we add to two
> other blocks.

It's normal for it to start by putting all the entries into the first 
two blocks, but after those are split it should be pretty uniform 
across the resulting 4, and so on.  Can you confirm it's unbalanced?

I used to have a handy hash bucket-dumping function (dx_show_buckets) 
hooked into dir->read, which normally just returns an error.  To get a 
dump you'd cat the directory.  A hokey interface, for debugging only, 
but convenient for coding and using.    This is gone from the page 
cache version and I should hook it back in somehow.

> I will have to recompile and run with debugging on again to get
> actual output.
>
> To me this would _seem_ bad, as it indicates the hash is not
> uniformly distributing the files across the hash space.  However,
> skewed hashing may not necessarily be the bad for performance.  It
> may even be that because we never have to rebalance the hash index
> structure that as long as we don't get large numbers of identical
> hashes it is just fine if similar filenames produce similar hash
> values.  We just keep splitting the leaf blocks until the hash moves
> over to a different "range".  For a balanced tree-based structure a
> skewed hash would be bad, because you would have to do full-tree
> rebalancing very often then.
>
> > No known bugs, please test, thanks in advance.

So much for that ;-)

> Running mongo has shown up another bug, I see, but haven't had a
> chance to look into yet.  It involves not being able to delete files
> from an indexed directory:
>
> rm: cannot remove `/mnt/tmp/testdir1-0-0/d0/d1/d2/d3/509.r':
> Input/output error
>
> This is after the files had been renamed (.r suffix).  Do we re-hash
> directory entries if the file is renamed?  If not, then that would
> explain this problem.  It _looks_ like we do the right thing, but the
> mongo testing wipes out the filesystem after each test, and the above
> message is from a logfile only.

The rename creates the new entry via ext2_add_entry so the hash must be 
correct.  Time to get out the bug swatter.  I'll get mongo and try it.

--
Daniel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ext2-devel] [UPDATE] Directory index for ext2
  2001-05-31 21:02   ` Daniel Phillips
@ 2001-05-31 22:42     ` Andreas Dilger
  0 siblings, 0 replies; 17+ messages in thread
From: Andreas Dilger @ 2001-05-31 22:42 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Andreas Dilger, linux-kernel, ext2-devel, Alexander Viro

Daniel writes:
> On Thursday 31 May 2001 21:44, Andreas Dilger wrote:
> > I think Al's idea of doing the validation once on the initial
> > read is a good one.
> 
> I'm doing that in the current patch, for leaf blocks, look at 
> ext2_bread.  For index blocks, ext2_bread needs help to know that a 
> block is supposed to be an index block.  Add a parameter?

I think we should just get rid of the misconception that ext2_bread()
is a block read function.  It is only used by directory functions.
Instead we should have separate ext2_bread_leaf(), ext2_bread_root(),
ext2_bread_node() which do the appropriate validation for each type
of block.

In ext2_bread_dir() if we really think directory block prealloc is a win
(in addition to the existing in-memory contiguous block prealloc), we
may as well do it each time we split a leaf block, and make them valid
in-use leaf blocks instead of just wasting space on disk (i.e. each split
block has the hash space split into 1/N of the existing space, and we
distribute existing entries across all N blocks).

This way we don't have to split the each directory block so many times.
For indexed directories this is (probably) a net win because we avoid
N extra block splits (i.e. extra copying of leaf blocks), and make the
leaf search space smaller.  On non-indexed ext2 it would be a net loss
because we would still have to read and search each directory block,
even if they are empty.

> It's normal for it to start by putting all the entries into the first 
> two blocks, but after those are split it should be pretty uniform 
> across the resulting 4, and so on.  Can you confirm it's unbalanced?

I don't think that is what I was seeing, because the hash block numbers
were not "->1" and "->2" (which would be the case right after a split),
but rather 30's, 40's, etc.

> > Running mongo has shown up another bug, I see, but haven't had a
> > chance to look into yet.  It involves not being able to delete files
> > from an indexed directory:
> >
> > rm: cannot remove `/mnt/tmp/testdir1-0-0/d0/d1/d2/d3/509.r':
> > Input/output error
> >
> > This is after the files had been renamed (.r suffix).  Do we re-hash
> > directory entries if the file is renamed?  If not, then that would
> > explain this problem.  It _looks_ like we do the right thing, but the
> > mongo testing wipes out the filesystem after each test, and the above
> > message is from a logfile only.
> 
> The rename creates the new entry via ext2_add_entry so the hash must be 
> correct.  Time to get out the bug swatter.  I'll get mongo and try it.

One other point of information.  In the test I was running, it was
always the file "509.r" which had the I/O error (new filesystem each
test run, btw, and no IDE errors in the log).

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ext2-devel] [UPDATE] Directory index for ext2
  2001-05-31 19:44 ` [Ext2-devel] " Andreas Dilger
  2001-05-31 21:02   ` Daniel Phillips
@ 2001-06-03  0:19   ` Daniel Phillips
  1 sibling, 0 replies; 17+ messages in thread
From: Daniel Phillips @ 2001-06-03  0:19 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: ext2-devel, linux-kernel

On Thursday 31 May 2001 21:44, Andreas Dilger wrote:
> I noticed something interesting when running "mongo" with debugging
> on. It is adding filenames which are only sequential numbers, and the
> hash code is basically adding to only two blocks until those blocks
> are full, at which point (I guess) the blocks split and we add to two
> other blocks.
>
> I haven't looked at it closely, but for _example_ it something like:
>
> 65531 to block 113
> 65532 to block 51
> 65533 to block 51
> 65534 to block 113
> 65535 to block 113
> (repeats)
> 65600 to block 21
> 65601 to block 96
> 65602 to block 96
> 65603 to block 21
> 65604 to block 21
> (repeats)
>
> I will have to recompile and run with debugging on again to get
> actual output.
>
> To me this would _seem_ bad, as it indicates the hash is not
> uniformly distributing the files across the hash space.  However,
> skewed hashing may not necessarily be the bad for performance.  It
> may even be that because we never have to rebalance the hash index
> structure that as long as we don't get large numbers of identical
> hashes it is just fine if similar filenames produce similar hash
> values.  We just keep splitting the leaf blocks until the hash moves
> over to a different "range".  For a balanced tree-based structure a
> skewed hash would be bad, because you would have to do full-tree
> rebalancing very often then.

A skewed hash doesn't hurt the fixed-depth tree in the obvious way - it 
just tends to leave a lot of half-full buckets around, which wastes 
about 1/3 of the leaf space.  The reason for this behaviour is, when 
you create a lot of sequentially-related names a skewed hash will tend 
to dump a lot them into a tiny sliver of the hash space, and after 
splitting the little sliver it's quite unlikely that later entries will 
hit the same small range.  The only good protection against this is to 
make the hash function vary wildly if even one bit in the string 
changes.  This is what crc does, and that's why I'm interested in it.

I've rehabilitated the hack_show_dir code, which is enabled by 
#defining DX_HACK.  To kprint a dump of hash buckets and statistics do:

    cat /test_partition/indexed_dir

This dump is extremely helpful in judging the effectiveness of hash 
functions, just take a look at how the range of hash values that gets 
mapped into each leaf block.  Ideally, there should not be too much 
variance.

The format of the dump is:

   bucketnumber:blocknumber hashstart/range (entrycount)

Yusuf Goolamabbas sent me a pointer to some new work on hash functions:

   http://www.isthe.com/chongo/tech/comp/fnv/

I coded up the fnv_hash and included it in today's patch - there is a 
define to select which to use; dx_hack_hash is still the default.

fnv_hash is only a little wose than dx_hack_hash, which still produces 
the most uniform distribution of all the hash functions I've tested.   
But I can see from the dumps that it's still not optimal, it's just 
that all the others are worse.

I still have quite a few leads to follow up on the hashing question.
Next week I hope I'll get time to try crc32 as a hashing function.  I 
hope it doesn't win because I'll need a 1K table to evaluate it, and 
that would be 20% of the whole code size.

The patch:

    http://nl.linux.org/~phillips/htree/dx.pcache-2.4.5-2

--
Daniel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [UPDATE] Directory index for ext2
  2001-05-31 16:13 [UPDATE] Directory index for ext2 Daniel Phillips
  2001-05-31 19:44 ` [Ext2-devel] " Andreas Dilger
@ 2001-06-20 14:59 ` Tony Gale
  2001-06-20 16:02   ` [Ext2-devel] " Theodore Tso
                     ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Tony Gale @ 2001-06-20 14:59 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel, ext2-devel, Alexander Viro, Andreas Dilger


The main problem I have with this is that e2fsck doesn't know how to
deal with it - at least I haven't found a version that will. This makes
it rather difficult to use, especially for your root fs.

And, since I used it, and have since stopped using it, I have a problem
in what all my disk free space disappears over a couple of days - I have
to run fsck to recover it, were it appears as deleted inodes with zero
dtime. I can't say for sure that the dir index stuff is at fault though.
I am currently using 2.4.6-pre3 without the dir patch installed. I am
using the grsecurity patch though.

I have just upgraded to e2fsprogs-1.21 in the hope of sorting it out. If
that fails I'll revert to a clean 2.4.6-pre kernel. Other ideas welcome.

-tony



On 31 May 2001 18:13:43 +0200, Daniel Phillips wrote:
> Changes:
> 
>   - Freshen to 2.4.5
>   - EXT2_FEATURE_COMPAT_DIR_INDEX flag finalized
>   - Break up ext2_add_entry for aesthetic reasons (Al Viro)
>   - Handle more than 64K directories per directory (Andreas Dilger)
>   - Bug fix: new inode no longer inherits index flag (Andreas Dilger)
>   - Bug fix: correct handling of error on index create (Al Viro)
> 
> To-Do:
> 
>   - More factoring of ext2_add_entry
>   - Fall back to linear search in case of corrupted index
>   - Finalize hash function
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ext2-devel] Re: [UPDATE] Directory index for ext2
  2001-06-20 14:59 ` Tony Gale
@ 2001-06-20 16:02   ` Theodore Tso
  2001-06-20 17:27     ` Daniel Phillips
  2001-06-20 16:58   ` Daniel Phillips
  2001-06-25  9:46   ` Tony Gale
  2 siblings, 1 reply; 17+ messages in thread
From: Theodore Tso @ 2001-06-20 16:02 UTC (permalink / raw)
  To: Tony Gale
  Cc: Daniel Phillips, linux-kernel, ext2-devel, Alexander Viro,
	Andreas Dilger

On Wed, Jun 20, 2001 at 03:59:58PM +0100, Tony Gale wrote:
> 
> The main problem I have with this is that e2fsck doesn't know how to
> deal with it - at least I haven't found a version that will. This makes
> it rather difficult to use, especially for your root fs.

Getting e2fsck to deal with directory indexing is on my todo list at
this point.  

Daniel, do you have any preliminary patches to start with, or do I
need to start from scratch?  

						- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [UPDATE] Directory index for ext2
  2001-06-20 14:59 ` Tony Gale
  2001-06-20 16:02   ` [Ext2-devel] " Theodore Tso
@ 2001-06-20 16:58   ` Daniel Phillips
  2001-06-25  9:46   ` Tony Gale
  2 siblings, 0 replies; 17+ messages in thread
From: Daniel Phillips @ 2001-06-20 16:58 UTC (permalink / raw)
  To: Tony Gale; +Cc: linux-kernel, ext2-devel, Alexander Viro, Andreas Dilger

On Wednesday 20 June 2001 16:59, Tony Gale wrote:
> The main problem I have with this is that e2fsck doesn't know how to
> deal with it - at least I haven't found a version that will. This makes
> it rather difficult to use, especially for your root fs.

Good, the file format isn't finalized, this is only recommended for use on a 
test partition.

> And, since I used it, and have since stopped using it, I have a problem
> in what all my disk free space disappears over a couple of days - I have
> to run fsck to recover it, were it appears as deleted inodes with zero
> dtime. I can't say for sure that the dir index stuff is at fault though.

It could well be, I'd appreciate any more data you have on that.

> I am currently using 2.4.6-pre3 without the dir patch installed. I am
> using the grsecurity patch though.
>
> I have just upgraded to e2fsprogs-1.21 in the hope of sorting it out. If
> that fails I'll revert to a clean 2.4.6-pre kernel. Other ideas welcome.
>
> -tony
>
> On 31 May 2001 18:13:43 +0200, Daniel Phillips wrote:
> > Changes:
> >
> >   - Freshen to 2.4.5
> >   - EXT2_FEATURE_COMPAT_DIR_INDEX flag finalized
> >   - Break up ext2_add_entry for aesthetic reasons (Al Viro)
> >   - Handle more than 64K directories per directory (Andreas Dilger)
> >   - Bug fix: new inode no longer inherits index flag (Andreas Dilger)
> >   - Bug fix: correct handling of error on index create (Al Viro)
> >
> > To-Do:
> >
> >   - More factoring of ext2_add_entry
> >   - Fall back to linear search in case of corrupted index
> >   - Finalize hash function

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ext2-devel] Re: [UPDATE] Directory index for ext2
  2001-06-20 16:02   ` [Ext2-devel] " Theodore Tso
@ 2001-06-20 17:27     ` Daniel Phillips
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Phillips @ 2001-06-20 17:27 UTC (permalink / raw)
  To: Theodore Tso, Tony Gale
  Cc: linux-kernel, ext2-devel, Alexander Viro, Andreas Dilger

On Wednesday 20 June 2001 18:02, Theodore Tso wrote:
> On Wed, Jun 20, 2001 at 03:59:58PM +0100, Tony Gale wrote:
> > The main problem I have with this is that e2fsck doesn't know how to
> > deal with it - at least I haven't found a version that will. This makes
> > it rather difficult to use, especially for your root fs.
>
> Getting e2fsck to deal with directory indexing is on my todo list at
> this point.
>
> Daniel, do you have any preliminary patches to start with, or do I
> need to start from scratch?

No, I haven't written any user space code for this at all, the only source is 
the patch itself.  The debug code might be helpful - show_buckets.  It's 
incomplete though, it only shows one level of the tree.  I need to do 
something about that.  I'll also spiff up my pseudocode and data structure 
documentation and forward it.

--
Daniel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [UPDATE] Directory index for ext2
  2001-06-20 14:59 ` Tony Gale
  2001-06-20 16:02   ` [Ext2-devel] " Theodore Tso
  2001-06-20 16:58   ` Daniel Phillips
@ 2001-06-25  9:46   ` Tony Gale
  2001-06-25 16:10     ` Daniel Phillips
  2 siblings, 1 reply; 17+ messages in thread
From: Tony Gale @ 2001-06-25  9:46 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Heusden, Folkert van, linux-kernel, ext2-devel, Alexander Viro,
	Andreas Dilger


After some testing, removing the grsecurity patch seems to have solved
the disappearing-free-space problem. Now just need to find out why.


On 20 Jun 2001 18:58:43 +0200, Daniel Phillips wrote:
> On Wednesday 20 June 2001 16:59, Tony Gale wrote:
> > The main problem I have with this is that e2fsck doesn't know how to
> > deal with it - at least I haven't found a version that will. This makes
> > it rather difficult to use, especially for your root fs.
> 
> Good, the file format isn't finalized, this is only recommended for use on a 
> test partition.

It was a test partition, just happended to be a root one. AFAIAA, isn't
the fact that the file format may change irrelevant. You want people to
test this stuff or not? If it's not tested on a root fs then how do you
know there won't be any problems.

Sorry, but when someone reports a problem, and then you say, well don't
test it like that then, it is really not acceptable.

-tony



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [UPDATE] Directory index for ext2
  2001-06-25  9:46   ` Tony Gale
@ 2001-06-25 16:10     ` Daniel Phillips
  2001-06-25 19:51       ` [Ext2-devel] " Andreas Dilger
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Phillips @ 2001-06-25 16:10 UTC (permalink / raw)
  To: Tony Gale
  Cc: Heusden, Folkert van, linux-kernel, ext2-devel, Alexander Viro,
	Andreas Dilger

On Monday 25 June 2001 11:46, Tony Gale wrote:
> After some testing, removing the grsecurity patch seems to have solved
> the disappearing-free-space problem. Now just need to find out why.
>
> On 20 Jun 2001 18:58:43 +0200, Daniel Phillips wrote:
> > On Wednesday 20 June 2001 16:59, Tony Gale wrote:
> > > The main problem I have with this is that e2fsck doesn't know how to
> > > deal with it - at least I haven't found a version that will. This makes
> > > it rather difficult to use, especially for your root fs.
> >
> > Good, the file format isn't finalized, this is only recommended for use
> > on a test partition.
>
> It was a test partition, just happended to be a root one. AFAIAA, isn't
> the fact that the file format may change irrelevant. You want people to
> test this stuff or not? If it's not tested on a root fs then how do you
> know there won't be any problems.
>
> Sorry, but when someone reports a problem, and then you say, well don't
> test it like that then, it is really not acceptable.

Sure, if your root partition is expendable, by all means go ahead.  Ted has 
already offered to start the required changes to e2fsck, which reminds me, I 
have to send the promised docs.  For now, just use normal fsck and it will 
(in theory) turn the directory indexes back into normal file blocks, and have 
no effect on inodes.  Lets take advantage of the opportunity to test that 
feature.  The new improved e2fsck should be capable of re-adding the indexes, 
so we'll get to test that too ;-)

Lets continue this privately and see what's going on with your inode leak.

--
Daniel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ext2-devel] Re: [UPDATE] Directory index for ext2
  2001-06-25 16:10     ` Daniel Phillips
@ 2001-06-25 19:51       ` Andreas Dilger
  2001-06-25 22:25         ` Daniel Phillips
  2001-06-26  9:27         ` Tony Gale
  0 siblings, 2 replies; 17+ messages in thread
From: Andreas Dilger @ 2001-06-25 19:51 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Tony Gale, Heusden, Folkert van, linux-kernel, ext2-devel,
	Alexander Viro

Daniel writes:
> > > On Wednesday 20 June 2001 16:59, Tony Gale wrote:
> > > > The main problem I have with this is that e2fsck doesn't know how to
> > > > deal with it - at least I haven't found a version that will. This makes
> > > > it rather difficult to use, especially for your root fs.
> 
> Sure, if your root partition is expendable, by all means go ahead.  Ted has 
> already offered to start the required changes to e2fsck, which reminds me, I 
> have to send the promised docs.  For now, just use normal fsck and it will 
> (in theory) turn the directory indexes back into normal file blocks, and have 
> no effect on inodes.

This is only true without the COMPAT_DIR_INDEX flag.  Since e2fsck _needs_
to know about every filesystem feature, it will (correctly) refuse to touch
such a system for now.  You could "tune2fs -O ^FEATURE_C4 /dev/hdX" to
turn of the COMPAT_DIR_INDEX flag and let e2fsck go to town.  That will
break all of the directory indexes, I believe.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ext2-devel] Re: [UPDATE] Directory index for ext2
  2001-06-25 19:51       ` [Ext2-devel] " Andreas Dilger
@ 2001-06-25 22:25         ` Daniel Phillips
  2001-06-26 23:49           ` Theodore Tso
  2001-07-03 10:00           ` [Ext2-devel] Re: [UPDATE] Directory index for ext2 Tony Gale
  2001-06-26  9:27         ` Tony Gale
  1 sibling, 2 replies; 17+ messages in thread
From: Daniel Phillips @ 2001-06-25 22:25 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Tony Gale, Heusden, Folkert van, linux-kernel, ext2-devel,
	Alexander Viro

On Monday 25 June 2001 21:51, Andreas Dilger wrote:
> Daniel writes:
> > > > On Wednesday 20 June 2001 16:59, Tony Gale wrote:
> > > > > The main problem I have with this is that e2fsck doesn't know how
> > > > > to deal with it - at least I haven't found a version that will.
> > > > > This makes it rather difficult to use, especially for your root fs.
> >
> > Sure, if your root partition is expendable, by all means go ahead.  Ted
> > has already offered to start the required changes to e2fsck, which
> > reminds me, I have to send the promised docs.  For now, just use normal
> > fsck and it will (in theory) turn the directory indexes back into normal
> > file blocks, and have no effect on inodes.
>
> This is only true without the COMPAT_DIR_INDEX flag.  Since e2fsck _needs_
> to know about every filesystem feature, it will (correctly) refuse to touch
> such a system for now.  You could "tune2fs -O ^FEATURE_C4 /dev/hdX" to
> turn of the COMPAT_DIR_INDEX flag and let e2fsck go to town.  That will
> break all of the directory indexes, I believe.

This is what he wants, a workaround so he can fsck.  However, the above 
command (on version 1.2-WIP) just gives me:

   Invalid filesystem option set: ^FEATURE_C4

Maybe he should just edit the source so it doesn't set the superblock flag 
for now.

BTW, there doesn't seem to be a --version command in tune2fs.

--
Daniel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ext2-devel] Re: [UPDATE] Directory index for ext2
  2001-06-25 19:51       ` [Ext2-devel] " Andreas Dilger
  2001-06-25 22:25         ` Daniel Phillips
@ 2001-06-26  9:27         ` Tony Gale
  1 sibling, 0 replies; 17+ messages in thread
From: Tony Gale @ 2001-06-26  9:27 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Andreas Dilger, Folkert van, linux-kernel, ext2-devel, Alexander Viro


I use debugfs to remove the flag before fsck'ing:

Start debugfs.

Type

open -f -w /dev/<part>
features -FEATURE_C5

-tony


On 26 Jun 2001 00:25:32 +0200, Daniel Phillips wrote:
> On Monday 25 June 2001 21:51, Andreas Dilger wrote:
> > Daniel writes:
> > > Sure, if your root partition is expendable, by all means go ahead.  Ted
> > > has already offered to start the required changes to e2fsck, which
> > > reminds me, I have to send the promised docs.  For now, just use normal
> > > fsck and it will (in theory) turn the directory indexes back into normal
> > > file blocks, and have no effect on inodes.
> >
> > This is only true without the COMPAT_DIR_INDEX flag.  Since e2fsck _needs_
> > to know about every filesystem feature, it will (correctly) refuse to touch
> > such a system for now.  You could "tune2fs -O ^FEATURE_C4 /dev/hdX" to
> > turn of the COMPAT_DIR_INDEX flag and let e2fsck go to town.  That will
> > break all of the directory indexes, I believe.
> 
> This is what he wants, a workaround so he can fsck.  However, the above 
> command (on version 1.2-WIP) just gives me:
> 
>    Invalid filesystem option set: ^FEATURE_C4
> 
> Maybe he should just edit the source so it doesn't set the superblock flag 
> for now.
> 
> BTW, there doesn't seem to be a --version command in tune2fs.
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ext2-devel] Re: [UPDATE] Directory index for ext2
  2001-06-25 22:25         ` Daniel Phillips
@ 2001-06-26 23:49           ` Theodore Tso
  2001-06-27  0:08             ` [RFC] Checks in ext2_new_block() Alexander Viro
  2001-07-03 10:00           ` [Ext2-devel] Re: [UPDATE] Directory index for ext2 Tony Gale
  1 sibling, 1 reply; 17+ messages in thread
From: Theodore Tso @ 2001-06-26 23:49 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Andreas Dilger, Tony Gale, Heusden, Folkert van, linux-kernel,
	ext2-devel, Alexander Viro

On Tue, Jun 26, 2001 at 12:25:32AM +0200, Daniel Phillips wrote:
> > This is only true without the COMPAT_DIR_INDEX flag.  Since e2fsck _needs_
> > to know about every filesystem feature, it will (correctly) refuse to touch
> > such a system for now.  You could "tune2fs -O ^FEATURE_C4 /dev/hdX" to
> > turn of the COMPAT_DIR_INDEX flag and let e2fsck go to town.  That will
> > break all of the directory indexes, I believe.
> 
> This is what he wants, a workaround so he can fsck.  However, the above 
> command (on version 1.2-WIP) just gives me:
> 
>    Invalid filesystem option set: ^FEATURE_C4
> 
> Maybe he should just edit the source so it doesn't set the superblock flag 
> for now.

I haven't had a chance to analyze the directory index format to see if
an-dirindexing-ignorant e2fsck could do any damage to the index.  It's
probably the case as long as the filesystem isn't corrupted, simply
modifying e2fsck to ignore the compatibility flag won't hurt.  But
it's certainly not something I would recommend for any kind of
production operation.

						- Ted


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC] Checks in ext2_new_block()
  2001-06-26 23:49           ` Theodore Tso
@ 2001-06-27  0:08             ` Alexander Viro
  0 siblings, 0 replies; 17+ messages in thread
From: Alexander Viro @ 2001-06-27  0:08 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-kernel, ext2-devel

	Ted, could you comment on sanity checks in ext2_new_block()?
a)
        if (tmp == le32_to_cpu(gdp->bg_block_bitmap) ||
            tmp == le32_to_cpu(gdp->bg_inode_bitmap) ||
            in_range (tmp, le32_to_cpu(gdp->bg_inode_table),
                      sb->u.ext2_sb.s_itb_per_group))
                ext2_error (sb, "ext2_new_block",
                            "Allocating block in system zone - "
                            "block = %u", tmp);

will go ahead and return the block. Looks like we can do better than that
if we mark it in use (we do that anyway), decremnt relevant free blocks
counters (global and cylinder group one) and goto repeat;

b) we don't do similar checks for blocks we grab in preallocation loop.
And ext2_alloc_block() doesn't do such checks either.

c)
        if (ext2_set_bit (j, bh->b_data)) {
                ext2_warning (sb, "ext2_new_block",
                              "bit already set for block %d", j);
                DQUOT_FREE_BLOCK(sb, inode, 1);
                goto repeat;
        }
is of the "if memory got corrupted during the last dozens of cycles" variety -
we had seen that bit 0 several lines before and we couldn't even block during
that interval (not that it mattered much, since all modifications of these
bitmaps are under lock_super() anyway).

d)
        if (j >= le32_to_cpu(es->s_blocks_count)) {
                ext2_error (sb, "ext2_new_block",
                            "block(%d) >= blocks count(%d) - "
                            "block_group = %d, es == %p ",j,
                        le32_to_cpu(es->s_blocks_count), i, es);
                goto out;
        }
is a bit too late _and_ we don't do anything similar for preallocated blocks.

The question being: which of these checks deserve to stay ((c) doesn't, IMO)
and which deserve to be extended to preallocation? If we do them for
main path, we ought to be at least consistent...


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ext2-devel] Re: [UPDATE] Directory index for ext2
  2001-06-25 22:25         ` Daniel Phillips
  2001-06-26 23:49           ` Theodore Tso
@ 2001-07-03 10:00           ` Tony Gale
  1 sibling, 0 replies; 17+ messages in thread
From: Tony Gale @ 2001-07-03 10:00 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Daniel Phillips, Andreas Dilger, Folkert van, linux-kernel,
	ext2-devel, Alexander Viro


Right, I've now disabled every grsecurity kernel config option, apart
from the overarching "Getrewted Kernel Security" one - indicating the
problem is in one of the non #ifdef parts of the patch. Could this be a
problem:

diff -ruN linux/fs/namei.c linux/fs/namei.c
--- linux/fs/namei.c    Sat May 19 18:02:45 2001
+++ linux/fs/namei.c    Tue May 29 01:23:36 2001
@@ -1851,8 +1963,6 @@
        error = vfs_rename(old_dir->d_inode, old_dentry,
                                   new_dir->d_inode, new_dentry);
        unlock_kernel();
-
-       dput(new_dentry);
 exit4:
        dput(old_dentry);
 exit3:

Thanks

-tony


On 26 Jun 2001 19:49:19 -0400, Theodore Tso wrote:
> On Tue, Jun 26, 2001 at 12:25:32AM +0200, Daniel Phillips wrote:
> > > This is only true without the COMPAT_DIR_INDEX flag.  Since e2fsck _needs_
> > > to know about every filesystem feature, it will (correctly) refuse to touch
> > > such a system for now.  You could "tune2fs -O ^FEATURE_C4 /dev/hdX" to
> > > turn of the COMPAT_DIR_INDEX flag and let e2fsck go to town.  That will
> > > break all of the directory indexes, I believe.
> > 
> > This is what he wants, a workaround so he can fsck.  However, the above 
> > command (on version 1.2-WIP) just gives me:
> > 
> >    Invalid filesystem option set: ^FEATURE_C4
> > 
> > Maybe he should just edit the source so it doesn't set the superblock flag 
> > for now.
> 
> I haven't had a chance to analyze the directory index format to see if
> an-dirindexing-ignorant e2fsck could do any damage to the index.  It's
> probably the case as long as the filesystem isn't corrupted, simply
> modifying e2fsck to ignore the compatibility flag won't hurt.  But
> it's certainly not something I would recommend for any kind of
> production operation.
> 
> 						- Ted
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2001-07-03 10:00 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-05-31 16:13 [UPDATE] Directory index for ext2 Daniel Phillips
2001-05-31 19:44 ` [Ext2-devel] " Andreas Dilger
2001-05-31 21:02   ` Daniel Phillips
2001-05-31 22:42     ` Andreas Dilger
2001-06-03  0:19   ` Daniel Phillips
2001-06-20 14:59 ` Tony Gale
2001-06-20 16:02   ` [Ext2-devel] " Theodore Tso
2001-06-20 17:27     ` Daniel Phillips
2001-06-20 16:58   ` Daniel Phillips
2001-06-25  9:46   ` Tony Gale
2001-06-25 16:10     ` Daniel Phillips
2001-06-25 19:51       ` [Ext2-devel] " Andreas Dilger
2001-06-25 22:25         ` Daniel Phillips
2001-06-26 23:49           ` Theodore Tso
2001-06-27  0:08             ` [RFC] Checks in ext2_new_block() Alexander Viro
2001-07-03 10:00           ` [Ext2-devel] Re: [UPDATE] Directory index for ext2 Tony Gale
2001-06-26  9:27         ` Tony Gale

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).