* [BK PATCH] Add ext3 indexed directory (htree) support @ 2002-09-25 20:03 tytso 2002-09-25 20:34 ` Andreas Dilger ` (4 more replies) 0 siblings, 5 replies; 47+ messages in thread From: tytso @ 2002-09-25 20:03 UTC (permalink / raw) To: torvalds; +Cc: linux-kernel Hi Linus, This changeset contains Daniel Phillip's indexed directory changes, ported to 2.5 by Christopher Li and Andrew Morton, and then extensively cleaned up by me. I also implemented the enhanced dx_readdir code which returns files would be returned in hash order. This was necessary so that concurrent tree splits would not result in filenames to be erroneously returned twice or not at all when the b-tree splits reorganized the directory out from underneath readdir(). This patch significantly increases the speed of using large directories. Creating 100,000 files in a single directory took 38 minutes without directory indexing... and 11 seconds with the directory indexing turned on. I've given this code a good bit of testing, under both 2.4 and 2.5 kernels, and believe that it is ready for prime-time. Please pull it from: bk://extfs.bkbits.net/for-linus-htree-2.5 In order to use the new directory indexing feature, please update your e2fsprogs to 1.29. Existing filesystem can be updated to use directory indexing using the command "tune2fs -O dir_index /dev/hdXXX". This can be done while the filesystem is mounted, and subsequent new directories or directories fit within a single block will be use the new (backwards compatible) dirctory indexing format when they grow beyond a single block. Existing large directories on the filesystem can be converted to use the new indexed directory format by running the following command on an unmounted filesystem "e2fsck -fD /dev/hdXXX". - Ted fs/ext3/Makefile | 2 fs/ext3/dir.c | 298 ++++++++++ fs/ext3/file.c | 3 fs/ext3/hash.c | 215 +++++++ fs/ext3/namei.c | 1305 ++++++++++++++++++++++++++++++++++++++++----- fs/ext3/super.c | 6 include/linux/ext3_fs.h | 86 ++ include/linux/ext3_fs_sb.h | 2 include/linux/ext3_jbd.h | 2 include/linux/rbtree.h | 1 lib/rbtree.c | 16 11 files changed, 1797 insertions(+), 139 deletions(-) (The changes to rbtree.c were to add a new function rb_first(), which returns the first node in the rbtree.) ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-25 20:03 [BK PATCH] Add ext3 indexed directory (htree) support tytso @ 2002-09-25 20:34 ` Andreas Dilger 2002-09-25 20:41 ` Dave Jones ` (3 subsequent siblings) 4 siblings, 0 replies; 47+ messages in thread From: Andreas Dilger @ 2002-09-25 20:34 UTC (permalink / raw) To: tytso; +Cc: torvalds, linux-kernel On Sep 25, 2002 16:03 -0400, tytso@mit.edu wrote: > This patch significantly increases the speed of using large directories. > Creating 100,000 files in a single directory took 38 minutes without > directory indexing... and 11 seconds with the directory indexing turned on. Not mentioned, but very important to note is that the dir indexing code is 100% compatible with older kernels (even back to 1.2 days) for both read and write. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-25 20:03 [BK PATCH] Add ext3 indexed directory (htree) support tytso 2002-09-25 20:34 ` Andreas Dilger @ 2002-09-25 20:41 ` Dave Jones 2002-09-25 21:08 ` Andreas Dilger 2002-09-25 21:34 ` Theodore Ts'o 2002-09-25 22:54 ` Jeff Garzik ` (2 subsequent siblings) 4 siblings, 2 replies; 47+ messages in thread From: Dave Jones @ 2002-09-25 20:41 UTC (permalink / raw) To: tytso; +Cc: torvalds, linux-kernel On Wed, Sep 25, 2002 at 04:03:44PM -0400, tytso@mit.edu wrote: > This patch significantly increases the speed of using large directories. > Creating 100,000 files in a single directory took 38 minutes without > directory indexing... and 11 seconds with the directory indexing turned on. Just curious.. what measurable overhead (if any) is there of indexing dirs with smaller numbers of files vs non-indexed ? If so, where would be the break-even point ? Dave -- | Dave Jones. http://www.codemonkey.org.uk | SuSE Labs ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-25 20:41 ` Dave Jones @ 2002-09-25 21:08 ` Andreas Dilger 2002-09-25 21:34 ` Theodore Ts'o 1 sibling, 0 replies; 47+ messages in thread From: Andreas Dilger @ 2002-09-25 21:08 UTC (permalink / raw) To: Dave Jones, tytso, torvalds, linux-kernel On Sep 25, 2002 21:41 +0100, Dave Jones wrote: > On Wed, Sep 25, 2002 at 04:03:44PM -0400, tytso@mit.edu wrote: > > > This patch significantly increases the speed of using large directories. > > Creating 100,000 files in a single directory took 38 minutes without > > directory indexing... and 11 seconds with the directory indexing turned on. > > Just curious.. what measurable overhead (if any) is there of indexing > dirs with smaller numbers of files vs non-indexed ? > If so, where would be the break-even point ? No overhead at all for directories 1 block in size. The htree code uses the existing "search leaf block" code for such a directory directly. For directories > 1 block in size, you have the index (1 block overhead), but also the benefit that you are only searching 1/N of the blocks for an entry (the leaf block searching code remains the same, just the "which block to search" code is activated. So, in summary, htree is never slower than an un-indexed directory, so there is never really a time when you wouldn't want to use it. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-25 20:41 ` Dave Jones 2002-09-25 21:08 ` Andreas Dilger @ 2002-09-25 21:34 ` Theodore Ts'o 1 sibling, 0 replies; 47+ messages in thread From: Theodore Ts'o @ 2002-09-25 21:34 UTC (permalink / raw) To: Dave Jones, torvalds, linux-kernel On Wed, Sep 25, 2002 at 09:41:01PM +0100, Dave Jones wrote: > Just curious.. what measurable overhead (if any) is there of indexing > dirs with smaller numbers of files vs non-indexed ? > If so, where would be the break-even point ? It should be negligible to the point of being non-measurable. If the number of files fit within a single block, the format doesn't change at all. Once the directory is grows so there are two block's worth of directory entries, then we move to an indexed format, and every file will be found within two disk block reads, and on average we will need to do comparisons on half a block worth of directory names. Without directory indexing, on average a lookup will succeed in 1.5 disk reads (50% will require one disk block read, and 50% will require two disk block reads), and on average we will need to do comparisons on a full block's worth of directory entries. So when the directory is two blocks' worth of data, it's a slight lose if you are looking at things from the point of view of disk reads, but we're winning already from the point of view of CPU time. Not that this matters; it won't be measureable because the block read statistics only apply the first time you search the directory. After that, the directory blocks will be in cache, and the only thing that matters is the CPU time. And the amount of CPU time that it takes to search directory entries, whether it we need to search on average 1 block or half a block worth of directory entries, is small enough to be not an issue. Not that it matters, but for the record though, we break-even on disk reads once the directory is 3 blocks long. At that point, linear search will require on average reading 2 blocks (1*1/3 + 2*1/3 + 3*1/3 == 2) and the indexed directory will still require 2 disk reads. On the CPU front, the linear search will require comparisons with 1.5 blocks worth of directory entries, while the indexed directory will still require only 0.5 blocks worth of directory comparisons. (I'm ignoring here the CPU time it takes to calculate the hash, but it's the time to search the directory blocks that matters, since that's where you'll have all of the memory cache misses that will slow down the search.) Oh, and finally, for a small directory, it won't be measurable because after the first time around, all of the directory entries will fit in the dcache. :-) - Ted ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-25 20:03 [BK PATCH] Add ext3 indexed directory (htree) support tytso 2002-09-25 20:34 ` Andreas Dilger 2002-09-25 20:41 ` Dave Jones @ 2002-09-25 22:54 ` Jeff Garzik 2002-09-25 23:29 ` Theodore Ts'o 2002-09-25 23:31 ` [BK PATCH] Add ext3 indexed directory (htree) support Daniel Egger 2002-09-26 0:50 ` Aaron Lehmann 4 siblings, 1 reply; 47+ messages in thread From: Jeff Garzik @ 2002-09-25 22:54 UTC (permalink / raw) To: tytso; +Cc: torvalds, linux-kernel tytso@mit.edu wrote: > I've given this code a good bit of testing, under both 2.4 and 2.5 > kernels, and believe that it is ready for prime-time. Please pull it > from: > > bk://extfs.bkbits.net/for-linus-htree-2.5 Can you post a GNU patch too, for public lookover and independent integration? Jeff ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-25 22:54 ` Jeff Garzik @ 2002-09-25 23:29 ` Theodore Ts'o 2002-09-25 23:45 ` Ryan Cumming 0 siblings, 1 reply; 47+ messages in thread From: Theodore Ts'o @ 2002-09-25 23:29 UTC (permalink / raw) To: Jeff Garzik; +Cc: torvalds, linux-kernel On Wed, Sep 25, 2002 at 06:54:00PM -0400, Jeff Garzik wrote: > > Can you post a GNU patch too, for public lookover and independent > integration? > Sure. The patch is a bit big for e-mail, but you can find it at: http://thunk.org/tytso/linux/ext3-dxdir/patch-ext3-dxdir-2.5.38 There is also a 2.4.19 patch available as well: http://thunk.org/tytso/linux/ext3-dxdir/patch-ext3-dxdir-2.4.19-2 - Ted ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-25 23:29 ` Theodore Ts'o @ 2002-09-25 23:45 ` Ryan Cumming 2002-09-26 3:27 ` Theodore Ts'o 0 siblings, 1 reply; 47+ messages in thread From: Ryan Cumming @ 2002-09-25 23:45 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 25, 2002 16:29, Theodore Ts'o wrote: > There is also a 2.4.19 patch available as well: > > http://thunk.org/tytso/linux/ext3-dxdir/patch-ext3-dxdir-2.4.19-2 > I got some pretty nasty results with that patch. After enabling the dir_index superblock flag and running e2fsck -fD, the filesystem would spontaneously remount itself read-only (I have errors=remount-ro set) after a few minutes of use. Once I disabled dir_index, e2fsck picked up many duplicate blocks. There doesn't appear to be any severe filesystem damage, however. - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9kkqkLGMzRzbJfbQRAiiSAJwLFZSyHkG9WSGw6HY23g6i+N+z0QCggQEF ixJJJJLCe56O5lL51sTDgaE= =sywJ -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-25 23:45 ` Ryan Cumming @ 2002-09-26 3:27 ` Theodore Ts'o 2002-09-26 5:23 ` Ryan Cumming 0 siblings, 1 reply; 47+ messages in thread From: Theodore Ts'o @ 2002-09-26 3:27 UTC (permalink / raw) To: Ryan Cumming; +Cc: linux-kernel On Wed, Sep 25, 2002 at 04:45:34PM -0700, Ryan Cumming wrote: > I got some pretty nasty results with that patch. After enabling the > dir_index superblock flag and running e2fsck -fD, the filesystem > would spontaneously remount itself read-only (I have > errors=remount-ro set) after a few minutes of use. Once I disabled > dir_index, e2fsck picked up many duplicate blocks. There doesn't > appear to be any severe filesystem damage, however. Well, I'm currently running with 2.4.19 with the ext3 patch, and I'm not having any problems. In fact, my mail directory is an indexed directory, and I'm replying out of it at this very moment. I've also done a number of fairly intensive bitkeeper operations (the deleted/SCCS directory gets is pretty big for the linux kernel), and it seems to be working just fine for me. Are you sure you upgraded to the latest version of e2fsprogs (version 1.29, released yesterday?). I specified version 1.29 explicitly in my last e-mail for a reason --- e2fsprogs 1.28 had a bug where e2fsck -fD had a 1 in 512 chance of corrupting each directory block which it tries to optimize. Unfortunately because the fence post error had only a 1 in 512 chance of triggering, I didn't notice it in my regression tests. - Ted ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 3:27 ` Theodore Ts'o @ 2002-09-26 5:23 ` Ryan Cumming 2002-09-26 5:57 ` Theodore Ts'o 0 siblings, 1 reply; 47+ messages in thread From: Ryan Cumming @ 2002-09-26 5:23 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 25, 2002 20:27, Theodore Ts'o wrote: > Are you sure you upgraded to the latest version of e2fsprogs (version > 1.29, released yesterday?). Yes, I compiled from source explicitly for this purpose. I made a second attempt at getting dir_index to work, with make clean in linux and e2fsprogs. 1) I ran 'tune2fs -O dir_index /dev/hda2". So good so far. 2) Upon rebooting in to single user mode, I ran 'man fsck' to figure out how to get those nifty progressbars ;) 3) While starting man(1), EXT3 began spewing messages in the form: "EXT3-fs error (device (ide0(3,2)): ext3_readdir: directory #4243459 contains a hole at offset xxxxxx" The directory number stayed constant, but the offset was variable. fsck -fD had -not- been run at this point. 4) On reboot, fsck reported: "Directory inode has unallocated block #xx" multiple times. fsck seemed to fully recover the filesystem. I rebooted again for good measure. 5) Not set back, I tried 'man fsck' again. It worked as expected this time. 6) Next, I executed 'e2fsck -CfD /dev/hda2". When it completed, I rebooted. 7) While KDE was trying to start, EXT3 dumped the following to the console: "EXT3-fs error (device ide(3,2)) in start_transaction: Journal has aborted" 8) I rebooted, and fsck said: "Directory inode 131073,block3,offset 528: Directory corrupted" I wasn't so lucky this time, and a good portion of my home directory got eaten. 9) I tried to start KDE again, and the filesystem went read-only halfway through composing an email. I didn't catch the error on the console that time. 10) fsck reported more 'Directory corrupted' errors, and more of my home directory ended up in /lost+found 11) I ran 'tune2fs -O ^dir_index', rebooted, and fsck cleaned up all the index blocks It seems to be running stable now. Linux 2.4.19, UP Athlon, GCC 3.2. - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9kpnBLGMzRzbJfbQRAhlfAKCPiz7J6EWNyk/hlFsMbXjeeT7D7QCeI10u k0yuUW2maTuHhhDMsKRjO5c= =jOkd -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 5:23 ` Ryan Cumming @ 2002-09-26 5:57 ` Theodore Ts'o 2002-09-26 6:22 ` Ryan Cumming ` (2 more replies) 0 siblings, 3 replies; 47+ messages in thread From: Theodore Ts'o @ 2002-09-26 5:57 UTC (permalink / raw) To: Ryan Cumming; +Cc: linux-kernel On Wed, Sep 25, 2002 at 10:23:11PM -0700, Ryan Cumming wrote: > > It seems to be running stable now. Linux 2.4.19, UP Athlon, GCC 3.2. Just to humor me, can you try it with gcc 2.95.4? I just want to eliminate one variable.... > 3) While starting man(1), EXT3 began spewing messages in the form: > "EXT3-fs error (device (ide0(3,2)): ext3_readdir: directory #4243459 contains > a hole at offset xxxxxx" what directory was 4243459? You can use the debugfs's ncheck command to get back a pathname from an inode number? Are you sure the filesystem was consistent before you started this whole procedure? It sounds like you hadn't started modifying directories at this point in the procedure. Yet this error ("directory #XXXX contains a hole") is printed by the non-indexed-directory version of readdir. So that would imply that the directory with the initial error reported on it was not an indexed directory..... very strange! > The directory number stayed constant, but the offset was variable. fsck -fD > had -not- been run at this point. > 4) On reboot, fsck reported: > "Directory inode has unallocated block #xx" > multiple times. fsck seemed to fully recover the filesystem. I rebooted again > for good measure. What were the directory inode numbers, and what pathname did they map to? > 7) While KDE was trying to start, EXT3 dumped the following to the console: > "EXT3-fs error (device ide(3,2)) in start_transaction: Journal has aborted" This message will appear if previously some other part of ext2 reported a filesystem inconsistency. So it's a symptom, and not the root cause of the problem. > 8) I rebooted, and fsck said: > "Directory inode 131073,block3,offset 528: Directory corrupted" > I wasn't so lucky this time, and a good portion of my home directory got > eaten. Against, what was the pathnmae to the inode #131073? This is strange, since I'm not seeing any of the problems that you're seeing. I'm going to need a lot more information if I'm going to have a prayer of a chance of digging into it. - Ted ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 5:57 ` Theodore Ts'o @ 2002-09-26 6:22 ` Ryan Cumming 2002-09-26 14:05 ` Theodore Ts'o 2002-09-26 6:25 ` Ryan Cumming 2002-09-26 7:41 ` Ryan Cumming 2 siblings, 1 reply; 47+ messages in thread From: Ryan Cumming @ 2002-09-26 6:22 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 25, 2002 22:57, Theodore Ts'o wrote: > Just to humor me, can you try it with gcc 2.95.4? I just want to > eliminate one variable.... I'll make a brief run with 2.95.4 shortly. I hope you understand my reluctance to do any more in-depth testing after the data loss I experienced > > 3) While starting man(1), EXT3 began spewing messages in the form: > > "EXT3-fs error (device (ide0(3,2)): ext3_readdir: directory #4243459 > > contains a hole at offset xxxxxx" > > what directory was 4243459? You can use the debugfs's ncheck command > to get back a pathname from an inode number? /usr/share/man/man1 > Are you sure the filesystem was consistent before you started this > whole procedure? Fairly sure, I ran fsck to completion after my first run-in with dir_index. > > The directory number stayed constant, but the offset was variable. fsck > > -fD had -not- been run at this point. > > 4) On reboot, fsck reported: > > "Directory inode has unallocated block #xx" > > multiple times. fsck seemed to fully recover the filesystem. I rebooted > > again for good measure. > > What were the directory inode numbers, and what pathname did they map > to? Sorry, I forgot to mention that the directory inode was 4243459 (/usr/share/man/man1), the same one EXT3 complained about above. > > 7) While KDE was trying to start, EXT3 dumped the following to the > > console: "EXT3-fs error (device ide(3,2)) in start_transaction: Journal > > has aborted" > > This message will appear if previously some other part of ext2 > reported a filesystem inconsistency. So it's a symptom, and not the > root cause of the problem. > > > 8) I rebooted, and fsck said: > > "Directory inode 131073,block3,offset 528: Directory corrupted" > > I wasn't so lucky this time, and a good portion of my home directory got > > eaten. > > Against, what was the pathnmae to the inode #131073? /home/ryan (ugh) - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9kqe0LGMzRzbJfbQRAnUxAJ9NWR7sdunDTFo2brbAV4qGCdHhkgCfZ7yt TiR5s6fv6E+CDDc4KayqFgU= =ycMb -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 6:22 ` Ryan Cumming @ 2002-09-26 14:05 ` Theodore Ts'o 0 siblings, 0 replies; 47+ messages in thread From: Theodore Ts'o @ 2002-09-26 14:05 UTC (permalink / raw) To: Ryan Cumming; +Cc: linux-kernel On Wed, Sep 25, 2002 at 11:22:41PM -0700, Ryan Cumming wrote: > > > 8) I rebooted, and fsck said: > > > "Directory inode 131073,block3,offset 528: Directory corrupted" > > > I wasn't so lucky this time, and a good portion of my home directory got > > > eaten. > > > > Against, what was the pathnmae to the inode #131073? > /home/ryan (ugh) One more thought. The files in your directory weren't gone; they were just moved to the lost+found directory, and you just lost their name->inode numbers mapping. This is recoverable, however, if you had saved that information first. If you must test on filesystems with precious data (as opposed to using scratch disks or loopback mounts), something which is really useful to do in advance is to run the command: e2image -r /dev/hdXX /var/e2image/XXX.e2img This will create a dump of all of the critical filesystem metadata, including the directory blocks. I sometimes ask people to send me compressed e2image dump files when I need to debug a filesystem, since it allows me to see all of the metadata, which is what I care about, without getting need to send all of the data in the filesystem, which (a) might be a privacy problem, and (b) makes the image file significantly bigger to send over the network. It's also useful for recovery purposes. For example, you can run debugfs on the e2image file, and use the ls command to see the original directory, which would have allowed you to relatively easily reconstruct the filenames of the files left in the lost+found directory. Actually, the easiest way to do it would be to write a quick perl or python script to gether up all of the inode numbers of the files in lost+found, and pass it to the debugfs's "ncheck" command, which will map inode numbers to pathnames. You can specify all of the inodes ncheck at once: debugfs -R "ncheck ino1 ino2 ino3 ..." /var/e2image/XXX.e2img Then just parse the output, and move the files from lost+found back to their original location in your home directory. In fact, what e2image can be so helpful that a wise distribution might want to consider automatically running it at boot time, to save critical filesystem metadata. This could be used to reconstruct filesystems and recover critical data in the case of various sorts of hardware failure and/or kernel bugs. - Ted ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 5:57 ` Theodore Ts'o 2002-09-26 6:22 ` Ryan Cumming @ 2002-09-26 6:25 ` Ryan Cumming 2002-09-26 11:25 ` Daniel Egger 2002-09-26 7:41 ` Ryan Cumming 2 siblings, 1 reply; 47+ messages in thread From: Ryan Cumming @ 2002-09-26 6:25 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 25, 2002 22:57, Theodore Ts'o wrote: > On Wed, Sep 25, 2002 at 10:23:11PM -0700, Ryan Cumming wrote: > > It seems to be running stable now. Linux 2.4.19, UP Athlon, GCC 3.2. > > Just to humor me, can you try it with gcc 2.95.4? I just want to > eliminate one variable.... Before I go again, any suggestions on how to reliably capture these error messages? Because the filesystem goes read-only immediately, /var/log/messages is hardly useful. - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9kqhELGMzRzbJfbQRAkENAJ42ihIhFYwW8+0ssBjgY9OJK4u3nQCeKAz5 RUgZltxYOScsTMG9HwAsdso= =au0V -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 6:25 ` Ryan Cumming @ 2002-09-26 11:25 ` Daniel Egger 0 siblings, 0 replies; 47+ messages in thread From: Daniel Egger @ 2002-09-26 11:25 UTC (permalink / raw) To: Ryan Cumming; +Cc: Theodore Ts'o, linux-kernel [-- Attachment #1: Type: text/plain, Size: 767 bytes --] Am Don, 2002-09-26 um 08.25 schrieb Ryan Cumming: > Before I go again, any suggestions on how to reliably capture these error > messages? Because the filesystem goes read-only immediately, > /var/log/messages is hardly useful. Indeed, but I'd wonder why someone whould start syslogd on a r/o filesystem, so probably /var/log/messages is not used at this time anyway. You could set up syslogd to do remote logging to another machine and fire it up before the troubles to capture anything. But of course this requires a) another machine b) syslog set up on both sides for remote logging c) a working network An alternative would be to mount another disk with a different filesystem (reiserfs/FAT) and log to there. -- Servus, Daniel [-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 5:57 ` Theodore Ts'o 2002-09-26 6:22 ` Ryan Cumming 2002-09-26 6:25 ` Ryan Cumming @ 2002-09-26 7:41 ` Ryan Cumming 2002-09-26 13:23 ` Theodore Ts'o 2002-09-26 15:42 ` Theodore Ts'o 2 siblings, 2 replies; 47+ messages in thread From: Ryan Cumming @ 2002-09-26 7:41 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 25, 2002 22:57, Theodore Ts'o wrote: > On Wed, Sep 25, 2002 at 10:23:11PM -0700, Ryan Cumming wrote: > > It seems to be running stable now. Linux 2.4.19, UP Athlon, GCC 3.2. > > Just to humor me, can you try it with gcc 2.95.4? I just want to > eliminate one variable.... Using GCC 2.95.4 seems to stabilize dir_index nicely, both before and after the hdparm -fD run. Only the kernel was recompiled with 2.95.4, I reused the original GCC 3.2 compiled e2fsprogs. - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9krpHLGMzRzbJfbQRArCGAJ0V8NuAtrUQt/HcOVgbVRtXJzhDwQCeOwtS Lkkp52o/ku9W4pXoFl8nARc= =axyt -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 7:41 ` Ryan Cumming @ 2002-09-26 13:23 ` Theodore Ts'o 2002-09-26 15:42 ` Theodore Ts'o 1 sibling, 0 replies; 47+ messages in thread From: Theodore Ts'o @ 2002-09-26 13:23 UTC (permalink / raw) To: Ryan Cumming; +Cc: linux-kernel On Thu, Sep 26, 2002 at 12:41:54AM -0700, Ryan Cumming wrote: > > Using GCC 2.95.4 seems to stabilize dir_index nicely, both before and after > the hdparm -fD run. Only the kernel was recompiled with 2.95.4, I reused the > original GCC 3.2 compiled e2fsprogs. > Thanks for willing to be a Guienea Pig; now I know what to look for. I didn't have a chance to answer your question last night about how to test it without risking your data, but the answer is that that patch doesn't change how ext3 works unless the dir_index feature flag is set. So it would have been safe to boot the patched, kernel, and then create a test filesystem in a plain file, and the mount it using the loop device: dd if=/dev/zero of=test.img bs=1024k count=32 mke2fs -f -j test.img mount -o loop test.img /mnt So I'll take a look at things and try to figure out what GCC bug we might be triggering. It's strange; it's not like the code using inline instructions, or anything else that might be triggering a GCC bug or discrepancy between how GCC 2.95.4 and GCC 3.2 works. In fact, it's all relatively straightforward C code. Anyway, thanks. - Ted ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 7:41 ` Ryan Cumming 2002-09-26 13:23 ` Theodore Ts'o @ 2002-09-26 15:42 ` Theodore Ts'o 2002-09-26 19:08 ` Ryan Cumming 1 sibling, 1 reply; 47+ messages in thread From: Theodore Ts'o @ 2002-09-26 15:42 UTC (permalink / raw) To: Ryan Cumming; +Cc: linux-kernel On Thu, Sep 26, 2002 at 12:41:54AM -0700, Ryan Cumming wrote: > On September 25, 2002 22:57, Theodore Ts'o wrote: > > On Wed, Sep 25, 2002 at 10:23:11PM -0700, Ryan Cumming wrote: > > > It seems to be running stable now. Linux 2.4.19, UP Athlon, GCC 3.2. > > > > Just to humor me, can you try it with gcc 2.95.4? I just want to > > eliminate one variable.... > > Using GCC 2.95.4 seems to stabilize dir_index nicely, both before and after > the hdparm -fD run. Only the kernel was recompiled with 2.95.4, I reused the > original GCC 3.2 compiled e2fsprogs. Hmm... I just tried biult 2.4.19 with the ext3 patch on my UP P3 machine, using GCC 3.2, and I wasn't able to replicate your problem. (This was using Debian's gcc 3.2.1-0pre2 release from testing.) What was the precise GCC 3.2 version you were using, and for what architecture were you compiling for? As I recall P3 kernels should run on Athlons, if memory serves me correctly. (I don't have any Athlons at home, so if it's an Athlon-specific code generation bug, I'll have problems replicating it.) Could you try compiling a kernel for the P3 architecture using GCC 3.2, and see whether or not the problem is there? This is just a shot in the dark, but.... - Ted ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 15:42 ` Theodore Ts'o @ 2002-09-26 19:08 ` Ryan Cumming 2002-09-26 19:51 ` Horst von Brand 2002-09-26 22:04 ` Theodore Ts'o 0 siblings, 2 replies; 47+ messages in thread From: Ryan Cumming @ 2002-09-26 19:08 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 26, 2002 08:42, Theodore Ts'o wrote: > Hmm... I just tried biult 2.4.19 with the ext3 patch on my UP P3 > machine, using GCC 3.2, and I wasn't able to replicate your problem. > (This was using Debian's gcc 3.2.1-0pre2 release from testing.) The whole GCC 3.2 thing was a red herring. Although it ran stable for a few hours last night (cvs up, compiled a kernel, etc), the filesystem was once again read-only when I came to check my mail this morning. The interesting fsck errors this time were: 245782 was part of the orphaned inode list FIXED 245792 was part of the orphaned inode list FIXED 245797... 245782,245792 don't exist according to ncheck. - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9k1tKLGMzRzbJfbQRAj/AAJ9/yPnuSPF3kUTlwFt2wF2JFSJlJwCfc39B ZKxRtt+hpDHQ7XuiyAYkpNM= =MHW8 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 19:08 ` Ryan Cumming @ 2002-09-26 19:51 ` Horst von Brand 2002-09-26 19:59 ` Ryan Cumming 2002-09-26 22:04 ` Theodore Ts'o 1 sibling, 1 reply; 47+ messages in thread From: Horst von Brand @ 2002-09-26 19:51 UTC (permalink / raw) To: Ryan Cumming; +Cc: Theodore Ts'o, linux-kernel Ryan Cumming <ryan@completely.kicks-ass.org> said: > On September 26, 2002 08:42, Theodore Ts'o wrote: > > Hmm... I just tried biult 2.4.19 with the ext3 patch on my UP P3 > > machine, using GCC 3.2, and I wasn't able to replicate your problem. > > (This was using Debian's gcc 3.2.1-0pre2 release from testing.) > The whole GCC 3.2 thing was a red herring. Although it ran stable for a few > hours last night (cvs up, compiled a kernel, etc), the filesystem was once > again read-only when I came to check my mail this morning. > > The interesting fsck errors this time were: > 245782 was part of the orphaned inode list FIXED > 245792 was part of the orphaned inode list FIXED > 245797... > > 245782,245792 don't exist according to ncheck. Old(ish) WD disk, perhaps PIIX IDE? At home I get filesystem corruption if DMA is enabled... -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 19:51 ` Horst von Brand @ 2002-09-26 19:59 ` Ryan Cumming 0 siblings, 0 replies; 47+ messages in thread From: Ryan Cumming @ 2002-09-26 19:59 UTC (permalink / raw) To: Horst von Brand; +Cc: Theodore Ts'o, linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 26, 2002 12:51, Horst von Brand wrote: > > The interesting fsck errors this time were: > > 245782 was part of the orphaned inode list FIXED > > 245792 was part of the orphaned inode list FIXED > > 245797... > > > > 245782,245792 don't exist according to ncheck. > > Old(ish) WD disk, perhaps PIIX IDE? At home I get filesystem corruption if > DMA is enabled... No, 40GB Maxtor, VT133A chipset. This box has experienced -zero- filesystem corruption since it was purchased, up until I tried the directory index patch. And even then, it only shows up with the dir_index flag set. OTOH, much of this corruption could be because journal replay doesn't seem to be happening after a filesystem is remounted read-only due to errors. That would certainly leave the filesystem in an inconsistant state. - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9k2c0LGMzRzbJfbQRAt5lAJ4mSPUgX3EPDsvzLnS0f768Rc51nwCfUpdk ZSt7Q/WjmA1kyPF/C9DS4Nc= =6uMQ -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 19:08 ` Ryan Cumming 2002-09-26 19:51 ` Horst von Brand @ 2002-09-26 22:04 ` Theodore Ts'o 2002-09-26 22:53 ` Ryan Cumming 1 sibling, 1 reply; 47+ messages in thread From: Theodore Ts'o @ 2002-09-26 22:04 UTC (permalink / raw) To: Ryan Cumming; +Cc: linux-kernel On Thu, Sep 26, 2002 at 12:08:54PM -0700, Ryan Cumming wrote: > On September 26, 2002 08:42, Theodore Ts'o wrote: > > Hmm... I just tried biult 2.4.19 with the ext3 patch on my UP P3 > > machine, using GCC 3.2, and I wasn't able to replicate your problem. > > (This was using Debian's gcc 3.2.1-0pre2 release from testing.) > The whole GCC 3.2 thing was a red herring. Although it ran stable for a few > hours last night (cvs up, compiled a kernel, etc), the filesystem was once > again read-only when I came to check my mail this morning. Was there anything in the logs at all? There should be, if the filesystem was remounted read-only. > The interesting fsck errors this time were: > 245782 was part of the orphaned inode list FIXED > 245792 was part of the orphaned inode list FIXED > 245797... > > 245782,245792 don't exist according to ncheck. That's not surprising. What this means is that those inodes were deleted, but since some process still had a file descriptor open for that inode, it was placed on the orphaned inode list. But the directory entry would have already been removed, which is why ncheck couldn't find an associated pathname. The e2fsck error message simply states that these inodes had a dtime which was small enough that it was probably the next entry on the orphaned inode linked list, these inodes weren't actually on the list. At a guess, this probably happened when an error was noted in the filesystem, and the filesystem was forcibly put into the read-only state. That probably arrested some transactions which were not fully completed, and would explain these sorts of fsck errors. The real question is what was the original error that caused the ext3 filesystme to decide it needed to remount the filesystem read-only. That should be in your logs, since calls to ext3_error should always cause printk's explaining what the error was to be sent to the logs. The filesystem wouldn't happen to be running close to full either on the number of blocks or the number of inodes, would it? There's a bug in ext3 (for which Stephen has already posted bug fixes to be applied to the 2.4.20-preX kernels) where an running out of blocks or inodes is being erroneously flagged as a filesystem corruption error, which would mount the filesystem read-only. - Ted ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 22:04 ` Theodore Ts'o @ 2002-09-26 22:53 ` Ryan Cumming 2002-09-26 23:57 ` Theodore Ts'o 0 siblings, 1 reply; 47+ messages in thread From: Ryan Cumming @ 2002-09-26 22:53 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 26, 2002 15:04, Theodore Ts'o wrote: > Was there anything in the logs at all? There should be, if the > filesystem was remounted read-only. There is nothing. grep -i "ext3" /var/log/messages returned a whole lot of the usual: Sep 25 15:51:00 completely kernel: EXT3-fs: mounted filesystem with ordered data mode. Sep 25 15:51:00 completely kernel: EXT3 FS 2.4-0.9.18, 14 May 2002 on ide0(3,2), internal journal The one mildly interesting thing was: Sep 26 11:49:06 (none) kernel: EXT3-fs: INFO: recovery required on readonly filesystem. Sep 26 11:49:06 (none) kernel: EXT3-fs: write access will be enabled during recovery. Sep 26 11:49:06 (none) kernel: EXT3-fs warning (device ide0(3,2)): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure Sep 26 11:49:06 (none) kernel: EXT3-fs warning (device ide0(3,2)): ext3_clear_journal_err: Marking fs in need of filesystem check. Sep 26 11:49:06 (none) kernel: EXT3-fs: ide0(3,2): orphan cleanup on readonly fs Sep 26 11:49:06 (none) kernel: EXT3-fs: recovery complete. Sep 26 11:49:06 (none) kernel: EXT3-fs: mounted filesystem with ordered data mode. Sep 26 11:49:06 (none) kernel: EXT3 FS 2.4-0.9.18, 14 May 2002 on ide0(3,2), internal journal > The real question is what was the original error that caused the ext3 > filesystme to decide it needed to remount the filesystem read-only. > That should be in your logs, since calls to ext3_error should always > cause printk's explaining what the error was to be sent to the logs. I've seen the printk's (although I didn't write down this one). However, they are never hitting the disk. I could try to log the messages to my FreeBSD mail server... > The filesystem wouldn't happen to be running close to full either on > the number of blocks or the number of inodes, would it? Inode count: 4767744 Block count: 9522528 Reserved block count: 476126 Free blocks: 2362117 Free inodes: 4490071 Seems to be fine..... - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9k4/TLGMzRzbJfbQRAgp6AJ9Xp7WviFP9ByQaUJ8Ak9sYVf1C4ACffemS tS4gB+W4WE7VrPKa+tan/68= =MhXA -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 22:53 ` Ryan Cumming @ 2002-09-26 23:57 ` Theodore Ts'o 2002-09-27 1:00 ` Ryan Cumming 2002-09-27 4:12 ` Andreas Dilger 0 siblings, 2 replies; 47+ messages in thread From: Theodore Ts'o @ 2002-09-26 23:57 UTC (permalink / raw) To: Ryan Cumming; +Cc: linux-kernel On Thu, Sep 26, 2002 at 03:53:02PM -0700, Ryan Cumming wrote: > The one mildly interesting thing was: > Sep 26 11:49:06 (none) kernel: EXT3-fs: INFO: recovery required on readonly > filesystem. > Sep 26 11:49:06 (none) kernel: EXT3-fs: write access will be enabled during > recovery. > Sep 26 11:49:06 (none) kernel: EXT3-fs warning (device ide0(3,2)): > ext3_clear_journal_err: Filesystem error recorded from previous mount: IO > failure > Sep 26 11:49:06 (none) kernel: EXT3-fs warning (device ide0(3,2)): > ext3_clear_journal_err: Marking fs in need of filesystem check. > Sep 26 11:49:06 (none) kernel: EXT3-fs: ide0(3,2): orphan cleanup on readonly > fs > Sep 26 11:49:06 (none) kernel: EXT3-fs: recovery complete. > Sep 26 11:49:06 (none) kernel: EXT3-fs: mounted filesystem with ordered data > mode. > Sep 26 11:49:06 (none) kernel: EXT3 FS 2.4-0.9.18, 14 May 2002 on ide0(3,2), > internal journal Wait a second. These messages would occur only if you had done a read-only mount at 11:49:06. Did you do a manual mount at that time? Do you have one or more filesystems in your /etc/fstab (in particular /dev/hda2) that are set to be mounted read-only? That's the only thing that would explain the "write access enabled during recovery of readonly filesystem" warning message. That message means that /dev/hda2 was readonly because the mount command *requested* that it be mounted read-only, not because of some error. The other strange thing from these syslog messages is that, (a) they indicate that the filesystem wasn't checked by e2fsck before they were mounted --- which is what I generally recommend: let e2fsck run the journal take take care of the recovery, and (b) the reason why the filesystem was marked as being "in error" is because the previous time the filesystem was mounted, something reported an IO error. That could be a hardware problem, but it's also used as a generic error by much of the exte filesystem code, so we really need the log entry, which should have been the previous time this filesystem had been mounted. How is your system configured vis-a-vis the /etc/fstab entry for /dev/hda2? - Ted ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 23:57 ` Theodore Ts'o @ 2002-09-27 1:00 ` Ryan Cumming 2002-09-27 3:24 ` Theodore Ts'o 2002-09-27 4:12 ` Andreas Dilger 1 sibling, 1 reply; 47+ messages in thread From: Ryan Cumming @ 2002-09-27 1:00 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 26, 2002 16:57, Theodore Ts'o wrote: > Wait a second. These messages would occur only if you had done a > read-only mount at 11:49:06. Did you do a manual mount at that time? > Do you have one or more filesystems in your /etc/fstab (in particular > /dev/hda2) that are set to be mounted read-only? Only /dev/cdrom... > That's the only > thing that would explain the "write access enabled during recovery of > readonly filesystem" warning message. That message means that > /dev/hda2 was readonly because the mount command *requested* that it > be mounted read-only, not because of some error. Would init remounting the filesystem read-only before a reboot explain that? 11:49 is around the time I came to check my mail. > How is your system configured vis-a-vis the /etc/fstab entry for > /dev/hda2? /dev/hda2 / ext3 defaults,errors=remount-ro 0 1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9k62rLGMzRzbJfbQRAlcSAKCMNMTRNN0D/T3GA7eM3HnzM9MVMgCfT1w6 V6JU4fqtHDN8pzsLSdo0mzw= =kt7X -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-27 1:00 ` Ryan Cumming @ 2002-09-27 3:24 ` Theodore Ts'o 0 siblings, 0 replies; 47+ messages in thread From: Theodore Ts'o @ 2002-09-27 3:24 UTC (permalink / raw) To: Ryan Cumming; +Cc: linux-kernel On Thu, Sep 26, 2002 at 06:00:23PM -0700, Ryan Cumming wrote: > > > That's the only > > thing that would explain the "write access enabled during recovery of > > readonly filesystem" warning message. That message means that > > /dev/hda2 was readonly because the mount command *requested* that it > > be mounted read-only, not because of some error. > Would init remounting the filesystem read-only before a reboot explain that? > 11:49 is around the time I came to check my mail. No, it doesn't. The message "Filesystem recorded error from previous mount" can only happen on an initial mount (which from the other messages must have been a read-only mount), or on remount of a read-only mount to a read-write mount. Remounting read-only wouldn't explain these syslogs. - Ted ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 23:57 ` Theodore Ts'o 2002-09-27 1:00 ` Ryan Cumming @ 2002-09-27 4:12 ` Andreas Dilger 2002-09-27 7:55 ` Ryan Cumming 2002-09-28 1:20 ` Ryan Cumming 1 sibling, 2 replies; 47+ messages in thread From: Andreas Dilger @ 2002-09-27 4:12 UTC (permalink / raw) To: Theodore Ts'o, Ryan Cumming, linux-kernel On Sep 26, 2002 19:57 -0400, Theodore Ts'o wrote: > On Thu, Sep 26, 2002 at 03:53:02PM -0700, Ryan Cumming wrote: > > The one mildly interesting thing was: > > Sep 26 11:49:06 (none) kernel: EXT3-fs: INFO: recovery required on readonly > > filesystem. > > Sep 26 11:49:06 (none) kernel: EXT3-fs: write access will be enabled during > > recovery. > > Sep 26 11:49:06 (none) kernel: EXT3-fs warning (device ide0(3,2)): > > ext3_clear_journal_err: Filesystem error recorded from previous mount: IO > > failure > > Sep 26 11:49:06 (none) kernel: EXT3-fs warning (device ide0(3,2)): > > ext3_clear_journal_err: Marking fs in need of filesystem check. > > Sep 26 11:49:06 (none) kernel: EXT3-fs: ide0(3,2): orphan cleanup on readonly > > fs > > Sep 26 11:49:06 (none) kernel: EXT3-fs: recovery complete. > > Sep 26 11:49:06 (none) kernel: EXT3-fs: mounted filesystem with ordered data > > mode. > > Sep 26 11:49:06 (none) kernel: EXT3 FS 2.4-0.9.18, 14 May 2002 on ide0(3,2), > > internal journal > > Wait a second. These messages would occur only if you had done a > read-only mount at 11:49:06. Did you do a manual mount at that time? > Do you have one or more filesystems in your /etc/fstab (in particular > /dev/hda2) that are set to be mounted read-only? That's the only > thing that would explain the "write access enabled during recovery of > readonly filesystem" warning message. That message means that > /dev/hda2 was readonly because the mount command *requested* that it > be mounted read-only, not because of some error. Ryan Cumming replied: > /dev/hda2 / ext3 defaults,errors=remount-ro Ted, this is Ryan's root filesystem, so it _has_ to be mounted read-only and have the kernel do journal recovery on it. That said, there must have been an error from the previous boot which caused it to have an error recorded in the journal. Rather than Ryan running dir-indexing on his root filesystem, I would suggest that he instead test with a loopback filesystem to see if he can reproduce the errors without destroying his data. Ryan - you should run "tune2fs -O ^dir_index /dev/hda2" to turn off indexing on your root filesystem, and run a full e2fsck on it (e2fsck -f /dev/hda2). After that, we'd be happy if you could test with a loopback filesystem: touch /tmp/foo mke2fs -j -F /tmp/foo 100000 mkdir /mnt/tmp mount -o loop /tmp/foo /mnt/tmp and run tests on /mnt/tmp instead of your root filesystem. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-27 4:12 ` Andreas Dilger @ 2002-09-27 7:55 ` Ryan Cumming 2002-09-28 1:20 ` Ryan Cumming 1 sibling, 0 replies; 47+ messages in thread From: Ryan Cumming @ 2002-09-27 7:55 UTC (permalink / raw) To: Andreas Dilger; +Cc: Theodore Ts'o, linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 26, 2002 21:12, Andreas Dilger wrote: > After that, we'd be happy if you could test with a loopback filesystem: > > touch /tmp/foo > mke2fs -j -F /tmp/foo 100000 > mkdir /mnt/tmp > mount -o loop /tmp/foo /mnt/tmp > > and run tests on /mnt/tmp instead of your root filesystem. Okay, I followed those steps, and gave /mnt a fairly good fsstress'ing. No FS errors were encountered during the run. Upon umount and fsck, I got the following error: "Problem in HTREE directory inode 2 (/): bad block number 3617593." fsck then "optimized" inode 2 in pass 3A. Will beat up the filesystem some more ;) - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9lA8ILGMzRzbJfbQRAtSnAJwNp9um7iTwK2cEpQo4OlGOGjTp4ACgj5lo 8JPvFW0jS18DOPFN5bBccUg= =XzaF -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-27 4:12 ` Andreas Dilger 2002-09-27 7:55 ` Ryan Cumming @ 2002-09-28 1:20 ` Ryan Cumming 2002-09-28 1:46 ` Ryan Cumming 2002-09-28 14:13 ` Theodore Ts'o 1 sibling, 2 replies; 47+ messages in thread From: Ryan Cumming @ 2002-09-28 1:20 UTC (permalink / raw) To: Andreas Dilger, Theodore Ts'o, linux-kernel [-- Attachment #1: clearsigned data --] [-- Type: Text/Plain, Size: 879 bytes --] -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 26, 2002 21:12, Andreas Dilger wrote: > After that, we'd be happy if you could test with a loopback filesystem: Okay, got another one: "EXT3-fs error (device loop(7,0)): ext3_add_entry: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=8, name_len=0" fsck then reported: "Directory inode 2, block 166, offset 0: directory corrupted" This is while deleteing an old fsstress directory (a full fsck had been performed since the last time the fsstress directory had been touched) while running a few instances of the attached program. You guys have any idea what's going on yet? - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9lQPpLGMzRzbJfbQRAlzdAJ9mbiu8lRBbcFZXE/tD94Ad9DoAowCdFzab JkelXV/8C4fsqEqg9TfBvb0= =VA58 -----END PGP SIGNATURE----- [-- Attachment #2: dir-ream.c --] [-- Type: text/x-csrc, Size: 982 bytes --] #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <sys/time.h> #include <time.h> #define MAX_FILE_COUNT 4000000 int main(int argc, char *argv[]) { while(1) { unsigned int seed; struct timeval tv; int i; int count; gettimeofday(&tv, NULL); seed = tv.tv_usec; count = seed % MAX_FILE_COUNT; srand(seed); printf("Creating %i files\n" , count); for(i = 0; i < count;i++) { char filename[32]; snprintf(filename, 32, "%x", rand()); close(open(filename, O_CREAT | O_RDONLY)); } printf("Performing %i random lookups\n" , count); for(i = 0; i < count;i++) { char filename[32]; struct stat useless; snprintf(filename, 32, "%x", rand()); stat(filename, &useless); } srand(seed); printf("Unlinking %i files\n" , count); for(i = 0; i < count;i++) { char filename[32]; snprintf(filename, 32, "%x", rand()); unlink(filename); } } } ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-28 1:20 ` Ryan Cumming @ 2002-09-28 1:46 ` Ryan Cumming 2002-09-28 14:13 ` Theodore Ts'o 1 sibling, 0 replies; 47+ messages in thread From: Ryan Cumming @ 2002-09-28 1:46 UTC (permalink / raw) To: Andreas Dilger, Theodore Ts'o, linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 27, 2002 18:20, Ryan Cumming wrote: > Okay, got another one: > "EXT3-fs error (device loop(7,0)): ext3_add_entry: bad entry in directory > #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=8, > name_len=0" fsck then reported: > "Directory inode 2, block 166, offset 0: directory corrupted" > > This is while deleteing an old fsstress directory (a full fsck had been > performed since the last time the fsstress directory had been touched) > while running a few instances of the attached program. You can get a 6.1MiB bzip2'ed image of the broken filesystem (fsck hasn't touched this copy) at: http://completely.kicks-ass.org/image-broken.ext3.bz2 - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9lQnrLGMzRzbJfbQRAqhBAJ9EuJ6OhH13W1B4LWjnj8IhyIMX6QCgobUt gphiPMuRMSdewLp+67phv4g= =10yO -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-28 1:20 ` Ryan Cumming 2002-09-28 1:46 ` Ryan Cumming @ 2002-09-28 14:13 ` Theodore Ts'o 2002-09-28 14:18 ` Theodore Ts'o ` (3 more replies) 1 sibling, 4 replies; 47+ messages in thread From: Theodore Ts'o @ 2002-09-28 14:13 UTC (permalink / raw) To: Ryan Cumming; +Cc: Andreas Dilger, linux-kernel, ext2-devel [-- Attachment #1: Type: text/plain, Size: 2762 bytes --] On Fri, Sep 27, 2002 at 06:20:27PM -0700, Ryan Cumming wrote: > > This is while deleteing an old fsstress directory (a full fsck had been > performed since the last time the fsstress directory had been touched) while > running a few instances of the attached program. > > You guys have any idea what's going on yet? Some ideas yes, although I don't have a complete solution yet. I've been able to replicate it now fairly reliably, with the attached shell script and 2.4.19 with the 2.4.19-2 dxdir patch. It appears to be somewhat timing dependent, as where the directory corruption occurs is not consistent, but I believe it is in the split code. Since e2fsck -fD packs all of the directories completely, it means that any attempt to add a file to directory will guarantee at least one split, and possibly two levels of tree splits. Since the -D option to e2fsck has only been relatively recently been available, I believe this is why it hasn't been noticed up until now in the testing; directories which are indexed "naturally" as they grow don't appear to trigger the problem, or are very, very unlikely to trigger the problem. (One potential avenue for exploration is that -D option perfectly sorts all of the directory entries in hash order, which doesn't normally occur for naturally grown directories, and this may be triggering a fencepost error in the split code.) The other thing which I've developed is a patch to e2fsprogs 1.29 (also attached) which fixes the directory corruption without causing files to end up in lost+found. I'm using the dxdir patch in production, and I was first able to replicate your problem after I ran e2fsck -fD on my /usr partition. At that point, /usr/bin and /usr/share/man/man8 got corrupted. So I modified e2fsck to be able to correct the problem without throwing a directory block's worth of dirents into lost+found. The nature of the corruption is that a directory entry of size 8 (which is enough room for a zero-length name) is left in the directory. This is harmless, but it should never happen normally, and so the ext3 sanity-checking code flags it as an error. With this patch, e2fsck is much smarter about salvaging corrupt directories, and so it can do so without causing any directory entries to be lost. (This corrupted, too-small directory entry appears at the beginning of the directory block, which is another reason why I strongly suspect the dx_split code.) BTW, Andreas, I've tried your stack-usage reduction patch (modified to sanely deal with out of memory conditions instead of panicking), but that doesn't seem to fix the problem. So whatever it is, it's something else, although your patch is still a good one and I've commited it to the 2.4 ext3-dxdir BK tree. - Ted [-- Attachment #2: gen-test-cdir --] [-- Type: text/plain, Size: 374 bytes --] #!/bin/sh #dd if=/dev/zero of=test.img bs=8k count=2000 mke2fs -j -F -b 1024 -N 24000 test.img mount -t ext3 -o loop test.img /mnt pushd /mnt mkdir test cd test time seq -f "gabby-qf%f" 1 10000 | xargs touch popd umount /mnt e2fsck -fD test.img mount -t ext3 -o loop test.img /mnt pushd /mnt/test time seq -f "%f" 1 62 | xargs touch popd umount /mnt e2fsck -f test.img [-- Attachment #3: e2fsprogs-patch --] [-- Type: text/plain, Size: 8472 bytes --] # This is a BitKeeper generated patch for the following project: # Project Name: Ext2 filesystem utilities # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1020 -> 1.1021 # e2fsck/pass2.c 1.45 -> 1.46 # e2fsck/ChangeLog 1.281 -> 1.282 # e2fsck/unix.c 1.66 -> 1.67 # e2fsck/rehash.c 1.3 -> 1.4 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 02/09/28 tytso@think.thunk.org 1.1021 # Add a more sophisticated algorithm to e2fsck to salvage corrupted # directories. # # Speed up e2fsck slightly by only updating the master superblock; # there is no point to update the backup superblocks. # # Fix a small bug in the rehashing code which could leave the indexed # flag set even after the directory was compressed instead of indexed. # (Not fatal, since the kernel will deal with this, but technically # it filesystem isn't consistent, and the filesystem will be marked # as being in error when the kernel comes across the directory. It # should also never happen in real life, since directories that small # will never be indexed, but better safe than sorry.) # # Also change the threshold of when directories are indexed, so that # directories of size 2 blocks will be indexed. Otherwise they will # never be indexed by the kernel when they grow. # -------------------------------------------- # diff -Nru a/e2fsck/ChangeLog b/e2fsck/ChangeLog --- a/e2fsck/ChangeLog Sat Sep 28 09:50:43 2002 +++ b/e2fsck/ChangeLog Sat Sep 28 09:50:43 2002 @@ -1,3 +1,20 @@ +2002-09-28 Theodore Ts'o <tytso@mit.edu> + + * rehash.c (write_directory): Clear the index flag if by + reoptimizing the directory, we bring it back into a + non-indexed state. + (e2fsck_rehash_dir): Allow directories that contain two + blocks to be indexed. Otherwise when they grow, they + never will be indexed by the kernel. + + * unix.c (main): Only update the master superblock; there's no + point updating the backup superblocks, and it speeds up + fsck slightly. + + * pass2.c (salvage_directory): New function called by + check_dir_block() which is much more sophisticated about + how it salvages corrupted filesystems. + 2001-09-24 Theodore Tso <tytso@mit.edu> * Release of E2fsprogs 1.29 diff -Nru a/e2fsck/pass2.c b/e2fsck/pass2.c --- a/e2fsck/pass2.c Sat Sep 28 09:50:43 2002 +++ b/e2fsck/pass2.c Sat Sep 28 09:50:43 2002 @@ -574,6 +574,55 @@ } #endif /* ENABLE_HTREE */ +/* + * Given a busted directory, try to salvage it somehow. + * + */ +static int salvage_directory(ext2_filsys fs, + struct ext2_dir_entry *dirent, + struct ext2_dir_entry *prev, + int offset) +{ + char *cp = (char *) dirent; + int left = fs->blocksize - offset - dirent->rec_len; + int prev_offset = offset - ((char *) dirent - (char *) prev); + + /* + * Special case of directory entry of size 8: copy what's left + * of the directory block up to cover up the invalid hole. + */ + if ((left >= 12) && (dirent->rec_len == 8)) { + memmove(cp, cp+8, left); + memset(cp + left, 0, 8); + return offset; + } + /* + * If the directory entry is a multiple of four, so it is + * valid, let the previous directory entry absorb the invalid + * one. + */ + if (prev && dirent->rec_len && (dirent->rec_len % 4) == 0) { + prev->rec_len += dirent->rec_len; + return prev_offset; + } + /* + * Default salvage method --- kill all of the directory + * entries for the rest of the block. We will either try to + * absorb it into the previous directory entry, or create a + * new empty directory entry the rest of the directory block. + */ + if (prev) { + prev->rec_len += fs->blocksize - offset; + return prev_offset; + } else { + dirent->rec_len = fs->blocksize - offset; + dirent->name_len = 0; + dirent->inode = 0; + return offset; + } + +} + static int check_dir_block(ext2_filsys fs, struct ext2_db_entry *db, void *priv_data) @@ -583,7 +632,7 @@ #ifdef ENABLE_HTREE struct dx_dirblock_info *dx_db = 0; #endif /* ENABLE_HTREE */ - struct ext2_dir_entry *dirent; + struct ext2_dir_entry *dirent, *prev; ext2_dirhash_t hash; int offset = 0; int dir_modified = 0; @@ -680,8 +729,8 @@ } #endif /* ENABLE_HTREE */ + prev = 0; do { - dot_state++; problem = 0; dirent = (struct ext2_dir_entry *) (buf + offset); cd->pctx.dirent = dirent; @@ -691,10 +740,10 @@ ((dirent->rec_len % 4) != 0) || (((dirent->name_len & 0xFF)+8) > dirent->rec_len)) { if (fix_problem(ctx, PR_2_DIR_CORRUPTED, &cd->pctx)) { - dirent->rec_len = fs->blocksize - offset; - dirent->name_len = 0; - dirent->inode = 0; + offset = salvage_directory(fs, dirent, + prev, offset); dir_modified++; + continue; } else return DIRENT_ABORT; } @@ -705,10 +754,10 @@ } } - if (dot_state == 1) { + if (dot_state == 0) { if (check_dot(ctx, dirent, ino, &cd->pctx)) dir_modified++; - } else if (dot_state == 2) { + } else if (dot_state == 1) { dir = e2fsck_get_dir_info(ctx, ino); if (!dir) { fix_problem(ctx, PR_2_NO_DIRINFO, &cd->pctx); @@ -749,7 +798,7 @@ * clear it. */ problem = PR_2_BB_INODE; - } else if ((dot_state > 2) && + } else if ((dot_state > 1) && ((dirent->name_len & 0xFF) == 1) && (dirent->name[0] == '.')) { /* @@ -758,7 +807,7 @@ * duplicate entry that should be removed. */ problem = PR_2_DUP_DOT; - } else if ((dot_state > 2) && + } else if ((dot_state > 1) && ((dirent->name_len & 0xFF) == 2) && (dirent->name[0] == '.') && (dirent->name[1] == '.')) { @@ -768,7 +817,7 @@ * duplicate entry that should be removed. */ problem = PR_2_DUP_DOT_DOT; - } else if ((dot_state > 2) && + } else if ((dot_state > 1) && (dirent->inode == EXT2_ROOT_INO)) { /* * Don't allow links to the root directory. @@ -777,7 +826,7 @@ * directory hasn't been created yet. */ problem = PR_2_LINK_ROOT; - } else if ((dot_state > 2) && + } else if ((dot_state > 1) && (dirent->name_len & 0xFF) == 0) { /* * Don't allow zero-length directory names. @@ -842,7 +891,7 @@ * hard link. We assume the first link is correct, * and ask the user if he/she wants to clear this one. */ - if ((dot_state > 2) && + if ((dot_state > 1) && (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode))) { subdir = e2fsck_get_dir_info(ctx, dirent->inode); @@ -871,7 +920,9 @@ ctx->fs_links_count++; ctx->fs_total_count++; next: + prev = dirent; offset += dirent->rec_len; + dot_state++; } while (offset < fs->blocksize); #if 0 printf("\n"); diff -Nru a/e2fsck/rehash.c b/e2fsck/rehash.c --- a/e2fsck/rehash.c Sat Sep 28 09:50:43 2002 +++ b/e2fsck/rehash.c Sat Sep 28 09:50:43 2002 @@ -522,7 +522,9 @@ return wd.err; e2fsck_read_inode(ctx, ino, &inode, "rehash_dir"); - if (!compress) + if (compress) + inode.i_flags &= ~EXT2_INDEX_FL; + else inode.i_flags |= EXT2_INDEX_FL; inode.i_size = outdir->num * fs->blocksize; inode.i_blocks -= (fs->blocksize / 512) * wd.cleared; @@ -561,7 +563,7 @@ fd.dir_size = 0; fd.compress = 0; if (!(fs->super->s_feature_compat & EXT2_FEATURE_COMPAT_DIR_INDEX) || - (inode.i_size / fs->blocksize) < 3) + (inode.i_size / fs->blocksize) < 2) fd.compress = 1; fd.parent = 0; diff -Nru a/e2fsck/unix.c b/e2fsck/unix.c --- a/e2fsck/unix.c Sat Sep 28 09:50:43 2002 +++ b/e2fsck/unix.c Sat Sep 28 09:50:43 2002 @@ -974,8 +974,11 @@ ext2fs_mark_super_dirty(fs); /* - * Don't overwrite the backup superblock and block - * descriptors, until we're sure the filesystem is OK.... + * We only update the master superblock because (a) paranoia; + * we don't want to corrupt the backup superblocks, and (b) we + * don't need to update the mount count and last checked + * fields in the backup superblock (the kernel doesn't + * update the backup superblocks anyway). */ fs->flags |= EXT2_FLAG_MASTER_SB_ONLY; @@ -1058,9 +1061,7 @@ exit_value |= FSCK_REBOOT; } } - if (ext2fs_test_valid(fs)) - fs->flags &= ~EXT2_FLAG_MASTER_SB_ONLY; - else { + if (!ext2fs_test_valid(fs)) { printf(_("\n%s: ********** WARNING: Filesystem still has " "errors **********\n\n"), ctx->device_name); exit_value |= FSCK_UNCORRECTED; ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-28 14:13 ` Theodore Ts'o @ 2002-09-28 14:18 ` Theodore Ts'o 2002-09-28 22:35 ` Ryan Cumming 2002-09-28 17:27 ` [Ext2-devel] " Andreas Dilger ` (2 subsequent siblings) 3 siblings, 1 reply; 47+ messages in thread From: Theodore Ts'o @ 2002-09-28 14:18 UTC (permalink / raw) To: Ryan Cumming, Andreas Dilger, linux-kernel, ext2-devel Oh, one other thing. I'm pretty sure the rest of the errors you saw were a result of the fact that you had your filesystem set to remount the filesystem read-only after running into errors. When the error was detected, all existing updates to the filesystem are aborted (to minimize damage to the filesystem), and that can leave the filesystem in a somewhat inconsistent state, although nothing which e2fsck shouldn't be able to fix. When running with the filesystem set to continue-with-errors, the only problem which e2fsck picks up is the too-small-to-be-sane directory entry problem. - Ted ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-28 14:18 ` Theodore Ts'o @ 2002-09-28 22:35 ` Ryan Cumming 0 siblings, 0 replies; 47+ messages in thread From: Ryan Cumming @ 2002-09-28 22:35 UTC (permalink / raw) To: Theodore Ts'o, Andreas Dilger, linux-kernel, ext2-devel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 28, 2002 07:18, Theodore Ts'o wrote: > Oh, one other thing. I'm pretty sure the rest of the errors you saw > were a result of the fact that you had your filesystem set to remount > the filesystem read-only after running into errors. When the error > was detected, all existing updates to the filesystem are aborted (to > minimize damage to the filesystem), and that can leave the filesystem > in a somewhat inconsistent state, although nothing which e2fsck > shouldn't be able to fix. The loopback filesystem was set to continue on errors. The only things fsck noticed were the short directory entries and unattached inodes. It actually tried to connect lost+found to lost+found, BTW ;) - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9li6yLGMzRzbJfbQRAtn2AKCUoXTYKuVMkfB17wd4RWua0YtTmgCfV+Uh Gg//dy/RviPA5LWvpfDYQAw= =lno1 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Ext2-devel] Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-28 14:13 ` Theodore Ts'o 2002-09-28 14:18 ` Theodore Ts'o @ 2002-09-28 17:27 ` Andreas Dilger 2002-09-28 18:43 ` chrisl 2002-09-28 19:45 ` chrisl 2002-09-28 22:30 ` Ryan Cumming 2002-09-29 7:03 ` [PATCH] fix htree dir corrupt after fsck -fD chrisl 3 siblings, 2 replies; 47+ messages in thread From: Andreas Dilger @ 2002-09-28 17:27 UTC (permalink / raw) To: Theodore Ts'o, Ryan Cumming, linux-kernel, ext2-devel On Sep 28, 2002 10:13 -0400, Theodore Ts'o wrote: > The nature of the corruption is that a directory entry of size 8 > (which is enough room for a zero-length name) is left in the > directory. This is harmless, but it should never happen normally, and > so the ext3 sanity-checking code flags it as an error. With this > patch, e2fsck is much smarter about salvaging corrupt directories, and > so it can do so without causing any directory entries to be lost. > (This corrupted, too-small directory entry appears at the beginning of > the directory block, which is another reason why I strongly suspect > the dx_split code.) One idea I just had but don't have time to investigate (babysitting both kids today) is if the do_split() code is creating a hash entry for unused dir entries (i.e. inode == 0 or name_len == 0). If that is the case, then it could explain the presence of this short entry. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Ext2-devel] Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-28 17:27 ` [Ext2-devel] " Andreas Dilger @ 2002-09-28 18:43 ` chrisl 2002-09-28 19:45 ` chrisl 1 sibling, 0 replies; 47+ messages in thread From: chrisl @ 2002-09-28 18:43 UTC (permalink / raw) To: Theodore Ts'o, Ryan Cumming, linux-kernel, ext2-devel I am looking at this also. It is very possible the case. I am doing experiment now. This happened very randomly. Same script randomly produce the bug. Each time the problem block happen at a different place. I just can't consistently reproduce the bug. I am doing some experiment now. Chris On Sat, Sep 28, 2002 at 11:27:49AM -0600, Andreas Dilger wrote: > On Sep 28, 2002 10:13 -0400, Theodore Ts'o wrote: > > The nature of the corruption is that a directory entry of size 8 > > (which is enough room for a zero-length name) is left in the > > directory. This is harmless, but it should never happen normally, and > > so the ext3 sanity-checking code flags it as an error. With this > > patch, e2fsck is much smarter about salvaging corrupt directories, and > > so it can do so without causing any directory entries to be lost. > > (This corrupted, too-small directory entry appears at the beginning of > > the directory block, which is another reason why I strongly suspect > > the dx_split code.) > > One idea I just had but don't have time to investigate (babysitting > both kids today) is if the do_split() code is creating a hash entry > for unused dir entries (i.e. inode == 0 or name_len == 0). If that > is the case, then it could explain the presence of this short entry. > > Cheers, Andreas > -- > Andreas Dilger > http://www-mddsp.enel.ucalgary.ca/People/adilger/ > http://sourceforge.net/projects/ext2resize/ > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Ext2-devel mailing list > Ext2-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ext2-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Ext2-devel] Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-28 17:27 ` [Ext2-devel] " Andreas Dilger 2002-09-28 18:43 ` chrisl @ 2002-09-28 19:45 ` chrisl 1 sibling, 0 replies; 47+ messages in thread From: chrisl @ 2002-09-28 19:45 UTC (permalink / raw) To: Theodore Ts'o, Ryan Cumming, linux-kernel, ext2-devel That is exactly the case. Now I am looking what where is those entry come from. Remove entry should merge with the previous one. Add a check in dx_make_map will help. Chris On Sat, Sep 28, 2002 at 11:27:49AM -0600, Andreas Dilger wrote: > On Sep 28, 2002 10:13 -0400, Theodore Ts'o wrote: > > The nature of the corruption is that a directory entry of size 8 > > (which is enough room for a zero-length name) is left in the > > directory. This is harmless, but it should never happen normally, and > > so the ext3 sanity-checking code flags it as an error. With this > > patch, e2fsck is much smarter about salvaging corrupt directories, and > > so it can do so without causing any directory entries to be lost. > > (This corrupted, too-small directory entry appears at the beginning of > > the directory block, which is another reason why I strongly suspect > > the dx_split code.) > > One idea I just had but don't have time to investigate (babysitting > both kids today) is if the do_split() code is creating a hash entry > for unused dir entries (i.e. inode == 0 or name_len == 0). If that > is the case, then it could explain the presence of this short entry. > > Cheers, Andreas > -- > Andreas Dilger > http://www-mddsp.enel.ucalgary.ca/People/adilger/ > http://sourceforge.net/projects/ext2resize/ > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Ext2-devel mailing list > Ext2-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ext2-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-28 14:13 ` Theodore Ts'o 2002-09-28 14:18 ` Theodore Ts'o 2002-09-28 17:27 ` [Ext2-devel] " Andreas Dilger @ 2002-09-28 22:30 ` Ryan Cumming 2002-09-29 7:03 ` [PATCH] fix htree dir corrupt after fsck -fD chrisl 3 siblings, 0 replies; 47+ messages in thread From: Ryan Cumming @ 2002-09-28 22:30 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Andreas Dilger, linux-kernel, ext2-devel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 28, 2002 07:13, Theodore Ts'o wrote: > I've been able to replicate it now fairly reliably, with the attached > shell script and 2.4.19 with the 2.4.19-2 dxdir patch. Yes, that was what I'm using > It appears to > be somewhat timing dependent, as where the directory corruption occurs > is not consistent, but I believe it is in the split code. Since > e2fsck -fD packs all of the directories completely, it means that any > attempt to add a file to directory will guarantee at least one split, > and possibly two levels of tree splits. Since the -D option to e2fsck > has only been relatively recently been available, I believe this is > why it hasn't been noticed up until now in the testing; directories > which are indexed "naturally" as they grow don't appear to trigger the > problem, or are very, very unlikely to trigger the problem. (One > potential avenue for exploration is that -D option perfectly sorts all > of the directory entries in hash order, which doesn't normally occur > for naturally grown directories, and this may be triggering a > fencepost error in the split code.) Yes, running fsck -D seemed to make my loopback filesystem more brittle. I have triggered corruption without fsck, but it's much more difficult. That seems to match your description of the problem. - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9li2QLGMzRzbJfbQRAlSfAJ93gPP65tXjzsyGiOjvoDlGwk5WrACeIILY QKk+DpMb9kpq7rsiaSSuJHs= =LoWq -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH] fix htree dir corrupt after fsck -fD 2002-09-28 14:13 ` Theodore Ts'o ` (2 preceding siblings ...) 2002-09-28 22:30 ` Ryan Cumming @ 2002-09-29 7:03 ` chrisl 2002-09-29 8:16 ` Ryan Cumming 2002-09-29 14:13 ` Theodore Ts'o 3 siblings, 2 replies; 47+ messages in thread From: chrisl @ 2002-09-29 7:03 UTC (permalink / raw) To: Theodore Ts'o, Ryan Cumming, Andreas Dilger, linux-kernel, ext2-devel OK, I believe this two patch will fix the problem. When e2fsck rehash the directory index. it will left one empty dir entry at the end of block if next dir entry can not fit in. I think it is OK to do so but I also make a patch to make the empty space merge with previous entry to consistent with the kernel. dx_make_map need to skip the empty entry. So here is the patch for review before I push it back to htree BK repository. I already do the initial test and it fix the problem in kernel and e2fsck. Chris PS, Ted, is e2fsck endian free now? I notice some code in copy_dir_entries() do not have cpu_to_le32 for rec_len etc. ===== namei.c 1.8 vs edited ===== --- 1.8/fs/ext3/namei.c Fri Sep 20 00:40:12 2002 +++ edited/namei.c Sat Sep 28 23:43:56 2002 @@ -601,11 +601,13 @@ while ((char *) de < base + size) { - ext3fs_dirhash(de->name, de->name_len, &h); - map[count].hash = h.hash; - map[count].offs = (u32) ((char *) de - base); + if (de->name_len && de->inode) { + ext3fs_dirhash(de->name, de->name_len, &h); + map[count].hash = h.hash; + map[count].offs = (u32) ((char *) de - base); + count++; + } de = (ext3_dirent *) ((char *) de + le16_to_cpu(de->rec_len)); - count++; } return count; } ===== rehash.c 1.3 vs edited ===== --- 1.3/e2fsck/rehash.c Fri Sep 6 07:14:11 2002 +++ edited/rehash.c Sat Sep 28 23:19:16 2002 @@ -278,22 +278,16 @@ for (i=0; i < fd->num_array; i++) { ent = fd->harray + i; rec_len = EXT2_DIR_REC_LEN(ent->dir->name_len & 0xFF); - dirent = (struct ext2_dir_entry *) (block_start + offset); - left = fs->blocksize - offset; if (rec_len > left) { - if (left) { - dirent->rec_len = left; - dirent->inode = 0; - dirent->name_len = 0; - offset += left; - left = 0; - } + if (left) + dirent->rec_len += left; if ((retval = get_next_block(fs, outdir, &block_start))) return retval; - offset = 0; left = fs->blocksize; - dirent = (struct ext2_dir_entry *) block_start; + offset = 0; } + left = fs->blocksize - offset; + dirent = (struct ext2_dir_entry *) (block_start + offset); if (offset == 0) { if (ent->hash == prev_hash) outdir->hashes[outdir->num-1] = ent->hash | 1; On Sat, Sep 28, 2002 at 10:13:30AM -0400, Theodore Ts'o wrote: > On Fri, Sep 27, 2002 at 06:20:27PM -0700, Ryan Cumming wrote: > > > > This is while deleteing an old fsstress directory (a full fsck had been > > performed since the last time the fsstress directory had been touched) while > > running a few instances of the attached program. > > > > You guys have any idea what's going on yet? > > Some ideas yes, although I don't have a complete solution yet. > > I've been able to replicate it now fairly reliably, with the attached > shell script and 2.4.19 with the 2.4.19-2 dxdir patch. It appears to > be somewhat timing dependent, as where the directory corruption occurs > is not consistent, but I believe it is in the split code. Since > e2fsck -fD packs all of the directories completely, it means that any > attempt to add a file to directory will guarantee at least one split, > and possibly two levels of tree splits. Since the -D option to e2fsck > has only been relatively recently been available, I believe this is > why it hasn't been noticed up until now in the testing; directories > which are indexed "naturally" as they grow don't appear to trigger the > problem, or are very, very unlikely to trigger the problem. (One > potential avenue for exploration is that -D option perfectly sorts all > of the directory entries in hash order, which doesn't normally occur > for naturally grown directories, and this may be triggering a > fencepost error in the split code.) > > The other thing which I've developed is a patch to e2fsprogs 1.29 > (also attached) which fixes the directory corruption without causing > files to end up in lost+found. I'm using the dxdir patch in > production, and I was first able to replicate your problem after I ran > e2fsck -fD on my /usr partition. At that point, /usr/bin and > /usr/share/man/man8 got corrupted. So I modified e2fsck to be able to > correct the problem without throwing a directory block's worth of > dirents into lost+found. > > The nature of the corruption is that a directory entry of size 8 > (which is enough room for a zero-length name) is left in the > directory. This is harmless, but it should never happen normally, and > so the ext3 sanity-checking code flags it as an error. With this > patch, e2fsck is much smarter about salvaging corrupt directories, and > so it can do so without causing any directory entries to be lost. > (This corrupted, too-small directory entry appears at the beginning of > the directory block, which is another reason why I strongly suspect > the dx_split code.) > > BTW, Andreas, I've tried your stack-usage reduction patch (modified to > sanely deal with out of memory conditions instead of panicking), but > that doesn't seem to fix the problem. So whatever it is, it's > something else, although your patch is still a good one and I've > commited it to the 2.4 ext3-dxdir BK tree. > > - Ted > Content-Description: gen-test-cdir > #!/bin/sh > #dd if=/dev/zero of=test.img bs=8k count=2000 > mke2fs -j -F -b 1024 -N 24000 test.img > mount -t ext3 -o loop test.img /mnt > pushd /mnt > mkdir test > cd test > time seq -f "gabby-qf%f" 1 10000 | xargs touch > popd > umount /mnt > e2fsck -fD test.img > mount -t ext3 -o loop test.img /mnt > pushd /mnt/test > time seq -f "%f" 1 62 | xargs touch > popd > umount /mnt > e2fsck -f test.img > Content-Description: e2fsprogs-patch > # This is a BitKeeper generated patch for the following project: > # Project Name: Ext2 filesystem utilities > # This patch format is intended for GNU patch command version 2.5 or higher. > # This patch includes the following deltas: > # ChangeSet 1.1020 -> 1.1021 > # e2fsck/pass2.c 1.45 -> 1.46 > # e2fsck/ChangeLog 1.281 -> 1.282 > # e2fsck/unix.c 1.66 -> 1.67 > # e2fsck/rehash.c 1.3 -> 1.4 > # > # The following is the BitKeeper ChangeSet Log > # -------------------------------------------- > # 02/09/28 tytso@think.thunk.org 1.1021 > # Add a more sophisticated algorithm to e2fsck to salvage corrupted > # directories. > # > # Speed up e2fsck slightly by only updating the master superblock; > # there is no point to update the backup superblocks. > # > # Fix a small bug in the rehashing code which could leave the indexed > # flag set even after the directory was compressed instead of indexed. > # (Not fatal, since the kernel will deal with this, but technically > # it filesystem isn't consistent, and the filesystem will be marked > # as being in error when the kernel comes across the directory. It > # should also never happen in real life, since directories that small > # will never be indexed, but better safe than sorry.) > # > # Also change the threshold of when directories are indexed, so that > # directories of size 2 blocks will be indexed. Otherwise they will > # never be indexed by the kernel when they grow. > # -------------------------------------------- > # > diff -Nru a/e2fsck/ChangeLog b/e2fsck/ChangeLog > --- a/e2fsck/ChangeLog Sat Sep 28 09:50:43 2002 > +++ b/e2fsck/ChangeLog Sat Sep 28 09:50:43 2002 > @@ -1,3 +1,20 @@ > +2002-09-28 Theodore Ts'o <tytso@mit.edu> > + > + * rehash.c (write_directory): Clear the index flag if by > + reoptimizing the directory, we bring it back into a > + non-indexed state. > + (e2fsck_rehash_dir): Allow directories that contain two > + blocks to be indexed. Otherwise when they grow, they > + never will be indexed by the kernel. > + > + * unix.c (main): Only update the master superblock; there's no > + point updating the backup superblocks, and it speeds up > + fsck slightly. > + > + * pass2.c (salvage_directory): New function called by > + check_dir_block() which is much more sophisticated about > + how it salvages corrupted filesystems. > + > 2001-09-24 Theodore Tso <tytso@mit.edu> > > * Release of E2fsprogs 1.29 > diff -Nru a/e2fsck/pass2.c b/e2fsck/pass2.c > --- a/e2fsck/pass2.c Sat Sep 28 09:50:43 2002 > +++ b/e2fsck/pass2.c Sat Sep 28 09:50:43 2002 > @@ -574,6 +574,55 @@ > } > #endif /* ENABLE_HTREE */ > > +/* > + * Given a busted directory, try to salvage it somehow. > + * > + */ > +static int salvage_directory(ext2_filsys fs, > + struct ext2_dir_entry *dirent, > + struct ext2_dir_entry *prev, > + int offset) > +{ > + char *cp = (char *) dirent; > + int left = fs->blocksize - offset - dirent->rec_len; > + int prev_offset = offset - ((char *) dirent - (char *) prev); > + > + /* > + * Special case of directory entry of size 8: copy what's left > + * of the directory block up to cover up the invalid hole. > + */ > + if ((left >= 12) && (dirent->rec_len == 8)) { > + memmove(cp, cp+8, left); > + memset(cp + left, 0, 8); > + return offset; > + } > + /* > + * If the directory entry is a multiple of four, so it is > + * valid, let the previous directory entry absorb the invalid > + * one. > + */ > + if (prev && dirent->rec_len && (dirent->rec_len % 4) == 0) { > + prev->rec_len += dirent->rec_len; > + return prev_offset; > + } > + /* > + * Default salvage method --- kill all of the directory > + * entries for the rest of the block. We will either try to > + * absorb it into the previous directory entry, or create a > + * new empty directory entry the rest of the directory block. > + */ > + if (prev) { > + prev->rec_len += fs->blocksize - offset; > + return prev_offset; > + } else { > + dirent->rec_len = fs->blocksize - offset; > + dirent->name_len = 0; > + dirent->inode = 0; > + return offset; > + } > + > +} > + > static int check_dir_block(ext2_filsys fs, > struct ext2_db_entry *db, > void *priv_data) > @@ -583,7 +632,7 @@ > #ifdef ENABLE_HTREE > struct dx_dirblock_info *dx_db = 0; > #endif /* ENABLE_HTREE */ > - struct ext2_dir_entry *dirent; > + struct ext2_dir_entry *dirent, *prev; > ext2_dirhash_t hash; > int offset = 0; > int dir_modified = 0; > @@ -680,8 +729,8 @@ > } > #endif /* ENABLE_HTREE */ > > + prev = 0; > do { > - dot_state++; > problem = 0; > dirent = (struct ext2_dir_entry *) (buf + offset); > cd->pctx.dirent = dirent; > @@ -691,10 +740,10 @@ > ((dirent->rec_len % 4) != 0) || > (((dirent->name_len & 0xFF)+8) > dirent->rec_len)) { > if (fix_problem(ctx, PR_2_DIR_CORRUPTED, &cd->pctx)) { > - dirent->rec_len = fs->blocksize - offset; > - dirent->name_len = 0; > - dirent->inode = 0; > + offset = salvage_directory(fs, dirent, > + prev, offset); > dir_modified++; > + continue; > } else > return DIRENT_ABORT; > } > @@ -705,10 +754,10 @@ > } > } > > - if (dot_state == 1) { > + if (dot_state == 0) { > if (check_dot(ctx, dirent, ino, &cd->pctx)) > dir_modified++; > - } else if (dot_state == 2) { > + } else if (dot_state == 1) { > dir = e2fsck_get_dir_info(ctx, ino); > if (!dir) { > fix_problem(ctx, PR_2_NO_DIRINFO, &cd->pctx); > @@ -749,7 +798,7 @@ > * clear it. > */ > problem = PR_2_BB_INODE; > - } else if ((dot_state > 2) && > + } else if ((dot_state > 1) && > ((dirent->name_len & 0xFF) == 1) && > (dirent->name[0] == '.')) { > /* > @@ -758,7 +807,7 @@ > * duplicate entry that should be removed. > */ > problem = PR_2_DUP_DOT; > - } else if ((dot_state > 2) && > + } else if ((dot_state > 1) && > ((dirent->name_len & 0xFF) == 2) && > (dirent->name[0] == '.') && > (dirent->name[1] == '.')) { > @@ -768,7 +817,7 @@ > * duplicate entry that should be removed. > */ > problem = PR_2_DUP_DOT_DOT; > - } else if ((dot_state > 2) && > + } else if ((dot_state > 1) && > (dirent->inode == EXT2_ROOT_INO)) { > /* > * Don't allow links to the root directory. > @@ -777,7 +826,7 @@ > * directory hasn't been created yet. > */ > problem = PR_2_LINK_ROOT; > - } else if ((dot_state > 2) && > + } else if ((dot_state > 1) && > (dirent->name_len & 0xFF) == 0) { > /* > * Don't allow zero-length directory names. > @@ -842,7 +891,7 @@ > * hard link. We assume the first link is correct, > * and ask the user if he/she wants to clear this one. > */ > - if ((dot_state > 2) && > + if ((dot_state > 1) && > (ext2fs_test_inode_bitmap(ctx->inode_dir_map, > dirent->inode))) { > subdir = e2fsck_get_dir_info(ctx, dirent->inode); > @@ -871,7 +920,9 @@ > ctx->fs_links_count++; > ctx->fs_total_count++; > next: > + prev = dirent; > offset += dirent->rec_len; > + dot_state++; > } while (offset < fs->blocksize); > #if 0 > printf("\n"); > diff -Nru a/e2fsck/rehash.c b/e2fsck/rehash.c > --- a/e2fsck/rehash.c Sat Sep 28 09:50:43 2002 > +++ b/e2fsck/rehash.c Sat Sep 28 09:50:43 2002 > @@ -522,7 +522,9 @@ > return wd.err; > > e2fsck_read_inode(ctx, ino, &inode, "rehash_dir"); > - if (!compress) > + if (compress) > + inode.i_flags &= ~EXT2_INDEX_FL; > + else > inode.i_flags |= EXT2_INDEX_FL; > inode.i_size = outdir->num * fs->blocksize; > inode.i_blocks -= (fs->blocksize / 512) * wd.cleared; > @@ -561,7 +563,7 @@ > fd.dir_size = 0; > fd.compress = 0; > if (!(fs->super->s_feature_compat & EXT2_FEATURE_COMPAT_DIR_INDEX) || > - (inode.i_size / fs->blocksize) < 3) > + (inode.i_size / fs->blocksize) < 2) > fd.compress = 1; > fd.parent = 0; > > diff -Nru a/e2fsck/unix.c b/e2fsck/unix.c > --- a/e2fsck/unix.c Sat Sep 28 09:50:43 2002 > +++ b/e2fsck/unix.c Sat Sep 28 09:50:43 2002 > @@ -974,8 +974,11 @@ > ext2fs_mark_super_dirty(fs); > > /* > - * Don't overwrite the backup superblock and block > - * descriptors, until we're sure the filesystem is OK.... > + * We only update the master superblock because (a) paranoia; > + * we don't want to corrupt the backup superblocks, and (b) we > + * don't need to update the mount count and last checked > + * fields in the backup superblock (the kernel doesn't > + * update the backup superblocks anyway). > */ > fs->flags |= EXT2_FLAG_MASTER_SB_ONLY; > > @@ -1058,9 +1061,7 @@ > exit_value |= FSCK_REBOOT; > } > } > - if (ext2fs_test_valid(fs)) > - fs->flags &= ~EXT2_FLAG_MASTER_SB_ONLY; > - else { > + if (!ext2fs_test_valid(fs)) { > printf(_("\n%s: ********** WARNING: Filesystem still has " > "errors **********\n\n"), ctx->device_name); > exit_value |= FSCK_UNCORRECTED; ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH] fix htree dir corrupt after fsck -fD 2002-09-29 7:03 ` [PATCH] fix htree dir corrupt after fsck -fD chrisl @ 2002-09-29 8:16 ` Ryan Cumming 2002-09-29 8:36 ` Ryan Cumming 2002-09-30 2:46 ` Ryan Cumming 2002-09-29 14:13 ` Theodore Ts'o 1 sibling, 2 replies; 47+ messages in thread From: Ryan Cumming @ 2002-09-29 8:16 UTC (permalink / raw) To: chrisl, Theodore Ts'o, Andreas Dilger, linux-kernel, ext2-devel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 29, 2002 00:03, chrisl@gnuchina.org wrote: > I already do the initial test and it fix the problem in kernel and > e2fsck. Still broken here. The short directory inodes are cleared up, but I'm still getting various errors from fsck Case 1: "Problem in HTREE directory inode 2 (/): bad block 3223649" Case 2: "Inode 2, i_blocks is 3718, should be 2280 Directory inode 2 has an unallocated block #377 Directory inode 2 has an unallocated block #378 Directory inode 2 has an unallocated block #379" etc This is a completely fresh loopback EXT3 filesystem, untouched by fsck -D, and normally unmounted. - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9lrb9LGMzRzbJfbQRAhpSAKCbkbyiwM8PnpAbN2FvU6tRHM1urwCdEzFK WSwjN6jC+0QI0NnJzKc0rX8= =HHtV -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH] fix htree dir corrupt after fsck -fD 2002-09-29 8:16 ` Ryan Cumming @ 2002-09-29 8:36 ` Ryan Cumming 2002-09-30 2:46 ` Ryan Cumming 1 sibling, 0 replies; 47+ messages in thread From: Ryan Cumming @ 2002-09-29 8:36 UTC (permalink / raw) To: chrisl, Theodore Ts'o, Andreas Dilger, linux-kernel, ext2-devel [-- Attachment #1: clearsigned data --] [-- Type: Text/Plain, Size: 1035 bytes --] -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 29, 2002 01:16, Ryan Cumming wrote: > This is a completely fresh loopback EXT3 filesystem, untouched by fsck -D, > and normally unmounted. Oh, and I've attached the current version of my test program if anyone is interested. It spawns 8 child processes which repeatedly create up to 1,000,000 files each, stat up to 1,000,000 (probably) non-existant files, and then unlink the files they created. It also spawns another 4 processes which iterate over all of the directory entries using readdir(). For each file it encounters, it has an equal probability of renaming it, unlinking it, or truncating it to a random length. It can corrupt my loopback test filesystems in under 5 minutes. Note that it will completely destroy any data in its working directory, however. - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9lrulLGMzRzbJfbQRAgxMAJ46Y/L4FA8nuwe76MyGCyhG+mSE3QCgjb5a TBJbqp55p5yOU2BY0AnW/TA= =cKn9 -----END PGP SIGNATURE----- [-- Attachment #2: fs-ream.c --] [-- Type: text/x-csrc, Size: 1982 bytes --] #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <dirent.h> #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <sys/time.h> #include <time.h> #define BLIND_THREADS 8 #define READDIR_THREADS 4 #define MAX_FILE_COUNT_PER_THREAD 200000 #define SYNC_INTERVAL 1 #define MAX_FILE_SIZE (8 * 1024 * 1024) void blind_thread(); void readdir_thread(); int main(int argc, char *argv[]) { int i = 0; for(i = 0;i < BLIND_THREADS;i++) { if (!fork()) { blind_thread(); } } for(i = 0;i < READDIR_THREADS;i++) { if (!fork()) { readdir_thread(); } } while(1) { sleep(SYNC_INTERVAL); sync(); } return 0; } void blind_thread() { while(1) { unsigned int seed; struct timeval tv; int i; int count; gettimeofday(&tv, NULL); seed = tv.tv_usec; count = seed % MAX_FILE_COUNT_PER_THREAD; srand(seed); printf("Creating %i files\n" , count); for(i = 0; i < count;i++) { char filename[32]; snprintf(filename, 32, "%x", rand()); close(open(filename, O_CREAT | O_RDONLY)); } printf("Performing %i random lookups\n" , count); for(i = 0; i < count;i++) { char filename[32]; struct stat useless; snprintf(filename, 32, "%x", rand()); stat(filename, &useless); } srand(seed); printf("Unlinking %i files\n" , count); for(i = 0; i < count;i++) { char filename[32]; snprintf(filename, 32, "%x", rand()); unlink(filename); } } } void readdir_thread() { while(1) { DIR *dir; struct dirent *entry; dir = opendir("."); while((entry = readdir(dir))) { struct stat st; stat(entry->d_name, &st); if (S_ISREG(st.st_mode)) { switch(rand() % 3) { case 0: unlink(entry->d_name); case 1: truncate(entry->d_name, rand() % MAX_FILE_SIZE); case 2: { char filename[32]; snprintf(filename, 32, "%x", rand()); rename(entry->d_name, filename); } } } } closedir(dir); } } ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH] fix htree dir corrupt after fsck -fD 2002-09-29 8:16 ` Ryan Cumming 2002-09-29 8:36 ` Ryan Cumming @ 2002-09-30 2:46 ` Ryan Cumming 1 sibling, 0 replies; 47+ messages in thread From: Ryan Cumming @ 2002-09-30 2:46 UTC (permalink / raw) To: chrisl, Theodore Ts'o, Andreas Dilger, linux-kernel, ext2-devel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On September 29, 2002 01:16, Ryan Cumming wrote: > Case 1: > "Problem in HTREE directory inode 2 (/): bad block 3223649" > > Case 2: > "Inode 2, i_blocks is 3718, should be 2280 > Directory inode 2 has an unallocated block #377 > Directory inode 2 has an unallocated block #378 > Directory inode 2 has an unallocated block #379" > etc Okay, a few more datapoints: 1) Case #1 happens much more frequently than case #2 2) I can trigger filesystem corruption with nothing but non-concurrent readdir()s and unlink()s 3) Although my initial tests were reaching the filesystem's inode limit, I've been able to reproduce case #1 with 153,129 free inodes and 217,294 free blocks. 4) Case #1 persists with Chris' stack overflow fix, modified to use memmove instead of memcpy 5) I have not been able to reproduce either error without the dir_index flag set. This may be due to the fact that my test program runs -much- more slowly without directory indexing. - -Ryan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9l7slLGMzRzbJfbQRAtbnAKCo5c6NzuGY59Xy4MvCRFqwuw5/+gCgpQkr FBXfCLl2R2ii/PiJ3+0ZQR8= =droh -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH] fix htree dir corrupt after fsck -fD 2002-09-29 7:03 ` [PATCH] fix htree dir corrupt after fsck -fD chrisl 2002-09-29 8:16 ` Ryan Cumming @ 2002-09-29 14:13 ` Theodore Ts'o 1 sibling, 0 replies; 47+ messages in thread From: Theodore Ts'o @ 2002-09-29 14:13 UTC (permalink / raw) To: chrisl; +Cc: Ryan Cumming, Andreas Dilger, linux-kernel, ext2-devel On Sun, Sep 29, 2002 at 12:03:15AM -0700, chrisl@gnuchina.org wrote: > When e2fsck rehash the directory index. it will left one empty > dir entry at the end of block if next dir entry can not fit in. > I think it is OK to do so but I also make a patch to make the empty > space merge with previous entry to consistent with the kernel. Nice catch! Yup, e2fsck will create an empty directory entry, instead of letting the previous entry absorb the extra space. Both are legal ways of doing things, but yes, the kernel does the latter in all cases except when deleting the last directory entry in a block. (But that's OK, because the split code would never need to be invoked on an empty directory block.) > dx_make_map need to skip the empty entry. So here is the patch for > review before I push it back to htree BK repository. That patch looks fine. Push away! > Ted, is e2fsck endian free now? I notice some code in > copy_dir_entries() do not have cpu_to_le32 for rec_len etc. Yes, e2fsprogs is endian-free. The byte swapping happens in the library routine ext2fs_read_dir_block() and ext2fs_write_dir_block(). Note that you have to tell these routines whether you're planning on using the v2 version of dirent struct or not, because that affects whether or not the name_len field needs to be byte-swapped or not. - Ted ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-25 20:03 [BK PATCH] Add ext3 indexed directory (htree) support tytso ` (2 preceding siblings ...) 2002-09-25 22:54 ` Jeff Garzik @ 2002-09-25 23:31 ` Daniel Egger 2002-09-26 0:32 ` Randy.Dunlap 2002-09-26 0:50 ` Aaron Lehmann 4 siblings, 1 reply; 47+ messages in thread From: Daniel Egger @ 2002-09-25 23:31 UTC (permalink / raw) To: tytso; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 324 bytes --] Am Mit, 2002-09-25 um 22.03 schrieb tytso@mit.edu: > I've given this code a good bit of testing, under both 2.4 and 2.5 > kernels, and believe that it is ready for prime-time. Please pull it > from: > bk://extfs.bkbits.net/for-linus-htree-2.5 Where can one fetch the 2.4 code? -- Servus, Daniel [-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-25 23:31 ` [BK PATCH] Add ext3 indexed directory (htree) support Daniel Egger @ 2002-09-26 0:32 ` Randy.Dunlap 0 siblings, 0 replies; 47+ messages in thread From: Randy.Dunlap @ 2002-09-26 0:32 UTC (permalink / raw) To: Daniel Egger; +Cc: tytso, linux-kernel On 26 Sep 2002, Daniel Egger wrote: | Am Mit, 2002-09-25 um 22.03 schrieb tytso@mit.edu: | | > I've given this code a good bit of testing, under both 2.4 and 2.5 | > kernels, and believe that it is ready for prime-time. Please pull it | > from: | | > bk://extfs.bkbits.net/for-linus-htree-2.5 | | Where can one fetch the 2.4 code? As Ted already said: > Can you post a GNU patch too, for public lookover and independent > integration? Sure. The patch is a bit big for e-mail, but you can find it at: http://thunk.org/tytso/linux/ext3-dxdir/patch-ext3-dxdir-2.5.38 There is also a 2.4.19 patch available as well: http://thunk.org/tytso/linux/ext3-dxdir/patch-ext3-dxdir-2.4.19-2 -- ~Randy ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-25 20:03 [BK PATCH] Add ext3 indexed directory (htree) support tytso ` (3 preceding siblings ...) 2002-09-25 23:31 ` [BK PATCH] Add ext3 indexed directory (htree) support Daniel Egger @ 2002-09-26 0:50 ` Aaron Lehmann 2002-09-26 3:28 ` Theodore Ts'o 4 siblings, 1 reply; 47+ messages in thread From: Aaron Lehmann @ 2002-09-26 0:50 UTC (permalink / raw) To: tytso; +Cc: linux-kernel On Wed, Sep 25, 2002 at 04:03:44PM -0400, tytso@mit.edu wrote: > In order to use the new directory indexing feature, please update your > e2fsprogs to 1.29. Existing filesystem can be updated to use directory > indexing using the command "tune2fs -O dir_index /dev/hdXXX". Do new filesystems created with e2fsprogs 1.29 use this feature by default? ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BK PATCH] Add ext3 indexed directory (htree) support 2002-09-26 0:50 ` Aaron Lehmann @ 2002-09-26 3:28 ` Theodore Ts'o 0 siblings, 0 replies; 47+ messages in thread From: Theodore Ts'o @ 2002-09-26 3:28 UTC (permalink / raw) To: Aaron Lehmann; +Cc: linux-kernel On Wed, Sep 25, 2002 at 05:50:20PM -0700, Aaron Lehmann wrote: > On Wed, Sep 25, 2002 at 04:03:44PM -0400, tytso@mit.edu wrote: > > In order to use the new directory indexing feature, please update your > > e2fsprogs to 1.29. Existing filesystem can be updated to use directory > > indexing using the command "tune2fs -O dir_index /dev/hdXXX". > > Do new filesystems created with e2fsprogs 1.29 use this feature by > default? Yes, they do. The "tune2fs -O dir_index" command is only needed for filesystems created before 1.29. - Ted ^ permalink raw reply [flat|nested] 47+ messages in thread
* [BK PATCH] Add ext3 indexed directory (htree) support @ 2002-10-02 9:11 tytso 0 siblings, 0 replies; 47+ messages in thread From: tytso @ 2002-10-02 9:11 UTC (permalink / raw) To: torvalds; +Cc: linux-kernel Hi Linus, I believe the ext3 indexed directory (htree) patches are ready for integration into the 2.5 tree at this point. Please pull them from: bk://extfs.bkbits.net/for-linus-htree-2.5 Patches against 2.5.40 and 2.4.19 are available from: http://thunk.org/tytso/linux/ext3-dxdir/patch-ext3-dxdir-2.5.40 and http://thunk.org/tytso/linux/ext3-dxdir/patch-ext3-dxdir-2.4.19-4 This patch significantly increases the speed of using large directories in ext3, in a completely backwards and forwards compatible fashion. Creating 100,000 files in a single directory took 38 minutes without directory indexing... and 11 seconds with the directory indexing turned on. Since the last time I've submitted this changeset to you, it's received additional testing from the LKML list, and we've fixed few bugs in both the kernel code and in e2fsprogs (please use e2fsprogs 1.30-WIP-0930 from sourceforge). In addition, the code paths are essentially unchanged if the directory indexing filesystem feature flag is disabled. Hence, this is a low risk patch to apply. As before, existing filesystems can be updated to use directory indexing by using the command "tune2fs -O dir_index /dev/hdXXX", and existing large directories can be indexed by using the command "e2fsck -fD /dev/hdXXX". - Ted fs/ext3/Makefile | 2 fs/ext3/dir.c | 299 +++++++++- fs/ext3/file.c | 3 fs/ext3/hash.c | 215 +++++++ fs/ext3/namei.c | 1333 ++++++++++++++++++++++++++++++++++++++++----- fs/ext3/super.c | 6 include/linux/ext3_fs.h | 86 ++ include/linux/ext3_fs_sb.h | 2 include/linux/ext3_jbd.h | 2 include/linux/rbtree.h | 1 lib/rbtree.c | 16 11 files changed, 1827 insertions(+), 138 deletions(-) ^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2002-10-02 9:07 UTC | newest] Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-09-25 20:03 [BK PATCH] Add ext3 indexed directory (htree) support tytso 2002-09-25 20:34 ` Andreas Dilger 2002-09-25 20:41 ` Dave Jones 2002-09-25 21:08 ` Andreas Dilger 2002-09-25 21:34 ` Theodore Ts'o 2002-09-25 22:54 ` Jeff Garzik 2002-09-25 23:29 ` Theodore Ts'o 2002-09-25 23:45 ` Ryan Cumming 2002-09-26 3:27 ` Theodore Ts'o 2002-09-26 5:23 ` Ryan Cumming 2002-09-26 5:57 ` Theodore Ts'o 2002-09-26 6:22 ` Ryan Cumming 2002-09-26 14:05 ` Theodore Ts'o 2002-09-26 6:25 ` Ryan Cumming 2002-09-26 11:25 ` Daniel Egger 2002-09-26 7:41 ` Ryan Cumming 2002-09-26 13:23 ` Theodore Ts'o 2002-09-26 15:42 ` Theodore Ts'o 2002-09-26 19:08 ` Ryan Cumming 2002-09-26 19:51 ` Horst von Brand 2002-09-26 19:59 ` Ryan Cumming 2002-09-26 22:04 ` Theodore Ts'o 2002-09-26 22:53 ` Ryan Cumming 2002-09-26 23:57 ` Theodore Ts'o 2002-09-27 1:00 ` Ryan Cumming 2002-09-27 3:24 ` Theodore Ts'o 2002-09-27 4:12 ` Andreas Dilger 2002-09-27 7:55 ` Ryan Cumming 2002-09-28 1:20 ` Ryan Cumming 2002-09-28 1:46 ` Ryan Cumming 2002-09-28 14:13 ` Theodore Ts'o 2002-09-28 14:18 ` Theodore Ts'o 2002-09-28 22:35 ` Ryan Cumming 2002-09-28 17:27 ` [Ext2-devel] " Andreas Dilger 2002-09-28 18:43 ` chrisl 2002-09-28 19:45 ` chrisl 2002-09-28 22:30 ` Ryan Cumming 2002-09-29 7:03 ` [PATCH] fix htree dir corrupt after fsck -fD chrisl 2002-09-29 8:16 ` Ryan Cumming 2002-09-29 8:36 ` Ryan Cumming 2002-09-30 2:46 ` Ryan Cumming 2002-09-29 14:13 ` Theodore Ts'o 2002-09-25 23:31 ` [BK PATCH] Add ext3 indexed directory (htree) support Daniel Egger 2002-09-26 0:32 ` Randy.Dunlap 2002-09-26 0:50 ` Aaron Lehmann 2002-09-26 3:28 ` Theodore Ts'o 2002-10-02 9:11 tytso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).