linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Any known ext2 FS problems in 2.4.7?
@ 2001-08-03 17:40 David Ford
  2001-08-03 23:01 ` Andreas Dilger
  0 siblings, 1 reply; 5+ messages in thread
From: David Ford @ 2001-08-03 17:40 UTC (permalink / raw)
  To: linux-kernel

I'm starting to go through a cycle every 2-3 days where I have to bring 
one particular machine down to init l1, kill any processes and remount 
RO, then run e2fsck on the e2fs partition.  Over that period of time, 
disk space is eaten without accouting. 'du' shows about 13 gigs used 
when I sum up all the directories.  Roughly 4.5 gigs is missing.  During 
e2fsck, there are many many pages of deleted inodes with zero dtime, ref 
count fixups, and free inode count fixups.  When I say many, I mean that 
this pIII 667 scrolls for about four minutes...

There is nothing special about this partition, it doesn't do it while 
running 2.4.5-ac15, but I can't use that kernel either because it OOPSes 
as I reported.  That OOPS was fixed for 2.4.7, but this disk space issue 
is rather frustrating.  Fortunately all my other systems are reiserfs 
and work fine.

/dev/ide/host0/bus0/target0/lun0/part2 on / type ext2 
(rw,usrquota=/usr/local/admin/system-info/quota-home)

I haven't mucked with any /proc settings other than "16384" 
 >/proc/sys/fs/file-max.  It's also worthy to note that this machine 
also likes to break and spontaneously reboot about once every day.  No 
klog, no console, no nothing, just bewm.  Again 2.4.5 didn't do this.

There is nothing unusual running on this machine, it's very similar to 
several other machines that stay running with much higher loads just fine.

-d




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Any known ext2 FS problems in 2.4.7?
  2001-08-03 17:40 Any known ext2 FS problems in 2.4.7? David Ford
@ 2001-08-03 23:01 ` Andreas Dilger
  2001-08-04  3:35   ` David Ford
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Dilger @ 2001-08-03 23:01 UTC (permalink / raw)
  To: David Ford; +Cc: linux-kernel

David Ford writes:
> I'm starting to go through a cycle every 2-3 days where I have to bring 
> one particular machine down to init l1, kill any processes and remount 
> RO, then run e2fsck on the e2fs partition.  Over that period of time, 
> disk space is eaten without accouting. 'du' shows about 13 gigs used 
> when I sum up all the directories.  Roughly 4.5 gigs is missing.  During 
> e2fsck, there are many many pages of deleted inodes with zero dtime, ref 
> count fixups, and free inode count fixups.  When I say many, I mean that 
> this pIII 667 scrolls for about four minutes...
> 
> There is nothing special about this partition, it doesn't do it while 
> running 2.4.5-ac15, but I can't use that kernel either because it OOPSes 
> as I reported.  That OOPS was fixed for 2.4.7, but this disk space issue 
> is rather frustrating.  Fortunately all my other systems are reiserfs 
> and work fine.
> 
> /dev/ide/host0/bus0/target0/lun0/part2 on / type ext2 
> (rw,usrquota=/usr/local/admin/system-info/quota-home)
> 
> I haven't mucked with any /proc settings other than "16384" 
>  >/proc/sys/fs/file-max.  It's also worthy to note that this machine 
> also likes to break and spontaneously reboot about once every day.  No 
> klog, no console, no nothing, just bewm.  Again 2.4.5 didn't do this.

Are you sure you are running e2fsck on this partition at boot time?
I mean, if it is rebooting spontaneously every day, but you need to run
e2fsck manually to clean up the filesystem every 2-3 days, the fsck after
reboot should already clean up the filesystem for you.  If you _don't_
run e2fsck on this filesystem (you need a non-zero number in the 6th
column of /etc/fstab) that would explain the problem.

The "missing space" you are seeing is because files are being held open
(thus not reported by "du", which only can check linked files, but reported
by "df" which shows the whole filesystem stats).  If the files are held
open at the time of a crash, then you need to run e2fsck to clean up all
of these "orphans".  You should be able to see what process is causing this
by running "lsof | grep deleted" to find open-but-deleted files.

Note that reiserfs still has the same problem (AFAIK, I don't think it
is fixed in the stock kernels, although there is a patch available),
so even though it doesn't _need_ reiserfsck at boot time, you still
don't get the space back until it is run.  If the other machines don't
crash all the time, the space won't be "lost", so you may not notice it.
Ext3 cleans up orphans at boot time (no fsck needed).

Cheers, Andreas
-- 
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Any known ext2 FS problems in 2.4.7?
  2001-08-03 23:01 ` Andreas Dilger
@ 2001-08-04  3:35   ` David Ford
  2001-08-04  7:14     ` David Ford
  0 siblings, 1 reply; 5+ messages in thread
From: David Ford @ 2001-08-04  3:35 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-kernel

Yes I'm sure, this machine has been running for a long time quite well 
save the reported problems.

a) fsck on boot, yes
b) manual fsck and reboot every few days, yes
c) fstab is correct :P
d) missing space isn't account for in du, lsof, or any other tool that 
i'm aware of
e) 4.5 gigs is a lot of space to eat over a couple days, particularly 
when this server's disk use varies by a couple hundred megs per day
f) only the e2fs partition has problems, the other partitions are fine, 
no journal replays or otherwise

After a fresh clean fsck'd start, the machine runs fine for a day or 
two, then in about one day disk space grinds away.  Bringing it to 
single user mode with nothing running but the kernel threads and a few 
init children, lsof shows nothing but a few libs open, and no deleted or 
otherwise open for write files.

All my other machines run fine.  Also note as I said, it only does this 
under 2.4.7, if I boot into 2.4.5 it will stay up for weeks until the 
reported-and-fixed OOPs kills it.  No disk space issues there.

David

Andreas Dilger wrote:

>David Ford writes:
>
>>I'm starting to go through a cycle every 2-3 days where I have to bring 
>>one particular machine down to init l1, kill any processes and remount 
>>RO, then run e2fsck on the e2fs partition.  Over that period of time, 
>>disk space is eaten without accouting. 'du' shows about 13 gigs used 
>>when I sum up all the directories.  Roughly 4.5 gigs is missing.  During 
>>e2fsck, there are many many pages of deleted inodes with zero dtime, ref 
>>count fixups, and free inode count fixups.  When I say many, I mean that 
>>this pIII 667 scrolls for about four minutes...
>>
>>There is nothing special about this partition, it doesn't do it while 
>>running 2.4.5-ac15, but I can't use that kernel either because it OOPSes 
>>as I reported.  That OOPS was fixed for 2.4.7, but this disk space issue 
>>is rather frustrating.  Fortunately all my other systems are reiserfs 
>>and work fine.
>>
>>/dev/ide/host0/bus0/target0/lun0/part2 on / type ext2 
>>(rw,usrquota=/usr/local/admin/system-info/quota-home)
>>
>>I haven't mucked with any /proc settings other than "16384" 
>> >/proc/sys/fs/file-max.  It's also worthy to note that this machine 
>>also likes to break and spontaneously reboot about once every day.  No 
>>klog, no console, no nothing, just bewm.  Again 2.4.5 didn't do this.
>>
>
>Are you sure you are running e2fsck on this partition at boot time?
>I mean, if it is rebooting spontaneously every day, but you need to run
>e2fsck manually to clean up the filesystem every 2-3 days, the fsck after
>reboot should already clean up the filesystem for you.  If you _don't_
>run e2fsck on this filesystem (you need a non-zero number in the 6th
>column of /etc/fstab) that would explain the problem.
>
>The "missing space" you are seeing is because files are being held open
>(thus not reported by "du", which only can check linked files, but reported
>by "df" which shows the whole filesystem stats).  If the files are held
>open at the time of a crash, then you need to run e2fsck to clean up all
>of these "orphans".  You should be able to see what process is causing this
>by running "lsof | grep deleted" to find open-but-deleted files.
>
>Note that reiserfs still has the same problem (AFAIK, I don't think it
>is fixed in the stock kernels, although there is a patch available),
>so even though it doesn't _need_ reiserfsck at boot time, you still
>don't get the space back until it is run.  If the other machines don't
>crash all the time, the space won't be "lost", so you may not notice it.
>Ext3 cleans up orphans at boot time (no fsck needed).
>
>Cheers, Andreas
>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Any known ext2 FS problems in 2.4.7?
  2001-08-04  3:35   ` David Ford
@ 2001-08-04  7:14     ` David Ford
  2001-08-04 23:32       ` Andreas Dilger
  0 siblings, 1 reply; 5+ messages in thread
From: David Ford @ 2001-08-04  7:14 UTC (permalink / raw)
  Cc: Andreas Dilger, linux-kernel

Ok, the assumed guilty party that just at 4.5gigs flat tonight was the 
rsync for the kernel mirror.

Here are the facts.
a) lsof didn't report any large files opened by rsync
b) lsof currently reports about 80k in deleted files
c) the directory where rsync runs from, the destination directory of the 
mirror, both report expected sizes, the mirror is a little over 9 gigs 
like it should be, the script directory is tiny.
e) dmesg shows nothing
f) du of the partitions shows the expected usage of about 13gigs.

4.5gigs just disappeared into nothingness.  I can't find it with any 
tools.  I have to shut this machine down to single user mode and run 
e2fsck to recover the space.

David


David Ford wrote:

> Yes I'm sure, this machine has been running for a long time quite well 
> save the reported problems.
>
> a) fsck on boot, yes
> b) manual fsck and reboot every few days, yes
> c) fstab is correct :P
> d) missing space isn't account for in du, lsof, or any other tool that 
> i'm aware of
> e) 4.5 gigs is a lot of space to eat over a couple days, particularly 
> when this server's disk use varies by a couple hundred megs per day
> f) only the e2fs partition has problems, the other partitions are 
> fine, no journal replays or otherwise
>
> After a fresh clean fsck'd start, the machine runs fine for a day or 
> two, then in about one day disk space grinds away.  Bringing it to 
> single user mode with nothing running but the kernel threads and a few 
> init children, lsof shows nothing but a few libs open, and no deleted 
> or otherwise open for write files.
>
> All my other machines run fine.  Also note as I said, it only does 
> this under 2.4.7, if I boot into 2.4.5 it will stay up for weeks until 
> the reported-and-fixed OOPs kills it.  No disk space issues there.
>
> David
>
> Andreas Dilger wrote:
>
>> David Ford writes:
>>
>>> I'm starting to go through a cycle every 2-3 days where I have to 
>>> bring one particular machine down to init l1, kill any processes and 
>>> remount RO, then run e2fsck on the e2fs partition.  Over that period 
>>> of time, disk space is eaten without accouting. 'du' shows about 13 
>>> gigs used when I sum up all the directories.  Roughly 4.5 gigs is 
>>> missing.  During e2fsck, there are many many pages of deleted inodes 
>>> with zero dtime, ref count fixups, and free inode count fixups.  
>>> When I say many, I mean that this pIII 667 scrolls for about four 
>>> minutes...
>>>
>>> There is nothing special about this partition, it doesn't do it 
>>> while running 2.4.5-ac15, but I can't use that kernel either because 
>>> it OOPSes as I reported.  That OOPS was fixed for 2.4.7, but this 
>>> disk space issue is rather frustrating.  Fortunately all my other 
>>> systems are reiserfs and work fine.
>>>
>>> /dev/ide/host0/bus0/target0/lun0/part2 on / type ext2 
>>> (rw,usrquota=/usr/local/admin/system-info/quota-home)
>>>
>>> I haven't mucked with any /proc settings other than "16384" 
>>> >/proc/sys/fs/file-max.  It's also worthy to note that this machine 
>>> also likes to break and spontaneously reboot about once every day.  
>>> No klog, no console, no nothing, just bewm.  Again 2.4.5 didn't do 
>>> this.
>>>
>>
>> Are you sure you are running e2fsck on this partition at boot time?
>> I mean, if it is rebooting spontaneously every day, but you need to run
>> e2fsck manually to clean up the filesystem every 2-3 days, the fsck 
>> after
>> reboot should already clean up the filesystem for you.  If you _don't_
>> run e2fsck on this filesystem (you need a non-zero number in the 6th
>> column of /etc/fstab) that would explain the problem.
>>
>> The "missing space" you are seeing is because files are being held open
>> (thus not reported by "du", which only can check linked files, but 
>> reported
>> by "df" which shows the whole filesystem stats).  If the files are held
>> open at the time of a crash, then you need to run e2fsck to clean up all
>> of these "orphans".  You should be able to see what process is 
>> causing this
>> by running "lsof | grep deleted" to find open-but-deleted files.
>>
>> Note that reiserfs still has the same problem (AFAIK, I don't think it
>> is fixed in the stock kernels, although there is a patch available),
>> so even though it doesn't _need_ reiserfsck at boot time, you still
>> don't get the space back until it is run.  If the other machines don't
>> crash all the time, the space won't be "lost", so you may not notice it.
>> Ext3 cleans up orphans at boot time (no fsck needed).
>>
>> Cheers, Andreas
>>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Any known ext2 FS problems in 2.4.7?
  2001-08-04  7:14     ` David Ford
@ 2001-08-04 23:32       ` Andreas Dilger
  0 siblings, 0 replies; 5+ messages in thread
From: Andreas Dilger @ 2001-08-04 23:32 UTC (permalink / raw)
  To: David Ford; +Cc: Andreas Dilger, linux-kernel

David Ford writes:
> Ok, the assumed guilty party that just at 4.5gigs flat tonight was the 
> rsync for the kernel mirror.
> 
> Here are the facts.
> a) lsof didn't report any large files opened by rsync
> b) lsof currently reports about 80k in deleted files
> c) the directory where rsync runs from, the destination directory of the 
> mirror, both report expected sizes, the mirror is a little over 9 gigs 
> like it should be, the script directory is tiny.
> e) dmesg shows nothing
> f) du of the partitions shows the expected usage of about 13gigs.
> 
> 4.5gigs just disappeared into nothingness.  I can't find it with any 
> tools.  I have to shut this machine down to single user mode and run 
> e2fsck to recover the space.

Could you try (a) running with quotas disabled to see if it fixes the
problem, or (b) running an -ac kernel which has changes to the quota
code (along with new quota tools, from sourceforge, I believe).

Cheers, Andreas
-- 
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-08-07 16:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-03 17:40 Any known ext2 FS problems in 2.4.7? David Ford
2001-08-03 23:01 ` Andreas Dilger
2001-08-04  3:35   ` David Ford
2001-08-04  7:14     ` David Ford
2001-08-04 23:32       ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).