linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* filesystem bug?
@ 2003-12-15  9:25 Tsuchiya Yoshihiro
  2003-12-15  9:55 ` bert hubert
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Tsuchiya Yoshihiro @ 2003-12-15  9:25 UTC (permalink / raw)
  To: linux-kernel

Hi,

Ext2 and Ext3 filesystem go to inconsistent status by
simple test program on my system.

My test program is a script that extract a tar+gzip archive
twice and compare them, and remove one of the tree, and then
another extracting, and compare them again. A very simple test.

Following is an Ext2 result and the inode is filled by zero.
I think the inode becomes a badinode.

----
[root@dell04 tsuchiya]# ls -l /mnt/foo/ae/dir0/mozilla/layout/html/tests/table/bugs/bug2757.html
ls: /mnt/foo/ae/dir0/mozilla/layout/html/tests/table/bugs/bug2757.html: Input/output error


debugfs:  stat foo/ae/dir0/mozilla/layout/html/tests/table/bugs/bug2757.html
Inode: 1935297   Type: bad type    Mode:  0000   Flags: 0x0   Generation: 0
User:     0   Group:     0   Size: 0
File ACL: 0    Directory ACL: 0
Links: 0   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x00000000 -- Thu Jan  1 09:00:00 1970
atime: 0x00000000 -- Thu Jan  1 09:00:00 1970
mtime: 0x00000000 -- Thu Jan  1 09:00:00 1970
BLOCKS:

/dev/sda4 on /mnt type ext2 (rw)
----

I saw same thing on Ext3 before.

I use RedHat9 which kernel is 2.4.20-8 and I also tried
2.4.20-19.9(redhat kernel patch rpm).

I want to know whether it is a redhat kernel problem or a generic
Ext problem and on which version it is fixed.


Mkfs parameter is just default of /sbin/mkfs.ext2 and mkfs.ext3,
and I use DELL 1650's internal SCSI disks for this test:

----
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
        <Adaptec aic7899 Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
        <Adaptec aic7899 Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs

blk: queue dfceb214, I/O limit 4095Mb (mask 0xffffffff)
  Vendor: SEAGATE   Model: ST336607LC        Rev: DS04
  Type:   Direct-Access                      ANSI SCSI revision: 03
blk: queue dfceb414, I/O limit 4095Mb (mask 0xffffffff)
  Vendor: SEAGATE   Model: ST336753LC        Rev: DX03
  Type:   Direct-Access                      ANSI SCSI revision: 03
----

I will attach my script. I use Mozilla's tar archive.
Edit the first three lines for your use.

In the example above, the inode structure was cleared by zero, and
some time the data area was broken. Also I saw an inode overwritten
by deleted inode(which nlink=0 and i_dtime is on).
My feeling is that the broken buffers were used for some other
purpose and destroyed without having right LOCK of the buffer.

Here is my script:
---
#!/bin/bash

TARGETPREFIX=/mnt/foo   # filesystem that will be tested
MOZSRC=/home/tsuchiya/src/mozilla-source-1.3.tar.gz     # tgz used for test
RDIR="/tmp/xcresult"    # result directory

function _xtract+compare {
        echo "extracting directory to be compared against for $1"
        TARGETDIR=$TARGETPREFIX/$1
        mkdir -p $TARGETDIR
        cd $TARGETDIR
        tar zxf $MOZSRC
        echo "$1 done .... now the job is started."
        RESULTS=$RDIR/$1

        echo "test result will be stored under $RESULTS"
        mkdir -p $RESULTS;

        for ((i=0; i < 100000; i++))
        do
                echo "$1:$i-th trial"

                echo "test dir is $TARGETDIR";
                mkdir -p $TARGETDIR;

                cd $TARGETDIR
                mkdir dir$i
                cd dir$i
                tar zxf $MOZSRC
                diff -rq $TARGETDIR/mozilla mozilla > $RESULTS/dir$i.result 2>&1
                DIFFSIZE=`ls -l $RESULTS/dir$i.result | awk '{print $5}'`
                if [ $DIFFSIZE != 0 ];
                then
                        echo "something wrong happened at $1:$i-th trial "
                        exit;
                else
                        rm $RESULTS/dir$i.result
                        echo "test $1:$i-th passed"
                fi
                rm -rf mozilla &
        done
}

for target in aa ab ac ad ae # af ag ah ai aj ak al am an
do
        _xtract+compare $target $RDIR &
done

---


Any information would be appreciated.

Thanks,
Yoshi
---
Yoshihiro Tsuchiya




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-15  9:25 filesystem bug? Tsuchiya Yoshihiro
@ 2003-12-15  9:55 ` bert hubert
  2003-12-16 13:44 ` Stephen C. Tweedie
  2003-12-26  9:59 ` dlion
  2 siblings, 0 replies; 30+ messages in thread
From: bert hubert @ 2003-12-15  9:55 UTC (permalink / raw)
  To: Tsuchiya Yoshihiro; +Cc: linux-kernel

On Mon, Dec 15, 2003 at 06:25:17PM +0900, Tsuchiya Yoshihiro wrote:


> I use RedHat9 which kernel is 2.4.20-8 and I also tried
> 2.4.20-19.9(redhat kernel patch rpm).
> 
> I want to know whether it is a redhat kernel problem or a generic
> Ext problem and on which version it is fixed.

Red Hat patch their kernel so heavily it is hard to tell if this applies to
stock 2.4 as well, I suggest you compile your own version and compare. If
you want to be really helpful, try 2.6 too :-)

Wonderful testing by the way, thanks!

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-15  9:25 filesystem bug? Tsuchiya Yoshihiro
  2003-12-15  9:55 ` bert hubert
@ 2003-12-16 13:44 ` Stephen C. Tweedie
  2003-12-16 21:40   ` Bryan Whitehead
  2003-12-26  9:59 ` dlion
  2 siblings, 1 reply; 30+ messages in thread
From: Stephen C. Tweedie @ 2003-12-16 13:44 UTC (permalink / raw)
  To: tsuchiya; +Cc: linux-kernel, Stephen Tweedie

Hi,

On Mon, 2003-12-15 at 09:25, Tsuchiya Yoshihiro wrote:

> Following is an Ext2 result and the inode is filled by zero.
> I think the inode becomes a badinode.

> [root@dell04 tsuchiya]# ls -l /mnt/foo/ae/dir0/mozilla/layout/html/tests/table/bugs/bug2757.html
> ls: /mnt/foo/ae/dir0/mozilla/layout/html/tests/table/bugs/bug2757.html: Input/output error

"Input/output error" can sometimes mean that the kernel has found a
filesystem problem, but it also often indicates a device-layer problem. 
Is there anything helpful in the kernel logs?

Cheers,
 Stephen



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-16 13:44 ` Stephen C. Tweedie
@ 2003-12-16 21:40   ` Bryan Whitehead
  2003-12-16 21:50     ` Bryan Whitehead
  2003-12-16 23:31     ` Tsuchiya Yoshihiro
  0 siblings, 2 replies; 30+ messages in thread
From: Bryan Whitehead @ 2003-12-16 21:40 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: tsuchiya, linux-kernel

I get this problem all the time here at JPL. I can always get the files 
back by remounting the filesystem.

For example if /dev/sdb1 mounted on /export/project is getting wierd 
"Input/output" errors I can simply run this command:
mount -o remount /dev/sdb1 /export/project

It's been about a year of these problems... I'll try running the test 
Tsuchiya Yoshihiro made to reproduce. (I have not been able to create a 
test that can consistantly reproduce... but the problem has sure screwed 
up some data-gathering runs in the lab).

These are all on Mandrake kernels though.... (from the 9.0 series). so 
that's 2.4.19+tonOfPatches.

Stephen C. Tweedie wrote:
> Hi,
> 
> On Mon, 2003-12-15 at 09:25, Tsuchiya Yoshihiro wrote:
> 
> 
>>Following is an Ext2 result and the inode is filled by zero.
>>I think the inode becomes a badinode.
> 
> 
>>[root@dell04 tsuchiya]# ls -l /mnt/foo/ae/dir0/mozilla/layout/html/tests/table/bugs/bug2757.html
>>ls: /mnt/foo/ae/dir0/mozilla/layout/html/tests/table/bugs/bug2757.html: Input/output error
> 
> 
> "Input/output error" can sometimes mean that the kernel has found a
> filesystem problem, but it also often indicates a device-layer problem. 
> Is there anything helpful in the kernel logs?
> 
> Cheers,
>  Stephen
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


-- 
Bryan Whitehead
SysAdmin - JPL - Interferometry and Large Optical Systems
Phone: 818 354 2903
driver@jpl.nasa.gov


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-16 21:40   ` Bryan Whitehead
@ 2003-12-16 21:50     ` Bryan Whitehead
  2003-12-16 23:31     ` Tsuchiya Yoshihiro
  1 sibling, 0 replies; 30+ messages in thread
From: Bryan Whitehead @ 2003-12-16 21:50 UTC (permalink / raw)
  To: Bryan Whitehead; +Cc: Stephen C. Tweedie, tsuchiya, linux-kernel

BTW, this happens on these filesystems we tried: ext2, ext3, and XFS.

Bryan Whitehead wrote:
> I get this problem all the time here at JPL. I can always get the files 
> back by remounting the filesystem.
> 
> For example if /dev/sdb1 mounted on /export/project is getting wierd 
> "Input/output" errors I can simply run this command:
> mount -o remount /dev/sdb1 /export/project
> 
> It's been about a year of these problems... I'll try running the test 
> Tsuchiya Yoshihiro made to reproduce. (I have not been able to create a 
> test that can consistantly reproduce... but the problem has sure screwed 
> up some data-gathering runs in the lab).
> 
> These are all on Mandrake kernels though.... (from the 9.0 series). so 
> that's 2.4.19+tonOfPatches.
> 
> Stephen C. Tweedie wrote:
> 
>> Hi,
>>
>> On Mon, 2003-12-15 at 09:25, Tsuchiya Yoshihiro wrote:
>>
>>
>>> Following is an Ext2 result and the inode is filled by zero.
>>> I think the inode becomes a badinode.
>>
>>
>>
>>> [root@dell04 tsuchiya]# ls -l 
>>> /mnt/foo/ae/dir0/mozilla/layout/html/tests/table/bugs/bug2757.html
>>> ls: 
>>> /mnt/foo/ae/dir0/mozilla/layout/html/tests/table/bugs/bug2757.html: 
>>> Input/output error
>>
>>
>>
>> "Input/output error" can sometimes mean that the kernel has found a
>> filesystem problem, but it also often indicates a device-layer 
>> problem. Is there anything helpful in the kernel logs?
>>
>> Cheers,
>>  Stephen
>>
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe 
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
> 


-- 
Bryan Whitehead
SysAdmin - JPL - Interferometry and Large Optical Systems
Phone: 818 354 2903
driver@jpl.nasa.gov


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-16 21:40   ` Bryan Whitehead
  2003-12-16 21:50     ` Bryan Whitehead
@ 2003-12-16 23:31     ` Tsuchiya Yoshihiro
  2003-12-16 23:40       ` viro
  2003-12-17 23:24       ` Tsuchiya Yoshihiro
  1 sibling, 2 replies; 30+ messages in thread
From: Tsuchiya Yoshihiro @ 2003-12-16 23:31 UTC (permalink / raw)
  To: Bryan Whitehead; +Cc: Stephen C. Tweedie, linux-kernel

Hi,

Stephen, I don't have anything helpful for debuging at this point. We 
noticed the problem
by debuging our SCSI driver. Then we found that the same thing happens 
on generic
SCSI disk and IDE also.  The problem we observed in our driver was that 
while it is
processing a buffer, which should be locked by BH_LOCK,  the contents of 
the buffer were
overwritten. The amount of overwrite is a few byte to 1KB out of 4KB, 
which cannot be done
in our driver. Then, we tried a generic SCSI and I reproduced the problem.
I think it is not because of a broken pointer because overwrites only 
happen in data buffers
and other parts of memory seem ok.

Especially with Ext2 reproducing is easy, it happens in a few hours with 
my script.
With Ext3, in a day if you are lucky.

Now I am trying 2.4.23 from kernel.org with ext3, and 2.6.0-test11 from 
kernel.org with ext3.
So far, it's been a about a day, they are runing nicely. Let's see what 
happens.

Following is the failed combination:
Redhat9 with 2.4.20-8 ext2 and ext3
Redhat9 with 2.4.20-19.9 ext2 and ext3
Redhat9 with 2.4.20-24.9 ext2

Thanks,
Yoshi
---
Yoshihiro Tsuchiya



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-16 23:31     ` Tsuchiya Yoshihiro
@ 2003-12-16 23:40       ` viro
  2003-12-17  0:12         ` Tsuchiya Yoshihiro
  2003-12-17 23:24       ` Tsuchiya Yoshihiro
  1 sibling, 1 reply; 30+ messages in thread
From: viro @ 2003-12-16 23:40 UTC (permalink / raw)
  To: Tsuchiya Yoshihiro; +Cc: Bryan Whitehead, Stephen C. Tweedie, linux-kernel

On Wed, Dec 17, 2003 at 08:31:55AM +0900, Tsuchiya Yoshihiro wrote:
> Hi,
> 
> Stephen, I don't have anything helpful for debuging at this point. We 
> noticed the problem
> by debuging our SCSI driver. Then we found that the same thing happens 
> on generic
> SCSI disk and IDE also.  The problem we observed in our driver was that 
> while it is
> processing a buffer, which should be locked by BH_LOCK,  the contents of 
> the buffer were
> overwritten. The amount of overwrite is a few byte to 1KB out of 4KB, 
> which cannot be done
> in our driver. Then, we tried a generic SCSI and I reproduced the problem.
> I think it is not because of a broken pointer because overwrites only 
> happen in data buffers
> and other parts of memory seem ok.
 
Umm...  You do realize that if you have a shared writable mapping, the
buffer contents _can_ change during the IO?  Legitimately.  When dirty
page is being written to disk, it remains mapped.  So process can change
its contents just fine.

BH_LOCK does not prevent that and it was never supposed to...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-16 23:40       ` viro
@ 2003-12-17  0:12         ` Tsuchiya Yoshihiro
  0 siblings, 0 replies; 30+ messages in thread
From: Tsuchiya Yoshihiro @ 2003-12-17  0:12 UTC (permalink / raw)
  To: viro; +Cc: Bryan Whitehead, Stephen C. Tweedie, linux-kernel

viro@parcelfarce.linux.theplanet.co.uk wrote:

>Umm...  You do realize that if you have a shared writable mapping, the
>buffer contents _can_ change during the IO?  Legitimately.  When dirty


Thanks for telling me about it.
But this case, the broken data can be in an inode block.
And also the test script does not share any files. 
It just create,write,read and remove independent files.


Yoshihiro Tsuchiya:
>SCSI disk and IDE also.  The problem we observed in our driver was that 

We haven't tried it with IDE disk. Sorry. 

Thank you,
Yoshi
---
Yoshihiro Tsuchiya



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-16 23:31     ` Tsuchiya Yoshihiro
  2003-12-16 23:40       ` viro
@ 2003-12-17 23:24       ` Tsuchiya Yoshihiro
  2003-12-18 21:29         ` Stephen C. Tweedie
  1 sibling, 1 reply; 30+ messages in thread
From: Tsuchiya Yoshihiro @ 2003-12-17 23:24 UTC (permalink / raw)
  To: tsuchiya; +Cc: Bryan Whitehead, Stephen C. Tweedie, linux-kernel

Tsuchiya Yoshihiro wrote:

> Especially with Ext2 reproducing is easy, it happens in a few hours 
> with my script.
> With Ext3, in a day if you are lucky.
>
> Now I am trying 2.4.23 from kernel.org with ext3, and 2.6.0-test11 
> from kernel.org with ext3.
> So far, it's been a about a day, they are runing nicely. Let's see 
> what happens.
>
> Following is the failed combination:
> Redhat9 with 2.4.20-8 ext2 and ext3
> Redhat9 with 2.4.20-19.9 ext2 and ext3
> Redhat9 with 2.4.20-24.9 ext2

I forgot to mention that I had been testing 2.4.20 from kernel.org 
also.... And it failed now!

As you see below, /mnt/foo/ad/mozilla was gone. ad/mozilla had been used 
to compare
with dir*/mozilla and it is basically read-only and will never be 
removed by the script.
/mnt/foo/ae is ok and ae/mozilla is of cource there.

It had been almost 2days scince the test started, and the test was 58-th 
turn.
It had run on ext2 filesystem, and the kernel was downloaded from 
kernel.org.

I had seen the same problem--I mean read-only mozilla directory going 
away--
on ext3 on redhat kernel 2.4.20-19.9.

[root@dell04 tsuchiya]# ls /mnt/foo/ad
dir0   dir14  dir2   dir25  dir30  dir36  dir41  dir47  dir52  dir58
dir1   dir15  dir20  dir26  dir31  dir37  dir42  dir48  dir53  dir6
dir10  dir16  dir21  dir27  dir32  dir38  dir43  dir49  dir54  dir7
dir11  dir17  dir22  dir28  dir33  dir39  dir44  dir5   dir55  dir8
dir12  dir18  dir23  dir29  dir34  dir4   dir45  dir50  dir56  dir9
dir13  dir19  dir24  dir3   dir35  dir40  dir46  dir51  dir57
[root@dell04 tsuchiya]# ls /mnt/foo/ae
dir0   dir14  dir2   dir25  dir30  dir36  dir41  dir47  dir52  dir58
dir1   dir15  dir20  dir26  dir31  dir37  dir42  dir48  dir53  dir6
dir10  dir16  dir21  dir27  dir32  dir38  dir43  dir49  dir54  dir7
dir11  dir17  dir22  dir28  dir33  dir39  dir44  dir5   dir55  dir8
dir12  dir18  dir23  dir29  dir34  dir4   dir45  dir50  dir56  dir9
dir13  dir19  dir24  dir3   dir35  dir40  dir46  dir51  dir57  mozilla

Yoshi
--
Yoshihiro Tsuchiya



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-17 23:24       ` Tsuchiya Yoshihiro
@ 2003-12-18 21:29         ` Stephen C. Tweedie
  2003-12-21 23:15           ` Tsuchiya Yoshihiro
  0 siblings, 1 reply; 30+ messages in thread
From: Stephen C. Tweedie @ 2003-12-18 21:29 UTC (permalink / raw)
  To: tsuchiya; +Cc: Bryan Whitehead, linux-kernel, Stephen Tweedie

Hi,

On Wed, 2003-12-17 at 23:24, Tsuchiya Yoshihiro wrote:
> Tsuchiya Yoshihiro wrote:
> 
> > Especially with Ext2 reproducing is easy, it happens in a few hours 
> > with my script.
> > With Ext3, in a day if you are lucky.

I've seen plenty of problems which seem more easily reproduced with one
fs over another but which turned out to be due to either bad hardware or
a kernel bug somewhere completely different in the system.  But the
basic knowledge that it happens on multiple filesystems is really
helpful to eliminate possibilities.

> > Following is the failed combination:
> > Redhat9 with 2.4.20-8 ext2 and ext3
> > Redhat9 with 2.4.20-19.9 ext2 and ext3
> > Redhat9 with 2.4.20-24.9 ext2
> 
> I forgot to mention that I had been testing 2.4.20 from kernel.org 
> also.... And it failed now!

This looks more and more like either bad hardware, or a specific device
driver problem.  What storage is being used here?

It could possibly be a core VFS bug, but the VFS is in general pretty
reliable under load.  We've had problems under specific edge conditions
such as races between sync and unmount, but the basic VFS behaviour
under load generally gets _lots_ of testing, so I'd definitely start by
looking elsewhere.  

I'd also like to see how your 2.4.23 and 2.6.0-test11 testing is going. 
That might give some clues, too.  There's a race between clear_inode()
and read_inode() fixed in those kernels, but that doesn't look relevant
here; there may be something else changed that's significant, though.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-18 21:29         ` Stephen C. Tweedie
@ 2003-12-21 23:15           ` Tsuchiya Yoshihiro
  2003-12-22  1:54             ` Tsuchiya Yoshihiro
  2003-12-22  4:30             ` Tsuchiya Yoshihiro
  0 siblings, 2 replies; 30+ messages in thread
From: Tsuchiya Yoshihiro @ 2003-12-21 23:15 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel

Stephen C. Tweedie wrote:

>>>Following is the failed combination:
>>>Redhat9 with 2.4.20-8 ext2 and ext3
>>>Redhat9 with 2.4.20-19.9 ext2 and ext3
>>>Redhat9 with 2.4.20-24.9 ext2
>>>      
>>>
>>I forgot to mention that I had been testing 2.4.20 from kernel.org 
>>also.... And it failed now!
>>    
>>
>
>This looks more and more like either bad hardware, or a specific device
>driver problem.  What storage is being used here?
>
>  
>
Hi,

Stephen, I don't think it is a hardware problem, since this problem
happens on
several different machines, and it happens both on SCSI disk and
our own iSCSI like device driver. I typically use:

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36
<Adaptec aic7899 Ultra160 SCSI adapter>
aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs

blk: queue c1671674, I/O limit 4095Mb (mask 0xffffffff)
Vendor: SEAGATE Model: ST336753LC Rev: DX03


>It could possibly be a core VFS bug, but the VFS is in general pretty
>reliable under load.  We've had problems under specific edge conditions
>such as races between sync and unmount, but the basic VFS behaviour
>under load generally gets _lots_ of testing, so I'd definitely start by
>looking elsewhere.  
>
>I'd also like to see how your 2.4.23 and 2.6.0-test11 testing is going. 
>That might give some clues, too.  There's a race between clear_inode()
>and read_inode() fixed in those kernels, but that doesn't look relevant
>here; there may be something else changed that's significant, though.
>
>  
>
EXT3 on 2.4.23 and 2.6.0-test11 both failed. I feel when I make the
filesystem
smaller - make the filesystem usage 70% to 80% during the test- ,
the problem happens easyer.

Yoshi
--
Yoshihiro Tsuchiya



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-21 23:15           ` Tsuchiya Yoshihiro
@ 2003-12-22  1:54             ` Tsuchiya Yoshihiro
  2003-12-22  4:30             ` Tsuchiya Yoshihiro
  1 sibling, 0 replies; 30+ messages in thread
From: Tsuchiya Yoshihiro @ 2003-12-22  1:54 UTC (permalink / raw)
  Cc: Stephen C. Tweedie, linux-kernel


Hi,

The problems I had seen with my script are followings:

1. disk inode is filled by 0.
2. disk inode is written by deleted inode.
3. a directory is gone during and after the test.
4. a directory is gone during the test, but it actually
exists after the test. (test script says "no such file or dir",
but when I see the directory, it is there.)
5. file data is destroyed.

Problem #1 to #3 happened both on ext2 and ext3.
#5 only on ext2 and #4 only on ext3.



Thanks,
Yoshi

--
Yoshihiro Tsuchiya



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-21 23:15           ` Tsuchiya Yoshihiro
  2003-12-22  1:54             ` Tsuchiya Yoshihiro
@ 2003-12-22  4:30             ` Tsuchiya Yoshihiro
  2003-12-22 12:03               ` Stephen C. Tweedie
  1 sibling, 1 reply; 30+ messages in thread
From: Tsuchiya Yoshihiro @ 2003-12-22  4:30 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel

Tsuchiya Yoshihiro wrote:

>>It could possibly be a core VFS bug, but the VFS is in general pretty
>>reliable under load.  We've had problems under specific edge conditions
>>such as races between sync and unmount, but the basic VFS behaviour
>>under load generally gets _lots_ of testing, so I'd definitely start by
>>looking elsewhere.  
>>
>>I'd also like to see how your 2.4.23 and 2.6.0-test11 testing is going. 
>>That might give some clues, too.  There's a race between clear_inode()
>>and read_inode() fixed in those kernels, but that doesn't look relevant
>>here; there may be something else changed that's significant, though.
>>
>> 
>>
>>    
>>
>EXT3 on 2.4.23 and 2.6.0-test11 both failed. I feel when I make the
>filesystem
>smaller - make the filesystem usage 70% to 80% during the test- ,
>the problem happens easyer.
>
>  
>

I tried it with IDE disk and it failed also. It was run on
ext2 on 2.4.23. So it's not a SCSI problem.


Thanks,
Yoshi

--
Yoshihiro Tsuchiya




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-22  4:30             ` Tsuchiya Yoshihiro
@ 2003-12-22 12:03               ` Stephen C. Tweedie
  2003-12-24  1:48                 ` Tsuchiya Yoshihiro
  0 siblings, 1 reply; 30+ messages in thread
From: Stephen C. Tweedie @ 2003-12-22 12:03 UTC (permalink / raw)
  To: tsuchiya; +Cc: linux-kernel, Stephen Tweedie

Hi,

On Mon, 2003-12-22 at 04:30, Tsuchiya Yoshihiro wrote:

> I tried it with IDE disk and it failed also. It was run on
> ext2 on 2.4.23. So it's not a SCSI problem.

OK, I'll try your script with a 2.4.21 or 2.4.23 kernel to see if we can
reproduce this here.  In the mean time, could you possibly try a 2.4.24
kernel, just in case the clear_inode race has something to do with this?

Thanks,
 Stephen


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-22 12:03               ` Stephen C. Tweedie
@ 2003-12-24  1:48                 ` Tsuchiya Yoshihiro
  2003-12-24 23:09                   ` Tsuchiya Yoshihiro
  0 siblings, 1 reply; 30+ messages in thread
From: Tsuchiya Yoshihiro @ 2003-12-24  1:48 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel

Stephen C. Tweedie wrote:

>OK, I'll try your script with a 2.4.21 or 2.4.23 kernel to see if we can
>reproduce this here.  In the mean time, could you possibly try a 2.4.24
>kernel, just in case the clear_inode race has something to do with this?
>
>  
>
Stephen, I started running the test on ext2 and ext3 on 2.4.24-pre2.

BTW, what exactly is the clear_inode and read_inode race that you mentioned?

I am not familar with the locking model in Linux kernel. I found
kernel_lock is
held before ext3_rename/unlink/rmdir, so I think it's ok. But I do not
understand
how it is done in the path walk.

Thanks,
Yoshi

--
Yoshihiro Tsuchiya




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-24  1:48                 ` Tsuchiya Yoshihiro
@ 2003-12-24 23:09                   ` Tsuchiya Yoshihiro
  2004-01-15  6:38                     ` Tsuchiya Yoshihiro
  0 siblings, 1 reply; 30+ messages in thread
From: Tsuchiya Yoshihiro @ 2003-12-24 23:09 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel

Tsuchiya Yoshihiro wrote:

>>OK, I'll try your script with a 2.4.21 or 2.4.23 kernel to see if we can
>>reproduce this here.  In the mean time, could you possibly try a 2.4.24
>>kernel, just in case the clear_inode race has something to do with this?
>>
>> 
>>
>>    
>>
>Stephen, I started running the test on ext2 and ext3 on 2.4.24-pre2.
>  
>
The test on ext3 on 2.4.24-pre2 failed. The read-only directory has been 
gone.
As from the number of files and blocks that 'df' says, not only the lost 
directory
has been gone, it looks the directories and files under it also have 
been gone.
The remove command looks like it really worked on the directory, rather than
the parent directory is broken.

The test on ext2 is still running.

Thanks,
Yoshi
---
Yoshihiro Tsuchiya


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-15  9:25 filesystem bug? Tsuchiya Yoshihiro
  2003-12-15  9:55 ` bert hubert
  2003-12-16 13:44 ` Stephen C. Tweedie
@ 2003-12-26  9:59 ` dlion
  2003-12-26 12:27   ` dlion
  2 siblings, 1 reply; 30+ messages in thread
From: dlion @ 2003-12-26  9:59 UTC (permalink / raw)
  To: lkml

Hello Tsuchiya,

Monday, December 15, 2003, 5:25:17 PM, you wrote:

TY> Hi,

TY> Ext2 and Ext3 filesystem go to inconsistent status by
TY> simple test program on my system.

TY> My test program is a script that extract a tar+gzip archive
TY> twice and compare them, and remove one of the tree, and then
TY> another extracting, and compare them again. A very simple test.

I tried your script on ext2 and ext3 filesystem on a ramdisk. I got errors,
too. It seems that this problem is unrelated to device driver or
hardware.

The mozilla tarball is too big for a ramdisk. I use a
zhcon-0.2.1.tar.gz (4,991,350 bytes) instead.

I only got one kind of error on ext2 filesystem. That is, the script
 said the read-only directory zhcon-0.2.1 is missing, but it _is_ there.
I used e2fsck to check the ramdisk and found no error.

I got other errors on ext3 filesystem include:
1. missing file
2. corrupted file
but when I used fsck.ext3 to check the ramdisk, the result was clean.

My system is:
CPU:  AMD Athlon XP 1800+
RAM:  256M DDR333
Chipset: VIA KT400A
Linux Distribution: Fedora Core 1
Linux Kernel: kernel-2.4.22-1.2115.nptl.athlon.rpm

-- 
Best regards,
 dlion                            mailto:dlion2004_at_sina.com.cn



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-26  9:59 ` dlion
@ 2003-12-26 12:27   ` dlion
  0 siblings, 0 replies; 30+ messages in thread
From: dlion @ 2003-12-26 12:27 UTC (permalink / raw)
  To: lkml

Hello ,

d> I tried your script on ext2 and ext3 filesystem on a ramdisk. I got errors,
d> too. It seems that this problem is unrelated to device driver or
d> hardware.

d> The mozilla tarball is too big for a ramdisk. I use a
d> zhcon-0.2.1.tar.gz (4,991,350 bytes) instead.

d> I only got one kind of error on ext2 filesystem. That is, the script
d>  said the read-only directory zhcon-0.2.1 is missing, but it _is_ there.
d> I used e2fsck to check the ramdisk and found no error.

d> I got other errors on ext3 filesystem include:
d> 1. missing file
d> 2. corrupted file
d> but when I used fsck.ext3 to check the ramdisk, the result was clean.

d> My system is:
d> CPU:  AMD Athlon XP 1800+
d> RAM:  256M DDR333
d> Chipset: VIA KT400A
d> Linux Distribution: Fedora Core 1
d> Linux Kernel: kernel-2.4.22-1.2115.nptl.athlon.rpm

I did the same on the kernel-2.4.22-1.2135.nptl.athlon.rpm and got the
same result.


-- 
Best regards,
 dlion                            mailto:dlion2004@sina.com.cn



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-24 23:09                   ` Tsuchiya Yoshihiro
@ 2004-01-15  6:38                     ` Tsuchiya Yoshihiro
  0 siblings, 0 replies; 30+ messages in thread
From: Tsuchiya Yoshihiro @ 2004-01-15  6:38 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel, dlion2004, Marcelo Tosatti


Hi,
I tried ramdisk again with more running process, and the script failed very
early just like Mr Dlion reported previously. It is about 20 minutes on 
my machines.

1. The script use nvi-1.79 tar ball
2. Prepare 64MB ramdisk, and mkfs on it.
3. edit the first three lines and run the script below(its name is xc-1.2)
4. wait half an hour and see the result will be in /tmp/xcresult

Thanks,
Yoshi
--------------------------------
#!/bin/bash

TARGETPREFIX=/mnt/foo   # filesystem that will be tested
#MOZSRC=/home/tsuchiya/src/mozilla-source-1.3.tar.gz    # tgz used for test
MOZSRC=/home/tsuchiya/src/nvi-1.79.tar.gz       # tgz used for test
RDIR="/tmp/xcresult"    # result directory
#SOURCE=mozilla
SOURCE=nvi-1.79

ERRORF=$RDIR/ERROR
INOFILE=$RDIR/INOF

touch $ERRORF

function _xtract+compare {
        echo "extracting directory to be compared against for $1"
        TARGETDIR=$TARGETPREFIX/$1
        mkdir -p $TARGETDIR
        cd $TARGETDIR
        tar zxf $MOZSRC
        echo "$1 done .... now the job is started."
# new
#       touch $INOFILE
        pwd >> $INOFILE-$1
        ls -lid $SOURCE >> $INOFILE-$1

        RESULTS=$RDIR/$1
        echo "test result will be stored under $RESULTS"
        mkdir -p $RESULTS;
#       echo "test dir is $TARGETDIR";
        mkdir -p $TARGETDIR;

        for ((i=0; i < 100000; i++))    # ext2/3 limit 32000
        do

                cd $TARGETDIR
                mkdir $TARGETDIR/dirXC$i
                cd $TARGETDIR/dirXC$i > $RDIR/CD-ERR-$1 2>&1

                if [ -s $RDIR/CD-ERR-$1 ]
                then
                        echo "something wrong happened at $1:$i-th trial "
                        df > $RDIR/DF-$1
                        exit;
                fi

                tar zxf $MOZSRC >> $ERRORF

#                echo "test dir for $TARGETDIR" >> $INOFILE-$1
                ls -lid $SOURCE >> $INOFILE-$1

                diff -rq $TARGETDIR/$SOURCE $TARGETDIR/dirXC$i/$SOURCE > 
$RESULT
S/dirXC$i.result 2>&1
                DIFFSIZE=`ls -l $RESULTS/dirXC$i.result | awk '{print $5}'`
                if [ $DIFFSIZE != 0 ];
                then
                        echo "something wrong happened at $1:$i-th trial "
                        df > $RDIR/DF-$1
                        exit;
                else
                        rm $RESULTS/dirXC$i.result
                        echo "test $1:$i-th passed"
                fi

                cd ..
                rm -rf $TARGETDIR/dirXC$i &
        done
}

for target in aa ab ac ad ae af #ag ah ai aj ak al am an
do
        _xtract+compare $target $RDIR &
done

--
Yoshihiro Tsuchiya




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2004-01-20  8:36       ` Tsuchiya Yoshihiro
@ 2004-01-20 16:27         ` Stephen C. Tweedie
  0 siblings, 0 replies; 30+ messages in thread
From: Stephen C. Tweedie @ 2004-01-20 16:27 UTC (permalink / raw)
  To: tsuchiya; +Cc: linux-kernel, Stephen Tweedie

Hi,

On Tue, 2004-01-20 at 08:36, Tsuchiya Yoshihiro wrote:

> >http://linux.bkbits.net:8080/linux-2.4/patch@1.1136.67.1

> 2. some time, i_nlink was 0 and i_dtime was set which is I think
> somewhat related with this patch, but the other time,
> part of a inode block was cleaned with 0, which I do not understand
> how at all.

Yep.  I'd really need to see exactly which kernel versions these
specific problems reproduce on to take this much further, though.  I'll
be travelling for the next week and a half, but I'll look for more
results once I'm back from that.

Thanks,
 Stephen


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2004-01-19 13:12     ` Stephen C. Tweedie
@ 2004-01-20  8:36       ` Tsuchiya Yoshihiro
  2004-01-20 16:27         ` Stephen C. Tweedie
  0 siblings, 1 reply; 30+ messages in thread
From: Tsuchiya Yoshihiro @ 2004-01-20  8:36 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel

Stephen C. Tweedie wrote:

> Other than 2.4.20-28.9, since they have been running for three days,
>
>>they seems nice at this point.
>>
>>What exactly is the race condition between read_inode() and
>>clear_inode() you have
>>mentioned?
>>    
>>
>
>This one:
>
>http://linux.bkbits.net:8080/linux-2.4/patch@1.1136.67.1
>  
>

Thank you. I think this one does not explain all of my problem.
1. the corrupted inode was still in the parent directory. It is
strange because unlink removes the directory entry first and then
iput deletes the inode.

2. some time, i_nlink was 0 and i_dtime was set which is I think
somewhat related with this patch, but the other time,
part of a inode block was cleaned with 0, which I do not understand
how at all.

Thank you,
Yoshi
-- 
--
Yoshihiro Tsuchiya



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2004-01-19  7:52   ` Tsuchiya Yoshihiro
@ 2004-01-19 13:12     ` Stephen C. Tweedie
  2004-01-20  8:36       ` Tsuchiya Yoshihiro
  0 siblings, 1 reply; 30+ messages in thread
From: Stephen C. Tweedie @ 2004-01-19 13:12 UTC (permalink / raw)
  To: tsuchiya; +Cc: linux-kernel, Stephen Tweedie

Hi,

On Mon, 2004-01-19 at 07:52, Tsuchiya Yoshihiro wrote:

> >OK.  Under exactly what circumstances have you seen this in the past, as
> >opposed to the other problem?  I have not been able to reproduce this
> >one so far.

> Other than 2.4.20-28.9, since they have been running for three days,
> they seems nice at this point.
> 
> What exactly is the race condition between read_inode() and
> clear_inode() you have
> mentioned?

This one:

http://linux.bkbits.net:8080/linux-2.4/patch@1.1136.67.1

Cheers,
 Stephen


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2004-01-16 12:29 ` Stephen C. Tweedie
@ 2004-01-19  7:52   ` Tsuchiya Yoshihiro
  2004-01-19 13:12     ` Stephen C. Tweedie
  0 siblings, 1 reply; 30+ messages in thread
From: Tsuchiya Yoshihiro @ 2004-01-19  7:52 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel

Hello,

Stephen C. Tweedie wrote:

>OK.  Under exactly what circumstances have you seen this in the past, as
>opposed to the other problem?  I have not been able to reproduce this
>one so far.
>  
>

The combinations of kernel versions and filesystem types are:
2.4.20-8 ext2
2.4.20-19.9 ext2, ext3
2.4.20-24.9 ext2
2.4.20-28.9 ext2

I do the test with mozilla-1.3.tar.gz and 6 processes in the script,
it happens with ext2 within a few hours.

I haven't seen the problem on 2.4.20,23 and 24.

So now I am testing followings:
2.4.24-pre2 ext2 (mozilla-1.3.tar.gz)
2.4.24 ext2 (nvi-1.79.tar.gz)
2.4.20 ext3 (mozilla-1.3.tar.gz)
2.4.23 ext3 (mozilla-1.3.tar.gz)
2.4.24 ext3 (mozilla-1.3.tar.gz)
2.4.20-28.9 ext3 (mozilla-1.3.tar.gz)

Other than 2.4.20-28.9, since they have been running for three days,
they seems nice at this point.

What exactly is the race condition between read_inode() and
clear_inode() you have
mentioned?

Thanks,
Yoshi
-- 
--
Yoshihiro Tsuchiya



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2004-01-16  2:59 Tsuchiya Yoshihiro
@ 2004-01-16 12:29 ` Stephen C. Tweedie
  2004-01-19  7:52   ` Tsuchiya Yoshihiro
  0 siblings, 1 reply; 30+ messages in thread
From: Stephen C. Tweedie @ 2004-01-16 12:29 UTC (permalink / raw)
  To: tsuchiya; +Cc: linux-kernel, Stephen Tweedie

Hi,

On Fri, 2004-01-16 at 02:59, Tsuchiya Yoshihiro wrote:

> I tried with /bin/zsh, and it seems you are right. The script
> is working fine for about 2 hours.

Thank you for checking.

> So I will try to find out about EIO(inode corruption) problem next.

OK.  Under exactly what circumstances have you seen this in the past, as
opposed to the other problem?  I have not been able to reproduce this
one so far.

Cheers, 
 Stephen


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
@ 2004-01-16  2:59 Tsuchiya Yoshihiro
  2004-01-16 12:29 ` Stephen C. Tweedie
  0 siblings, 1 reply; 30+ messages in thread
From: Tsuchiya Yoshihiro @ 2004-01-16  2:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: Stephen C. Tweedie


Hi Stephen,

>Now, I can't tell from this whether it's a bash bug or an exit/signal
> bug, but it doesn't look like a filesystem problem for now. I'm going
> to try with a different shell to see if that helps.

I tried with /bin/zsh, and it seems you are right. The script
is working fine for about 2 hours.

So I will try to find out about EIO(inode corruption) problem next.

Thank you so much,

Yoshi

-- 
--
Yoshihiro Tsuchiya




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
       [not found]       ` <0586254E-46DA-11D8-B45E-00039341E01A@ybb.ne.jp>
@ 2004-01-15 22:38         ` Stephen C. Tweedie
  0 siblings, 0 replies; 30+ messages in thread
From: Stephen C. Tweedie @ 2004-01-15 22:38 UTC (permalink / raw)
  To: Tsuchiya Yoshihiro
  Cc: Marcelo Tosatti, tschiya, dlion2004, Stephen Tweedie, linux-kernel

Hi,

On Wed, 2004-01-14 at 21:38, Tsuchiya Yoshihiro wrote:

> I usually use two to seven boxes at a time, and I get about two problems
> out of them within about two nights.

I was able to reproduce with your script, too: even on ramfs.  Curiouser
and curiouser.

Al Viro was a help getting further, and we nailed the reason for the
failure.  I got a failure:

"something wrong (diff) happened at ae:619-th trial"

corresponding to the script code:

                mkdir $TARGETDIR/dirXC$i || echo "Error $? making $TARGETDIR/dirXC$i" > $RDIR/MD-ERR-$1 2>&1
                cd $TARGETDIR/dirXC$i > $RDIR/CD-ERR-$1 2>&1

                if [ -s $RDIR/CD-ERR-$1 ]
                then
                        echo "something wrong (cd) happened at $1:$i-th trial "
                        (df .; df -i .) > $RDIR/DF-$1
                        exit;
                fi

                tar zxf $MOZSRC >> $ERRORF

#                echo "test dir for $TARGETDIR" >> $INOFILE-$1
                ls -lid $SOURCE >> $INOFILE-$1

                diff -rq $TARGETDIR/$SOURCE $TARGETDIR/dirXC$i/$SOURCE > $RESULTS/dirXC$i.result 2>&1
                DIFFSIZE=`ls -l $RESULTS/dirXC$i.result | awk '{print $5}'`
                if [ $DIFFSIZE != 0 ];
                then
                        echo "something wrong (diff) happened at $1:$i-th trial "
                        (df .; df -i .) > $RDIR/DF-$1
                        exit;

and psacct was able to trace the order in which stuff happened; lastcomm showed:

tee                     guest    ??         5.55 secs Thu Jan 15 22:15
tar                     guest    ??         0.08 secs Thu Jan 15 22:24
gzip                    guest    ??         0.25 secs Thu Jan 15 22:24
xc-1.2                  guest    ??         0.02 secs Thu Jan 15 22:15
xc-1.2             F    guest    ??         6.37 secs Thu Jan 15 22:15
xc-1.2             F    guest    ??         0.00 secs Thu Jan 15 22:24
df                      guest    ??         0.00 secs Thu Jan 15 22:24
df                      guest    ??         0.01 secs Thu Jan 15 22:24
xc-1.2             F    guest    ??         0.01 secs Thu Jan 15 22:24
awk                     guest    ??         0.00 secs Thu Jan 15 22:24
ls                      guest    ??         0.01 secs Thu Jan 15 22:24
diff                    guest    ??         0.01 secs Thu Jan 15 22:24
rm                      guest    ??         0.02 secs Thu Jan 15 22:24
ls                      guest    ??         0.00 secs Thu Jan 15 22:24
mkdir                   guest    ??         0.01 secs Thu Jan 15 22:24

Reading from the bottom up: we get the "mkdir" and "ls -lid" of the new
directory, but not the tar; then the "rm &" of the previous iteration
completes; then there's the diff that failed, and the ls and awk from
that; then the two "df"s, then we exit. 

And *then*, after all that, the tar/gunzip finishes.  Remember, lastcomm
records the exit of each task, not its start.

So the problem seems to be that the shell is continuing beyond the "tar
xzf" before that command has finished, which is why we see ENOENT on
"cd" to the dir or on the diff.

The trace above was on the last running thread of the test.  All other
threads had completed, so there was no interleaving of test runs in the
psacct records.

Now, I can't tell from this whether it's a bash bug or an exit/signal
bug, but it doesn't look like a filesystem problem for now.  I'm going
to try with a different shell to see if that helps.

Cheers,
 Stephen


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-26 14:30 ` dlion
@ 2003-12-28  8:26   ` dlion
  0 siblings, 0 replies; 30+ messages in thread
From: dlion @ 2003-12-28  8:26 UTC (permalink / raw)
  To: lkml

Bxynj>> Hi,

 >>>I got other errors on ext3 filesystem include:
 >>>1. missing file
 >>>2. corrupted file
 >>>but when I used fsck.ext3 to check the ramdisk, the result was clean.

Bxynj>> Dlion,  how did the corrupted file look like?
Bxynj>> (its file size, number of blocks etc.)

d> 3. maybe all corrupted files' mtime is exactly the same
d> wrong value. Should be around 2003.12.26 21:30:00, but
d> is 2002.05.12 12:00:48(hex value is 0x3cdde8f0) . ctime
d> and atime is correct. The system's clock time is unchanged.

Sorry. I have made a mistake. The mtime is correct, not damaged.
It is set by tar.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
@ 2003-12-27 14:35 Tsuchiya Yoshihiro
       [not found] ` <Pine.LNX.4.58L.0312301556380.23875@logos.cnet>
  0 siblings, 1 reply; 30+ messages in thread
From: Tsuchiya Yoshihiro @ 2003-12-27 14:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: dlion2004


Hi,

 >1. some corrupted files is truncated to 0 bytes. Blockcount is 0.
 >
 >2. some corrupted files is truncated . the result is a shorter file.
 >the new size is multiple of block size.

I have seen these things before, though

 >3. maybe all corrupted files' mtime is exactly the same
 >wrong value. Should be around 2003.12.26 21:30:00, but
 >is 2002.05.12 12:00:48(hex value is 0x3cdde8f0) . ctime
 >and atime is correct. The system's clock time is unchanged.
 >
 >4. it seems that the corrupted files tends to exist in the same
 >directory.

I haven't been aware of these ones. Thank you.

Yoshi
---
Yoshihiro Tsuchiya 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
  2003-12-26 13:22 土屋芳浩
@ 2003-12-26 14:30 ` dlion
  2003-12-28  8:26   ` dlion
  0 siblings, 1 reply; 30+ messages in thread
From: dlion @ 2003-12-26 14:30 UTC (permalink / raw)
  To: lkml

Hello ,

Friday, December 26, 2003, 9:22:25 PM, you wrote:

Bxynj> Hi,

 >>I got other errors on ext3 filesystem include:
 >>1. missing file
 >>2. corrupted file
 >>but when I used fsck.ext3 to check the ramdisk, the result was clean.

Bxynj> Dlion,  how did the corrupted file look like?
Bxynj> (its file size, number of blocks etc.)

1. some corrupted files is truncated to 0 bytes. Blockcount is 0.

2. some corrupted files is truncated . the result is a shorter file.
the new size is multiple of block size.

3. maybe all corrupted files' mtime is exactly the same
wrong value. Should be around 2003.12.26 21:30:00, but
is 2002.05.12 12:00:48(hex value is 0x3cdde8f0) . ctime
and atime is correct. The system's clock time is unchanged.

4. it seems that the corrupted files tends to exist in the same
directory.

Use a 128000k bytes ramdisk you can get these results in less than
30 minutes. BTW, your test script is very good. Thank you.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: filesystem bug?
@ 2003-12-26 13:22 土屋芳浩
  2003-12-26 14:30 ` dlion
  0 siblings, 1 reply; 30+ messages in thread
From: 土屋芳浩 @ 2003-12-26 13:22 UTC (permalink / raw)
  To: dlion2004; +Cc: tsuchiya, linux-kernel

Hi,

 >I got other errors on ext3 filesystem include:
 >1. missing file
 >2. corrupted file
 >but when I used fsck.ext3 to check the ramdisk, the result was clean.

Dlion,  how did the corrupted file look like?
(its file size, number of blocks etc.)

Thanks,
Yoshi
--
Yoshihiro Tsuchiya


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2004-01-20 16:27 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-15  9:25 filesystem bug? Tsuchiya Yoshihiro
2003-12-15  9:55 ` bert hubert
2003-12-16 13:44 ` Stephen C. Tweedie
2003-12-16 21:40   ` Bryan Whitehead
2003-12-16 21:50     ` Bryan Whitehead
2003-12-16 23:31     ` Tsuchiya Yoshihiro
2003-12-16 23:40       ` viro
2003-12-17  0:12         ` Tsuchiya Yoshihiro
2003-12-17 23:24       ` Tsuchiya Yoshihiro
2003-12-18 21:29         ` Stephen C. Tweedie
2003-12-21 23:15           ` Tsuchiya Yoshihiro
2003-12-22  1:54             ` Tsuchiya Yoshihiro
2003-12-22  4:30             ` Tsuchiya Yoshihiro
2003-12-22 12:03               ` Stephen C. Tweedie
2003-12-24  1:48                 ` Tsuchiya Yoshihiro
2003-12-24 23:09                   ` Tsuchiya Yoshihiro
2004-01-15  6:38                     ` Tsuchiya Yoshihiro
2003-12-26  9:59 ` dlion
2003-12-26 12:27   ` dlion
2003-12-26 13:22 土屋芳浩
2003-12-26 14:30 ` dlion
2003-12-28  8:26   ` dlion
2003-12-27 14:35 Tsuchiya Yoshihiro
     [not found] ` <Pine.LNX.4.58L.0312301556380.23875@logos.cnet>
     [not found]   ` <74964CA8-3B50-11D8-B879-00039341E01A@ybb.ne.jp>
     [not found]     ` <1074109164.4538.8.camel@sisko.scot.redhat.com>
     [not found]       ` <0586254E-46DA-11D8-B45E-00039341E01A@ybb.ne.jp>
2004-01-15 22:38         ` Stephen C. Tweedie
2004-01-16  2:59 Tsuchiya Yoshihiro
2004-01-16 12:29 ` Stephen C. Tweedie
2004-01-19  7:52   ` Tsuchiya Yoshihiro
2004-01-19 13:12     ` Stephen C. Tweedie
2004-01-20  8:36       ` Tsuchiya Yoshihiro
2004-01-20 16:27         ` Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).