linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: Unknown Issue.
@ 2005-01-06 13:33 hard__ware
  0 siblings, 0 replies; 12+ messages in thread
From: hard__ware @ 2005-01-06 13:33 UTC (permalink / raw)
  To: linux-kernel

Im having the same issue on a 2.6.9 kernel ...

Dell PowerEdge 2650 /w Adaptec AACRaid 
In hardware Raid 0 (stripe array)

am finding that some of the earlier 2.6  kernels were
a bit more stable ... 

only seem to get input/output & bus errors when
high load is present on the system ... e.g. Backup ... 


Thanx ...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Unknown Issue.
@ 2004-12-13 18:57 Piszcz, Justin Michael
  0 siblings, 0 replies; 12+ messages in thread
From: Piszcz, Justin Michael @ 2004-12-13 18:57 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Patrick, linux-kernel, linux-xfs, Andrew Morton,
	Kristofer T. Karas, Jeff Garzik, Linus Torvalds

> Ok, so XVM has found something wrong at this point.  Any chance the
box 
> had a power failure?  Write caches on ide drives can wreak havoc with 
> journaling filesystems...  i.e. what happened between "the filesystem 
> was working" and "i remounted the filesystem and got this"

For main system: To make a long story short, I was attempting to hook up
a cd burner and dvd reader to SATA via SATA<->PATA adapters and enable
SATA in the kernel for the Intel ICH5 chipset and I was trying different
drivers/options in an attempt to get them to work.  However, please note
during the entire time, the disk that suffered FS corruption was always
hooked to a Ultra ATA/133 Promise Controller.  I believe I had a kernel
panic once and at another time during either loading SATA drivers or IDE
drivers I had a lockup somewhere along the lines and I rebooted
improperly.

For Dell GX1 system: No, all I did was upgrade the kernel [2.6.9 ->
2.6.10-rc2] and reboot, no power outages or crashes at all.  After about
an hour or so, I began to experience these problems.


-----Original Message-----
From: Eric Sandeen [mailto:sandeen@sgi.com] 
Sent: Monday, December 13, 2004 12:50 PM
To: Piszcz, Justin Michael
Cc: Patrick; linux-kernel@vger.kernel.org; linux-xfs@oss.sgi.com; Andrew
Morton; Kristofer T. Karas; Jeff Garzik; Linus Torvalds
Subject: Re: Unknown Issue.

Piszcz, Justin Michael wrote:

> Ah, good question, yes I used xfs_repair, at this point I knew I had
to
> restore from backup and answered "y" to all questions.  I am not sure
> but I do not recall the log being dirty.

Hm, xfs_repair does not ask any questions.

> In the logs on my main machine, it showed the following when it
> attempted to mount the two filesystems (root and boot, /dev/hde4 and
> /dev/hde1 respectively).

> Dec  5 08:23:53 jpiszcz kernel: XFS internal error
> XFS_WANT_CORRUPTED_GOTO at line 1583 of file fs/xfs/xfs_alloc.c.
Caller
> 0xc021de57
(having trouble replaying the log here)

Ok, so XVM has found something wrong at this point.  Any chance the box 
had a power failure?  Write caches on ide drives can wreak havoc with 
journaling filesystems...  i.e. what happened between "the filesystem 
was working" and "i remounted the filesystem and got this"

>
> As far as bad disk/memory, I have tested both systems with memtest86
and
> the result was 0 errors, as far as the disks go, I have not
experienced
> any problems with either of them until I moved to
2.6.9/2.6.10-rc{1,2}.

ok

-Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unknown Issue.
  2004-12-13 17:50 ` Eric Sandeen
@ 2004-12-13 17:56   ` Eric Sandeen
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Sandeen @ 2004-12-13 17:56 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Piszcz, Justin Michael, Patrick, linux-kernel, linux-xfs,
	Kristofer T. Karas

Eric Sandeen wrote:

>> Dec  5 08:23:53 jpiszcz kernel: XFS internal error
>> XFS_WANT_CORRUPTED_GOTO at line 1583 of file fs/xfs/xfs_alloc.c.  Caller
>> 0xc021de57
> 
> (having trouble replaying the log here)
> 
> Ok, so XVM has found something wrong at this point.  

urk, make that "XFS has found..." of course :)

-Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unknown Issue.
  2004-12-13 17:14 Piszcz, Justin Michael
  2004-12-13 17:17 ` Patrick
@ 2004-12-13 17:50 ` Eric Sandeen
  2004-12-13 17:56   ` Eric Sandeen
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Sandeen @ 2004-12-13 17:50 UTC (permalink / raw)
  To: Piszcz, Justin Michael
  Cc: Patrick, linux-kernel, linux-xfs, Andrew Morton,
	Kristofer T. Karas, Jeff Garzik, Linus Torvalds

Piszcz, Justin Michael wrote:

> Ah, good question, yes I used xfs_repair, at this point I knew I had to
> restore from backup and answered "y" to all questions.  I am not sure
> but I do not recall the log being dirty.

Hm, xfs_repair does not ask any questions.

> In the logs on my main machine, it showed the following when it
> attempted to mount the two filesystems (root and boot, /dev/hde4 and
> /dev/hde1 respectively).

> Dec  5 08:23:53 jpiszcz kernel: XFS internal error
> XFS_WANT_CORRUPTED_GOTO at line 1583 of file fs/xfs/xfs_alloc.c.  Caller
> 0xc021de57
(having trouble replaying the log here)

Ok, so XVM has found something wrong at this point.  Any chance the box 
had a power failure?  Write caches on ide drives can wreak havoc with 
journaling filesystems...  i.e. what happened between "the filesystem 
was working" and "i remounted the filesystem and got this"

>
> As far as bad disk/memory, I have tested both systems with memtest86 and
> the result was 0 errors, as far as the disks go, I have not experienced
> any problems with either of them until I moved to 2.6.9/2.6.10-rc{1,2}.

ok

-Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unknown Issue.
  2004-12-13 17:20 Piszcz, Justin Michael
@ 2004-12-13 17:27 ` Patrick
  0 siblings, 0 replies; 12+ messages in thread
From: Patrick @ 2004-12-13 17:27 UTC (permalink / raw)
  To: Piszcz, Justin Michael
  Cc: Eric Sandeen, linux-kernel, linux-xfs, Andrew Morton,
	Kristofer T. Karas, Jeff Garzik, Linus Torvalds

Hi, 

> So your problem was only temporary?

No, it happened randomly though, and all the time. Generally within an hour. 

> After I began having the problem, I was trying to edit some files and
> then I got the same errors as you, ie: /usr/bin/vi Input/Ouput error,
> and then I tried to run or edit different programs and files and nothing
> was working.
> 
> Were you also forced to re-install, or does this only happen sometimes?

I moved to freebsd as i require the box to actually work, which it
seems to be doing at the moment, even after a bit-o-nailing, but that
still doesn't solve the problem.

P

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Unknown Issue.
@ 2004-12-13 17:20 Piszcz, Justin Michael
  2004-12-13 17:27 ` Patrick
  0 siblings, 1 reply; 12+ messages in thread
From: Piszcz, Justin Michael @ 2004-12-13 17:20 UTC (permalink / raw)
  To: Patrick, Eric Sandeen
  Cc: linux-kernel, linux-xfs, Andrew Morton, Kristofer T. Karas,
	Jeff Garzik, Linus Torvalds

So your problem was only temporary?

Or?

After I began having the problem, I was trying to edit some files and
then I got the same errors as you, ie: /usr/bin/vi Input/Ouput error,
and then I tried to run or edit different programs and files and nothing
was working.  

Were you also forced to re-install, or does this only happen sometimes?

-----Original Message-----
From: Patrick [mailto:nawtyness@gmail.com] 
Sent: Monday, December 13, 2004 12:14 PM
To: Eric Sandeen
Cc: Piszcz, Justin Michael; linux-kernel@vger.kernel.org;
linux-xfs@oss.sgi.com; Andrew Morton; Kristofer T. Karas; Jeff Garzik;
Linus Torvalds
Subject: Re: Unknown Issue.

Hi, 

> Patrick, can you reproduce on a non-gentoo kernel?  That'd be the
first
> step for this audience.

I've not tried to reproduce it on a non-gentoo kernel as the original
one that i had the problem was a vanilla kernel ;) ( as i know your
fondness of gentoo's patch-o-lotic )

I've been abusing the box the entire day with FreeBSD, the same mysql
config and version of the mysqld as well as the same operations ( and
some more ... serious ones ( e.g. forkbomb, iozone, etc. ) and no
problem's.

There were no messages in the log, and nothing in kmesg. Anything else
i could try ? Also, as far as i know i was running kernel 2.6.10_rc3
and i'd reinstalled the box twice with new XFS filesystems both times.

P

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unknown Issue.
  2004-12-13 17:14 Piszcz, Justin Michael
@ 2004-12-13 17:17 ` Patrick
  2004-12-13 17:50 ` Eric Sandeen
  1 sibling, 0 replies; 12+ messages in thread
From: Patrick @ 2004-12-13 17:17 UTC (permalink / raw)
  To: Piszcz, Justin Michael
  Cc: Eric Sandeen, linux-kernel, linux-xfs, Andrew Morton,
	Kristofer T. Karas, Jeff Garzik, Linus Torvalds

Hi, 

> Yes, there was nothing relevant on either machine.

Same here.

> Dec  5 08:23:53 jpiszcz kernel: XFS internal error
> XFS_WANT_CORRUPTED_GOTO at line 1583 of file fs/xfs/xfs_alloc.c.  Caller
> 0xc021de57
> Dec  5 08:23:53 jpiszcz kernel:  [xfs_free_ag_extent+1237/2065]
> xfs_free_ag_extent+0x4d5/0x811
> Dec  5 08:23:53 jpiszcz kernel:  [xfs_free_extent+207/242]
> xfs_free_extent+0xcf/0xf2
> Dec  5 08:23:53 jpiszcz kernel:  [xlog_grant_push_ail+279/400]
> xlog_grant_push_ail+0x117/0x190
> Dec  5 08:23:53 jpiszcz kernel:  [xfs_free_extent+207/242]
> xfs_free_extent+0xcf/0xf2
> Dec  5 08:23:53 jpiszcz kernel:  [xfs_trans_get_efd+56/70]
> xfs_trans_get_efd+0x38/0x46
> Dec  5 08:23:53 jpiszcz kernel:  [xlog_recover_process_efi+402/508]
> xlog_recover_process_efi+0x192/0x1fc
> Dec  5 08:23:53 jpiszcz kernel:  [xlog_recover_process_efis+77/129]
> xlog_recover_process_efis+0x4d/0x81
> Dec  5 08:23:53 jpiszcz kernel:  [xlog_recover_finish+26/194]
> xlog_recover_finish+0x1a/0xc2
> Dec  5 08:23:53 jpiszcz kernel:  [xfs_rtmount_inodes+193/230]
> xfs_rtmount_inodes+0xc1/0xe6
> Dec  5 08:23:53 jpiszcz kernel:  [xfs_log_mount_finish+44/48]
> xfs_log_mount_finish+0x2c/0x30
> Dec  5 08:23:53 jpiszcz kernel:  [xfs_mountfs+2459/3995]
> xfs_mountfs+0x99b/0xf9b
> Dec  5 08:23:53 jpiszcz kernel:  [pagebuf_iostart+143/159]
> pagebuf_iostart+0x8f/0x9f
> Dec  5 08:23:53 jpiszcz kernel:  [atomic_dec_and_lock+39/68]
> atomic_dec_and_lock+0x27/0x44
> Dec  5 08:23:53 jpiszcz kernel:  [xfs_readsb+417/559]
> xfs_readsb+0x1a1/0x22f
> Dec  5 08:23:53 jpiszcz kernel:  [xfs_ioinit+27/46] xfs_ioinit+0x1b/0x2e
> Dec  5 08:23:53 jpiszcz kernel:  [xfs_mount+934/1646]
> xfs_mount+0x3a6/0x66e
> Dec  5 08:23:53 jpiszcz kernel:  [linvfs_fill_super+155/486]
> linvfs_fill_super+0x9b/0x1e6
> Dec  5 08:23:53 jpiszcz kernel:  [snprintf+39/43] snprintf+0x27/0x2b
> Dec  5 08:23:53 jpiszcz kernel:  [disk_name+98/191] disk_name+0x62/0xbf
> Dec  5 08:23:53 jpiszcz kernel:  [sb_set_blocksize+46/94]
> sb_set_blocksize+0x2e/0x5e
> Dec  5 08:23:53 jpiszcz kernel:  [get_sb_bdev+258/342]
> get_sb_bdev+0x102/0x156
> Dec  5 08:23:53 jpiszcz kernel:  [alloc_vfsmnt+156/215]
> alloc_vfsmnt+0x9c/0xd7
> Dec  5 08:23:53 jpiszcz kernel:  [linvfs_get_sb+47/51]
> linvfs_get_sb+0x2f/0x33
> Dec  5 08:23:53 jpiszcz kernel:  [linvfs_fill_super+0/486]
> linvfs_fill_super+0x0/0x1e6
> Dec  5 08:23:53 jpiszcz kernel:  [do_kern_mount+99/235]
> do_kern_mount+0x63/0xeb
> Dec  5 08:23:53 jpiszcz kernel:  [do_new_mount+158/247]
> do_new_mount+0x9e/0xf7
> Dec  5 08:23:53 jpiszcz kernel:  [do_mount+413/443] do_mount+0x19d/0x1bb
> Dec  5 08:23:53 jpiszcz kernel:  [copy_mount_options+96/183]
> copy_mount_options+0x60/0xb7
> Dec  5 08:23:53 jpiszcz kernel:  [sys_mount+191/291]
> sys_mount+0xbf/0x123
> Dec  5 08:23:53 jpiszcz kernel:  [do_mount_root+47/158]
> do_mount_root+0x2f/0x9e
> Dec  5 08:23:53 jpiszcz kernel:  [mount_block_root+96/305]
> mount_block_root+0x60/0x131
> Dec  5 08:23:53 jpiszcz kernel:  [mount_root+101/135]
> mount_root+0x65/0x87
> Dec  5 08:23:53 jpiszcz kernel:  [prepare_namespace+25/178]
> prepare_namespace+0x19/0xb2
> Dec  5 08:23:53 jpiszcz kernel:  [flush_workqueue+136/180]
> flush_workqueue+0x88/0xb4
> Dec  5 08:23:53 jpiszcz kernel:  [init+427/475] init+0x1ab/0x1db
> Dec  5 08:23:53 jpiszcz kernel:  [init+0/475] init+0x0/0x1db
> Dec  5 08:23:53 jpiszcz kernel:  [kernel_thread_helper+5/11]
> kernel_thread_helper+0x5/0xb
> Dec  5 08:23:53 jpiszcz kernel: VFS: Mounted root (xfs filesystem)
> readonly.

Ok, well i couldn't pinpoint it at FS and looked like hardware to me,
i suppose i could redo the box with 2.6.10 and XFS again to see if i
can redo the problem, although i'm partially leaning towards hardware,
but that's the easiest thing to blame :)

I figure i'm going to try out another FS, maby reiser, that should
either do the same, or not, if not, then we know where to start ?

P

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Unknown Issue.
@ 2004-12-13 17:14 Piszcz, Justin Michael
  2004-12-13 17:17 ` Patrick
  2004-12-13 17:50 ` Eric Sandeen
  0 siblings, 2 replies; 12+ messages in thread
From: Piszcz, Justin Michael @ 2004-12-13 17:14 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Patrick, linux-kernel, linux-xfs, Andrew Morton,
	Kristofer T. Karas, Jeff Garzik, Linus Torvalds

> My first thought is that perhaps the filesystem has shut down due to 
> some error (memory corruption, bad disk, xfs bug...); did you check
your 
> log messages?

Yes, there was nothing relevant on either machine.

> Justin, when you mentioned that you used xfs' fsck, I guess you used 
> xfs_repair.  Was the log clean when you ran it, or did you force
repair 
> to zero out the log?  That could explain the large lost+found/ when
you 
> were done...

Ah, good question, yes I used xfs_repair, at this point I knew I had to
restore from backup and answered "y" to all questions.  I am not sure
but I do not recall the log being dirty.

In the logs on my main machine, it showed the following when it
attempted to mount the two filesystems (root and boot, /dev/hde4 and
/dev/hde1 respectively).

As far as bad disk/memory, I have tested both systems with memtest86 and
the result was 0 errors, as far as the disks go, I have not experienced
any problems with either of them until I moved to 2.6.9/2.6.10-rc{1,2}.


Justin.

Dec  5 08:23:53 jpiszcz kernel: XFS internal error
XFS_WANT_CORRUPTED_GOTO at line 1583 of file fs/xfs/xfs_alloc.c.  Caller
0xc021de57
Dec  5 08:23:53 jpiszcz kernel:  [xfs_free_ag_extent+1237/2065]
xfs_free_ag_extent+0x4d5/0x811
Dec  5 08:23:53 jpiszcz kernel:  [xfs_free_extent+207/242]
xfs_free_extent+0xcf/0xf2
Dec  5 08:23:53 jpiszcz kernel:  [xlog_grant_push_ail+279/400]
xlog_grant_push_ail+0x117/0x190
Dec  5 08:23:53 jpiszcz kernel:  [xfs_free_extent+207/242]
xfs_free_extent+0xcf/0xf2
Dec  5 08:23:53 jpiszcz kernel:  [xfs_trans_get_efd+56/70]
xfs_trans_get_efd+0x38/0x46
Dec  5 08:23:53 jpiszcz kernel:  [xlog_recover_process_efi+402/508]
xlog_recover_process_efi+0x192/0x1fc
Dec  5 08:23:53 jpiszcz kernel:  [xlog_recover_process_efis+77/129]
xlog_recover_process_efis+0x4d/0x81
Dec  5 08:23:53 jpiszcz kernel:  [xlog_recover_finish+26/194]
xlog_recover_finish+0x1a/0xc2
Dec  5 08:23:53 jpiszcz kernel:  [xfs_rtmount_inodes+193/230]
xfs_rtmount_inodes+0xc1/0xe6
Dec  5 08:23:53 jpiszcz kernel:  [xfs_log_mount_finish+44/48]
xfs_log_mount_finish+0x2c/0x30
Dec  5 08:23:53 jpiszcz kernel:  [xfs_mountfs+2459/3995]
xfs_mountfs+0x99b/0xf9b
Dec  5 08:23:53 jpiszcz kernel:  [pagebuf_iostart+143/159]
pagebuf_iostart+0x8f/0x9f
Dec  5 08:23:53 jpiszcz kernel:  [atomic_dec_and_lock+39/68]
atomic_dec_and_lock+0x27/0x44
Dec  5 08:23:53 jpiszcz kernel:  [xfs_readsb+417/559]
xfs_readsb+0x1a1/0x22f
Dec  5 08:23:53 jpiszcz kernel:  [xfs_ioinit+27/46] xfs_ioinit+0x1b/0x2e
Dec  5 08:23:53 jpiszcz kernel:  [xfs_mount+934/1646]
xfs_mount+0x3a6/0x66e
Dec  5 08:23:53 jpiszcz kernel:  [linvfs_fill_super+155/486]
linvfs_fill_super+0x9b/0x1e6
Dec  5 08:23:53 jpiszcz kernel:  [snprintf+39/43] snprintf+0x27/0x2b
Dec  5 08:23:53 jpiszcz kernel:  [disk_name+98/191] disk_name+0x62/0xbf
Dec  5 08:23:53 jpiszcz kernel:  [sb_set_blocksize+46/94]
sb_set_blocksize+0x2e/0x5e
Dec  5 08:23:53 jpiszcz kernel:  [get_sb_bdev+258/342]
get_sb_bdev+0x102/0x156
Dec  5 08:23:53 jpiszcz kernel:  [alloc_vfsmnt+156/215]
alloc_vfsmnt+0x9c/0xd7
Dec  5 08:23:53 jpiszcz kernel:  [linvfs_get_sb+47/51]
linvfs_get_sb+0x2f/0x33
Dec  5 08:23:53 jpiszcz kernel:  [linvfs_fill_super+0/486]
linvfs_fill_super+0x0/0x1e6
Dec  5 08:23:53 jpiszcz kernel:  [do_kern_mount+99/235]
do_kern_mount+0x63/0xeb
Dec  5 08:23:53 jpiszcz kernel:  [do_new_mount+158/247]
do_new_mount+0x9e/0xf7
Dec  5 08:23:53 jpiszcz kernel:  [do_mount+413/443] do_mount+0x19d/0x1bb
Dec  5 08:23:53 jpiszcz kernel:  [copy_mount_options+96/183]
copy_mount_options+0x60/0xb7
Dec  5 08:23:53 jpiszcz kernel:  [sys_mount+191/291]
sys_mount+0xbf/0x123
Dec  5 08:23:53 jpiszcz kernel:  [do_mount_root+47/158]
do_mount_root+0x2f/0x9e
Dec  5 08:23:53 jpiszcz kernel:  [mount_block_root+96/305]
mount_block_root+0x60/0x131
Dec  5 08:23:53 jpiszcz kernel:  [mount_root+101/135]
mount_root+0x65/0x87
Dec  5 08:23:53 jpiszcz kernel:  [prepare_namespace+25/178]
prepare_namespace+0x19/0xb2
Dec  5 08:23:53 jpiszcz kernel:  [flush_workqueue+136/180]
flush_workqueue+0x88/0xb4
Dec  5 08:23:53 jpiszcz kernel:  [init+427/475] init+0x1ab/0x1db
Dec  5 08:23:53 jpiszcz kernel:  [init+0/475] init+0x0/0x1db
Dec  5 08:23:53 jpiszcz kernel:  [kernel_thread_helper+5/11]
kernel_thread_helper+0x5/0xb
Dec  5 08:23:53 jpiszcz kernel: VFS: Mounted root (xfs filesystem)
readonly.

-----Original Message-----
From: Eric Sandeen [mailto:sandeen@sgi.com] 
Sent: Monday, December 13, 2004 12:04 PM
To: Piszcz, Justin Michael
Cc: Patrick; linux-kernel@vger.kernel.org; linux-xfs@oss.sgi.com; Andrew
Morton; Kristofer T. Karas; Jeff Garzik; Linus Torvalds
Subject: Re: Unknown Issue.

My first thought is that perhaps the filesystem has shut down due to 
some error (memory corruption, bad disk, xfs bug...); did you check your

log messages?

Justin, when you mentioned that you used xfs' fsck, I guess you used 
xfs_repair.  Was the log clean when you ran it, or did you force repair 
to zero out the log?  That could explain the large lost+found/ when you 
were done...

Patrick, can you reproduce on a non-gentoo kernel?  That'd be the first 
step for this audience.

-Eric

Piszcz, Justin Michael wrote:
> Patrick,
> 
> I had the same problem on two machines with XFS.  Both
slackware-current
> machines.  The kernel on the Dell GX1 was built with GCC-3.4.2 and on
my
> main box was GCC-3.4.3.
> 
> There seems to be a bug in XFS with some configurations of 2.6.9 and
> 2.6.10-rc series.
> 
> After re-installing Slackware-10.0 and upgrading to -current, I have
> installed 2.6.10-rc3 and so far, I have not been able to reproduce the
> problem.
> 
> Some questions for you:
> 
> 1] What kernel are you running?
> 2] What did you last change before you started getting these errors?
> 
> As far as severity goes, I ran XFS' fsck from a KNOPPIX CD and as a
> result, I had about 500-600mb of files in my /lost+found directory
when
> it was finished.  Files were missing from all parts of the file
system.
> I had to restore from backup.  I would say stick with your previous
> 2.6.9 configuration (if you were running it) or go back to 2.6.8.1,
some
> 2.6.9 configurations and 2.6.10-rc1 and/or 2.6.10-rc2 definitely cause
> file corruption with XFS.  So far, however, I have not been able to
> reproduce the error with 2.6.10-rc3.
> 
> Justin.
> 
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Patrick
> Sent: Sunday, December 12, 2004 4:15 PM
> To: linux-kernel@vger.kernel.org
> Subject: Unknown Issue.
> 
> Hi, 
> 
> I've got a computer running gentoo, on a clean install where i've got
> an odd problem :
> 
> after a while, the computer refuses to spawn processes anymore : 
> 
> -/bin/bash: /bin/ps: Input/output error
> -/bin/bash: /usr/bin/w: Input/output error
> -/bin/bash: /bin/df: Input/output error
> -/bin/bash: /bin/mount: Input/output error
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unknown Issue.
  2004-12-13 17:04 ` Eric Sandeen
@ 2004-12-13 17:13   ` Patrick
  0 siblings, 0 replies; 12+ messages in thread
From: Patrick @ 2004-12-13 17:13 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Piszcz, Justin Michael, linux-kernel, linux-xfs, Andrew Morton,
	Kristofer T. Karas, Jeff Garzik, Linus Torvalds

Hi, 

> Patrick, can you reproduce on a non-gentoo kernel?  That'd be the first
> step for this audience.

I've not tried to reproduce it on a non-gentoo kernel as the original
one that i had the problem was a vanilla kernel ;) ( as i know your
fondness of gentoo's patch-o-lotic )

I've been abusing the box the entire day with FreeBSD, the same mysql
config and version of the mysqld as well as the same operations ( and
some more ... serious ones ( e.g. forkbomb, iozone, etc. ) and no
problem's.

There were no messages in the log, and nothing in kmesg. Anything else
i could try ? Also, as far as i know i was running kernel 2.6.10_rc3
and i'd reinstalled the box twice with new XFS filesystems both times.

P

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unknown Issue.
  2004-12-13 13:57 Piszcz, Justin Michael
@ 2004-12-13 17:04 ` Eric Sandeen
  2004-12-13 17:13   ` Patrick
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Sandeen @ 2004-12-13 17:04 UTC (permalink / raw)
  To: Piszcz, Justin Michael
  Cc: Patrick, linux-kernel, linux-xfs, Andrew Morton,
	Kristofer T. Karas, Jeff Garzik, Linus Torvalds

My first thought is that perhaps the filesystem has shut down due to 
some error (memory corruption, bad disk, xfs bug...); did you check your 
log messages?

Justin, when you mentioned that you used xfs' fsck, I guess you used 
xfs_repair.  Was the log clean when you ran it, or did you force repair 
to zero out the log?  That could explain the large lost+found/ when you 
were done...

Patrick, can you reproduce on a non-gentoo kernel?  That'd be the first 
step for this audience.

-Eric

Piszcz, Justin Michael wrote:
> Patrick,
> 
> I had the same problem on two machines with XFS.  Both slackware-current
> machines.  The kernel on the Dell GX1 was built with GCC-3.4.2 and on my
> main box was GCC-3.4.3.
> 
> There seems to be a bug in XFS with some configurations of 2.6.9 and
> 2.6.10-rc series.
> 
> After re-installing Slackware-10.0 and upgrading to -current, I have
> installed 2.6.10-rc3 and so far, I have not been able to reproduce the
> problem.
> 
> Some questions for you:
> 
> 1] What kernel are you running?
> 2] What did you last change before you started getting these errors?
> 
> As far as severity goes, I ran XFS' fsck from a KNOPPIX CD and as a
> result, I had about 500-600mb of files in my /lost+found directory when
> it was finished.  Files were missing from all parts of the file system.
> I had to restore from backup.  I would say stick with your previous
> 2.6.9 configuration (if you were running it) or go back to 2.6.8.1, some
> 2.6.9 configurations and 2.6.10-rc1 and/or 2.6.10-rc2 definitely cause
> file corruption with XFS.  So far, however, I have not been able to
> reproduce the error with 2.6.10-rc3.
> 
> Justin.
> 
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Patrick
> Sent: Sunday, December 12, 2004 4:15 PM
> To: linux-kernel@vger.kernel.org
> Subject: Unknown Issue.
> 
> Hi, 
> 
> I've got a computer running gentoo, on a clean install where i've got
> an odd problem :
> 
> after a while, the computer refuses to spawn processes anymore : 
> 
> -/bin/bash: /bin/ps: Input/output error
> -/bin/bash: /usr/bin/w: Input/output error
> -/bin/bash: /bin/df: Input/output error
> -/bin/bash: /bin/mount: Input/output error
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Unknown Issue.
@ 2004-12-13 13:57 Piszcz, Justin Michael
  2004-12-13 17:04 ` Eric Sandeen
  0 siblings, 1 reply; 12+ messages in thread
From: Piszcz, Justin Michael @ 2004-12-13 13:57 UTC (permalink / raw)
  To: Patrick, linux-kernel, linux-xfs
  Cc: Andrew Morton, Kristofer T. Karas, Jeff Garzik, Linus Torvalds

Patrick,

I had the same problem on two machines with XFS.  Both slackware-current
machines.  The kernel on the Dell GX1 was built with GCC-3.4.2 and on my
main box was GCC-3.4.3.

There seems to be a bug in XFS with some configurations of 2.6.9 and
2.6.10-rc series.

After re-installing Slackware-10.0 and upgrading to -current, I have
installed 2.6.10-rc3 and so far, I have not been able to reproduce the
problem.

Some questions for you:

1] What kernel are you running?
2] What did you last change before you started getting these errors?

As far as severity goes, I ran XFS' fsck from a KNOPPIX CD and as a
result, I had about 500-600mb of files in my /lost+found directory when
it was finished.  Files were missing from all parts of the file system.
I had to restore from backup.  I would say stick with your previous
2.6.9 configuration (if you were running it) or go back to 2.6.8.1, some
2.6.9 configurations and 2.6.10-rc1 and/or 2.6.10-rc2 definitely cause
file corruption with XFS.  So far, however, I have not been able to
reproduce the error with 2.6.10-rc3.

Justin.

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Patrick
Sent: Sunday, December 12, 2004 4:15 PM
To: linux-kernel@vger.kernel.org
Subject: Unknown Issue.

Hi, 

I've got a computer running gentoo, on a clean install where i've got
an odd problem :

after a while, the computer refuses to spawn processes anymore : 

-/bin/bash: /bin/ps: Input/output error
-/bin/bash: /usr/bin/w: Input/output error
-/bin/bash: /bin/df: Input/output error
-/bin/bash: /bin/mount: Input/output error

It happen's randomly, i've tried everything from changing the computer
from running software raid ( scsi ) to running a hardware solution and
reinstalling, I've run the memory through memtest as well as i've
remounted the drives and i've tested the ram to make sure it was
properly mounted.

The only thing running on this box is mysql, which runs perfectly at
7500 q/s ( running super smack ) now, i'm not sure if this is a linux
kernel thing, or a gentoo thing, or a hardware thing.

I've checked and i'm not running out of file descriptors ( by looking
in /proc/sys/fs/file-nr ) and i've increased the ammount in (
/proc/sys/fs/file-max ( if i member correctly ) ) by adding a 0 after
the end of the value thus increasing it alot.

It's running XFS on the root partition with a single partition, dual
xeon 2.66 with hyperthreading enabled, dual intel gbe and a adaptec
2120S AACraid card. Dual 36gb 10krpm scsi drives in raid1.

Does anyone have any ideas on what i can do, what i can test, if it's
hardware ? software ?

guys ? 

P

-- 
</N>

------
In the beginning, there was nothing. And God said, 'Let there be
Light.' And there was still nothing, but you could see a bit better.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Unknown Issue.
@ 2004-12-12 21:14 Patrick
  0 siblings, 0 replies; 12+ messages in thread
From: Patrick @ 2004-12-12 21:14 UTC (permalink / raw)
  To: linux-kernel

Hi, 

I've got a computer running gentoo, on a clean install where i've got
an odd problem :

after a while, the computer refuses to spawn processes anymore : 

-/bin/bash: /bin/ps: Input/output error
-/bin/bash: /usr/bin/w: Input/output error
-/bin/bash: /bin/df: Input/output error
-/bin/bash: /bin/mount: Input/output error

It happen's randomly, i've tried everything from changing the computer
from running software raid ( scsi ) to running a hardware solution and
reinstalling, I've run the memory through memtest as well as i've
remounted the drives and i've tested the ram to make sure it was
properly mounted.

The only thing running on this box is mysql, which runs perfectly at
7500 q/s ( running super smack ) now, i'm not sure if this is a linux
kernel thing, or a gentoo thing, or a hardware thing.

I've checked and i'm not running out of file descriptors ( by looking
in /proc/sys/fs/file-nr ) and i've increased the ammount in (
/proc/sys/fs/file-max ( if i member correctly ) ) by adding a 0 after
the end of the value thus increasing it alot.

It's running XFS on the root partition with a single partition, dual
xeon 2.66 with hyperthreading enabled, dual intel gbe and a adaptec
2120S AACraid card. Dual 36gb 10krpm scsi drives in raid1.

Does anyone have any ideas on what i can do, what i can test, if it's
hardware ? software ?

guys ? 

P

-- 
</N>

------
In the beginning, there was nothing. And God said, 'Let there be
Light.' And there was still nothing, but you could see a bit better.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-01-06 13:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-06 13:33 Unknown Issue hard__ware
  -- strict thread matches above, loose matches on Subject: below --
2004-12-13 18:57 Piszcz, Justin Michael
2004-12-13 17:20 Piszcz, Justin Michael
2004-12-13 17:27 ` Patrick
2004-12-13 17:14 Piszcz, Justin Michael
2004-12-13 17:17 ` Patrick
2004-12-13 17:50 ` Eric Sandeen
2004-12-13 17:56   ` Eric Sandeen
2004-12-13 13:57 Piszcz, Justin Michael
2004-12-13 17:04 ` Eric Sandeen
2004-12-13 17:13   ` Patrick
2004-12-12 21:14 Patrick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).