linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.22-pre3 and reiserfs problem (not boot)
@ 2003-07-11 14:08 "Peter Lojkin" 
  2003-07-11 14:29 ` Oleg Drokin
  0 siblings, 1 reply; 5+ messages in thread
From: "Peter Lojkin"  @ 2003-07-11 14:08 UTC (permalink / raw)
  To: linux-kernel

Hello,

I am not on the list so please CC me if replying...

After few hours of work with 2.4.22-pre3 (patched to solve mount problem) we got this (ksyms was unavailable):

Jul 10 06:25:41 host kernel: kernel BUG at prints.c:334!
Jul 10 06:25:41 host kernel: invalid operand: 0000
Jul 10 06:25:41 host kernel: CPU:    1
Jul 10 06:25:41 host kernel: EIP:    0010:[reiserfs_panic+41/96]    Not tainted
Jul 10 06:25:41 host kernel: EFLAGS: 00010286
Jul 10 06:25:41 host kernel: eax: 00000024   ebx: c02da700   ecx: 00000097   edx: 01000000
Jul 10 06:25:41 host kernel: esi: f7e57000   edi: 00000000   ebp: f7e57000   esp: f7ed7ecc
Jul 10 06:25:41 host kernel: ds: 0018   es: 0018   ss: 0018
Jul 10 06:25:41 host kernel: Process kupdated (pid: 7, stackpage=f7ed7000)
Jul 10 06:25:41 host kernel: Stack: c02d89fa c03bf920 c02da700 f7ed7ef0 f8b10110 00000073 c01bb1df f7e57000
Jul 10 06:25:41 host kernel:        c02da700 00000001 00000012 00000010 00000000 f8b10144 f8b10138 00000074
Jul 10 06:25:41 host kernel:        00000000 00000002 eea13500 c01beacb f7e57000 f8b10110 00000001 f7ed7f8c
Jul 10 06:25:41 host kernel: Call Trace:    [flush_commit_list+675/920] [do_journal_end+1955/2668] [flush_old_commits+286/308] [reiserfs_write_super+56/104] [sync_supers+250/340]
Jul 10 06:25:41 host kernel: Code: 0f 0b 4e 01 00 8a 2d c0 68 20 f9 3b c0 85 f6 74 16 0f b7 46
Using defaults from ksymoops -t elf32-i386 -a i386


>>ebx; c02da700 <tails+ea4/1ff4>
>>edx; 01000000 Before first symbol
>>esi; f7e57000 <END_OF_CODE+37a5e204/????>
>>ebp; f7e57000 <END_OF_CODE+37a5e204/????>
>>esp; f7ed7ecc <END_OF_CODE+37adf0d0/????>

Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   0f 0b                     ud2a
Code;  00000002 Before first symbol
   2:   4e                        dec    %esi
Code;  00000003 Before first symbol
   3:   01 00                     add    %eax,(%eax)
Code;  00000005 Before first symbol
   5:   8a 2d c0 68 20 f9         mov    0xf92068c0,%ch
Code;  0000000b Before first symbol
   b:   3b c0                     cmp    %eax,%eax
Code;  0000000d Before first symbol
   d:   85 f6                     test   %esi,%esi
Code;  0000000f Before first symbol
   f:   74 16                     je     27 <_EIP+0x27> 00000027 Before first symbol
Code;  00000011 Before first symbol
  11:   0f b7 46 00               movzwl 0x0(%esi),%eax

Jul 11 06:25:41 host kernel: kernel BUG at prints.c:341!
Jul 11 06:25:41 host kernel: invalid operand: 0000
Jul 11 06:25:41 host kernel: CPU:    0
Jul 11 06:25:41 host kernel: EIP:    0010:[reiserfs_panic+52/104]    Not tainted
Jul 11 06:25:41 host kernel: EFLAGS: 00010286
Jul 11 06:25:41 host kernel: eax: 00000037   ebx: c02dc6a0   ecx: 00000002   edx: 02000000
Jul 11 06:25:41 host kernel: esi: f7e57000   edi: 00000000   ebp: f7e57000   esp: f7ed7eb8
Jul 11 06:25:41 host kernel: ds: 0018   es: 0018   ss: 0018
Jul 11 06:25:41 host kernel: Process kupdated (pid: 7, stackpage=f7ed7000)
Jul 11 06:25:41 host kernel: Stack: c02da97f c03c5c20 c03c1b80 00000841 c02dc6a0 f7ed7ee4 f8b1017c f7a10000
Jul 11 06:25:41 host kernel:        c01bc1fe f7e57000 c02dc6a0 00000002 00000012 00000010 00000000 f8b100b4
Jul 11 06:25:41 host kernel:        f8b101b0 f8b101a4 00000077 00000000 00000003 eefcf7a0 c01c01ad f7e57000 
Jul 11 06:25:41 host kernel: Call Trace:    [flush_commit_list+658/904] [do_journal_end+1989/2764] [journal_mark_dirty+490/792] [flush_old_commits+295/320] [reiserfs_write_super+56/108]
Jul 11 06:25:41 host kernel: Code: 0f 0b 55 01 92 a9 2d c0 68 20 5c 3c c0 85 f6 74 13 0f b7 46


>>ebx; c02dc6a0 <MAX_KEY+e40/3fb8>
>>edx; 02000000 Before first symbol
>>esi; f7e57000 <END_OF_CODE+37a5e204/????>
>>ebp; f7e57000 <END_OF_CODE+37a5e204/????>
>>esp; f7ed7eb8 <END_OF_CODE+37adf0bc/????>

Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   0f 0b                     ud2a
Code;  00000002 Before first symbol
   2:   55                        push   %ebp
Code;  00000003 Before first symbol
   3:   01 92 a9 2d c0 68         add    %edx,0x68c02da9(%edx)
Code;  00000009 Before first symbol
   9:   20 5c 3c c0               and    %bl,0xffffffc0(%esp,%edi,1)
Code;  0000000d Before first symbol
   d:   85 f6                     test   %esi,%esi
Code;  0000000f Before first symbol
   f:   74 13                     je     24 <_EIP+0x24> 00000024 Before first symbol
Code;  00000011 Before first symbol
  11:   0f b7 46 00               movzwl 0x0(%esi),%eax

1 warning and 1 error issued.  Results may not be reliable.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.22-pre3 and reiserfs problem (not boot)
  2003-07-11 14:08 2.4.22-pre3 and reiserfs problem (not boot) "Peter Lojkin" 
@ 2003-07-11 14:29 ` Oleg Drokin
  2003-07-11 15:41   ` Re[2]: " "Peter Lojkin" 
  0 siblings, 1 reply; 5+ messages in thread
From: Oleg Drokin @ 2003-07-11 14:29 UTC (permalink / raw)
  To: Peter Lojkin; +Cc: linux-kernel

Hello!

On Fri, Jul 11, 2003 at 06:08:08PM +0400, "Peter Lojkin"  wrote:

> After few hours of work with 2.4.22-pre3 (patched to solve mount problem) we got this (ksyms was unavailable):

There was one more reiserfs message in kernel log just before this line, can you please include it?

> Jul 10 06:25:41 host kernel: kernel BUG at prints.c:334!
> Jul 10 06:25:41 host kernel: invalid operand: 0000
> Jul 10 06:25:41 host kernel: CPU:    1
> Jul 10 06:25:41 host kernel: EIP:    0010:[reiserfs_panic+41/96]    Not tainted

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re[2]: 2.4.22-pre3 and reiserfs problem (not boot)
  2003-07-11 14:29 ` Oleg Drokin
@ 2003-07-11 15:41   ` "Peter Lojkin" 
  2003-07-11 15:49     ` Oleg Drokin
  0 siblings, 1 reply; 5+ messages in thread
From: "Peter Lojkin"  @ 2003-07-11 15:41 UTC (permalink / raw)
  To: "Oleg Drokin" ; +Cc: linux-kernel

Hello,

> There was one more reiserfs message in kernel log just before this
> line, can you please include it?
> 
> > Jul 10 06:25:41 host kernel: kernel BUG at prints.c:334!
> > Jul 10 06:25:41 host kernel: invalid operand: 0000
> > Jul 10 06:25:41 host kernel: CPU:    1
> > Jul 10 06:25:41 host kernel: EIP:    0010:[reiserfs_panic+41/96]    Not tainted

right. ksymoops cut it out so i missed it.

Jul 10 06:25:10 host kernel: journal-601, buffer write failed

another thing to note, both oopses happend exactly at 06:25:41 (Jul 10 and 11), and both times there were "journal-601, buffer write failed"
close prior to it.

i missed a lot info in original message, sorry.
here it is:

the box is dual p3, serverworks le chipset, 1gb memory, integrated
dual-channel adaptec 7899a, intel e1000, 4 scsi disks, scsi promise
ide-raid box attached.

local disks form md0 with raid5 with total size of ~130gb.
promise box also in raid5 mode with total size of ~1.3tb.
both use reiserfs.

in the logs we often get messages like:
Jul 11 14:25:59 host kernel: (scsi0:A:10:0): parity error detected in Data-out phase. SEQADDR(0x55) SCSIRATE(0xc2)
Jul 11 14:25:59 host kernel: ^INo terminal CRC packet recevied

_but_ with 2.4.21-rc? kernel it cause no problems and no data loss.
promise box itself doesn't detect any errors.
i've checked the list and found coule of messages about such "parity
errors" in recent kernels, but no solution or any info about it
causing problems.
hoping to get rid of this messages i've tried 2.4.22-pre3 and got
oopses...



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.22-pre3 and reiserfs problem (not boot)
  2003-07-11 15:41   ` Re[2]: " "Peter Lojkin" 
@ 2003-07-11 15:49     ` Oleg Drokin
  2003-07-11 16:06       ` "Peter Lojkin" 
  0 siblings, 1 reply; 5+ messages in thread
From: Oleg Drokin @ 2003-07-11 15:49 UTC (permalink / raw)
  To: Peter Lojkin; +Cc: linux-kernel

Hello!

On Fri, Jul 11, 2003 at 07:41:03PM +0400, "Peter Lojkin"  wrote:

> > There was one more reiserfs message in kernel log just before this
> > line, can you please include it?
> right. ksymoops cut it out so i missed it.
> Jul 10 06:25:10 host kernel: journal-601, buffer write failed

Well, the write to journal failed. Reiserfs panics in such an event as it does not
know what to do in such a case (there are some works at SuSE by Jeff Mahoney to
remount r/o if such an event happens).

> another thing to note, both oopses happend exactly at 06:25:41 (Jul 10 and 11), and both times there were "journal-601, buffer write failed"
> close prior to it.

Well, how about some i/o error messages from block device drivers?

> in the logs we often get messages like:
> Jul 11 14:25:59 host kernel: (scsi0:A:10:0): parity error detected in Data-out phase. SEQADDR(0x55) SCSIRATE(0xc2)
> Jul 11 14:25:59 host kernel: ^INo terminal CRC packet recevied

Hm, can that lead to i/o error propagated up to reiserfs? If yes, then thats' the problem.

> _but_ with 2.4.21-rc? kernel it cause no problems and no data loss.
> promise box itself doesn't detect any errors.
> i've checked the list and found coule of messages about such "parity
> errors" in recent kernels, but no solution or any info about it
> causing problems.
> hoping to get rid of this messages i've tried 2.4.22-pre3 and got
> oopses...

Hm, I guess you need to stop the driver to propagate i/o errors upstream
(perhaps find a recent change that started to do this).
There is nothing to do from reiserfs perspective (except for better error handling,
which will not do you anything good anyway).

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.22-pre3 and reiserfs problem (not boot)
  2003-07-11 15:49     ` Oleg Drokin
@ 2003-07-11 16:06       ` "Peter Lojkin" 
  0 siblings, 0 replies; 5+ messages in thread
From: "Peter Lojkin"  @ 2003-07-11 16:06 UTC (permalink / raw)
  To: "Oleg Drokin" ; +Cc: linux-kernel

> > Jul 10 06:25:10 host kernel: journal-601, buffer write failed
> 
> Well, the write to journal failed. Reiserfs panics in such an event as it does not
> know what to do in such a case (there are some works at SuSE by Jeff Mahoney to
> remount r/o if such an event happens).
yes, once i found the "buffer write failed" i knew it wasn't a random
reiserfs oops. just missed it first time, sorry.

and i think that close timming of oopses was caused by some cron job
started at this time, the one that does search through entire fs tree...

> Well, how about some i/o error messages from block device drivers?
> 
> > in the logs we often get messages like:
> > Jul 11 14:25:59 host kernel: (scsi0:A:10:0): parity error detected in Data-out phase. SEQADDR(0x55) SCSIRATE(0xc2)
> > Jul 11 14:25:59 host kernel: ^INo terminal CRC packet recevied
> 
> Hm, can that lead to i/o error propagated up to reiserfs? If yes,
> then thats' the problem.
sure if there were real errors, but with earlier kernels we get
this errors in logs but no problems or data loss. strange...

> Hm, I guess you need to stop the driver to propagate i/o errors upstream
> (perhaps find a recent change that started to do this).
> There is nothing to do from reiserfs perspective (except for better error handling,
> which will not do you anything good anyway).
well i cannot do a lot of reboots on this box, so i guess i just
try to move promise to another host with another scsi hba and see if
it works... 

Big thanks for quick reply!


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-07-11 15:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-11 14:08 2.4.22-pre3 and reiserfs problem (not boot) "Peter Lojkin" 
2003-07-11 14:29 ` Oleg Drokin
2003-07-11 15:41   ` Re[2]: " "Peter Lojkin" 
2003-07-11 15:49     ` Oleg Drokin
2003-07-11 16:06       ` "Peter Lojkin" 

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).