linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* panic in raid1_end_write_request
@ 2005-01-28 21:23 Norman Gaywood
  2005-01-28 22:34 ` Mark Rustad
  0 siblings, 1 reply; 3+ messages in thread
From: Norman Gaywood @ 2005-01-28 21:23 UTC (permalink / raw)
  To: linux-kernel

I have a Dell PE2650, Dual Xeon, 1G memory and several software raid1
partitions, ext3. Main duties include NFS, DHCP and samba. A Fedora
kernel 2.6.10-1.747_FC3smp which includes 2.6.10-ac10.

This system panics frequently, between several hours to several days. It
does not seem to be related to load. Hardware and memory tests indicate
a good system.

Panic messages are similar to:

Unable to handle kernel NULL pointer dereference at virtual address 00000038
 printing eip:
f882940f
*pde = 379c9001
Oops: 0000 [#1]
SMP 
Modules linked in: iptable_filter ip_tables nfsd exportfs md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd sunrpc microcode dm_mod video button battery ac cfi_probe gen_probe scb2_flash mtdcore chipreg map_funcs tg3 floppy sg ext3 jbd raid1 aic7xxx sd_mod scsi_mod
CPU:    3
EIP:    0060:[<f882940f>]    Not tainted VLI
EFLAGS: 00010246   (2.6.10-1.747_FC3smp) 
EIP is at raid1_end_write_request+0x8e/0xb2 [raid1]
eax: 00000000   ebx: f7dda400   ecx: f79e78a0   edx: 00000000
esi: 00000018   edi: f7dd6e00   ebp: f7dda400   esp: c03aef18
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c03ae000 task=f7f5fa40)
Stack: f7fbd100 00001000 f8829381 00000000 c01564ce 00001000 f7fbd100 00000000 
       c03aef60 c0217b6f f7bcca24 00000000 00000000 00000000 00001000 f7bcca24 
       f7d4b33c f78f4080 00000001 f88435ec 00000001 e4d10b80 f7bcca24 f78f4080 
Call Trace:
 [<f8829381>] raid1_end_write_request+0x0/0xb2 [raid1]
 [<c01564ce>] bio_endio+0x50/0x55
 [<c0217b6f>] __end_that_request_first+0xea/0x1ab
 [<f88435ec>] scsi_end_request+0x1b/0x9d [scsi_mod]
 [<f88439a7>] scsi_io_completion+0x206/0x40f [scsi_mod]
 [<c011a394>] __wake_up+0x29/0x3c
 [<f883fadd>] scsi_finish_command+0xad/0xb1 [scsi_mod]
 [<f883fa02>] scsi_softirq+0xb6/0xbe [scsi_mod]
 [<c0121f60>] __do_softirq+0x4c/0xb1
 [<c0105d9f>] do_softirq+0x41/0x48
 =======================
 [<c0105cd0>] do_IRQ+0x74/0x7e
 [<c010467e>] common_interrupt+0x1a/0x20
 [<c0102018>] default_idle+0x0/0x2f
 [<c02b007b>] xfrm_sk_policy_lookup+0x2cd/0x355
 [<c0102041>] default_idle+0x29/0x2f
 [<c01020a0>] cpu_idle+0x26/0x3b
Code: 53 08 89 44 0e 04 89 54 0e 08 f0 ff 0b 0f 94 c0 84 c0 74 0f 8b 43 14 e8 bf 5f a3 c7 89 d8 e8 15 fe ff ff 8b 47 04 8b 1f 8b 04 06 <8b> 48 38 f0 ff 48 48 0f 94 c2 84 d2 74 0d 85 c9 74 09 f0 0f ba 
 <0>Kernel panic - not syncing: Fatal exception in interrupt
 

-- 
Norman Gaywood, Systems Administrator
School of Mathematics, Statistics and Computer Science
University of New England, Armidale, NSW 2351, Australia

norm@turing.une.edu.au            Phone: +61 (0)2 6773 2412
http://turing.une.edu.au/~norm    Fax:   +61 (0)2 6773 3312

Please avoid sending me Word or PowerPoint attachments.
See http://www.fsf.org/philosophy/no-word-attachments.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: panic in raid1_end_write_request
  2005-01-28 21:23 panic in raid1_end_write_request Norman Gaywood
@ 2005-01-28 22:34 ` Mark Rustad
  2005-01-28 22:56   ` Norman Gaywood
  0 siblings, 1 reply; 3+ messages in thread
From: Mark Rustad @ 2005-01-28 22:34 UTC (permalink / raw)
  To: Norman Gaywood; +Cc: linux-kernel

Norman,

I used to get these running SuSE SLES 9 and also with a variety of 
kernel.org kernels. The crash was triggered by a media error on a 
RAID1. A patch that I got from SuSE fixed it for me. The patch is below 
your message excerpt.

On Jan 28, 2005, at 3:23 PM, Norman Gaywood wrote:

> I have a Dell PE2650, Dual Xeon, 1G memory and several software raid1
> partitions, ext3. Main duties include NFS, DHCP and samba. A Fedora
> kernel 2.6.10-1.747_FC3smp which includes 2.6.10-ac10.
>
> This system panics frequently, between several hours to several days. 
> It
> does not seem to be related to load. Hardware and memory tests indicate
> a good system.
>
> Panic messages are similar to:
>
> Unable to handle kernel NULL pointer dereference at virtual address 
> 00000038
>  printing eip:
> f882940f
> *pde = 379c9001
> Oops: 0000 [#1]

<snip>

Here is the patch:

--- linux-2.6.5/fs/bio.c~	2004-11-24 12:42:10.532343678 +0100
+++ linux-2.6.5/fs/bio.c	2004-11-24 12:46:49.308021403 +0100
@@ -98,12 +98,7 @@

  	BIO_BUG_ON(pool_idx >= BIOVEC_NR_POOLS);

-	/*
-	 * cloned bio doesn't own the veclist
-	 */
-	if (!bio_flagged(bio, BIO_CLONED))
-		mempool_free(bio->bi_io_vec, bp->pool);
-
+	mempool_free(bio->bi_io_vec, bp->pool);
  	mempool_free(bio, bio_pool);
  }

@@ -212,7 +207,9 @@
   */
  inline void __bio_clone(struct bio *bio, struct bio *bio_src)
  {
-	bio->bi_io_vec = bio_src->bi_io_vec;
+	request_queue_t *q = bdev_get_queue(bio_src->bi_bdev);
+
+	memcpy(bio->bi_io_vec, bio_src->bi_io_vec, bio_src->bi_max_vecs * 
sizeof(struct bio_vec));

  	bio->bi_sector = bio_src->bi_sector;
  	bio->bi_bdev = bio_src->bi_bdev;
@@ -224,21 +221,9 @@
  	 * for the clone
  	 */
  	bio->bi_vcnt = bio_src->bi_vcnt;
-	bio->bi_idx = bio_src->bi_idx;
-	if (bio_flagged(bio, BIO_SEG_VALID)) {
-		bio->bi_phys_segments = bio_src->bi_phys_segments;
-		bio->bi_hw_segments = bio_src->bi_hw_segments;
-		bio->bi_flags |= (1 << BIO_SEG_VALID);
-	}
  	bio->bi_size = bio_src->bi_size;
-
-	/*
-	 * cloned bio does not own the bio_vec, so users cannot fiddle with
-	 * it. clear bi_max_vecs and clear the BIO_POOL_BITS to make this
-	 * apparent
-	 */
-	bio->bi_max_vecs = 0;
-	bio->bi_flags &= (BIO_POOL_MASK - 1);
+	bio_phys_segments(q, bio);
+	bio_hw_segments(q, bio);
  }

  /**
@@ -250,7 +235,7 @@
   */
  struct bio *bio_clone(struct bio *bio, int gfp_mask)
  {
-	struct bio *b = bio_alloc(gfp_mask, 0);
+	struct bio *b = bio_alloc(gfp_mask, bio->bi_max_vecs);

  	if (b)
  		__bio_clone(b, bio);

-- 
Mark Rustad, MRustad@mac.com


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: panic in raid1_end_write_request
  2005-01-28 22:34 ` Mark Rustad
@ 2005-01-28 22:56   ` Norman Gaywood
  0 siblings, 0 replies; 3+ messages in thread
From: Norman Gaywood @ 2005-01-28 22:56 UTC (permalink / raw)
  To: Mark Rustad; +Cc: linux-kernel

Thanks Mark,

On Fri, Jan 28, 2005 at 04:34:01PM -0600, Mark Rustad wrote:
> I used to get these running SuSE SLES 9 and also with a variety of 
> kernel.org kernels. The crash was triggered by a media error on a 
> RAID1.

Were there any media errors logged? My system does not log any such errors.

>        A patch that I got from SuSE fixed it for me. The patch is below 
> your message excerpt.

That looks like the "bio clone memory corruption" patch which is
supposed to be in 2.6.10-1.747_FC3smp via 2.6.10-ac10 being included in
that kernel.

I was hoping that would solve my problem as well, but it didn't.

-- 
Norman Gaywood, Systems Administrator
School of Mathematics, Statistics and Computer Science
University of New England, Armidale, NSW 2351, Australia

norm@turing.une.edu.au            Phone: +61 (0)2 6773 2412
http://turing.une.edu.au/~norm    Fax:   +61 (0)2 6773 3312

Please avoid sending me Word or PowerPoint attachments.
See http://www.fsf.org/philosophy/no-word-attachments.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-01-28 22:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-28 21:23 panic in raid1_end_write_request Norman Gaywood
2005-01-28 22:34 ` Mark Rustad
2005-01-28 22:56   ` Norman Gaywood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).