* panic in raid1_end_write_request
@ 2005-01-28 21:23 Norman Gaywood
2005-01-28 22:34 ` Mark Rustad
0 siblings, 1 reply; 3+ messages in thread
From: Norman Gaywood @ 2005-01-28 21:23 UTC (permalink / raw)
To: linux-kernel
I have a Dell PE2650, Dual Xeon, 1G memory and several software raid1
partitions, ext3. Main duties include NFS, DHCP and samba. A Fedora
kernel 2.6.10-1.747_FC3smp which includes 2.6.10-ac10.
This system panics frequently, between several hours to several days. It
does not seem to be related to load. Hardware and memory tests indicate
a good system.
Panic messages are similar to:
Unable to handle kernel NULL pointer dereference at virtual address 00000038
printing eip:
f882940f
*pde = 379c9001
Oops: 0000 [#1]
SMP
Modules linked in: iptable_filter ip_tables nfsd exportfs md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd sunrpc microcode dm_mod video button battery ac cfi_probe gen_probe scb2_flash mtdcore chipreg map_funcs tg3 floppy sg ext3 jbd raid1 aic7xxx sd_mod scsi_mod
CPU: 3
EIP: 0060:[<f882940f>] Not tainted VLI
EFLAGS: 00010246 (2.6.10-1.747_FC3smp)
EIP is at raid1_end_write_request+0x8e/0xb2 [raid1]
eax: 00000000 ebx: f7dda400 ecx: f79e78a0 edx: 00000000
esi: 00000018 edi: f7dd6e00 ebp: f7dda400 esp: c03aef18
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c03ae000 task=f7f5fa40)
Stack: f7fbd100 00001000 f8829381 00000000 c01564ce 00001000 f7fbd100 00000000
c03aef60 c0217b6f f7bcca24 00000000 00000000 00000000 00001000 f7bcca24
f7d4b33c f78f4080 00000001 f88435ec 00000001 e4d10b80 f7bcca24 f78f4080
Call Trace:
[<f8829381>] raid1_end_write_request+0x0/0xb2 [raid1]
[<c01564ce>] bio_endio+0x50/0x55
[<c0217b6f>] __end_that_request_first+0xea/0x1ab
[<f88435ec>] scsi_end_request+0x1b/0x9d [scsi_mod]
[<f88439a7>] scsi_io_completion+0x206/0x40f [scsi_mod]
[<c011a394>] __wake_up+0x29/0x3c
[<f883fadd>] scsi_finish_command+0xad/0xb1 [scsi_mod]
[<f883fa02>] scsi_softirq+0xb6/0xbe [scsi_mod]
[<c0121f60>] __do_softirq+0x4c/0xb1
[<c0105d9f>] do_softirq+0x41/0x48
=======================
[<c0105cd0>] do_IRQ+0x74/0x7e
[<c010467e>] common_interrupt+0x1a/0x20
[<c0102018>] default_idle+0x0/0x2f
[<c02b007b>] xfrm_sk_policy_lookup+0x2cd/0x355
[<c0102041>] default_idle+0x29/0x2f
[<c01020a0>] cpu_idle+0x26/0x3b
Code: 53 08 89 44 0e 04 89 54 0e 08 f0 ff 0b 0f 94 c0 84 c0 74 0f 8b 43 14 e8 bf 5f a3 c7 89 d8 e8 15 fe ff ff 8b 47 04 8b 1f 8b 04 06 <8b> 48 38 f0 ff 48 48 0f 94 c2 84 d2 74 0d 85 c9 74 09 f0 0f ba
<0>Kernel panic - not syncing: Fatal exception in interrupt
--
Norman Gaywood, Systems Administrator
School of Mathematics, Statistics and Computer Science
University of New England, Armidale, NSW 2351, Australia
norm@turing.une.edu.au Phone: +61 (0)2 6773 2412
http://turing.une.edu.au/~norm Fax: +61 (0)2 6773 3312
Please avoid sending me Word or PowerPoint attachments.
See http://www.fsf.org/philosophy/no-word-attachments.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: panic in raid1_end_write_request
2005-01-28 21:23 panic in raid1_end_write_request Norman Gaywood
@ 2005-01-28 22:34 ` Mark Rustad
2005-01-28 22:56 ` Norman Gaywood
0 siblings, 1 reply; 3+ messages in thread
From: Mark Rustad @ 2005-01-28 22:34 UTC (permalink / raw)
To: Norman Gaywood; +Cc: linux-kernel
Norman,
I used to get these running SuSE SLES 9 and also with a variety of
kernel.org kernels. The crash was triggered by a media error on a
RAID1. A patch that I got from SuSE fixed it for me. The patch is below
your message excerpt.
On Jan 28, 2005, at 3:23 PM, Norman Gaywood wrote:
> I have a Dell PE2650, Dual Xeon, 1G memory and several software raid1
> partitions, ext3. Main duties include NFS, DHCP and samba. A Fedora
> kernel 2.6.10-1.747_FC3smp which includes 2.6.10-ac10.
>
> This system panics frequently, between several hours to several days.
> It
> does not seem to be related to load. Hardware and memory tests indicate
> a good system.
>
> Panic messages are similar to:
>
> Unable to handle kernel NULL pointer dereference at virtual address
> 00000038
> printing eip:
> f882940f
> *pde = 379c9001
> Oops: 0000 [#1]
<snip>
Here is the patch:
--- linux-2.6.5/fs/bio.c~ 2004-11-24 12:42:10.532343678 +0100
+++ linux-2.6.5/fs/bio.c 2004-11-24 12:46:49.308021403 +0100
@@ -98,12 +98,7 @@
BIO_BUG_ON(pool_idx >= BIOVEC_NR_POOLS);
- /*
- * cloned bio doesn't own the veclist
- */
- if (!bio_flagged(bio, BIO_CLONED))
- mempool_free(bio->bi_io_vec, bp->pool);
-
+ mempool_free(bio->bi_io_vec, bp->pool);
mempool_free(bio, bio_pool);
}
@@ -212,7 +207,9 @@
*/
inline void __bio_clone(struct bio *bio, struct bio *bio_src)
{
- bio->bi_io_vec = bio_src->bi_io_vec;
+ request_queue_t *q = bdev_get_queue(bio_src->bi_bdev);
+
+ memcpy(bio->bi_io_vec, bio_src->bi_io_vec, bio_src->bi_max_vecs *
sizeof(struct bio_vec));
bio->bi_sector = bio_src->bi_sector;
bio->bi_bdev = bio_src->bi_bdev;
@@ -224,21 +221,9 @@
* for the clone
*/
bio->bi_vcnt = bio_src->bi_vcnt;
- bio->bi_idx = bio_src->bi_idx;
- if (bio_flagged(bio, BIO_SEG_VALID)) {
- bio->bi_phys_segments = bio_src->bi_phys_segments;
- bio->bi_hw_segments = bio_src->bi_hw_segments;
- bio->bi_flags |= (1 << BIO_SEG_VALID);
- }
bio->bi_size = bio_src->bi_size;
-
- /*
- * cloned bio does not own the bio_vec, so users cannot fiddle with
- * it. clear bi_max_vecs and clear the BIO_POOL_BITS to make this
- * apparent
- */
- bio->bi_max_vecs = 0;
- bio->bi_flags &= (BIO_POOL_MASK - 1);
+ bio_phys_segments(q, bio);
+ bio_hw_segments(q, bio);
}
/**
@@ -250,7 +235,7 @@
*/
struct bio *bio_clone(struct bio *bio, int gfp_mask)
{
- struct bio *b = bio_alloc(gfp_mask, 0);
+ struct bio *b = bio_alloc(gfp_mask, bio->bi_max_vecs);
if (b)
__bio_clone(b, bio);
--
Mark Rustad, MRustad@mac.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: panic in raid1_end_write_request
2005-01-28 22:34 ` Mark Rustad
@ 2005-01-28 22:56 ` Norman Gaywood
0 siblings, 0 replies; 3+ messages in thread
From: Norman Gaywood @ 2005-01-28 22:56 UTC (permalink / raw)
To: Mark Rustad; +Cc: linux-kernel
Thanks Mark,
On Fri, Jan 28, 2005 at 04:34:01PM -0600, Mark Rustad wrote:
> I used to get these running SuSE SLES 9 and also with a variety of
> kernel.org kernels. The crash was triggered by a media error on a
> RAID1.
Were there any media errors logged? My system does not log any such errors.
> A patch that I got from SuSE fixed it for me. The patch is below
> your message excerpt.
That looks like the "bio clone memory corruption" patch which is
supposed to be in 2.6.10-1.747_FC3smp via 2.6.10-ac10 being included in
that kernel.
I was hoping that would solve my problem as well, but it didn't.
--
Norman Gaywood, Systems Administrator
School of Mathematics, Statistics and Computer Science
University of New England, Armidale, NSW 2351, Australia
norm@turing.une.edu.au Phone: +61 (0)2 6773 2412
http://turing.une.edu.au/~norm Fax: +61 (0)2 6773 3312
Please avoid sending me Word or PowerPoint attachments.
See http://www.fsf.org/philosophy/no-word-attachments.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-01-28 22:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-28 21:23 panic in raid1_end_write_request Norman Gaywood
2005-01-28 22:34 ` Mark Rustad
2005-01-28 22:56 ` Norman Gaywood
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).