linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* raid5 crash
@ 2004-12-22 22:04 Kristian Eide
  2004-12-22 22:26 ` Norbert van Nobelen
  2004-12-22 23:08 ` Neil Brown
  0 siblings, 2 replies; 9+ messages in thread
From: Kristian Eide @ 2004-12-22 22:04 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2890 bytes --]

I am running kernel 2.6.9-gentoo-r10 on an Athlon XP 2400+ computer with a SiI 
3114 SATA controller hosting 4 WD2500JD-00G drives. I have combined these 
drives into a raid5 array using software raid, but unfortunately the array is 
not stable. I have tried several filesystems (ext3, reiserfs, xfs), but after 
copying several gigabytes of data into the array (using scp) and then trying 
to read them back (using rsync to compare over the network) always results in 
data corruption. Here is the output from 'dmesg':

kernel BUG at drivers/md/raid5.c:813!
invalid operand: 0000 [#1]
Modules linked in: sata_sil libata sbp2 ohci1394 ieee1394 usb_storage ehci_hcd 
usbcore
CPU:    0
EIP:    0060:[<c039cdd2>]    Not tainted VLI
EFLAGS: 00010006   (2.6.9-gentoo-r10)
EIP is at add_stripe_bio+0x1c2/0x200
eax: 00045168   ebx: d3974b00   ecx: d3974980   edx: 00000000
esi: 00045140   edi: 00000000   ebp: e33200a4   esp: f0a05ac4
ds: 007b   es: 007b   ss: 0068
Process rsync (pid: 32092, threadinfo=f0a04000 task=f6c10020)
Stack: 00000000 00000296 00000140 e3320028 00045140 00000000 d3974980 c039e092
       e3320028 d3974980 00000000 00000000 00000000 f0a05b1c de3e1ae0 00045158
       00000000 00000003 00000004 de3e1ae0 dfe90e00 00000000 00000003 f7d85088
Call Trace:
 [<c039e092>] make_request+0x122/0x200
 [<c032dc6f>] generic_make_request+0x15f/0x1e0
 [<c011d590>] autoremove_wake_function+0x0/0x60
 [<c032dd4d>] submit_bio+0x5d/0x100
 [<c0172d43>] mpage_bio_submit+0x23/0x40
 [<c0173170>] do_mpage_readpage+0x2d0/0x480
 [<c012367d>] __do_softirq+0x7d/0x90
 [<c02b54df>] radix_tree_node_alloc+0x1f/0x60
 [<c02b5762>] radix_tree_insert+0xe2/0x100
 [<c0136e54>] add_to_page_cache+0x54/0x80
 [<c017346b>] mpage_readpages+0x14b/0x180
 [<c018f1f0>] reiserfs_get_block+0x0/0x1450
 [<c013ddf4>] read_pages+0x134/0x140
 [<c018f1f0>] reiserfs_get_block+0x0/0x1450
 [<c013b390>] __alloc_pages+0x1d0/0x370
 [<c01081c5>] do_IRQ+0xc5/0xe0
 [<c013e04f>] do_page_cache_readahead+0xcf/0x130
 [<c013e19f>] page_cache_readahead+0xef/0x1e0
 [<c013765c>] do_generic_mapping_read+0x11c/0x4d0
 [<c0137cae>] __generic_file_aio_read+0x1be/0x1f0
 [<c0137a10>] file_read_actor+0x0/0xe0
 [<c0137e1a>] generic_file_read+0xba/0xe0
 [<c011ac24>] do_page_fault+0x194/0x591
 [<c011d590>] autoremove_wake_function+0x0/0x60
 [<c0126f6b>] update_wall_time+0xb/0x40
 [<c012739f>] do_timer+0xdf/0xf0
 [<c015270c>] vfs_read+0xbc/0x170
 [<c012367d>] __do_softirq+0x7d/0x90
 [<c0152a71>] sys_read+0x51/0x80
 [<c010603b>] syscall_call+0x7/0xb
Code: 72 08 0f ba a8 90 00 00 00 02 83 c4 0c 5b 5e 5f 5d c3 89 cb e9 cd fe ff 
ff 8b 5d 00 e9 c5 fe ff ff 77 08 39 f0 0f 86 94 fe ff ff <0f> 0b 2d 0370 92 
44 c0 e9 87 fe ff ff 0f 87 a8 fe ff ff 39 f0

Any idea whether this is a kernel bug or a hardware problem?
Please CC any replies to me.

Sincerely,

-- 
Kristian

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 crash
  2004-12-22 22:04 raid5 crash Kristian Eide
@ 2004-12-22 22:26 ` Norbert van Nobelen
  2004-12-22 23:05   ` Kristian Eide
  2004-12-22 23:08 ` Neil Brown
  1 sibling, 1 reply; 9+ messages in thread
From: Norbert van Nobelen @ 2004-12-22 22:26 UTC (permalink / raw)
  To: Kristian Eide; +Cc: linux-kernel

The sii 3114 is a RAID controller by itself. 

Is in not conflicting somewhere (like running software RAID5 and at the same 
time hardware RAID X?)

On Wednesday 22 December 2004 23:04, you wrote:
> I am running kernel 2.6.9-gentoo-r10 on an Athlon XP 2400+ computer with a
> SiI 3114 SATA controller hosting 4 WD2500JD-00G drives. I have combined
> these drives into a raid5 array using software raid, but unfortunately the
> array is not stable. I have tried several filesystems (ext3, reiserfs,
> xfs), but after copying several gigabytes of data into the array (using
> scp) and then trying to read them back (using rsync to compare over the
> network) always results in data corruption. Here is the output from
> 'dmesg':
>
> kernel BUG at drivers/md/raid5.c:813!
> invalid operand: 0000 [#1]
> Modules linked in: sata_sil libata sbp2 ohci1394 ieee1394 usb_storage
> ehci_hcd usbcore
> CPU:    0
> EIP:    0060:[<c039cdd2>]    Not tainted VLI
> EFLAGS: 00010006   (2.6.9-gentoo-r10)
> EIP is at add_stripe_bio+0x1c2/0x200
> eax: 00045168   ebx: d3974b00   ecx: d3974980   edx: 00000000
> esi: 00045140   edi: 00000000   ebp: e33200a4   esp: f0a05ac4
> ds: 007b   es: 007b   ss: 0068
> Process rsync (pid: 32092, threadinfo=f0a04000 task=f6c10020)
> Stack: 00000000 00000296 00000140 e3320028 00045140 00000000 d3974980
> c039e092 e3320028 d3974980 00000000 00000000 00000000 f0a05b1c de3e1ae0
> 00045158 00000000 00000003 00000004 de3e1ae0 dfe90e00 00000000 00000003
> f7d85088 Call Trace:
>  [<c039e092>] make_request+0x122/0x200
>  [<c032dc6f>] generic_make_request+0x15f/0x1e0
>  [<c011d590>] autoremove_wake_function+0x0/0x60
>  [<c032dd4d>] submit_bio+0x5d/0x100
>  [<c0172d43>] mpage_bio_submit+0x23/0x40
>  [<c0173170>] do_mpage_readpage+0x2d0/0x480
>  [<c012367d>] __do_softirq+0x7d/0x90
>  [<c02b54df>] radix_tree_node_alloc+0x1f/0x60
>  [<c02b5762>] radix_tree_insert+0xe2/0x100
>  [<c0136e54>] add_to_page_cache+0x54/0x80
>  [<c017346b>] mpage_readpages+0x14b/0x180
>  [<c018f1f0>] reiserfs_get_block+0x0/0x1450
>  [<c013ddf4>] read_pages+0x134/0x140
>  [<c018f1f0>] reiserfs_get_block+0x0/0x1450
>  [<c013b390>] __alloc_pages+0x1d0/0x370
>  [<c01081c5>] do_IRQ+0xc5/0xe0
>  [<c013e04f>] do_page_cache_readahead+0xcf/0x130
>  [<c013e19f>] page_cache_readahead+0xef/0x1e0
>  [<c013765c>] do_generic_mapping_read+0x11c/0x4d0
>  [<c0137cae>] __generic_file_aio_read+0x1be/0x1f0
>  [<c0137a10>] file_read_actor+0x0/0xe0
>  [<c0137e1a>] generic_file_read+0xba/0xe0
>  [<c011ac24>] do_page_fault+0x194/0x591
>  [<c011d590>] autoremove_wake_function+0x0/0x60
>  [<c0126f6b>] update_wall_time+0xb/0x40
>  [<c012739f>] do_timer+0xdf/0xf0
>  [<c015270c>] vfs_read+0xbc/0x170
>  [<c012367d>] __do_softirq+0x7d/0x90
>  [<c0152a71>] sys_read+0x51/0x80
>  [<c010603b>] syscall_call+0x7/0xb
> Code: 72 08 0f ba a8 90 00 00 00 02 83 c4 0c 5b 5e 5f 5d c3 89 cb e9 cd fe
> ff ff 8b 5d 00 e9 c5 fe ff ff 77 08 39 f0 0f 86 94 fe ff ff <0f> 0b 2d 0370
> 92 44 c0 e9 87 fe ff ff 0f 87 a8 fe ff ff 39 f0
>
> Any idea whether this is a kernel bug or a hardware problem?
> Please CC any replies to me.
>
> Sincerely,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 crash
  2004-12-22 22:26 ` Norbert van Nobelen
@ 2004-12-22 23:05   ` Kristian Eide
  0 siblings, 0 replies; 9+ messages in thread
From: Kristian Eide @ 2004-12-22 23:05 UTC (permalink / raw)
  To: Norbert van Nobelen; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 290 bytes --]

> The sii 3114 is a RAID controller by itself.
> Is in not conflicting somewhere (like running software RAID5 and at the
> same time hardware RAID X?)

No. The SiI 3114 is only being used as an SATA controller; I have not 
configured any hardware raid.

Sincerely,

-- 
Kristian

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 crash
  2004-12-22 22:04 raid5 crash Kristian Eide
  2004-12-22 22:26 ` Norbert van Nobelen
@ 2004-12-22 23:08 ` Neil Brown
  2004-12-23  9:51   ` Prakash K. Cheemplavam
                     ` (2 more replies)
  1 sibling, 3 replies; 9+ messages in thread
From: Neil Brown @ 2004-12-22 23:08 UTC (permalink / raw)
  To: Kristian Eide; +Cc: linux-kernel

On Wednesday December 22, kreide@online.no wrote:
> I am running kernel 2.6.9-gentoo-r10 on an Athlon XP 2400+ computer with a SiI 
> 3114 SATA controller hosting 4 WD2500JD-00G drives. I have combined these 
> drives into a raid5 array using software raid, but unfortunately the array is 
> not stable. I have tried several filesystems (ext3, reiserfs, xfs), but after 
> copying several gigabytes of data into the array (using scp) and then trying 
> to read them back (using rsync to compare over the network) always results in 
> data corruption. Here is the output from 'dmesg':
> 
> kernel BUG at drivers/md/raid5.c:813!

This BUG happens when there are two outstanding read (or write)
requests for the same piece of storage (more accurately, two "bio"s
that overlap).
raid5 cannot currently handle this situation.
Most filesystems would never make requests like this.
I note that you are using reiserfs in this case.  It is possible that
reiserfs with tail-packing enabled could do this.

I doubt very much that this would happen with ext3.  I don't know
about xfs, but I doubt it would happen their either.

When using some other filesystem, what sort of data corruption are you
getting?

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 crash
  2004-12-22 23:08 ` Neil Brown
@ 2004-12-23  9:51   ` Prakash K. Cheemplavam
  2004-12-23 19:45   ` Kristian Eide
  2005-01-13 14:58   ` raid5 crash Stephen C. Tweedie
  2 siblings, 0 replies; 9+ messages in thread
From: Prakash K. Cheemplavam @ 2004-12-23  9:51 UTC (permalink / raw)
  To: Neil Brown; +Cc: Kristian Eide, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 804 bytes --]

Neil Brown schrieb:
> On Wednesday December 22, kreide@online.no wrote:
> 
>>I am running kernel 2.6.9-gentoo-r10 on an Athlon XP 2400+ computer with a SiI 
>>3114 SATA controller hosting 4 WD2500JD-00G drives. I have combined these 
>>drives into a raid5 array using software raid, but unfortunately the array is 
>>not stable. I have tried several filesystems (ext3, reiserfs, xfs), but after 
>>copying several gigabytes of data into the array (using scp) and then trying 
>>to read them back (using rsync to compare over the network) always results in 
>>data corruption. Here is the output from 'dmesg':
>>
>>kernel BUG at drivers/md/raid5.c:813!

Have you a bios option called ext-p2p discard time? Try setting it 
higher. I posted another thread about sii3112 at lkml about this issue...

Prakash

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 crash
  2004-12-22 23:08 ` Neil Brown
  2004-12-23  9:51   ` Prakash K. Cheemplavam
@ 2004-12-23 19:45   ` Kristian Eide
  2005-01-03 23:30     ` raid5 crash (possible VM problem???) Neil Brown
  2005-01-13 14:58   ` raid5 crash Stephen C. Tweedie
  2 siblings, 1 reply; 9+ messages in thread
From: Kristian Eide @ 2004-12-23 19:45 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2741 bytes --]

> I doubt very much that this would happen with ext3.  I don't know
> about xfs, but I doubt it would happen their either.
> When using some other filesystem, what sort of data corruption are you
> getting?

This is with ext3:

kernel BUG at drivers/md/raid5.c:813!
invalid operand: 0000 [#1]
Modules linked in: sata_sil libata sbp2 ohci1394 ieee1394 usb_storage ehci_hcd 
usbcore
CPU:    0
EIP:    0060:[<c039cdd2>]    Not tainted VLI
EFLAGS: 00010016   (2.6.9-gentoo-r10)
EIP is at add_stripe_bio+0x1c2/0x200
eax: 493c4b40   ebx: cbc5eaa0   ecx: e8257da0   edx: 00000000
esi: 493c4b18   edi: 00000000   ebp: f58958b0   esp: f5995a98
ds: 007b   es: 007b   ss: 0068
Process rsync (pid: 8803, threadinfo=f5994000 task=d841aaa0)
Stack: c01551f7 f7ddf200 00000118 f58957c8 493c4b18 00000000 e8257da0 c039e092
       f58957c8 e8257da0 00000001 00000000 00000000 f5995af0 f5fde2e0 493c4b30
       00000000 00000003 00000004 f5fde2e0 dfe90e00 00000001 00000000 f7d84088
Call Trace:
 [<c01551f7>] __getblk+0x37/0x70
 [<c039e092>] make_request+0x122/0x200
 [<c032dc6f>] generic_make_request+0x15f/0x1e0
 [<c032dd4d>] submit_bio+0x5d/0x100
 [<c01b5ec4>] ext3_get_block+0x64/0xb0
 [<c0172d43>] mpage_bio_submit+0x23/0x40
 [<c0173170>] do_mpage_readpage+0x2d0/0x480
 [<c0332b08>] as_next_request+0x38/0x50
 [<c032a706>] elv_next_request+0x16/0x110
 [<c02b4c1e>] kobject_put+0x1e/0x30
 [<c02b4bf0>] kobject_release+0x0/0x10
 [<c02b54df>] radix_tree_node_alloc+0x1f/0x60
 [<c02b5762>] radix_tree_insert+0xe2/0x100
 [<c0136e54>] add_to_page_cache+0x54/0x80
 [<c017346b>] mpage_readpages+0x14b/0x180
 [<c01b5e60>] ext3_get_block+0x0/0xb0
 [<c013ddf4>] read_pages+0x134/0x140
 [<c01b5e60>] ext3_get_block+0x0/0xb0
 [<c013b390>] __alloc_pages+0x1d0/0x370
 [<c013e04f>] do_page_cache_readahead+0xcf/0x130
 [<c013e234>] page_cache_readahead+0x184/0x1e0
 [<c013765c>] do_generic_mapping_read+0x11c/0x4d0
 [<c0137cae>] __generic_file_aio_read+0x1be/0x1f0
 [<c0137a10>] file_read_actor+0x0/0xe0
 [<c02b4c1e>] kobject_put+0x1e/0x30
 [<c0137d3a>] generic_file_aio_read+0x5a/0x80
 [<c015261e>] do_sync_read+0xbe/0xf0
 [<f92cdc66>] ata_qc_complete+0x46/0xe0 [libata]
 [<c011d590>] autoremove_wake_function+0x0/0x60
 [<c036b9aa>] scsi_finish_command+0x7a/0xc0
 [<c011bb9f>] recalc_task_prio+0x8f/0x190
 [<c015270c>] vfs_read+0xbc/0x170
 [<c040e363>] schedule+0x2a3/0x470
 [<c0152a71>] sys_read+0x51/0x80
 [<c010603b>] syscall_call+0x7/0xb
Code: 72 08 0f ba a8 90 00 00 00 02 83 c4 0c 5b 5e 5f 5d c3 89 cb e9 cd fe ff 
ff 8b 5d 00 e9 c5 fe ff ff 77 08 39 f0 0f 86 94 fe ff ff <0f> 0b 2d 0370 92 
44 c0 e9 87 fe ff ff 0f 87 a8 fe ff ff 39 f0

So apparently it can happen with ext3 as well.

-- 
Kristian

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 crash (possible VM problem???)
  2004-12-23 19:45   ` Kristian Eide
@ 2005-01-03 23:30     ` Neil Brown
  2005-01-16 18:33       ` Kristian Eide
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2005-01-03 23:30 UTC (permalink / raw)
  To: Kristian Eide; +Cc: linux-kernel

On Thursday December 23, kreide@online.no wrote:
> > I doubt very much that this would happen with ext3.  I don't know
> > about xfs, but I doubt it would happen their either.
> > When using some other filesystem, what sort of data corruption are you
> > getting?
> 
> This is with ext3:
> 
> kernel BUG at drivers/md/raid5.c:813!

This is very suspect...
This BUG is triggered if raid5 receives a read request while there is
already an outstanding read request that overlaps the new one.
i.e. this->sector  < that->sector and
     this->sector + this->size > that->sector

I admit that my understanding of ext3 isn't perfect, but I would be VERY
surprised if ext3 would ever submit overlapping requests.  
I guess it could happen if the filesystem were corrupted (and two
files claimed to own the same block) but I doubt that is the case
here.

I suspect there is a problem somewhere else.. in the VM maybe??

You could try this patch.  It might hide the real problem, or it might
cause it to manifest in some other way.
It changes the raid5 behaviour so that if an overlap is found, it
doesn't BUG, but instead waits a little while and tries again.

NeilBrown

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/raid5.c |   18 ++++++++++++++----
 1 files changed, 14 insertions(+), 4 deletions(-)

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2004-12-20 12:24:19.000000000 +1100
+++ ./drivers/md/raid5.c	2004-12-28 17:02:44.000000000 +1100
@@ -232,6 +232,7 @@ static struct stripe_head *__find_stripe
 }
 
 static void unplug_slaves(mddev_t *mddev);
+static void raid5_unplug_device(request_queue_t *q);
 
 static struct stripe_head *get_active_stripe(raid5_conf_t *conf, sector_t sector,
 					     int pd_idx, int noblock) 
@@ -793,7 +794,7 @@ static void compute_parity(struct stripe
  * toread/towrite point to the first in a chain. 
  * The bi_next chain must be in order.
  */
-static void add_stripe_bio (struct stripe_head *sh, struct bio *bi, int dd_idx, int forwrite)
+static int add_stripe_bio (struct stripe_head *sh, struct bio *bi, int dd_idx, int forwrite)
 {
 	struct bio **bip;
 	raid5_conf_t *conf = sh->raid_conf;
@@ -810,10 +811,10 @@ static void add_stripe_bio (struct strip
 	else
 		bip = &sh->dev[dd_idx].toread;
 	while (*bip && (*bip)->bi_sector < bi->bi_sector) {
-		BUG_ON((*bip)->bi_sector + ((*bip)->bi_size >> 9) > bi->bi_sector);
+		if ((*bip)->bi_sector + ((*bip)->bi_size >> 9) > bi->bi_sector)
+			return 0; /* cannot add just now due to overlap */
 		bip = & (*bip)->bi_next;
 	}
-/* FIXME do I need to worry about overlapping bion */
 	if (*bip && bi->bi_next && (*bip) != bi->bi_next)
 		BUG();
 	if (*bip)
@@ -840,6 +841,7 @@ static void add_stripe_bio (struct strip
 		if (sector >= sh->dev[dd_idx].sector + STRIPE_SECTORS)
 			set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags);
 	}
+	return 1;
 }
 
 
@@ -1413,7 +1415,15 @@ static int make_request (request_queue_t
 		sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
 		if (sh) {
 
-			add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK));
+			while (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
+				/* add failed due to overlap.  Flush everything
+				 * and wait a while
+				 * FIXME - overlapping requests should be handled better
+				 */
+				raid5_unplug_device(mddev->queue);
+				set_current_state(TASK_UNINTERRUPTIBLE);
+				schedule_timeout(1);
+			}
 
 			raid5_plug_device(conf);
 			handle_stripe(sh);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 crash
  2004-12-22 23:08 ` Neil Brown
  2004-12-23  9:51   ` Prakash K. Cheemplavam
  2004-12-23 19:45   ` Kristian Eide
@ 2005-01-13 14:58   ` Stephen C. Tweedie
  2 siblings, 0 replies; 9+ messages in thread
From: Stephen C. Tweedie @ 2005-01-13 14:58 UTC (permalink / raw)
  To: Neil Brown; +Cc: Stephen Tweedie, Kristian Eide, linux-kernel

Hi,

On Wed, 2004-12-22 at 23:08, Neil Brown wrote:

> > kernel BUG at drivers/md/raid5.c:813!
> 
> This BUG happens when there are two outstanding read (or write)
> requests for the same piece of storage (more accurately, two "bio"s
> that overlap).

Ouch. 

> raid5 cannot currently handle this situation.
> Most filesystems would never make requests like this.

I don't think there's anything to stop me doing two O_DIRECT read()s
from the same bit of a file at the same time.  As far as I can see, this
should be easily triggered by user-space.

And if you get corruption in a filesystem such that two files share the
same block, then this possibility arises again.  That can happen due to
corruption in an indirect block (ext2/3) or in the reiserfs tree; or
more commonly due to a bitmap block getting corrupted, resulting in the
same block being allocated twice.

This is a situation we really need to handle.  ext3 goes to great
lengths to make sure that if such cases happen, the worst that results
should be the filesystem taking itself readonly cleanly.

It's really bad behaviour for a fault such as a bad IDE cable to be able
to oops the entire kernel.

> I doubt very much that this would happen with ext3.

It certainly shouldn't do so for buffered IO if things are running fine,
but as soon as you get corrupt data, or start using O_DIRECT, it's
possible.

Cheers,
 Stephen



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 crash (possible VM problem???)
  2005-01-03 23:30     ` raid5 crash (possible VM problem???) Neil Brown
@ 2005-01-16 18:33       ` Kristian Eide
  0 siblings, 0 replies; 9+ messages in thread
From: Kristian Eide @ 2005-01-16 18:33 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1209 bytes --]

> You could try this patch.  It might hide the real problem, or it might
> cause it to manifest in some other way.

I've applied the patch, but I now get another error; this might not be related 
to raid5, however, I have tested the individual SATA disks without getting 
any errors. This only happens when combining them into a raid5 array (other 
raid levels not tested).

ReiserFS: md3: journal params: device md3, size 8192, journal first block 18, 
max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: md3: checking transaction log (md3)
ReiserFS: md3: Using r5 hash to sort names
attempt to access beyond end of device
md3: rw=0, want=18446744063991695384, limit=1465175040
attempt to access beyond end of device
md3: rw=0, want=18446744062355374704, limit=1465175040
attempt to access beyond end of device
md3: rw=0, want=4913837584, limit=1465175040
attempt to access beyond end of device
md3: rw=0, want=18446744071656162744, limit=1465175040

I have 4 250GB SATA disk combined into one raid5 volume (kernel 2.6.10), and 
this error happens after copying a few gigabytes of data into the volume and 
then trying to read them back.

-- 
Kristian

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-01-16 18:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-22 22:04 raid5 crash Kristian Eide
2004-12-22 22:26 ` Norbert van Nobelen
2004-12-22 23:05   ` Kristian Eide
2004-12-22 23:08 ` Neil Brown
2004-12-23  9:51   ` Prakash K. Cheemplavam
2004-12-23 19:45   ` Kristian Eide
2005-01-03 23:30     ` raid5 crash (possible VM problem???) Neil Brown
2005-01-16 18:33       ` Kristian Eide
2005-01-13 14:58   ` raid5 crash Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).