linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+
@ 2001-01-22  0:18 Bernd Eckenfels
  2001-01-22  6:37 ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Bernd Eckenfels @ 2001-01-22  0:18 UTC (permalink / raw)
  To: linux-kernel

In article <14955.19182.663691.194031@notabene.cse.unsw.edu.au> you wrote:
> There have been assorted reports of filesystem corruption on raid5 in
> 2.4.0, and I have finally got a patch - see below.
> I don't know if it addresses everybody's problems, but it fixed a very
> really problem that is very reproducable.

Do you know if it is safe with 2.4.0 kernels to swap on degraded soft raids?
On the debian-devel list there is a discussion. Currently Debisn Systems to
not do swap-on on boot if a raid partition is resyncing.

Greetings
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+
  2001-01-22  0:18 [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+ Bernd Eckenfels
@ 2001-01-22  6:37 ` Neil Brown
  0 siblings, 0 replies; 9+ messages in thread
From: Neil Brown @ 2001-01-22  6:37 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel, linux-raid

On Monday January 22, inka-user@lina.inka.de wrote:
> In article <14955.19182.663691.194031@notabene.cse.unsw.edu.au> you wrote:
> > There have been assorted reports of filesystem corruption on raid5 in
> > 2.4.0, and I have finally got a patch - see below.
> > I don't know if it addresses everybody's problems, but it fixed a very
> > really problem that is very reproducable.
> 
> Do you know if it is safe with 2.4.0 kernels to swap on degraded soft raids?
> On the debian-devel list there is a discussion. Currently Debisn Systems to
> not do swap-on on boot if a raid partition is resyncing.

In 2.2 it was not safe to swap to a RAID array that we being resyched
or was having a spare reconstructed, but in 2.4 it is perfectly safe.

It would be entirely appropriate for the init.d script to check for
version >= 2.3.9pre8 (I think that is when the new resync code went
it) - or probably just >= 2.4.0, and bypass any fancy checks if that
is the case.

NeilBrown


> 
> Greetings
> Bernd
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+
  2001-01-21 20:47 Neil Brown
                   ` (2 preceding siblings ...)
  2001-01-22 19:36 ` Edward
@ 2001-01-23  8:21 ` Holger Kiehl
  3 siblings, 0 replies; 9+ messages in thread
From: Holger Kiehl @ 2001-01-23  8:21 UTC (permalink / raw)
  To: Neil Brown
  Cc: Otto Meier, Hans Reiser, edward, Ed Tomlinson, Nils Rennebarth,
	Manfred Spraul, David Willmore, Linus Torvalds, Alan Cox,
	linux-kernel, linux-raid



On Mon, 22 Jan 2001, Neil Brown wrote:

>
> There have been assorted reports of filesystem corruption on raid5 in
> 2.4.0, and I have finally got a patch - see below.
> I don't know if it addresses everybody's problems, but it fixed a very
> really problem that is very reproducable.
>
> The problem is that parity can be calculated wrongly when doing a
> read-modify-write update cycle.  If you have a fully functional, you
> wont notice this problem as the parity block is never used to return
> data.  But if you have a degraded array, you will get corruption very
> quickly.
> So I think this will solve the reported corruption with ext2fs, as I
> think they were mostly on degradred arrays.  I have no idea whether it
> will address the reiserfs problems as I don't think anybody reporting
> those problems described their array.
>
> In any case, please apply, and let me know of any further problems.
>
I did test this patch with 2.4.1-pre9 for about 16 hours and I no
longer get the ext2 errors in syslog. Though I must say that both machines
I tested did not have any degradred arrays (but do have corruption
without the patch). During my last test on one of the node a disk
started to get "medium errors", however everything worked fine the
raid code removed the bad disk, started recalculating parity to setup
the spare disk and everything kept on running with no interaction
and no errors in syslog. Very nice! However, forcing a check with
e2fsck -f still produces the following:

   root@florix:~# !e2fsck
   e2fsck -f /dev/md2
   e2fsck 1.19, 13-Jul-2000 for EXT2 FS 0.5b, 95/08/09
   Pass 1: Checking inodes, blocks, and sizes
   Special (device/socket/fifo) inode 3630145 has non-zero size.  Fix<y>? yes

   Special (device/socket/fifo) inode 3630156 has non-zero size.  Fix<y>? yes

   Special (device/socket/fifo) inode 3630176 has non-zero size.  Fix<y>? yes

   Special (device/socket/fifo) inode 3630184 has non-zero size.  Fix<y>? yes

   Pass 2: Checking directory structure
   Pass 3: Checking directory connectivity
   Pass 4: Checking reference counts
   Pass 5: Checking group summary information
   Block bitmap differences:  -3394 -3395 -3396 -3397 -3398 -3399 -3400 -3429 -3430 -3431 -3432 -3433 -3434 -3435 -3466 -3467 -3468 -3469 -3470 -3471 -3472 -3477 -3478 -3479 -3480 -3481 -3482 -3483 -3586 -3587 -3588 -3589 -3590 -3591 -3592 -3627 -3628 -3629 -3630 -3631 -3632 -3633 -3668 -3669 -3670 -3671 -3672 -3673 -3674 -3745 -3746 -3747 -3748 -3749 -3750 -3751 -3756 -3757 -3758 -3759 -3760 -3761 -3762 -3765 -3766 -3767 -3768 -3769 -3770 -3771 -3840 -3841 -3842 -3843 -3844 -3845 -3846
   Fix<y>? yes

   Free blocks count wrong for group #0 (27874, counted=27951).
   Fix<y>? yes

   Free blocks count wrong (7802000, counted=7802077).
   Fix<y>? yes


   /dev/md2: ***** FILE SYSTEM WAS MODIFIED *****
   /dev/md2: 7463/4006240 files (12.7% non-contiguous), 206243/8008320 blocks


Is this something I need to worry about? Yesterday I already reported
that I sometimes only do get the ones with "has non-zero size". What
is the meaning of this?

Another thing I observed in the syslog is the following:

   Jan 22 23:48:21 cube kernel: __alloc_pages: 2-order allocation failed.
   Jan 22 23:48:42 cube last message repeated 32 times
   Jan 22 23:49:54 cube last message repeated 48 times
   Jan 22 23:58:09 cube kernel: __alloc_pages: 2-order allocation failed.
   Jan 22 23:58:13 cube last message repeated 12 times
   Jan 23 00:11:08 cube kernel: __alloc_pages: 2-order allocation failed.
   Jan 23 00:11:10 cube last message repeated 43 times
   Jan 23 00:19:35 cube kernel: __alloc_pages: 2-order allocation failed.
   Jan 23 00:19:39 cube last message repeated 30 times
   Jan 23 00:40:05 cube -- MARK --
   Jan 23 00:53:36 cube kernel: __alloc_pages: 2-order allocation failed.
   Jan 23 00:53:50 cube last message repeated 16 times

This happens under a very high load (120) and is properly not raid related.
What's the meaning of this?

Thanks,
Holger

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+
  2001-01-21 20:47 Neil Brown
  2001-01-21 21:14 ` Manfred Spraul
  2001-01-22  9:23 ` Hans Reiser
@ 2001-01-22 19:36 ` Edward
  2001-01-23  8:21 ` Holger Kiehl
  3 siblings, 0 replies; 9+ messages in thread
From: Edward @ 2001-01-22 19:36 UTC (permalink / raw)
  To: Neil Brown
  Cc: Otto Meier, Holger Kiehl, Hans Reiser, Ed Tomlinson,
	Nils Rennebarth, Manfred Spraul, David Willmore, Linus Torvalds,
	Alan Cox, linux-kernel, linux-raid

Neil Brown wrote:
> 
> There have been assorted reports of filesystem corruption on raid5 in
> 2.4.0, and I have finally got a patch - see below.
> I don't know if it addresses everybody's problems, but it fixed a very
> really problem that is very reproducable.
> 
> The problem is that parity can be calculated wrongly when doing a
> read-modify-write update cycle.  If you have a fully functional, you
> wont notice this problem as the parity block is never used to return
> data.  But if you have a degraded array, you will get corruption very
> quickly.
> So I think this will solve the reported corruption with ext2fs, as I
> think they were mostly on degradred arrays.  I have no idea whether it
> will address the reiserfs problems as I don't think anybody reporting
> those problems described their array.
 
But we deal with a fully functional one. 
Nevertheless this patch fixed reiserfs corruption..
Thanks.
Edward.

> 
> In any case, please apply, and let me know of any further problems.
> 
> --- ./drivers/md/raid5.c        2001/01/21 04:01:57     1.1
> +++ ./drivers/md/raid5.c        2001/01/21 20:36:05     1.2
> @@ -714,6 +714,11 @@
>                 break;
>         }
>         spin_unlock_irq(&conf->device_lock);
> +       if (count>1) {
> +               xor_block(count, bh_ptr);
> +               count = 1;
> +       }
> +
>         for (i = disks; i--;)
>                 if (chosen[i]) {
>                         struct buffer_head *bh = sh->bh_cache[i];
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+
@ 2001-01-22 18:09 Otto Meier
  0 siblings, 0 replies; 9+ messages in thread
From: Otto Meier @ 2001-01-22 18:09 UTC (permalink / raw)
  To: linux-raid, linux-kernel

With this patch I did rebuilt my raid5 from scratch. So far it still runs in degraded mode
to honor the father of invention. 
System is a SMP Dual Celeron with kernel 2.4.0. I copied 18 Gbyte of data from my 
backup  to it. So far i have not seen any corroption messages. 
Last time I did that I got a lot of them. Seams that the fix has improved things for me.
 
Otto

On Mon, 22 Jan 2001 07:47:42 +1100 (EST), Neil Brown wrote:

>
>There have been assorted reports of filesystem corruption on raid5 in
>2.4.0, and I have finally got a patch - see below.
>I don't know if it addresses everybody's problems, but it fixed a very
>really problem that is very reproducable.
>
>The problem is that parity can be calculated wrongly when doing a
>read-modify-write update cycle.  If you have a fully functional, you
>wont notice this problem as the parity block is never used to return
>data.  But if you have a degraded array, you will get corruption very
>quickly.
>So I think this will solve the reported corruption with ext2fs, as I
>think they were mostly on degradred arrays.  I have no idea whether it
>will address the reiserfs problems as I don't think anybody reporting
>those problems described their array.
>
>In any case, please apply, and let me know of any further problems.
>
>
>--- ./drivers/md/raid5.c	2001/01/21 04:01:57	1.1
>+++ ./drivers/md/raid5.c	2001/01/21 20:36:05	1.2
>@@ -714,6 +714,11 @@
> 		break;
> 	}
> 	spin_unlock_irq(&conf->device_lock);
>+	if (count>1) {
>+		xor_block(count, bh_ptr);
>+		count = 1;
>+	}
>+	
> 	for (i = disks; i--;)
> 		if (chosen[i]) {
> 			struct buffer_head *bh = sh->bh_cache[i];
>
>
> From my notes for this patch:
>
>   For the read-modify-write cycle, we need to calculate the xor of a
>   bunch of old blocks and bunch of new versions of those blocks.  The
>   old and new blocks occupy the same buffer space, and because xoring
>   is delayed until we have lots of buffers, it could get delayed too
>   much and parity doesn't get calculated until after data had been
>   over-written.
>
>   This patch flushes any pending xor's before copying over old buffers.
>
>
>Everybody running raid5 on 2.4.0 or 2.4.1-pre really should apply this
>patch, and then arrange the get parity checked and corrected on their
>array.
>There currently isn't a clean way to correct parity.
>One way would be to shut down to single user, remount all filesystems
>readonly, or un mount them, and the pull the plug.
>On reboot, raid will rebuild parity, but the filesystems should be
>clean.
>An alternate it so rerun mkraid giving exactly the write configuration.
>This doesn't require pulling the plug, but if you get the config file
>wrong, you could loose your data.
>
>NeilBrown
>






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+
  2001-01-21 21:14 ` Manfred Spraul
@ 2001-01-22 11:19   ` Holger Kiehl
  0 siblings, 0 replies; 9+ messages in thread
From: Holger Kiehl @ 2001-01-22 11:19 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Neil Brown, Otto Meier, Hans Reiser, edward, Ed Tomlinson,
	Nils Rennebarth, David Willmore, Linus Torvalds, Alan Cox,
	linux-kernel, linux-raid



On Sun, 21 Jan 2001, Manfred Spraul wrote:

> I've attached Holger's testcase (ext2, SMP, raid5)
> boot with "mem=64M" and run the attached script.
> The script creates and deletes 9 directories with 10.000 in each dir.
> Neil, could you run it? I don't have an raid 5 array - SMP+ext2 without
> raid5 is ok.
>
> Holger, what's your ext2 block size, and do you run with a degraded
> array?
>
No, I do not have a degraded array and the blocksize of ext2 is 4096. Here is
what /proc/mdstat looks like:

     afdbench@florix:~/testdir$ cat /proc/mdstat
     Personalities : [raid1] [raid5]
     read_ahead 1024 sectors
     md3 : active raid1 sdc1[1] sdb1[0]
           136448 blocks [2/2] [UU]

     md4 : active raid1 sde1[1] sdd1[0]
           136448 blocks [2/2] [UU]

     md0 : active raid1 sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
           24000 blocks [5/5] [UUUUU]

     md1 : active raid5 sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1] sda3[0]
           3148288 blocks level 5, 64k chunk, algorithm 0 [5/5] [UUUUU]

     md2 : active raid5 sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] sda4[0]
           32033280 blocks level 5, 32k chunk, algorithm 0 [5/5] [UUUUU]

     unused devices: <none>

What I do have is a spare disk and I am running swap on raid1. However,
my machine at home, which experienes the same problems, does not have swap
on raid and is also not degraded.

I applied Neils patch to 2.4.1-pre9 and rerun the test, again with
filesystem corruption. I now pressed the reset button and had all parity
recalculated under 2.2.18 and rebooted again to 2.4.1-pre9 to rerun
the test. Now, I do not see anymore filesystem corruption in syslog,
however forcing a check with e2fsck produces the following:

   root@florix:~# !e2fsck
   e2fsck -f /dev/md2
   e2fsck 1.19, 13-Jul-2000 for EXT2 FS 0.5b, 95/08/09
   Pass 1: Checking inodes, blocks, and sizes
   Special (device/socket/fifo) inode 3630145 has non-zero size.  Fix<y>? yes

   Special (device/socket/fifo) inode 3630156 has non-zero size.  Fix<y>? yes

   Pass 2: Checking directory structure
   Pass 3: Checking directory connectivity
   Pass 4: Checking reference counts
   Pass 5: Checking group summary information

   /dev/md2: ***** FILE SYSTEM WAS MODIFIED *****
   /dev/md2: 20002/4006240 files (4.8% non-contiguous), 219556/8008320 blocks

Doing this three times, two of them reported the same inodes with non-zero
size. One test went without any problem (first time ever under 2.4.x).
Now, I am not sure if this still is a filessytem corruption and why
the corruptions where so bad, before the parity recalculation under
2.2.18. I do remember the first time I run 2.4.x with a much larger
testset, it corrupted my system so badly that I had to push the reset
button and parity was recalculated under 2.4.1-pre3.

I will now run my other testset, but this always takes 8 hours. When
this is done I report back.

Holger

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+
  2001-01-21 20:47 Neil Brown
  2001-01-21 21:14 ` Manfred Spraul
@ 2001-01-22  9:23 ` Hans Reiser
  2001-01-22 19:36 ` Edward
  2001-01-23  8:21 ` Holger Kiehl
  3 siblings, 0 replies; 9+ messages in thread
From: Hans Reiser @ 2001-01-22  9:23 UTC (permalink / raw)
  To: Neil Brown
  Cc: Otto Meier, Holger Kiehl, edward, Ed Tomlinson, Nils Rennebarth,
	Manfred Spraul, David Willmore, Linus Torvalds, Alan Cox,
	linux-kernel, linux-raid, reiserfs-list

We'll test and get back to you.

Hans

Neil Brown wrote:
> 
> There have been assorted reports of filesystem corruption on raid5 in
> 2.4.0, and I have finally got a patch - see below.
> I don't know if it addresses everybody's problems, but it fixed a very
> really problem that is very reproducable.
> 
> The problem is that parity can be calculated wrongly when doing a
> read-modify-write update cycle.  If you have a fully functional, you
> wont notice this problem as the parity block is never used to return
> data.  But if you have a degraded array, you will get corruption very
> quickly.
> So I think this will solve the reported corruption with ext2fs, as I
> think they were mostly on degradred arrays.  I have no idea whether it
> will address the reiserfs problems as I don't think anybody reporting
> those problems described their array.
> 
> In any case, please apply, and let me know of any further problems.
> 
> --- ./drivers/md/raid5.c        2001/01/21 04:01:57     1.1
> +++ ./drivers/md/raid5.c        2001/01/21 20:36:05     1.2
> @@ -714,6 +714,11 @@
>                 break;
>         }
>         spin_unlock_irq(&conf->device_lock);
> +       if (count>1) {
> +               xor_block(count, bh_ptr);
> +               count = 1;
> +       }
> +
>         for (i = disks; i--;)
>                 if (chosen[i]) {
>                         struct buffer_head *bh = sh->bh_cache[i];
> 
>  From my notes for this patch:
> 
>    For the read-modify-write cycle, we need to calculate the xor of a
>    bunch of old blocks and bunch of new versions of those blocks.  The
>    old and new blocks occupy the same buffer space, and because xoring
>    is delayed until we have lots of buffers, it could get delayed too
>    much and parity doesn't get calculated until after data had been
>    over-written.
> 
>    This patch flushes any pending xor's before copying over old buffers.
> 
> Everybody running raid5 on 2.4.0 or 2.4.1-pre really should apply this
> patch, and then arrange the get parity checked and corrected on their
> array.
> There currently isn't a clean way to correct parity.
> One way would be to shut down to single user, remount all filesystems
> readonly, or un mount them, and the pull the plug.
> On reboot, raid will rebuild parity, but the filesystems should be
> clean.
> An alternate it so rerun mkraid giving exactly the write configuration.
> This doesn't require pulling the plug, but if you get the config file
> wrong, you could loose your data.
> 
> NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+
  2001-01-21 20:47 Neil Brown
@ 2001-01-21 21:14 ` Manfred Spraul
  2001-01-22 11:19   ` Holger Kiehl
  2001-01-22  9:23 ` Hans Reiser
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Manfred Spraul @ 2001-01-21 21:14 UTC (permalink / raw)
  To: Neil Brown
  Cc: Otto Meier, Holger Kiehl, Hans Reiser, edward, Ed Tomlinson,
	Nils Rennebarth, David Willmore, Linus Torvalds, Alan Cox,
	linux-kernel, linux-raid

[-- Attachment #1: Type: text/plain, Size: 343 bytes --]

I've attached Holger's testcase (ext2, SMP, raid5)
boot with "mem=64M" and run the attached script.
The script creates and deletes 9 directories with 10.000 in each dir.
Neil, could you run it? I don't have an raid 5 array - SMP+ext2 without
raid5 is ok.

Holger, what's your ext2 block size, and do you run with a degraded
array?

--
	Manfred

[-- Attachment #2: fsd.c --]
[-- Type: text/plain, Size: 2855 bytes --]

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <dirent.h>
#include <errno.h>

static void create_files(int, int, char *),
            delete_files(char *);


/*$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ fsd $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$*/
int
main(int argc, char *argv[])
{
   int  no_of_files,
        file_size;
   char dirname[1024];

   if (argc == 4)
   {
      no_of_files = atoi(argv[1]);
      file_size = atoi(argv[2]);
      (void)strcpy(dirname, argv[3]);
   }
   else
   {
      (void)fprintf(stderr,
                    "Usage: %s <number of files> <file size> <directory>\n",
                    argv[0]);
      exit(1);
   }
   create_files(no_of_files, file_size, dirname);
   delete_files(dirname);
   exit(0);
}


/*+++++++++++++++++++++++++++ create_files() ++++++++++++++++++++++++++++*/
static void
create_files(int no_of_files, int file_size, char *dirname)
{
   int  i, fd;
   char *ptr;

   ptr = dirname + strlen(dirname);
   *ptr++ = '/';
   for (i = 0; i < no_of_files; i++)
   {
      (void)sprintf(ptr, "this_is_dummy_file_%d", i);
      if ((fd = open(dirname, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR)) == -1)
      {
         (void)fprintf(stderr, "Failed to open() %s : %s\n",
                       dirname, strerror(errno));
         exit(1);
      }
      if (lseek(fd, file_size - 1, SEEK_SET) == -1)
      {
         (void)fprintf(stderr, "Failed to lseek() %s : %s\n",
                       dirname, strerror(errno));
         exit(1);
      }
      if (write(fd, "", 1) != 1)
      {
         (void)fprintf(stderr, "Failed to write() to %s : %s\n",
                       dirname, strerror(errno));
         exit(1);
      }
      if (close(fd) == -1)
      {
         (void)fprintf(stderr, "Failed to close() %s : %s\n",
                       dirname, strerror(errno));
      }
   }
   ptr[-1] = 0;
   return;
}


/*++++++++++++++++++++++++++++ delete_files +++++++++++++++++++++++++++++*/
static void
delete_files(char *dirname)
{
   char          *ptr;
   struct dirent *dirp;
   DIR           *dp;

   ptr = dirname + strlen(dirname);
   if ((dp = opendir(dirname)) == NULL)
   {
      (void)fprintf(stderr, "Failed to opendir() %s : %s\n",
                    dirname, strerror(errno));
      exit(1);
   }
   *ptr++ = '/';
   while ((dirp = readdir(dp)) != NULL)
   {
      if (dirp->d_name[0] != '.')
      {
         (void)strcpy(ptr, dirp->d_name);
         if (unlink(dirname) == -1)
         {
            (void)fprintf(stderr, "Failed to open() %s : %s\n",
                          dirname, strerror(errno));
            exit(1);
         }
      }
   }
   ptr[-1] = 0;
   if (closedir(dp) == -1)
   {
      (void)fprintf(stderr, "Failed to closedir() %s : %s\n",
                    dirname, strerror(errno));
   }
   return;
}

[-- Attachment #3: start_fsd --]
[-- Type: text/plain, Size: 261 bytes --]

#!/bin/sh

NO_OF_PROCESS=9
NUMBER_OF_FILES=10000
FILE_SIZE=2048

counter=0
while [ $counter -lt $NO_OF_PROCESS ]
do
   if [ ! -d $counter ]
   then
      mkdir $counter
   fi
   ./fsd $NUMBER_OF_FILES $FILE_SIZE $counter &
   counter=`expr "$counter" + 1`
done

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+
@ 2001-01-21 20:47 Neil Brown
  2001-01-21 21:14 ` Manfred Spraul
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Neil Brown @ 2001-01-21 20:47 UTC (permalink / raw)
  To: Otto Meier, Holger Kiehl, Hans Reiser, edward, Ed Tomlinson,
	Nils Rennebarth, Manfred Spraul, David Willmore, Linus Torvalds,
	Alan Cox
  Cc: linux-kernel, linux-raid


There have been assorted reports of filesystem corruption on raid5 in
2.4.0, and I have finally got a patch - see below.
I don't know if it addresses everybody's problems, but it fixed a very
really problem that is very reproducable.

The problem is that parity can be calculated wrongly when doing a
read-modify-write update cycle.  If you have a fully functional, you
wont notice this problem as the parity block is never used to return
data.  But if you have a degraded array, you will get corruption very
quickly.
So I think this will solve the reported corruption with ext2fs, as I
think they were mostly on degradred arrays.  I have no idea whether it
will address the reiserfs problems as I don't think anybody reporting
those problems described their array.

In any case, please apply, and let me know of any further problems.


--- ./drivers/md/raid5.c	2001/01/21 04:01:57	1.1
+++ ./drivers/md/raid5.c	2001/01/21 20:36:05	1.2
@@ -714,6 +714,11 @@
 		break;
 	}
 	spin_unlock_irq(&conf->device_lock);
+	if (count>1) {
+		xor_block(count, bh_ptr);
+		count = 1;
+	}
+	
 	for (i = disks; i--;)
 		if (chosen[i]) {
 			struct buffer_head *bh = sh->bh_cache[i];


 From my notes for this patch:

   For the read-modify-write cycle, we need to calculate the xor of a
   bunch of old blocks and bunch of new versions of those blocks.  The
   old and new blocks occupy the same buffer space, and because xoring
   is delayed until we have lots of buffers, it could get delayed too
   much and parity doesn't get calculated until after data had been
   over-written.

   This patch flushes any pending xor's before copying over old buffers.


Everybody running raid5 on 2.4.0 or 2.4.1-pre really should apply this
patch, and then arrange the get parity checked and corrected on their
array.
There currently isn't a clean way to correct parity.
One way would be to shut down to single user, remount all filesystems
readonly, or un mount them, and the pull the plug.
On reboot, raid will rebuild parity, but the filesystems should be
clean.
An alternate it so rerun mkraid giving exactly the write configuration.
This doesn't require pulling the plug, but if you get the config file
wrong, you could loose your data.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2001-01-23  9:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-01-22  0:18 [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+ Bernd Eckenfels
2001-01-22  6:37 ` Neil Brown
  -- strict thread matches above, loose matches on Subject: below --
2001-01-22 18:09 Otto Meier
2001-01-21 20:47 Neil Brown
2001-01-21 21:14 ` Manfred Spraul
2001-01-22 11:19   ` Holger Kiehl
2001-01-22  9:23 ` Hans Reiser
2001-01-22 19:36 ` Edward
2001-01-23  8:21 ` Holger Kiehl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).