linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Content Of Files May Be Changed After One Disk Is Failed In RAID5
@ 2012-09-07  1:40 clplayer
  2012-09-07  2:33 ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: clplayer @ 2012-09-07  1:40 UTC (permalink / raw)
  To: linux-kernel

I am stressing the RAID5 functions on my desktop.

I installed 8 hard disks which 4 were on the internal SATA ports and
the others were connected via eSATA.

The operating system on the desktop is Ubuntu 12.04.1 LTS 64-bit.

I have made a script to check the files in the raid while there are
disks becoming failed.

The actions are as below:

1. creating an 8-disk raid, one of the 8 disks is set as the spare.
2. making a ext4 file system on the raid and mounting that raid.
3. generating a file from /dev/urandom in the root file system, and
the size of the file is 1GB.
4. calculating the checksum of the file by the command "cksum."
5. making 10 duplicates of the file and store in the raid, and then
calculating the checksums of each duplicate.
6. setting one of the disks in the raid to be failed after the 10
duplicates are stored and checked.
7. parallelly calculating the checksums of the duplicates again immediately.

Curiously, there are usually several files changed and the checksums
are not consistent.

Then I tried the same senario with the 8-disk reaid with no spare, and
the results is the same.

I have also tried with RAID1 and RAID6, and the checksums are
consistent with the two algorithms.

It looks like there are something wrong within the raid5 functions. I
am tracing the file raid5.c but I can not figure out the

root causes yet.

Would someone please suggest any ideas? Thank you very much.

My script is attached below:

#!/bin/sh

TESTSEQ="0 1 2 3 4 5 6 7 8 9"

mdadm --create /dev/md0 --level=raid5 --raid-devices=7
--spare-devices=1 /dev/sd[a-h]3 --assume-clean -z 10485760 -f -R

mkfs.ext4 /dev/md0

mount /dev/md0 /mnt

#duplicating the source file and calculating the checksum
for ITEM in $TESTSEQ
do
        echo "copying 1Gr.${ITEM}..."
        cp /1Gr /mnt/1Gr.${ITEM}

        cksum /mnt/1Gr.${ITEM} >> /tmp/cksum_org.${ITEM}
        cat /tmp/cksum_org.${ITEM} | while read tmpline
        do
                orgcksum=${tmpline%% *}
                echo "checksum is ${orgcksum}"
        done
done

sync

sleep 10

mdadm -f /dev/md0 /dev/sdb3

echo "producing checksum..."
for ITEM in $TESTSEQ
do
        cksum /md0/1Gr.${ITEM} > /tmp/cksum_out.${ITEM} &
done

#wait for the 10 cksum process being done
sleep 120

echo "checking the result..."
for ITEM in $TESTSEQ
do
        cat /tmp/cksum_out.${ITEM} | while read line
        do
                item=${line%% *}

		#the value 2606882893 was pre-calculated manually
                if [ x"$item" != "x2606882893" ]
                then
                        echo "get wrong cksum on ${ITEM}"
                else
                        rm /tmp/cksum_out.${ITEM}
                fi
        done
done

Thanks.
Peng.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Content Of Files May Be Changed After One Disk Is Failed In RAID5
  2012-09-07  1:40 Content Of Files May Be Changed After One Disk Is Failed In RAID5 clplayer
@ 2012-09-07  2:33 ` NeilBrown
  2012-09-07  6:30   ` clplayer
  0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2012-09-07  2:33 UTC (permalink / raw)
  To: clplayer; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3572 bytes --]

On Fri, 7 Sep 2012 09:40:18 +0800 clplayer <cl.player@gmail.com> wrote:

> I am stressing the RAID5 functions on my desktop.
> 
> I installed 8 hard disks which 4 were on the internal SATA ports and
> the others were connected via eSATA.
> 
> The operating system on the desktop is Ubuntu 12.04.1 LTS 64-bit.
> 
> I have made a script to check the files in the raid while there are
> disks becoming failed.
> 
> The actions are as below:
> 
> 1. creating an 8-disk raid, one of the 8 disks is set as the spare.
> 2. making a ext4 file system on the raid and mounting that raid.
> 3. generating a file from /dev/urandom in the root file system, and
> the size of the file is 1GB.
> 4. calculating the checksum of the file by the command "cksum."
> 5. making 10 duplicates of the file and store in the raid, and then
> calculating the checksums of each duplicate.
> 6. setting one of the disks in the raid to be failed after the 10
> duplicates are stored and checked.
> 7. parallelly calculating the checksums of the duplicates again immediately.
> 
> Curiously, there are usually several files changed and the checksums
> are not consistent.
> 
> Then I tried the same senario with the 8-disk reaid with no spare, and
> the results is the same.
> 
> I have also tried with RAID1 and RAID6, and the checksums are
> consistent with the two algorithms.
> 
> It looks like there are something wrong within the raid5 functions. I
> am tracing the file raid5.c but I can not figure out the
> 
> root causes yet.
> 
> Would someone please suggest any ideas? Thank you very much.
> 
> My script is attached below:
> 
> #!/bin/sh
> 
> TESTSEQ="0 1 2 3 4 5 6 7 8 9"
> 
> mdadm --create /dev/md0 --level=raid5 --raid-devices=7
> --spare-devices=1 /dev/sd[a-h]3 --assume-clean -z 10485760 -f -R

--assume-clean is not safe with RAID5 unless the array actually is clean.
It is safe with RAID1 and RAID6 due to details of the specific implementation.
So I suspect that is the cause of the corruption.

NeilBrown

> 
> mkfs.ext4 /dev/md0
> 
> mount /dev/md0 /mnt
> 
> #duplicating the source file and calculating the checksum
> for ITEM in $TESTSEQ
> do
>         echo "copying 1Gr.${ITEM}..."
>         cp /1Gr /mnt/1Gr.${ITEM}
> 
>         cksum /mnt/1Gr.${ITEM} >> /tmp/cksum_org.${ITEM}
>         cat /tmp/cksum_org.${ITEM} | while read tmpline
>         do
>                 orgcksum=${tmpline%% *}
>                 echo "checksum is ${orgcksum}"
>         done
> done
> 
> sync
> 
> sleep 10
> 
> mdadm -f /dev/md0 /dev/sdb3
> 
> echo "producing checksum..."
> for ITEM in $TESTSEQ
> do
>         cksum /md0/1Gr.${ITEM} > /tmp/cksum_out.${ITEM} &
> done
> 
> #wait for the 10 cksum process being done
> sleep 120
> 
> echo "checking the result..."
> for ITEM in $TESTSEQ
> do
>         cat /tmp/cksum_out.${ITEM} | while read line
>         do
>                 item=${line%% *}
> 
> 		#the value 2606882893 was pre-calculated manually
>                 if [ x"$item" != "x2606882893" ]
>                 then
>                         echo "get wrong cksum on ${ITEM}"
>                 else
>                         rm /tmp/cksum_out.${ITEM}
>                 fi
>         done
> done
> 
> Thanks.
> Peng.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Content Of Files May Be Changed After One Disk Is Failed In RAID5
  2012-09-07  2:33 ` NeilBrown
@ 2012-09-07  6:30   ` clplayer
  2012-09-07  6:48     ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: clplayer @ 2012-09-07  6:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-kernel

> --assume-clean is not safe with RAID5 unless the array actually is clean.
> It is safe with RAID1 and RAID6 due to details of the specific implementation.
> So I suspect that is the cause of the corruption.
>
> NeilBrown
>

Thank you for the information.

I have removed --assume-clean in the script and executed the stress
after that raid5 completed resync.

The files are all consistent in the rest tests.

I am now wondering, what's different between the implementation of
raid5 algorithm and of raid6 algorithm?

Would you please suggest some hints in the implementation?

Thank you,
Peng.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Content Of Files May Be Changed After One Disk Is Failed In RAID5
  2012-09-07  6:30   ` clplayer
@ 2012-09-07  6:48     ` NeilBrown
  0 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2012-09-07  6:48 UTC (permalink / raw)
  To: clplayer; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1308 bytes --]

On Fri, 7 Sep 2012 14:30:56 +0800 clplayer <cl.player@gmail.com> wrote:

> > --assume-clean is not safe with RAID5 unless the array actually is clean.
> > It is safe with RAID1 and RAID6 due to details of the specific implementation.
> > So I suspect that is the cause of the corruption.
> >
> > NeilBrown
> >
> 
> Thank you for the information.
> 
> I have removed --assume-clean in the script and executed the stress
> after that raid5 completed resync.
> 
> The files are all consistent in the rest tests.
> 
> I am now wondering, what's different between the implementation of
> raid5 algorithm and of raid6 algorithm?
> 
> Would you please suggest some hints in the implementation?
> 
> Thank you,
> Peng.

RAID5 will sometimes update the parity block by 
  read old parity and data blocks
  subtract old data block from parity block
  add new data block to parity block
  write new data and parity.

When it does this, if the parity was wrong before, it will still be wrong
afterwards.

RAID6 doesn't do that, because 'subtracting' old data from the 'Q' parity
block is complicated and hasn't been implemented.  So RAID6 always calculates
the 2 parity blocks from the actual data.  So every time you write a parity
block you can be sure it is correct.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-09-07  6:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-07  1:40 Content Of Files May Be Changed After One Disk Is Failed In RAID5 clplayer
2012-09-07  2:33 ` NeilBrown
2012-09-07  6:30   ` clplayer
2012-09-07  6:48     ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).