All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
@ 2016-07-12 21:50 Goffredo Baroncelli
  2016-07-14 21:20 ` Chris Mason
  2016-07-16 15:51 ` [BUG] Btrfs scrub sometime recalculate wrong parity in raid5 Jarkko Lavinen
  0 siblings, 2 replies; 13+ messages in thread
From: Goffredo Baroncelli @ 2016-07-12 21:50 UTC (permalink / raw)
  To: linux-btrfs

Hi All,

I developed a new btrfs command "btrfs insp phy"[1] to further investigate this bug [2]. Using "btrfs insp phy" I developed a script to trigger the bug. The bug is not always triggered, but most of time yes. 

Basically the script create a raid5 filesystem (using three loop-device on three file called disk[123].img); on this filesystem  it is create a file. Then using "btrfs insp phy", the physical placement of the data on the device are computed.

First the script checks that the data are the right one (for data1, data2 and parity), then it corrupt the data:

test1: the parity is corrupted, then scrub is ran. Then the (data1, data2, parity) data on the disk are checked. This test goes fine all the times

test2: data2 is corrupted, then scrub is ran. Then the (data1, data2, parity) data on the disk are checked. This test fail most of the time: the data on the disk is not correct; the parity is wrong. Scrub sometime reports "WARNING: errors detected during scrubbing, corrected" and sometime reports "ERROR: there are uncorrectable errors". But this seems unrelated to the fact that the data is corrupetd or not
test3: like test2, but data1 is corrupted. The result are the same as above.


test4: data2 is corrupted, the the file is read. The system doesn't return error (the data seems to be fine); but the data2 on the disk is still corrupted.


Note: data1, data2, parity are the disk-element of the raid5 stripe-

Conclusion:

most of the time, it seems that btrfs-raid5 is not capable to rebuild parity and data. Worse the message returned by scrub is incoherent by the status on the disk. The tests didn't fail every time; this complicate the diagnosis. However my script fails most of the time.

BR
G.Baroncelli

----

root="$(pwd)"
disks="disk1.img disk2.img disk3.img"
imgsize=500M
BTRFS=../btrfs-progs/btrfs


#
# returns all the loopback devices
#
loop_disks() {
	sudo losetup | grep $root | awk '{ print $1 }'
}

#init the fs

init_fs() {
	#destroy fs
	echo umount mnt
	sudo umount mnt
	for i in $( loop_disks ); do
		echo "losetup -d $i"
		sudo losetup -d $i
	done

	for i in $disks; do
		rm $i
		truncate -s $imgsize $i
		sudo losetup -f $i
	done

	loops="$(loop_disks)"
	loop1="$(echo $loops | awk '{ print $1 }')"
	echo "loops=$loops; loop1=$loop1"

	sudo mkfs.btrfs -d raid5 -m raid5 $loops
	sudo mount $loop1 mnt/

	python -c "print 'ad'+'a'*65534+'bd'+'b'*65533" | sudo tee mnt/out.txt >/dev/null

	ls -l mnt/out.txt

	sudo umount mnt
	sync; sync
}

check_fs() {

        sudo mount $loop1 mnt
        data="$(sudo $BTRFS insp phy mnt/out.txt)"
        
        data1_off="$(echo "$data" | grep "DATA$" | awk '{ print $5 }')"
        data2_off="$(echo "$data" | grep "OTHER$" | awk '{ print $5 }')"
        parity_off="$(echo "$data" | grep "PARITY$" | awk '{ print $5 }')"
        data1_dev="$(echo "$data" | grep "DATA$" | awk '{ print $3 }')"
        data2_dev="$(echo "$data" | grep "OTHER$" | awk '{ print $3 }')"
        parity_dev="$(echo "$data" | grep "PARITY$" | awk '{ print $3 }')"
        
        sudo umount mnt
        
	# check
	d="$(dd 2>/dev/null if=$data1_dev bs=1 skip=$data1_off count=5)"
	if [ "$d" != "adaaa" ]; then
		echo "******* Wrong data on disk:off $data1_dev:$data1_off (data1)"
		return 1
	fi

	d="$(dd 2>/dev/null if=$data2_dev bs=1 skip=$data2_off count=5)"
	if [ "$d" != "bdbbb" ]; then
		echo "******* Wrong data on disk:off $data2_dev:$data2_off (data2)"
		return 1
	fi

	d="$(dd 2>/dev/null if=$parity_dev bs=1 skip=$parity_off count=5 | 
                xxd | dd 2>/dev/null bs=1 count=9 skip=10)"
	if [ "x$d" != "x0300 0303" ]; then
		echo "******* Wrong data on disk:off $parity_dev:$parity_off (parity)"
		return 1
	fi

	return 0
}

test_corrupt_parity() {
	echo "--- test 1: corrupt parity"
	echo 3 | sudo tee >/dev/null /proc/sys/vm/drop_caches
	sudo dd 2>/dev/null if=/dev/zero of=$parity_dev bs=1 \
		seek=$parity_off count=5
	check_fs &>/dev/null && {
			echo Corruption failed
			exit 100
		}
	echo 3 | sudo tee >/dev/null /proc/sys/vm/drop_caches
	sudo mount $loop1 mnt
	sudo btrfs scrub start mnt/.
	sync; sync
	cat mnt/out.txt &>/dev/null || echo "Read FAIL"
	sudo umount mnt
	echo 3 | sudo tee >/dev/null /proc/sys/vm/drop_caches
	check_fs || return 1
	echo "--- test1: OK"
	return 0
}



test_corrupt_data2() {
	echo "--- test 2: corrupt data2"
	echo 3 | sudo tee >/dev/null /proc/sys/vm/drop_caches
	sudo dd 2>/dev/null if=/dev/zero of=$data2_dev bs=1 \
		seek=$data2_off count=5
	check_fs &>/dev/null && {
			echo Corruption failed
			exit 100
		}
	echo 3 | sudo tee >/dev/null >/dev/null /proc/sys/vm/drop_caches
	sudo mount $loop1 mnt
	sudo btrfs scrub start mnt/.
	sync; sync
	cat mnt/out.txt &>/dev/null || echo "Read FAIL"
	sudo umount mnt
	echo 3 | sudo tee >/dev/null /proc/sys/vm/drop_caches
	check_fs || return 1
	echo "--- test2: OK"
	return 0
}

test_corrupt_data1() {
	echo "--- test 3: corrupt data1"
	echo 3 | sudo tee >/dev/null /proc/sys/vm/drop_caches
	sudo dd 2>/dev/null if=/dev/zero of=$data1_dev bs=1 \
		seek=$data1_off count=5
	check_fs &>/dev/null && {
			echo Corruption failed
			exit 100
		}
	echo 3 | sudo tee >/dev/null /proc/sys/vm/drop_caches
	sudo mount $loop1 mnt
	sudo btrfs scrub start mnt/.
	sync; sync
	cat mnt/out.txt &>/dev/null || echo "Read FAIL"
	sudo umount mnt
	echo 3 | sudo tee >/dev/null /proc/sys/vm/drop_caches
	check_fs || return 1
	echo "--- test3: OK"
	return 0
}

test_corrupt_data2_wo_scrub() {
	echo "--- test 4: corrupt data2; read without scrub"
	echo 3 | sudo tee >/dev/null /proc/sys/vm/drop_caches
	sudo dd 2>/dev/null if=/dev/zero of=$data2_dev bs=1 \
		seek=$data2_off count=5
	check_fs &>/dev/null && {
			echo Corruption failed
			exit 100
		}
	echo 3 | sudo tee >/dev/null /proc/sys/vm/drop_caches
	sudo mount $loop1 mnt
	cat mnt/out.txt &>/dev/null || echo "Read FAIL"
	sudo umount mnt
	echo 3 | sudo tee >/dev/null /proc/sys/vm/drop_caches
	check_fs || return 1
	echo "--- test 4: OK"
	return 0
}


for t in test_corrupt_parity test_corrupt_data2 test_corrupt_data1 \
    test_corrupt_data2_wo_scrub; do
    
        init_fs &>/dev/null
        if ! check_fs &>/dev/null; then 
             echo Integrity test failed
             exit 100
        fi

        $t
        echo
    
done


-----------------




[1] See email "New btrfs sub command: btrfs inspect physical-find"
[2] See email "[BUG] Btrfs scrub sometime recalculate wrong parity in raid5"



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
  2016-07-12 21:50 [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two Goffredo Baroncelli
@ 2016-07-14 21:20 ` Chris Mason
  2016-07-15  4:39   ` Andrei Borzenkov
  2016-07-15 16:28   ` Goffredo Baroncelli
  2016-07-16 15:51 ` [BUG] Btrfs scrub sometime recalculate wrong parity in raid5 Jarkko Lavinen
  1 sibling, 2 replies; 13+ messages in thread
From: Chris Mason @ 2016-07-14 21:20 UTC (permalink / raw)
  To: kreijack, linux-btrfs



On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:
> Hi All,
>
> I developed a new btrfs command "btrfs insp phy"[1] to further investigate this bug [2]. Using "btrfs insp phy" I developed a script to trigger the bug. The bug is not always triggered, but most of time yes.
>
> Basically the script create a raid5 filesystem (using three loop-device on three file called disk[123].img); on this filesystem  it is create a file. Then using "btrfs insp phy", the physical placement of the data on the device are computed.
>
> First the script checks that the data are the right one (for data1, data2 and parity), then it corrupt the data:
>
> test1: the parity is corrupted, then scrub is ran. Then the (data1, data2, parity) data on the disk are checked. This test goes fine all the times
>
> test2: data2 is corrupted, then scrub is ran. Then the (data1, data2, parity) data on the disk are checked. This test fail most of the time: the data on the disk is not correct; the parity is wrong. Scrub sometime reports "WARNING: errors detected during scrubbing, corrected" and sometime reports "ERROR: there are uncorrectable errors". But this seems unrelated to the fact that the data is corrupetd or not
> test3: like test2, but data1 is corrupted. The result are the same as above.
>
>
> test4: data2 is corrupted, the the file is read. The system doesn't return error (the data seems to be fine); but the data2 on the disk is still corrupted.
>
>
> Note: data1, data2, parity are the disk-element of the raid5 stripe-
>
> Conclusion:
>
> most of the time, it seems that btrfs-raid5 is not capable to rebuild parity and data. Worse the message returned by scrub is incoherent by the status on the disk. The tests didn't fail every time; this complicate the diagnosis. However my script fails most of the time.

Interesting, thanks for taking the time to write this up.  Is the 
failure specific to scrub?  Or is parity rebuild in general also failing 
in this case?

-chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
  2016-07-14 21:20 ` Chris Mason
@ 2016-07-15  4:39   ` Andrei Borzenkov
  2016-07-15 13:20     ` Chris Mason
  2016-07-15 16:30     ` Goffredo Baroncelli
  2016-07-15 16:28   ` Goffredo Baroncelli
  1 sibling, 2 replies; 13+ messages in thread
From: Andrei Borzenkov @ 2016-07-15  4:39 UTC (permalink / raw)
  To: Chris Mason, kreijack, linux-btrfs

15.07.2016 00:20, Chris Mason пишет:
> 
> 
> On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:
>> Hi All,
>>
>> I developed a new btrfs command "btrfs insp phy"[1] to further
>> investigate this bug [2]. Using "btrfs insp phy" I developed a script
>> to trigger the bug. The bug is not always triggered, but most of time
>> yes.
>>
>> Basically the script create a raid5 filesystem (using three
>> loop-device on three file called disk[123].img); on this filesystem 

Are those devices themselves on btrfs? Just to avoid any sort of
possible side effects?

>> it is create a file. Then using "btrfs insp phy", the physical
>> placement of the data on the device are computed.
>>
>> First the script checks that the data are the right one (for data1,
>> data2 and parity), then it corrupt the data:
>>
>> test1: the parity is corrupted, then scrub is ran. Then the (data1,
>> data2, parity) data on the disk are checked. This test goes fine all
>> the times
>>
>> test2: data2 is corrupted, then scrub is ran. Then the (data1, data2,
>> parity) data on the disk are checked. This test fail most of the time:
>> the data on the disk is not correct; the parity is wrong. Scrub
>> sometime reports "WARNING: errors detected during scrubbing,
>> corrected" and sometime reports "ERROR: there are uncorrectable
>> errors". But this seems unrelated to the fact that the data is
>> corrupetd or not
>> test3: like test2, but data1 is corrupted. The result are the same as
>> above.
>>
>>
>> test4: data2 is corrupted, the the file is read. The system doesn't
>> return error (the data seems to be fine); but the data2 on the disk is
>> still corrupted.
>>
>>
>> Note: data1, data2, parity are the disk-element of the raid5 stripe-
>>
>> Conclusion:
>>
>> most of the time, it seems that btrfs-raid5 is not capable to rebuild
>> parity and data. Worse the message returned by scrub is incoherent by
>> the status on the disk. The tests didn't fail every time; this
>> complicate the diagnosis. However my script fails most of the time.
> 
> Interesting, thanks for taking the time to write this up.  Is the
> failure specific to scrub?  Or is parity rebuild in general also failing
> in this case?
> 

How do you rebuild parity without scrub as long as all devices appear to
be present?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
  2016-07-15  4:39   ` Andrei Borzenkov
@ 2016-07-15 13:20     ` Chris Mason
  2016-07-15 15:10       ` Andrei Borzenkov
  2016-07-15 16:30     ` Goffredo Baroncelli
  1 sibling, 1 reply; 13+ messages in thread
From: Chris Mason @ 2016-07-15 13:20 UTC (permalink / raw)
  To: Andrei Borzenkov, kreijack, linux-btrfs



On 07/15/2016 12:39 AM, Andrei Borzenkov wrote:
> 15.07.2016 00:20, Chris Mason пишет:
>>
>>
>> On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:
>>> Hi All,
>>>
>>> I developed a new btrfs command "btrfs insp phy"[1] to further
>>> investigate this bug [2]. Using "btrfs insp phy" I developed a script
>>> to trigger the bug. The bug is not always triggered, but most of time
>>> yes.
>>>
>>> Basically the script create a raid5 filesystem (using three
>>> loop-device on three file called disk[123].img); on this filesystem
>
> Are those devices themselves on btrfs? Just to avoid any sort of
> possible side effects?
>
>>> it is create a file. Then using "btrfs insp phy", the physical
>>> placement of the data on the device are computed.
>>>
>>> First the script checks that the data are the right one (for data1,
>>> data2 and parity), then it corrupt the data:
>>>
>>> test1: the parity is corrupted, then scrub is ran. Then the (data1,
>>> data2, parity) data on the disk are checked. This test goes fine all
>>> the times
>>>
>>> test2: data2 is corrupted, then scrub is ran. Then the (data1, data2,
>>> parity) data on the disk are checked. This test fail most of the time:
>>> the data on the disk is not correct; the parity is wrong. Scrub
>>> sometime reports "WARNING: errors detected during scrubbing,
>>> corrected" and sometime reports "ERROR: there are uncorrectable
>>> errors". But this seems unrelated to the fact that the data is
>>> corrupetd or not
>>> test3: like test2, but data1 is corrupted. The result are the same as
>>> above.
>>>
>>>
>>> test4: data2 is corrupted, the the file is read. The system doesn't
>>> return error (the data seems to be fine); but the data2 on the disk is
>>> still corrupted.
>>>
>>>
>>> Note: data1, data2, parity are the disk-element of the raid5 stripe-
>>>
>>> Conclusion:
>>>
>>> most of the time, it seems that btrfs-raid5 is not capable to rebuild
>>> parity and data. Worse the message returned by scrub is incoherent by
>>> the status on the disk. The tests didn't fail every time; this
>>> complicate the diagnosis. However my script fails most of the time.
>>
>> Interesting, thanks for taking the time to write this up.  Is the
>> failure specific to scrub?  Or is parity rebuild in general also failing
>> in this case?
>>
>
> How do you rebuild parity without scrub as long as all devices appear to
> be present?

If one block is corrupted, the crcs will fail and the kernel will 
rebuild parity when you read the file.  You can also use balance instead 
of scrub.

-chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
  2016-07-15 13:20     ` Chris Mason
@ 2016-07-15 15:10       ` Andrei Borzenkov
  2016-07-15 15:21         ` Chris Mason
  0 siblings, 1 reply; 13+ messages in thread
From: Andrei Borzenkov @ 2016-07-15 15:10 UTC (permalink / raw)
  To: Chris Mason, kreijack, linux-btrfs

15.07.2016 16:20, Chris Mason пишет:
>>>
>>> Interesting, thanks for taking the time to write this up.  Is the
>>> failure specific to scrub?  Or is parity rebuild in general also failing
>>> in this case?
>>>
>>
>> How do you rebuild parity without scrub as long as all devices appear to
>> be present?
> 
> If one block is corrupted, the crcs will fail and the kernel will
> rebuild parity when you read the file.  You can also use balance instead
> of scrub.
> 

As we have seen recently, btrfs does not compute, stores or verifies
checksum of RAID56 parity. So if parity is corrupted, the only way to
detect and correct it is to use scrub. Balance may work by side effect,
because it simply recomputes parity on new data, but it will not fix
wrong parity on existing data.

I agree that if data block is corrupted it will be detected, but then
you do not need to recompute parity in the first place.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
  2016-07-15 15:10       ` Andrei Borzenkov
@ 2016-07-15 15:21         ` Chris Mason
  0 siblings, 0 replies; 13+ messages in thread
From: Chris Mason @ 2016-07-15 15:21 UTC (permalink / raw)
  To: Andrei Borzenkov, kreijack, linux-btrfs



On 07/15/2016 11:10 AM, Andrei Borzenkov wrote:
> 15.07.2016 16:20, Chris Mason пишет:
>>>>
>>>> Interesting, thanks for taking the time to write this up.  Is the
>>>> failure specific to scrub?  Or is parity rebuild in general also failing
>>>> in this case?
>>>>
>>>
>>> How do you rebuild parity without scrub as long as all devices appear to
>>> be present?
>>
>> If one block is corrupted, the crcs will fail and the kernel will
>> rebuild parity when you read the file.  You can also use balance instead
>> of scrub.
>>
>
> As we have seen recently, btrfs does not compute, stores or verifies
> checksum of RAID56 parity. So if parity is corrupted, the only way to
> detect and correct it is to use scrub. Balance may work by side effect,
> because it simply recomputes parity on new data, but it will not fix
> wrong parity on existing data.

Ah, I misread your question  Yes, this is definitely where scrub is the 
best tool.  But even if we have to add debugging to force parity 
recomputation, we should see if the problem is only in scrub or deeper.

-chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
  2016-07-14 21:20 ` Chris Mason
  2016-07-15  4:39   ` Andrei Borzenkov
@ 2016-07-15 16:28   ` Goffredo Baroncelli
  2016-07-15 16:29     ` Chris Mason
  1 sibling, 1 reply; 13+ messages in thread
From: Goffredo Baroncelli @ 2016-07-15 16:28 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

On 2016-07-14 23:20, Chris Mason wrote:
> 
> 
> On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:
>> Hi All,
>> 
>> I developed a new btrfs command "btrfs insp phy"[1] to further
>> investigate this bug [2]. Using "btrfs insp phy" I developed a
>> script to trigger the bug. The bug is not always triggered, but
>> most of time yes.
>> 
>> Basically the script create a raid5 filesystem (using three
>> loop-device on three file called disk[123].img); on this filesystem
>> it is create a file. Then using "btrfs insp phy", the physical
>> placement of the data on the device are computed.
>> 
>> First the script checks that the data are the right one (for data1,
>> data2 and parity), then it corrupt the data:
>> 
>> test1: the parity is corrupted, then scrub is ran. Then the (data1,
>> data2, parity) data on the disk are checked. This test goes fine
>> all the times
>> 
>> test2: data2 is corrupted, then scrub is ran. Then the (data1,
>> data2, parity) data on the disk are checked. This test fail most of
>> the time: the data on the disk is not correct; the parity is wrong.
>> Scrub sometime reports "WARNING: errors detected during scrubbing,
>> corrected" and sometime reports "ERROR: there are uncorrectable
>> errors". But this seems unrelated to the fact that the data is
>> corrupetd or not test3: like test2, but data1 is corrupted. The
>> result are the same as above.
>> 
>> 
>> test4: data2 is corrupted, the the file is read. The system doesn't
>> return error (the data seems to be fine); but the data2 on the disk
>> is still corrupted.
>> 
>> 
>> Note: data1, data2, parity are the disk-element of the raid5
>> stripe-
>> 
>> Conclusion:
>> 
>> most of the time, it seems that btrfs-raid5 is not capable to
>> rebuild parity and data. Worse the message returned by scrub is
>> incoherent by the status on the disk. The tests didn't fail every
>> time; this complicate the diagnosis. However my script fails most
>> of the time.
> 
> Interesting, thanks for taking the time to write this up.  Is the
> failure specific to scrub?  Or is parity rebuild in general also
> failing in this case?

Test #4 handles this case: I corrupt the data, and when I read
it the data is good. So parity is used but the data on the platter
are still bad.

However I have to point out that this kind of test is very
difficult to do: the file-cache could lead to read an old data, so please
suggestion about how flush the cache are good (I do some sync, 
unmount the filesystem and perform "echo 3 >/proc/sys/vm/drop_caches", 
but sometime it seems not enough).



> 
> -chris
> 

BR
G.Baroncelli
-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
  2016-07-15 16:28   ` Goffredo Baroncelli
@ 2016-07-15 16:29     ` Chris Mason
  2016-07-15 16:34       ` Andrei Borzenkov
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Mason @ 2016-07-15 16:29 UTC (permalink / raw)
  To: kreijack, linux-btrfs



On 07/15/2016 12:28 PM, Goffredo Baroncelli wrote:
> On 2016-07-14 23:20, Chris Mason wrote:
>>
>>
>> On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:
>>> Hi All,
>>>
>>> I developed a new btrfs command "btrfs insp phy"[1] to further
>>> investigate this bug [2]. Using "btrfs insp phy" I developed a
>>> script to trigger the bug. The bug is not always triggered, but
>>> most of time yes.
>>>
>>> Basically the script create a raid5 filesystem (using three
>>> loop-device on three file called disk[123].img); on this filesystem
>>> it is create a file. Then using "btrfs insp phy", the physical
>>> placement of the data on the device are computed.
>>>
>>> First the script checks that the data are the right one (for data1,
>>> data2 and parity), then it corrupt the data:
>>>
>>> test1: the parity is corrupted, then scrub is ran. Then the (data1,
>>> data2, parity) data on the disk are checked. This test goes fine
>>> all the times
>>>
>>> test2: data2 is corrupted, then scrub is ran. Then the (data1,
>>> data2, parity) data on the disk are checked. This test fail most of
>>> the time: the data on the disk is not correct; the parity is wrong.
>>> Scrub sometime reports "WARNING: errors detected during scrubbing,
>>> corrected" and sometime reports "ERROR: there are uncorrectable
>>> errors". But this seems unrelated to the fact that the data is
>>> corrupetd or not test3: like test2, but data1 is corrupted. The
>>> result are the same as above.
>>>
>>>
>>> test4: data2 is corrupted, the the file is read. The system doesn't
>>> return error (the data seems to be fine); but the data2 on the disk
>>> is still corrupted.
>>>
>>>
>>> Note: data1, data2, parity are the disk-element of the raid5
>>> stripe-
>>>
>>> Conclusion:
>>>
>>> most of the time, it seems that btrfs-raid5 is not capable to
>>> rebuild parity and data. Worse the message returned by scrub is
>>> incoherent by the status on the disk. The tests didn't fail every
>>> time; this complicate the diagnosis. However my script fails most
>>> of the time.
>>
>> Interesting, thanks for taking the time to write this up.  Is the
>> failure specific to scrub?  Or is parity rebuild in general also
>> failing in this case?
>
> Test #4 handles this case: I corrupt the data, and when I read
> it the data is good. So parity is used but the data on the platter
> are still bad.
>
> However I have to point out that this kind of test is very
> difficult to do: the file-cache could lead to read an old data, so please
> suggestion about how flush the cache are good (I do some sync,
> unmount the filesystem and perform "echo 3 >/proc/sys/vm/drop_caches",
> but sometime it seems not enough).

O_DIRECT should handle the cache flushing for you.

-chris


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
  2016-07-15  4:39   ` Andrei Borzenkov
  2016-07-15 13:20     ` Chris Mason
@ 2016-07-15 16:30     ` Goffredo Baroncelli
  1 sibling, 0 replies; 13+ messages in thread
From: Goffredo Baroncelli @ 2016-07-15 16:30 UTC (permalink / raw)
  To: Andrei Borzenkov, Chris Mason, linux-btrfs

On 2016-07-15 06:39, Andrei Borzenkov wrote:
> 15.07.2016 00:20, Chris Mason пишет:
>>
>>
>> On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:
>>> Hi All,
>>>
>>> I developed a new btrfs command "btrfs insp phy"[1] to further
>>> investigate this bug [2]. Using "btrfs insp phy" I developed a script
>>> to trigger the bug. The bug is not always triggered, but most of time
>>> yes.
>>>
>>> Basically the script create a raid5 filesystem (using three
>>> loop-device on three file called disk[123].img); on this filesystem 
> 
> Are those devices themselves on btrfs? Just to avoid any sort of
> possible side effects?

Good question. However the files are stored on a ext4 filesystem (but I don't 
know if this is better or worse)

> 
>>> it is create a file. Then using "btrfs insp phy", the physical
>>> placement of the data on the device are computed.
>>>
>>> First the script checks that the data are the right one (for data1,
>>> data2 and parity), then it corrupt the data:
>>>
>>> test1: the parity is corrupted, then scrub is ran. Then the (data1,
>>> data2, parity) data on the disk are checked. This test goes fine all
>>> the times
>>>
>>> test2: data2 is corrupted, then scrub is ran. Then the (data1, data2,
>>> parity) data on the disk are checked. This test fail most of the time:
>>> the data on the disk is not correct; the parity is wrong. Scrub
>>> sometime reports "WARNING: errors detected during scrubbing,
>>> corrected" and sometime reports "ERROR: there are uncorrectable
>>> errors". But this seems unrelated to the fact that the data is
>>> corrupetd or not
>>> test3: like test2, but data1 is corrupted. The result are the same as
>>> above.
>>>
>>>
>>> test4: data2 is corrupted, the the file is read. The system doesn't
>>> return error (the data seems to be fine); but the data2 on the disk is
>>> still corrupted.
>>>
>>>
>>> Note: data1, data2, parity are the disk-element of the raid5 stripe-
>>>
>>> Conclusion:
>>>
>>> most of the time, it seems that btrfs-raid5 is not capable to rebuild
>>> parity and data. Worse the message returned by scrub is incoherent by
>>> the status on the disk. The tests didn't fail every time; this
>>> complicate the diagnosis. However my script fails most of the time.
>>
>> Interesting, thanks for taking the time to write this up.  Is the
>> failure specific to scrub?  Or is parity rebuild in general also failing
>> in this case?
>>
> 
> How do you rebuild parity without scrub as long as all devices appear to
> be present?

I corrupted the data, then I read the file. The data has to be correct on
the basis of the parity. Even in this case I found problem.

> 
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
  2016-07-15 16:29     ` Chris Mason
@ 2016-07-15 16:34       ` Andrei Borzenkov
  0 siblings, 0 replies; 13+ messages in thread
From: Andrei Borzenkov @ 2016-07-15 16:34 UTC (permalink / raw)
  To: Chris Mason, kreijack, linux-btrfs

15.07.2016 19:29, Chris Mason пишет:
>
>> However I have to point out that this kind of test is very
>> difficult to do: the file-cache could lead to read an old data, so please
>> suggestion about how flush the cache are good (I do some sync,
>> unmount the filesystem and perform "echo 3 >/proc/sys/vm/drop_caches",
>> but sometime it seems not enough).
> 
> O_DIRECT should handle the cache flushing for you.
> 

There is also BLKFLSBUF ioctl (blockdev --flushbufs on shell level).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5
  2016-07-12 21:50 [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two Goffredo Baroncelli
  2016-07-14 21:20 ` Chris Mason
@ 2016-07-16 15:51 ` Jarkko Lavinen
  2016-07-17 19:46   ` Jarkko Lavinen
  2016-07-18 18:56   ` Goffredo Baroncelli
  1 sibling, 2 replies; 13+ messages in thread
From: Jarkko Lavinen @ 2016-07-16 15:51 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 849 bytes --]

On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:
> Using "btrfs insp phy" I developed a script to trigger the bug.

Thank you for the script and all for sharing the raid5 and scrubbing issues. I have been using two raid5 arrays and ran scrub occasionally without any problems lately and been in false confidence. I converted successfully raid5 arrays into raid10 without any glitch.

I tried to modify the shell script so that instead of corrupting data with dd, a simulated bad block is created with device mapper. Modern disks are likely to either return the correct data or an error if they cannot.

The modified script behaves very much like the original dd version. With dd version I see wrong data instead of expected data. With simulated bad block I see no data at all instead of expected data since dd quits on read error.

Jarkko Lavinen

[-- Attachment #2: h.sh --]
[-- Type: application/x-sh, Size: 6526 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5
  2016-07-16 15:51 ` [BUG] Btrfs scrub sometime recalculate wrong parity in raid5 Jarkko Lavinen
@ 2016-07-17 19:46   ` Jarkko Lavinen
  2016-07-18 18:56   ` Goffredo Baroncelli
  1 sibling, 0 replies; 13+ messages in thread
From: Jarkko Lavinen @ 2016-07-17 19:46 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5491 bytes --]

On Sat, Jul 16, 2016 at 06:51:11PM +0300, Jarkko Lavinen wrote:
>  The modified script behaves very much like the original dd version.

Not quite. The bad sector simulation works like old hard drives without error correction and bad block remapping. This changes the error behaviour.

My script prints now kernel messages once the check_fs fails. The time range of messages messages is from the adding of the bad sector device to the point when check_fs fails.

The parity test which often passes with the Goffredo's script, always fails with my bad sector version and scrub says the error is uncorrectable. In the kernel messages there are two buffer IO read errors but no write error as if scrub quits before writing?

In the data2 test scrub again says the error is uncorrectable but according to the kernel messages the bad sector is read 4 times and written twice during the scrub. In my bad sector script the data2 is still corrupted and parity ok since the bad sector cannot be written and scrub likely quits earlier than in Goffredo's script. In his script the data2 gets fixed but the parity gets corrupted.

Jarkko Lavinen

$ bash h2.sh
--- test 1: corrupt parity
scrub started on mnt/., fsid 2625e2d0-420c-40b6-befa-97fc18eaed48 (pid=32490)
ERROR: there are uncorrectable errors
******* Wrong data on disk:off /dev/mapper/loop0:61931520 (parity)
Data read ||, expected |0300 0303|

Kernel messages in the test
First Check_fs started
Buffer I/O error on dev dm-0, logical block 15120, async page read
Scrub started
Second Check_fs started
Buffer I/O error on dev dm-0, logical block 15120, async page read

--- test 2: corrupt data2
scrub started on mnt/., fsid 8e506268-16c7-48fa-b176-0a8877f2a7aa (pid=434)
ERROR: there are uncorrectable errors
******* Wrong data on disk:off /dev/mapper/loop2:81854464 (data2)
Data read ||, expected |bdbbb|

Kernel messages in the test
First Check_fs started
Buffer I/O error on dev dm-2, logical block 19984, async page read
Scrub started
BTRFS warning (device dm-0): i/o error at logical 142802944 on dev /dev/mapper/loop2, sector 159872, root 5, inode 257, offset 65536, length 4096, links 1 (path: out.txt)
BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 1, rd 1, flush 0, corrupt 0, gen 0
BTRFS warning (device dm-0): i/o error at logical 142802944 on dev /dev/mapper/loop2, sector 159872, root 5, inode 257, offset 65536, length 4096, links 1 (path: out.txt)
BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 1, rd 2, flush 0, corrupt 0, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 142802944 on dev /dev/mapper/loop2
BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 2, rd 2, flush 0, corrupt 0, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 142802944 on dev /dev/mapper/loop2
BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 2, rd 3, flush 0, corrupt 0, gen 0
BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 2, rd 4, flush 0, corrupt 0, gen 0
Second Check_fs started
BTRFS info (device dm-0): bdev /dev/mapper/loop2 errs: wr 2, rd 4, flush 0, corrupt 0, gen 0
Buffer I/O error on dev dm-2, logical block 19984, async page read

--- test 3: corrupt data1
scrub started on mnt/., fsid f8a4ecca-2475-4e5e-9651-65d9478b56fe (pid=856)
ERROR: there are uncorrectable errors
******* Wrong data on disk:off /dev/mapper/loop1:61931520 (data1)
Data read ||, expected |adaaa|

Kernel messages in the test
First Check_fs started
Buffer I/O error on dev dm-1, logical block 15120, async page read
Scrub started
BTRFS warning (device dm-0): i/o error at logical 142737408 on dev /dev/mapper/loop1, sector 120960, root 5, inode 257, offset 0, length 4096, links 1 (path: out.txt)
BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
BTRFS warning (device dm-0): i/o error at logical 142737408 on dev /dev/mapper/loop1, sector 120960, root 5, inode 257, offset 0, length 4096, links 1 (path: out.txt)
BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 1, rd 2, flush 0, corrupt 0, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 142737408 on dev /dev/mapper/loop1
BTRFS error (device dm-0): unable to fixup (regular) error at logical 142737408 on dev /dev/mapper/loop1
BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 1, rd 3, flush 0, corrupt 0, gen 0
Second Check_fs started
BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 1, rd 4, flush 0, corrupt 0, gen 0
BTRFS info (device dm-0): bdev /dev/mapper/loop1 errs: wr 1, rd 4, flush 0, corrupt 0, gen 0
Buffer I/O error on dev dm-1, logical block 15120, async page read

--- test 4: corrupt data2; read without scrub
******* Wrong data on disk:off /dev/mapper/loop2:81854464 (data2)
Data read ||, expected |bdbbb|

Kernel messages in the test
First Check_fs started
Buffer I/O error on dev dm-2, logical block 19984, async page read
BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
Second Check_fs started
BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
BTRFS info (device dm-0): bdev /dev/mapper/loop2 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
Buffer I/O error on dev dm-2, logical block 19984, async page read

[-- Attachment #2: h2.sh --]
[-- Type: application/x-sh, Size: 7811 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5
  2016-07-16 15:51 ` [BUG] Btrfs scrub sometime recalculate wrong parity in raid5 Jarkko Lavinen
  2016-07-17 19:46   ` Jarkko Lavinen
@ 2016-07-18 18:56   ` Goffredo Baroncelli
  1 sibling, 0 replies; 13+ messages in thread
From: Goffredo Baroncelli @ 2016-07-18 18:56 UTC (permalink / raw)
  To: Jarkko Lavinen, linux-btrfs

Hi

On 2016-07-16 17:51, Jarkko Lavinen wrote:
> On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:
>> Using "btrfs insp phy" I developed a script to trigger the bug.
> 
> Thank you for the script and all for sharing the raid5 and scrubbing
> issues. I have been using two raid5 arrays and ran scrub occasionally
> without any problems lately and been in false confidence. I converted
> successfully raid5 arrays into raid10 without any glitch.
> 
> I tried to modify the shell script so that instead of corrupting data
> with dd, a simulated bad block is created with device mapper. Modern
> disks are likely to either return the correct data or an error if
> they cannot.


You are right; but doing so we are complicating further the test case:
- my tests show what happen when there is a corruption, but the drive behaves well
- your tests show what happen when there is a corruption AND the drive has a failure

I agree that your simulation is more realistic, but I fear that doing so we are complicating the bug finding.

> 
> The modified script behaves very much like the original dd version.
> With dd version I see wrong data instead of expected data. 

When toy say "I see wrong data", you means with 
1) "cat mnt/out.txt" 
or 2) with "dd if=/dev/loop....." ?

In the first case I see always good data; in the second case I see wrong data but of course no reading error

> With simulated bad block I see no data at all instead of expected data
> since dd quits on read error.
> 
> Jarkko Lavinen
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-07-18 18:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-12 21:50 [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two Goffredo Baroncelli
2016-07-14 21:20 ` Chris Mason
2016-07-15  4:39   ` Andrei Borzenkov
2016-07-15 13:20     ` Chris Mason
2016-07-15 15:10       ` Andrei Borzenkov
2016-07-15 15:21         ` Chris Mason
2016-07-15 16:30     ` Goffredo Baroncelli
2016-07-15 16:28   ` Goffredo Baroncelli
2016-07-15 16:29     ` Chris Mason
2016-07-15 16:34       ` Andrei Borzenkov
2016-07-16 15:51 ` [BUG] Btrfs scrub sometime recalculate wrong parity in raid5 Jarkko Lavinen
2016-07-17 19:46   ` Jarkko Lavinen
2016-07-18 18:56   ` Goffredo Baroncelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.