From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-btrfs-owner@vger.kernel.org>
Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:55680 "EHLO
	mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL)
	by vger.kernel.org with ESMTP id S1750996AbcGOQ35 (ORCPT
	<rfc822;linux-btrfs@vger.kernel.org>);
	Fri, 15 Jul 2016 12:29:57 -0400
Subject: Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5:
 take two
To: <kreijack@inwind.it>, linux-btrfs <linux-btrfs@vger.kernel.org>
References: <da6e9c30-02a0-39fc-3c67-d0af4fd5bf51@inwind.it>
 <a0538df6-7e25-fae8-8ebd-b18120a1c516@fb.com>
 <46299275-04a2-d9f9-c47b-7917b04c9484@inwind.it>
From: Chris Mason <clm@fb.com>
Message-ID: <4bdb4c42-ed13-3add-1da7-46a1acd8390e@fb.com>
Date: Fri, 15 Jul 2016 12:29:46 -0400
MIME-Version: 1.0
In-Reply-To: <46299275-04a2-d9f9-c47b-7917b04c9484@inwind.it>
Content-Type: text/plain; charset="utf-8"; format=flowed
Sender: linux-btrfs-owner@vger.kernel.org
List-ID: <linux-btrfs.vger.kernel.org>


On 07/15/2016 12:28 PM, Goffredo Baroncelli wrote:
> On 2016-07-14 23:20, Chris Mason wrote:
>>
>>
>> On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:
>>> Hi All,
>>>
>>> I developed a new btrfs command "btrfs insp phy"[1] to further
>>> investigate this bug [2]. Using "btrfs insp phy" I developed a
>>> script to trigger the bug. The bug is not always triggered, but
>>> most of time yes.
>>>
>>> Basically the script create a raid5 filesystem (using three
>>> loop-device on three file called disk[123].img); on this filesystem
>>> it is create a file. Then using "btrfs insp phy", the physical
>>> placement of the data on the device are computed.
>>>
>>> First the script checks that the data are the right one (for data1,
>>> data2 and parity), then it corrupt the data:
>>>
>>> test1: the parity is corrupted, then scrub is ran. Then the (data1,
>>> data2, parity) data on the disk are checked. This test goes fine
>>> all the times
>>>
>>> test2: data2 is corrupted, then scrub is ran. Then the (data1,
>>> data2, parity) data on the disk are checked. This test fail most of
>>> the time: the data on the disk is not correct; the parity is wrong.
>>> Scrub sometime reports "WARNING: errors detected during scrubbing,
>>> corrected" and sometime reports "ERROR: there are uncorrectable
>>> errors". But this seems unrelated to the fact that the data is
>>> corrupetd or not test3: like test2, but data1 is corrupted. The
>>> result are the same as above.
>>>
>>>
>>> test4: data2 is corrupted, the the file is read. The system doesn't
>>> return error (the data seems to be fine); but the data2 on the disk
>>> is still corrupted.
>>>
>>>
>>> Note: data1, data2, parity are the disk-element of the raid5
>>> stripe-
>>>
>>> Conclusion:
>>>
>>> most of the time, it seems that btrfs-raid5 is not capable to
>>> rebuild parity and data. Worse the message returned by scrub is
>>> incoherent by the status on the disk. The tests didn't fail every
>>> time; this complicate the diagnosis. However my script fails most
>>> of the time.
>>
>> Interesting, thanks for taking the time to write this up.  Is the
>> failure specific to scrub?  Or is parity rebuild in general also
>> failing in this case?
>
> Test #4 handles this case: I corrupt the data, and when I read
> it the data is good. So parity is used but the data on the platter
> are still bad.
>
> However I have to point out that this kind of test is very
> difficult to do: the file-cache could lead to read an old data, so please
> suggestion about how flush the cache are good (I do some sync,
> unmount the filesystem and perform "echo 3 >/proc/sys/vm/drop_caches",
> but sometime it seems not enough).

O_DIRECT should handle the cache flushing for you.

-chris