From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:55680 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750996AbcGOQ35 (ORCPT ); Fri, 15 Jul 2016 12:29:57 -0400 Subject: Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two To: , linux-btrfs References: <46299275-04a2-d9f9-c47b-7917b04c9484@inwind.it> From: Chris Mason Message-ID: <4bdb4c42-ed13-3add-1da7-46a1acd8390e@fb.com> Date: Fri, 15 Jul 2016 12:29:46 -0400 MIME-Version: 1.0 In-Reply-To: <46299275-04a2-d9f9-c47b-7917b04c9484@inwind.it> Content-Type: text/plain; charset="utf-8"; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 07/15/2016 12:28 PM, Goffredo Baroncelli wrote: > On 2016-07-14 23:20, Chris Mason wrote: >> >> >> On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: >>> Hi All, >>> >>> I developed a new btrfs command "btrfs insp phy"[1] to further >>> investigate this bug [2]. Using "btrfs insp phy" I developed a >>> script to trigger the bug. The bug is not always triggered, but >>> most of time yes. >>> >>> Basically the script create a raid5 filesystem (using three >>> loop-device on three file called disk[123].img); on this filesystem >>> it is create a file. Then using "btrfs insp phy", the physical >>> placement of the data on the device are computed. >>> >>> First the script checks that the data are the right one (for data1, >>> data2 and parity), then it corrupt the data: >>> >>> test1: the parity is corrupted, then scrub is ran. Then the (data1, >>> data2, parity) data on the disk are checked. This test goes fine >>> all the times >>> >>> test2: data2 is corrupted, then scrub is ran. Then the (data1, >>> data2, parity) data on the disk are checked. This test fail most of >>> the time: the data on the disk is not correct; the parity is wrong. >>> Scrub sometime reports "WARNING: errors detected during scrubbing, >>> corrected" and sometime reports "ERROR: there are uncorrectable >>> errors". But this seems unrelated to the fact that the data is >>> corrupetd or not test3: like test2, but data1 is corrupted. The >>> result are the same as above. >>> >>> >>> test4: data2 is corrupted, the the file is read. The system doesn't >>> return error (the data seems to be fine); but the data2 on the disk >>> is still corrupted. >>> >>> >>> Note: data1, data2, parity are the disk-element of the raid5 >>> stripe- >>> >>> Conclusion: >>> >>> most of the time, it seems that btrfs-raid5 is not capable to >>> rebuild parity and data. Worse the message returned by scrub is >>> incoherent by the status on the disk. The tests didn't fail every >>> time; this complicate the diagnosis. However my script fails most >>> of the time. >> >> Interesting, thanks for taking the time to write this up. Is the >> failure specific to scrub? Or is parity rebuild in general also >> failing in this case? > > Test #4 handles this case: I corrupt the data, and when I read > it the data is good. So parity is used but the data on the platter > are still bad. > > However I have to point out that this kind of test is very > difficult to do: the file-cache could lead to read an old data, so please > suggestion about how flush the cache are good (I do some sync, > unmount the filesystem and perform "echo 3 >/proc/sys/vm/drop_caches", > but sometime it seems not enough). O_DIRECT should handle the cache flushing for you. -chris