From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de ([195.135.220.15]:49668 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731556AbeGTRZD (ORCPT ); Fri, 20 Jul 2018 13:25:03 -0400 Date: Fri, 20 Jul 2018 18:35:51 +0200 From: David Sterba To: Qu Wenruo Cc: David Sterba , linux-btrfs@vger.kernel.org Subject: Re: [PATCH 0/4] 3- and 4- copy RAID1 Message-ID: <20180720163551.GF26141@twin.jikos.cz> Reply-To: dsterba@suse.cz References: <88531904-288b-f73e-1157-560845f8e72d@gmx.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 In-Reply-To: <88531904-288b-f73e-1157-560845f8e72d@gmx.com> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Thu, Jul 19, 2018 at 03:27:17PM +0800, Qu Wenruo wrote: > On 2018年07月14日 02:46, David Sterba wrote: > > Hi, > > > > I have some goodies that go into the RAID56 problem, although not > > implementing all the remaining features, it can be useful independently. > > > > This time my hackweek project > > > > https://hackweek.suse.com/17/projects/do-something-about-btrfs-and-raid56 > > > > aimed to implement the fix for the write hole problem but I spent more > > time with analysis and design of the solution and don't have a working > > prototype for that yet. > > > > This patchset brings a feature that will be used by the raid56 log, the > > log has to be on the same redundancy level and thus we need a 3-copy > > replication for raid6. As it was easy to extend to higher replication, > > I've added a 4-copy replication, that would allow triple copy raid (that > > does not have a standardized name). > > So this special level will be used for RAID56 for now? > Or it will also be possible for metadata usage just like current RAID1? It's a new profile usable in the same way as is raid1, ie. for the data or metadata. The patch that adds support to btrfs-progs has an mkfs example. The raid56 will use that to store the log, essentially data forcibly stored on the n-copy raid1 chunk and used only for logging. > If the latter, the metadata scrub problem will need to be considered more. > > For more copies RAID1, it's will have higher possibility one or two > devices missing, and then being scrubbed. > For metadata scrub, inlined csum can't ensure it's the latest one. > > So for such RAID1 scrub, we need to read out all copies and compare > their generation to find out the correct copy. > At least from the changeset, it doesn't look like it's addressed yet. Nothing like this is implemented in the patches, but I don't understand how this differs from the current raid1 and one missing device. Sure we can't have 2 missing devices so the existing copy is automatically considered correct and up to date. There are more corner case recovery scenario when there could be 3 copies slightly out of date due to device loss and scrub attempt, so yes this would need to be addressed. > And this also reminds me that current scrub is not as flex as balance, I > really like we could filter block groups to scrub just like balance, and > do scrub in a block group basis, other than devid basis. > That's to say, for a block group scrub, we don't really care which > device we're scrubbing, we just need to ensure all device in this block > is storing correct data. Right, a subset of the balance filters would be nice.