From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f179.google.com ([209.85.128.179]:36843 "EHLO mail-wr0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752353AbdFUXT4 (ORCPT ); Wed, 21 Jun 2017 19:19:56 -0400 Received: by mail-wr0-f179.google.com with SMTP id c11so199215wrc.3 for ; Wed, 21 Jun 2017 16:19:55 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <34ac2dd7-88ba-6de7-d8e2-061c283bb9c1@inwind.it> References: <1f5a4702-d264-51c6-aadd-d2cf521a45eb@dirtcellar.net> <60421001-5d74-2fb4-d916-7a397f246f20@cn.fujitsu.com> <34ac2dd7-88ba-6de7-d8e2-061c283bb9c1@inwind.it> From: Chris Murphy Date: Wed, 21 Jun 2017 17:19:54 -0600 Message-ID: Subject: Re: Exactly what is wrong with RAID5/6 To: Goffredo Baroncelli Cc: Chris Murphy , Qu Wenruo , waxhead , Btrfs BTRFS Content-Type: text/plain; charset="UTF-8" Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Wed, Jun 21, 2017 at 2:12 PM, Goffredo Baroncelli wrote: > > Generally speaking, when you write "two failure" this means two failure at the same time. But the write hole happens even if these two failures are not at the same time: > > Event #1: power failure between the data stripe write and the parity stripe write. The stripe is incoherent. > Event #2: a disk is failing: if you try to read the data from the remaining data and the parity you have wrong data. > > The likelihood of these two event at the same time (power failure and in the next boot a disk is failing) is quite low. But in the life of a filesystem, these two event likely happens. > > However BTRFS has an advantage: a simple scrub may (crossing finger) recover from event #1. Event #3: the stripe is read, missing a data strip due to event #2, and is wrongly reconstructed due to event #1, Btrfs computes crc32c on the reconstructed data and compares to extent csum, which then fails and EIO happens. Btrfs is susceptible to the write hole happening on disk. But it's still detected and corrupt data isn't propagated upward. -- Chris Murphy