From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.jrs-s.net ([173.230.137.22]:54880 "EHLO mail.jrs-s.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753166AbaACWZx (ORCPT ); Fri, 3 Jan 2014 17:25:53 -0500 Received: from [192.168.0.50] (mail.coastalscience.com [66.83.151.234]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: jim@jrs-s.net) by mail.jrs-s.net (Postfix) with ESMTPSA id 6236CD2A4 for ; Fri, 3 Jan 2014 17:25:53 -0500 (EST) Message-ID: <52C73987.7000106@jrs-s.net> Date: Fri, 03 Jan 2014 17:28:23 -0500 From: Jim Salter MIME-Version: 1.0 To: linux-btrfs@vger.kernel.org Subject: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT Content-Type: text/plain; charset=ISO-8859-1; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient). I discovered to my horror during testing today that neither raid1 nor raid10 arrays are fault tolerant of losing an actual disk. mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde mkdir /test mount /dev/vdb /test echo "test" > /test/test btrfs filesystem sync /test shutdown -hP now After shutting down the VM, I can remove ANY of the drives from the btrfs raid10 array, and be unable to mount the array. In this case, I removed the drive that was at /dev/vde, then restarted the VM. btrfs fi show Label: none uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455 Total devices 4 FS bytes used 156.00KB devid 3 size 1.00GB used 212.75MB path /dev/vdd devid 3 size 1.00GB used 212.75MB path /dev/vdc devid 3 size 1.00GB used 232.75MB path /dev/vdb *** Some devices missing OK, we have three of four raid10 devices present. Should be fine. Let's mount it: mount -t btrfs /dev/vdb /test mount: wrong fs type, bad option, bad superblock on /dev/vdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so What's the kernel log got to say about it? dmesg | tail -n 4 [ 536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1 transid 7 /dev/vdb [ 536.700515] btrfs: disk space caching is enabled [ 536.703491] btrfs: failed to read the system array on vdd [ 536.708337] btrfs: open_ctree failed Same behavior persists whether I create a raid1 or raid10 array, and whether I create it as that raid level using mkfs.btrfs or convert it afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn. Also persists even if I both scrub AND sync the array before shutting the machine down and removing one of the disks. What's up with this? This is a MASSIVE bug, and I haven't seen anybody else talking about it... has nobody tried actually failing out a disk yet, or what?