All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
@ 2014-01-03 22:28 Jim Salter
  2014-01-03 22:42 ` Emil Karlson
  2014-01-03 22:43 ` Joshua Schüler
  0 siblings, 2 replies; 40+ messages in thread
From: Jim Salter @ 2014-01-03 22:28 UTC (permalink / raw)
  To: linux-btrfs

I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the 
btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient).

I discovered to my horror during testing today that neither raid1 nor 
raid10 arrays are fault tolerant of losing an actual disk.

mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde
mkdir /test
mount /dev/vdb /test
echo "test" > /test/test
btrfs filesystem sync /test
shutdown -hP now

After shutting down the VM, I can remove ANY of the drives from the 
btrfs raid10 array, and be unable to mount the array. In this case, I 
removed the drive that was at /dev/vde, then restarted the VM.

btrfs fi show
Label: none  uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455
         Total devices 4 FS bytes used 156.00KB
          devid    3 size 1.00GB used 212.75MB path /dev/vdd
          devid    3 size 1.00GB used 212.75MB path /dev/vdc
          devid    3 size 1.00GB used 232.75MB path /dev/vdb
          *** Some devices missing

OK, we have three of four raid10 devices present. Should be fine. Let's 
mount it:

mount -t btrfs /dev/vdb /test
mount: wrong fs type, bad option, bad superblock on /dev/vdb,
        missing codepage or helper program, or other error
        In some cases useful info is found in syslog - try
        dmesg | tail or so

What's the kernel log got to say about it?

dmesg | tail -n 4
[  536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1 
transid 7 /dev/vdb
[  536.700515] btrfs: disk space caching is enabled
[  536.703491] btrfs: failed to read the system array on vdd
[  536.708337] btrfs: open_ctree failed

Same behavior persists whether I create a raid1 or raid10 array, and 
whether I create it as that raid level using mkfs.btrfs or convert it 
afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn. 
Also persists even if I both scrub AND sync the array before shutting 
the machine down and removing one of the disks.

What's up with this? This is a MASSIVE bug, and I haven't seen anybody 
else talking about it... has nobody tried actually failing out a disk 
yet, or what?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-03 22:28 btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT Jim Salter
@ 2014-01-03 22:42 ` Emil Karlson
  2014-01-03 22:43 ` Joshua Schüler
  1 sibling, 0 replies; 40+ messages in thread
From: Emil Karlson @ 2014-01-03 22:42 UTC (permalink / raw)
  To: Jim Salter; +Cc: Linux Btrfs

> mount -t btrfs /dev/vdb /test
> mount: wrong fs type, bad option, bad superblock on /dev/vdb,
>        missing codepage or helper program, or other error
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so

IIRC you need mount option degraded here.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-03 22:28 btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT Jim Salter
  2014-01-03 22:42 ` Emil Karlson
@ 2014-01-03 22:43 ` Joshua Schüler
  2014-01-03 22:56   ` Jim Salter
  1 sibling, 1 reply; 40+ messages in thread
From: Joshua Schüler @ 2014-01-03 22:43 UTC (permalink / raw)
  To: jim; +Cc: linux-btrfs

Am 03.01.2014 23:28, schrieb Jim Salter:
> I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the
> btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient).
> 
> I discovered to my horror during testing today that neither raid1 nor
> raid10 arrays are fault tolerant of losing an actual disk.
> 
> mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde
> mkdir /test
> mount /dev/vdb /test
> echo "test" > /test/test
> btrfs filesystem sync /test
> shutdown -hP now
> 
> After shutting down the VM, I can remove ANY of the drives from the
> btrfs raid10 array, and be unable to mount the array. In this case, I
> removed the drive that was at /dev/vde, then restarted the VM.
> 
> btrfs fi show
> Label: none  uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455
>         Total devices 4 FS bytes used 156.00KB
>          devid    3 size 1.00GB used 212.75MB path /dev/vdd
>          devid    3 size 1.00GB used 212.75MB path /dev/vdc
>          devid    3 size 1.00GB used 232.75MB path /dev/vdb
>          *** Some devices missing
> 
> OK, we have three of four raid10 devices present. Should be fine. Let's
> mount it:
> 
> mount -t btrfs /dev/vdb /test
> mount: wrong fs type, bad option, bad superblock on /dev/vdb,
>        missing codepage or helper program, or other error
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so
> 
> What's the kernel log got to say about it?
> 
> dmesg | tail -n 4
> [  536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1
> transid 7 /dev/vdb
> [  536.700515] btrfs: disk space caching is enabled
> [  536.703491] btrfs: failed to read the system array on vdd
> [  536.708337] btrfs: open_ctree failed
> 
> Same behavior persists whether I create a raid1 or raid10 array, and
> whether I create it as that raid level using mkfs.btrfs or convert it
> afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn.
> Also persists even if I both scrub AND sync the array before shutting
> the machine down and removing one of the disks.
> 
> What's up with this? This is a MASSIVE bug, and I haven't seen anybody
> else talking about it... has nobody tried actually failing out a disk
> yet, or what?

Hey Jim,

keep calm and read the wiki ;)
https://btrfs.wiki.kernel.org/

You need to mount with -o degraded to tell btrfs a disk is missing.


Joshua



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-03 22:43 ` Joshua Schüler
@ 2014-01-03 22:56   ` Jim Salter
  2014-01-03 23:04     ` Hugo Mills
                       ` (3 more replies)
  0 siblings, 4 replies; 40+ messages in thread
From: Jim Salter @ 2014-01-03 22:56 UTC (permalink / raw)
  To: Joshua Schüler; +Cc: linux-btrfs

I actually read the wiki pretty obsessively before blasting the list - 
could not successfully find anything answering the question, by scanning 
the FAQ or by Googling.

You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine.

HOWEVER - this won't allow a root filesystem to mount. How do you deal 
with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root 
filesystem? Few things are scarier than seeing the "cannot find init" 
message in GRUB and being faced with a BusyBox prompt... which is 
actually how I initially got my scare; I was trying to do a walkthrough 
for setting up a raid1 / for an article in a major online magazine and 
it wouldn't boot at all after removing a device; I backed off and tested 
with a non root filesystem before hitting the list.

I did find the -o degraded argument in the wiki now that you mentioned 
it - but it's not prominent enough if you ask me. =)



On 01/03/2014 05:43 PM, Joshua Schüler wrote:
> Am 03.01.2014 23:28, schrieb Jim Salter:
>> I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the
>> btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient).
>>
>> I discovered to my horror during testing today that neither raid1 nor
>> raid10 arrays are fault tolerant of losing an actual disk.
>>
>> mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde
>> mkdir /test
>> mount /dev/vdb /test
>> echo "test" > /test/test
>> btrfs filesystem sync /test
>> shutdown -hP now
>>
>> After shutting down the VM, I can remove ANY of the drives from the
>> btrfs raid10 array, and be unable to mount the array. In this case, I
>> removed the drive that was at /dev/vde, then restarted the VM.
>>
>> btrfs fi show
>> Label: none  uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455
>>          Total devices 4 FS bytes used 156.00KB
>>           devid    3 size 1.00GB used 212.75MB path /dev/vdd
>>           devid    3 size 1.00GB used 212.75MB path /dev/vdc
>>           devid    3 size 1.00GB used 232.75MB path /dev/vdb
>>           *** Some devices missing
>>
>> OK, we have three of four raid10 devices present. Should be fine. Let's
>> mount it:
>>
>> mount -t btrfs /dev/vdb /test
>> mount: wrong fs type, bad option, bad superblock on /dev/vdb,
>>         missing codepage or helper program, or other error
>>         In some cases useful info is found in syslog - try
>>         dmesg | tail or so
>>
>> What's the kernel log got to say about it?
>>
>> dmesg | tail -n 4
>> [  536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1
>> transid 7 /dev/vdb
>> [  536.700515] btrfs: disk space caching is enabled
>> [  536.703491] btrfs: failed to read the system array on vdd
>> [  536.708337] btrfs: open_ctree failed
>>
>> Same behavior persists whether I create a raid1 or raid10 array, and
>> whether I create it as that raid level using mkfs.btrfs or convert it
>> afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn.
>> Also persists even if I both scrub AND sync the array before shutting
>> the machine down and removing one of the disks.
>>
>> What's up with this? This is a MASSIVE bug, and I haven't seen anybody
>> else talking about it... has nobody tried actually failing out a disk
>> yet, or what?
> Hey Jim,
>
> keep calm and read the wiki ;)
> https://btrfs.wiki.kernel.org/
>
> You need to mount with -o degraded to tell btrfs a disk is missing.
>
>
> Joshua
>
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-03 22:56   ` Jim Salter
@ 2014-01-03 23:04     ` Hugo Mills
  2014-01-03 23:04     ` Joshua Schüler
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 40+ messages in thread
From: Hugo Mills @ 2014-01-03 23:04 UTC (permalink / raw)
  To: Jim Salter; +Cc: Joshua Schüler, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4070 bytes --]

On Fri, Jan 03, 2014 at 05:56:42PM -0500, Jim Salter wrote:
> I actually read the wiki pretty obsessively before blasting the list
> - could not successfully find anything answering the question, by
> scanning the FAQ or by Googling.
> 
> You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine.
> 
> HOWEVER - this won't allow a root filesystem to mount. How do you
> deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your
> root filesystem? Few things are scarier than seeing the "cannot find
> init" message in GRUB and being faced with a BusyBox prompt...

   Use grub's command-line editing to add rootflags=degraded to it.

   Hugo.

> which
> is actually how I initially got my scare; I was trying to do a
> walkthrough for setting up a raid1 / for an article in a major
> online magazine and it wouldn't boot at all after removing a device;
> I backed off and tested with a non root filesystem before hitting
> the list.
> 
> I did find the -o degraded argument in the wiki now that you
> mentioned it - but it's not prominent enough if you ask me. =)
> 
> 
> 
> On 01/03/2014 05:43 PM, Joshua Schüler wrote:
> >Am 03.01.2014 23:28, schrieb Jim Salter:
> >>I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the
> >>btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient).
> >>
> >>I discovered to my horror during testing today that neither raid1 nor
> >>raid10 arrays are fault tolerant of losing an actual disk.
> >>
> >>mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde
> >>mkdir /test
> >>mount /dev/vdb /test
> >>echo "test" > /test/test
> >>btrfs filesystem sync /test
> >>shutdown -hP now
> >>
> >>After shutting down the VM, I can remove ANY of the drives from the
> >>btrfs raid10 array, and be unable to mount the array. In this case, I
> >>removed the drive that was at /dev/vde, then restarted the VM.
> >>
> >>btrfs fi show
> >>Label: none  uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455
> >>         Total devices 4 FS bytes used 156.00KB
> >>          devid    3 size 1.00GB used 212.75MB path /dev/vdd
> >>          devid    3 size 1.00GB used 212.75MB path /dev/vdc
> >>          devid    3 size 1.00GB used 232.75MB path /dev/vdb
> >>          *** Some devices missing
> >>
> >>OK, we have three of four raid10 devices present. Should be fine. Let's
> >>mount it:
> >>
> >>mount -t btrfs /dev/vdb /test
> >>mount: wrong fs type, bad option, bad superblock on /dev/vdb,
> >>        missing codepage or helper program, or other error
> >>        In some cases useful info is found in syslog - try
> >>        dmesg | tail or so
> >>
> >>What's the kernel log got to say about it?
> >>
> >>dmesg | tail -n 4
> >>[  536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1
> >>transid 7 /dev/vdb
> >>[  536.700515] btrfs: disk space caching is enabled
> >>[  536.703491] btrfs: failed to read the system array on vdd
> >>[  536.708337] btrfs: open_ctree failed
> >>
> >>Same behavior persists whether I create a raid1 or raid10 array, and
> >>whether I create it as that raid level using mkfs.btrfs or convert it
> >>afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn.
> >>Also persists even if I both scrub AND sync the array before shutting
> >>the machine down and removing one of the disks.
> >>
> >>What's up with this? This is a MASSIVE bug, and I haven't seen anybody
> >>else talking about it... has nobody tried actually failing out a disk
> >>yet, or what?
> >Hey Jim,
> >
> >keep calm and read the wiki ;)
> >https://btrfs.wiki.kernel.org/
> >
> >You need to mount with -o degraded to tell btrfs a disk is missing.
> >
> >
> >Joshua
> >
> >
> 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
      --- Eighth Army Push Bottles Up Germans -- WWII newspaper ---      
                     headline (possibly apocryphal)                      

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-03 22:56   ` Jim Salter
  2014-01-03 23:04     ` Hugo Mills
@ 2014-01-03 23:04     ` Joshua Schüler
  2014-01-03 23:13       ` Jim Salter
  2014-01-03 23:19     ` Chris Murphy
       [not found]     ` <CAOjFWZ7zC3=4oH6=SBZA+PhZMrSK1KjxoRN6L2vqd=GTBKKTQA@mail.gmail.com>
  3 siblings, 1 reply; 40+ messages in thread
From: Joshua Schüler @ 2014-01-03 23:04 UTC (permalink / raw)
  To: Jim Salter; +Cc: linux-btrfs

Am 03.01.2014 23:56, schrieb Jim Salter:
> I actually read the wiki pretty obsessively before blasting the list -
> could not successfully find anything answering the question, by scanning
> the FAQ or by Googling.
> 
> You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine.
don't forget to
btrfs device delete missing <path>
See
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices
> 
> HOWEVER - this won't allow a root filesystem to mount. How do you deal
> with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root
> filesystem? Few things are scarier than seeing the "cannot find init"
> message in GRUB and being faced with a BusyBox prompt... which is
> actually how I initially got my scare; I was trying to do a walkthrough
> for setting up a raid1 / for an article in a major online magazine and
> it wouldn't boot at all after removing a device; I backed off and tested
> with a non root filesystem before hitting the list.
Add -o degraded to the boot-options in GRUB.

If your filesystem is more heavily corrupted then you either need the
btrfs tools in your initrd or a rescue cd
> 
> I did find the -o degraded argument in the wiki now that you mentioned
> it - but it's not prominent enough if you ask me. =)
> 
> 

[snip]

Joshua

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-03 23:04     ` Joshua Schüler
@ 2014-01-03 23:13       ` Jim Salter
  2014-01-03 23:18         ` Hugo Mills
  2014-01-03 23:22         ` Chris Murphy
  0 siblings, 2 replies; 40+ messages in thread
From: Jim Salter @ 2014-01-03 23:13 UTC (permalink / raw)
  To: Joshua Schüler; +Cc: linux-btrfs

Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still kinda 
black magic to me, and I don't think I'm supposed to be editing it 
directly at all anymore anyway, if I remember correctly...
>> HOWEVER - this won't allow a root filesystem to mount. How do you deal
>> with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root
>> filesystem? Few things are scarier than seeing the "cannot find init"
>> message in GRUB and being faced with a BusyBox prompt... which is
>> actually how I initially got my scare; I was trying to do a walkthrough
>> for setting up a raid1 / for an article in a major online magazine and
>> it wouldn't boot at all after removing a device; I backed off and tested
>> with a non root filesystem before hitting the list.
> Add -o degraded to the boot-options in GRUB.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-03 23:13       ` Jim Salter
@ 2014-01-03 23:18         ` Hugo Mills
  2014-01-03 23:25           ` Jim Salter
  2014-01-03 23:22         ` Chris Murphy
  1 sibling, 1 reply; 40+ messages in thread
From: Hugo Mills @ 2014-01-03 23:18 UTC (permalink / raw)
  To: Jim Salter; +Cc: Joshua Schüler, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1697 bytes --]

On Fri, Jan 03, 2014 at 06:13:25PM -0500, Jim Salter wrote:
> Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still
> kinda black magic to me, and I don't think I'm supposed to be
> editing it directly at all anymore anyway, if I remember
> correctly...

   You don't need to edit grub.cfg -- when you boot, grub has an edit
option, so you can do it at boot time without having to use a rescue
disk.

   Regardless, the thing you need to edit is the line starting
"linux", and will look something like this:

linux /vmlinuz-3.11.0-rc2-dirty root=UUID=1b6ec419-211a-445e-b762-ae7da27b6e8a ro single rootflags=subvol=fs-root

   If there's a rootflags= option already (as above), add ",degraded"
to the end. If there isn't, add "rootflags=degraded".

   Hugo.

> >>HOWEVER - this won't allow a root filesystem to mount. How do you deal
> >>with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root
> >>filesystem? Few things are scarier than seeing the "cannot find init"
> >>message in GRUB and being faced with a BusyBox prompt... which is
> >>actually how I initially got my scare; I was trying to do a walkthrough
> >>for setting up a raid1 / for an article in a major online magazine and
> >>it wouldn't boot at all after removing a device; I backed off and tested
> >>with a non root filesystem before hitting the list.
> >Add -o degraded to the boot-options in GRUB.
> 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
      --- Eighth Army Push Bottles Up Germans -- WWII newspaper ---      
                     headline (possibly apocryphal)                      

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-03 22:56   ` Jim Salter
  2014-01-03 23:04     ` Hugo Mills
  2014-01-03 23:04     ` Joshua Schüler
@ 2014-01-03 23:19     ` Chris Murphy
       [not found]     ` <CAOjFWZ7zC3=4oH6=SBZA+PhZMrSK1KjxoRN6L2vqd=GTBKKTQA@mail.gmail.com>
  3 siblings, 0 replies; 40+ messages in thread
From: Chris Murphy @ 2014-01-03 23:19 UTC (permalink / raw)
  To: Btrfs BTRFS


On Jan 3, 2014, at 3:56 PM, Jim Salter <jim@jrs-s.net> wrote:

> I actually read the wiki pretty obsessively before blasting the list - could not successfully find anything answering the question, by scanning the FAQ or by Googling.
> 
> You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine.
> 
> HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? 

I'd say that it's not ready for unattended/auto degraded mounting, that this is intended to be a red flag show stopper to get the attention of the user. Before automatic degraded mounts, which md and LVM raid do now, there probably needs to be notification support in desktop's, .e.g. Gnome will report degraded state for at least md arrays (maybe LVM too, not sure). There's also a list of other multiple device stuff on the to do, some of which maybe should be done before auto degraded mount, for example the hot spare work.

https://btrfs.wiki.kernel.org/index.php/Project_ideas#Multiple_Devices


Chris Murphy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-03 23:13       ` Jim Salter
  2014-01-03 23:18         ` Hugo Mills
@ 2014-01-03 23:22         ` Chris Murphy
  2014-01-04  6:10           ` Duncan
  1 sibling, 1 reply; 40+ messages in thread
From: Chris Murphy @ 2014-01-03 23:22 UTC (permalink / raw)
  To: Btrfs BTRFS


On Jan 3, 2014, at 4:13 PM, Jim Salter <jim@jrs-s.net> wrote:

> Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still kinda black magic to me, and I don't think I'm supposed to be editing it directly at all anymore anyway, if I remember correctly…

Don't edit the grub.cfg directly. At the grub menu, only highlight the entry you want to boot, then hit 'e', and then edit the existing linux/linuxefi line. If you already have rootfs on a subvolume, you'll have an existing parameter on that line rootflags=subvol=<rootname> and you can change this to rootflags=subvol=<rootname>,degraded

I would not make this option persistent by putting it permanently in the grub.cfg; although I don't know the consequence of always mounting with degraded even if not necessary it could have some negative effects (?)


Chris Murphy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-03 23:18         ` Hugo Mills
@ 2014-01-03 23:25           ` Jim Salter
  2014-01-03 23:32             ` Chris Murphy
  0 siblings, 1 reply; 40+ messages in thread
From: Jim Salter @ 2014-01-03 23:25 UTC (permalink / raw)
  To: Hugo Mills, Joshua Schüler, linux-btrfs

Yep - had just figured that out and successfully booted with it, and was 
in the process of typing up instructions for the list (and posterity).

One thing that concerns me is that edits made directly to grub.cfg will 
get wiped out with every kernel upgrade when update-grub is run - any 
idea where I'd put this in /etc/grub.d to have a persistent change?

I have to tell you, I'm not real thrilled with this behavior either way 
- it means I can't have the option to automatically mount degraded 
filesystems without the filesystems in question ALWAYS showing as being 
mounted degraded, whether the disks are all present and working fine or 
not. That's kind of blecchy. =\


On 01/03/2014 06:18 PM, Hugo Mills wrote:
> On Fri, Jan 03, 2014 at 06:13:25PM -0500, Jim Salter wrote:
>> Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still
>> kinda black magic to me, and I don't think I'm supposed to be
>> editing it directly at all anymore anyway, if I remember
>> correctly...
>     You don't need to edit grub.cfg -- when you boot, grub has an edit
> option, so you can do it at boot time without having to use a rescue
> disk.
>
>     Regardless, the thing you need to edit is the line starting
> "linux", and will look something like this:
>
> linux /vmlinuz-3.11.0-rc2-dirty root=UUID=1b6ec419-211a-445e-b762-ae7da27b6e8a ro single rootflags=subvol=fs-root
>
>     If there's a rootflags= option already (as above), add ",degraded"
> to the end. If there isn't, add "rootflags=degraded".
>
>     Hugo.
>
>>>> HOWEVER - this won't allow a root filesystem to mount. How do you deal
>>>> with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root
>>>> filesystem? Few things are scarier than seeing the "cannot find init"
>>>> message in GRUB and being faced with a BusyBox prompt... which is
>>>> actually how I initially got my scare; I was trying to do a walkthrough
>>>> for setting up a raid1 / for an article in a major online magazine and
>>>> it wouldn't boot at all after removing a device; I backed off and tested
>>>> with a non root filesystem before hitting the list.
>>> Add -o degraded to the boot-options in GRUB.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-03 23:25           ` Jim Salter
@ 2014-01-03 23:32             ` Chris Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Chris Murphy @ 2014-01-03 23:32 UTC (permalink / raw)
  To: Jim Salter; +Cc: Hugo Mills, Joshua Schüler, linux-btrfs


On Jan 3, 2014, at 4:25 PM, Jim Salter <jim@jrs-s.net> wrote:

> 
> One thing that concerns me is that edits made directly to grub.cfg will get wiped out with every kernel upgrade when update-grub is run - any idea where I'd put this in /etc/grub.d to have a persistent change?

/etc/default/grub

I don't recommend making it persistent. At this stage of development, a disk failure should cause mount failure so you're alerted to the problem.

> I have to tell you, I'm not real thrilled with this behavior either way - it means I can't have the option to automatically mount degraded filesystems without the filesystems in question ALWAYS showing as being mounted degraded, whether the disks are all present and working fine or not. That's kind of blecchy. =\

If you need something that comes up degraded automatically by design as a supported use case, use md (or possibly LVM which uses different user space tools and monitoring but uses the md kernel driver code and supports raid 0,1,5,6 - quite nifty). I haven't tried this yet, but I think that's also supported with the thin provisioning work, which even if you don't use thin provisioning gets you the significantly more efficient snapshot behavior.

Chris Murphy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
       [not found]     ` <CAOjFWZ7zC3=4oH6=SBZA+PhZMrSK1KjxoRN6L2vqd=GTBKKTQA@mail.gmail.com>
@ 2014-01-03 23:42       ` Jim Salter
  2014-01-03 23:45         ` Jim Salter
  2014-01-04  0:27         ` Chris Murphy
  0 siblings, 2 replies; 40+ messages in thread
From: Jim Salter @ 2014-01-03 23:42 UTC (permalink / raw)
  To: Freddie Cash; +Cc: Joshua Schüler, linux-btrfs, Chris Murphy

For anybody else interested, if you want your system to automatically 
boot a degraded btrfs array, here are my crib notes, verified working:

***************************** boot degraded

1. edit /etc/grub.d/10_linux, add degraded to the rootflags

     GRUB_CMDLINE_LINUX="rootflags=degraded,subvol=${rootsubvol} 
${GRUB_CMDLINE_LINUX}


2. add degraded to options in /etc/fstab also

UUID=bf9ea9b9-54a7-4efc-8003-6ac0b344c6b5 /               btrfs 
defaults,degraded,subvol=@       0       1


3. Update and reinstall GRUB to all boot disks

update-grub
grub-install /dev/vda
grub-install /dev/vdb

Now you have a system which will automatically start a degraded array.


******************************************************

Side note: sorry, but I absolutely don't buy the argument that "the 
system won't boot without you driving down to its physical location, 
standing in front of it, and hammering panickily at a BusyBox prompt" is 
the best way to find out your array is degraded.  I'll set up a Nagios 
module to check for degraded arrays using btrfs fi list instead, thanks...


On 01/03/2014 06:06 PM, Freddie Cash wrote:
> Why is manual intervention even needed?  Why isn't the filesystem 
> "smart" enough to mount in a degraded mode automatically?​
>
> -- 
> Freddie Cash
> fjwcash@gmail.com <mailto:fjwcash@gmail.com>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-03 23:42       ` Jim Salter
@ 2014-01-03 23:45         ` Jim Salter
  2014-01-04  0:27         ` Chris Murphy
  1 sibling, 0 replies; 40+ messages in thread
From: Jim Salter @ 2014-01-03 23:45 UTC (permalink / raw)
  To: Freddie Cash; +Cc: Joshua Schüler, linux-btrfs, Chris Murphy

Minor correction: you need to close the double-quotes at the end of the 
GRUB_CMDLINE_LINUX line:

     GRUB_CMDLINE_LINUX="rootflags=degraded,subvol=${rootsubvol} 
${GRUB_CMDLINE_LINUX}"


On 01/03/2014 06:42 PM, Jim Salter wrote:
> For anybody else interested, if you want your system to automatically 
> boot a degraded btrfs array, here are my crib notes, verified working:
>
> ***************************** boot degraded
>
> 1. edit /etc/grub.d/10_linux, add degraded to the rootflags
>
>     GRUB_CMDLINE_LINUX="rootflags=degraded,subvol=${rootsubvol} 
> ${GRUB_CMDLINE_LINUX}
>
>
> 2. add degraded to options in /etc/fstab also
>
> UUID=bf9ea9b9-54a7-4efc-8003-6ac0b344c6b5 /               btrfs 
> defaults,degraded,subvol=@       0       1
>
>
> 3. Update and reinstall GRUB to all boot disks
>
> update-grub
> grub-install /dev/vda
> grub-install /dev/vdb
>
> Now you have a system which will automatically start a degraded array.
>
>
> ******************************************************
>
> Side note: sorry, but I absolutely don't buy the argument that "the 
> system won't boot without you driving down to its physical location, 
> standing in front of it, and hammering panickily at a BusyBox prompt" 
> is the best way to find out your array is degraded.  I'll set up a 
> Nagios module to check for degraded arrays using btrfs fi list 
> instead, thanks...
>
>
> On 01/03/2014 06:06 PM, Freddie Cash wrote:
>> Why is manual intervention even needed? Why isn't the filesystem 
>> "smart" enough to mount in a degraded mode automatically?​
>>
>> -- 
>> Freddie Cash
>> fjwcash@gmail.com <mailto:fjwcash@gmail.com>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-03 23:42       ` Jim Salter
  2014-01-03 23:45         ` Jim Salter
@ 2014-01-04  0:27         ` Chris Murphy
  2014-01-04  2:59           ` Jim Salter
  1 sibling, 1 reply; 40+ messages in thread
From: Chris Murphy @ 2014-01-04  0:27 UTC (permalink / raw)
  To: Jim Salter; +Cc: linux-btrfs, Hugo Mills


On Jan 3, 2014, at 4:42 PM, Jim Salter <jim@jrs-s.net> wrote:

> For anybody else interested, if you want your system to automatically boot a degraded btrfs array, here are my crib notes, verified working:
> 
> ***************************** boot degraded
> 
> 1. edit /etc/grub.d/10_linux, add degraded to the rootflags
> 
>    GRUB_CMDLINE_LINUX="rootflags=degraded,subvol=${rootsubvol} ${GRUB_CMDLINE_LINUX}

This is the wrong way to solve this. /etc/grub.d/10_linux is subject to being replaced on updates. It is not recommended it be edited, same as for grub.cfg. The correct way is as I already stated, which is to edit the GRUB_CMDLINE_LINUX= line in /etc/default/grub.


> 2. add degraded to options in /etc/fstab also
> 
> UUID=bf9ea9b9-54a7-4efc-8003-6ac0b344c6b5 /               btrfs defaults,degraded,subvol=@       0       1


I think it's bad advice to recommend always persistently mounting a good volume with this option. There's a reason why degraded is not the default mount option, and why there isn't yet automatic degraded mount functionality. That fstab contains other errors.

The correct way to automate this before Btrfs developers get around to it is to create a systemd unit that checks for the mount failure, determines that there's a missing device, and generates a modified sysroot.mount job that includes degraded.


> Side note: sorry, but I absolutely don't buy the argument that "the system won't boot without you driving down to its physical location, standing in front of it, and hammering panickily at a BusyBox prompt" is the best way to find out your array is degraded.

You're simply dissatisfied with the state of Btrfs development and are suggesting bad hacks as a work around. That's my argument. Again, if your use case requires automatic degraded mounts, use a technology that's mature and well tested for that use case. Don't expect a lot of sympathy if these bad hacks cause you problems later.


>  I'll set up a Nagios module to check for degraded arrays using btrfs fi list instead, thanks…

That's a good idea, except that it's show rather than list.



Chris Murphy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04  0:27         ` Chris Murphy
@ 2014-01-04  2:59           ` Jim Salter
  2014-01-04  5:57             ` Dave
  2014-01-04 19:18             ` Chris Murphy
  0 siblings, 2 replies; 40+ messages in thread
From: Jim Salter @ 2014-01-04  2:59 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-btrfs


On 01/03/2014 07:27 PM, Chris Murphy wrote:
> This is the wrong way to solve this. /etc/grub.d/10_linux is subject 
> to being replaced on updates. It is not recommended it be edited, same 
> as for grub.cfg. The correct way is as I already stated, which is to 
> edit the GRUB_CMDLINE_LINUX= line in /etc/default/grub. 
Fair enough - though since I already have to monkey-patch 00_header, I 
kind of already have an eye on grub.d so it doesn't seem as onerous as 
it otherwise would. There is definitely a lot of work that needs to be 
done on the boot sequence for btrfs IMO.
> I think it's bad advice to recommend always persistently mounting a 
> good volume with this option. There's a reason why degraded is not the 
> default mount option, and why there isn't yet automatic degraded mount 
> functionality. That fstab contains other errors.
What other errors does it contain? Aside from adding the "degraded" 
option, that's a bone-stock fstab entry from an Ubuntu Server installation.
> The correct way to automate this before Btrfs developers get around to 
> it is to create a systemd unit that checks for the mount failure, 
> determines that there's a missing device, and generates a modified 
> sysroot.mount job that includes degraded. 
Systemd is not the boot system in use for my distribution, and using it 
would require me to build a custom kernel, among other things. We're 
going to have to agree to disagree that that's an appropriate 
workaround, I think.
> You're simply dissatisfied with the state of Btrfs development and are 
> suggesting bad hacks as a work around. That's my argument. Again, if 
> your use case requires automatic degraded mounts, use a technology 
> that's mature and well tested for that use case. Don't expect a lot of 
> sympathy if these bad hacks cause you problems later. 
You're suggesting the wrong alternatives here (mdraid, LVM, etc) - they 
don't provide the features that I need or are accustomed to (true 
snapshots, copy on write, self-correcting redundant arrays, and on down 
the line). If you're going to shoo me off, the correct way to do it is 
to wave me in the direction of ZFS, in which case I can tell you I've 
been a happy user of ZFS for 5+ years now on hundreds of systems. ZFS 
and btrfs are literally the *only* options available that do what I want 
to do, and have been doing for years now. (At least aside from 
six-figure-and-up proprietary systems, which I have neither the budget 
nor the inclination for.)

I'm testing btrfs heavily in throwaway virtual environments and in a few 
small, heavily-monitored "test production" instances because ZFS on 
Linux has its own set of problems, both technical and licensing, and I 
think it's clear btrfs is going to take the lead in the very near future 
- in many ways, it does already.
>>   I'll set up a Nagios module to check for degraded arrays using btrfs fi list instead, thanks…
> That's a good idea, except that it's show rather than list.
Yup, that's what I meant all right. I frequently still get the syntax 
backwards between btrfs fi show and btrfs subv list.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04  2:59           ` Jim Salter
@ 2014-01-04  5:57             ` Dave
  2014-01-04 11:28               ` Chris Samuel
  2014-01-04 19:18             ` Chris Murphy
  1 sibling, 1 reply; 40+ messages in thread
From: Dave @ 2014-01-04  5:57 UTC (permalink / raw)
  To: Jim Salter; +Cc: linux-btrfs

On Fri, Jan 3, 2014 at 9:59 PM, Jim Salter <jim@jrs-s.net> wrote:
> You're suggesting the wrong alternatives here (mdraid, LVM, etc) - they
> don't provide the features that I need or are accustomed to (true snapshots,
> copy on write, self-correcting redundant arrays, and on down the line). If
> you're going to shoo me off, the correct way to do it is to wave me in the
> direction of ZFS, in which case I can tell you I've been a happy user of ZFS
> for 5+ years now on hundreds of systems. ZFS and btrfs are literally the
> *only* options available that do what I want to do, and have been doing for
> years now. (At least aside from six-figure-and-up proprietary systems, which
> I have neither the budget nor the inclination for.)

Jim, there's nothing stopping you from creating a Btrfs filesystem on
top of an mdraid array.  I'm currently running three WD Red 3TB drives
in a raid5 configuration under a Btrfs filesystem.  This configuration
works pretty well and fills the feature gap you're describing.

I will say, though, that the whole tone of your email chain leaves a
bad taste in my mouth; kind of like a poorly adjusted relative who
shows up once a year for Thanksgiving and makes everyone feel
uncomfortable.  I find myself annoyed by the constant disclaimers I
read on this list, about the experimental status of Btrfs, but it's
apparent that this hasn't sunk in for everyone.  Your poor budget
doesn't a production filesytem make.

I and many others on this list who have been using Btrfs, will tell
you with no hesitation, that due to the maturity of the code, Btrfs
should be making NO assumptions in the event of a failure, and
everything should come to a screeching halt.  I've seen it all: the
infamous 120 second process hangs, csum errors, multiple separate
catastrophic failures (search me on this list).  Things are MOSTLY
stable but you simply have to glance at a few weeks of history on this
list to see the experimental status is fully justified.  I use Btrfs
because of its intoxicating feature set.  As an IT director though,
I'd never subject my company to these rigors.  If Btrfs on mdraid
isn't an acceptable solution for you, then ZFS is the only responsible
alternative.
-- 
-=[dave]=-

Entropy isn't what it used to be.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-03 23:22         ` Chris Murphy
@ 2014-01-04  6:10           ` Duncan
  2014-01-04 11:20             ` Chris Samuel
                               ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Duncan @ 2014-01-04  6:10 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Fri, 03 Jan 2014 16:22:44 -0700 as excerpted:

> I would not make this option persistent by putting it permanently in the
> grub.cfg; although I don't know the consequence of always mounting with
> degraded even if not necessary it could have some negative effects (?)

Degraded only actually does anything if it's actually needed.  On a 
normal array it'll be a NOOP, so should be entirely safe for /normal/ 
operation, but that doesn't mean I'd /recommend/ it for normal operation, 
since it bypasses checks that are there for a reason, thus silently 
bypassing information that an admin needs to know before he boots it 
anyway, in ordered to recover.

However, I've some other comments to add:

1) As you I'm uncomfortable with the whole idea of adding degraded 
permanently at this point.

Mention was made of having to drive down to the data center and actually 
stand in front of the box if something goes wrong, otherwise.  At the 
moment, for btrfs' development state at this point, fine.  Btrfs remains 
under development and there are clear warnings about using it without 
backups one hasn't tested recovery from or are not otherwise prepared to 
actually use.  It's stated in multiple locations on the wiki; it's stated 
on the kernel btrfs config option, and it's stated in mkfs.btrfs output 
when you create the filesystem.  If after all that people are using it in 
a remote situation where they're not prepared to drive down to the data 
center and stab at the keys if they have to, they're using possibly the 
right filesystem, but at the wrong too early point in its development, 
for their needs at this moment.


2) As the wiki explains, certain configurations require at least a 
minimum number of devices in ordered to work "undegraded".  The example 
given in the OP was of a 4-device raid10, already the minimum number to 
work undegraded, with one device dropped out, to below the minimum 
required number to mount undegraded, so of /course/ it wouldn't mount 
without that option.

If five or six devices would have been used, a device could have been 
dropped and the remaining number of devices would still be greater than 
or equal to the minimum number of devices to run an undegraded raid10, 
and the result would likely have been different, since there's still 
enough devices to mount writable with proper redundancy, even if existing 
information doesn't have that redundancy until a rebalance is done to 
take care of the missing device.

Similarly with a raid1 and its minimum two devices.  Configure with 
three, then drop one, and it should still work as it's above the two 
minimum for raid1 configuration.  Configure with two and drop one, and 
you'll have to mount degraded (and it'll drop to read-only if it happens 
in operation) since there's no second device to write the second copy to, 
as required by raid1.

3) Frankly, this whole thread smells of going off half cocked, posting 
before doing the proper research.  I know when I took a look at btrfs 
here, I read up on the wiki, reading the multiple devices stuff, the faq, 
the problem faq, the gotchas, the use cases, the sysadmin guide, the 
getting started and mount options... loading the pages multiple times as 
I followed links back and forth between them.

Because I care about my data and want to understand what I'm doing with 
it before I do it!

And even now I often reread specific parts as I'm trying to help others 
with questions on this list....

Then I still had some questions about how it worked that I couldn't find 
answers for on the wiki, and as traditional with mailing lists and 
newsgroups before them, I read several weeks worth of posts (on an 
archive for lists) before actually posting my questions, to see if they 
were FAQs already answered on the list.

Then and only then did I post the questions to the list, and when I did, 
it was, "Questions I haven't found answers for on the wiki or list", not 
"THE WORLD IS GOING TO END, OH NOS!!111!!111111!!!!!111!!!"

Now later on I did post some behavior that had me rather upset, but that 
was AFTER I had already engaged the list in general, and was pretty sure 
by that point that what I was seeing was NOT covered on the wiki, and was 
reasonably new information for at least SOME list users.

4) As a matter of fact, AFAIK that behavior remains relevant today, and 
may well be of interest to the OP.

FWIW my background was Linux kernel md/raid, so I approached the btrfs 
raid expecting similar behavior.  What I found in my testing (and NOT 
covered on the WIKI or in the various documentation other than in a few 
threads on list to this day, AFAIK) , however...

Test:  

a) Create a two device btrfs raid1.

b) Mount it and write some data to it.

c) Unmount it, unplug one device, mount degraded the remaining device.

d) Write some data to a test file on it, noting the path/filename and 
data.

e) Unmount again, switch plugged devices so the formerly unplugged one is 
now the plugged one, and again mount degraded.

f) Write some DIFFERENT data to the SAME path/file as in (d), so the two 
versions each on its own device have now incompatibly forked.

g) Unmount, plug both devices in and mount, now undegraded.

What I discovered back then, and to my knowledge the same behavior exists 
today, is that entirely unexpectedly from and in contrast to my mdraid 
experience, THE FILESYSTEM MOUNTED WITHOUT PROTEST!!

h) I checked the file and one variant as written was returned.  STILL NO 
WARNING!  While I didn't test it, I'm assuming based on the PID-based 
round-robin read-assignment that I now know btrfs uses, that which copy I 
got would depend on whether the PID of the reading thread was even or 
odd, as that's what determines what device of the pair is read.  (There 
has actually been some discussion of that as it's not a particularly 
intelligent balancing scheme and it's on the list to change, but the 
current even/odd works well enough for an initial implementation while 
the filesystem remains under development.)

i) Were I rerunning the test today, I'd try a scrub and see what it did 
with the difference.  But I was early enough in my btrfs learning that I 
didn't know to run it at that point, so didn't do so.  I'd still be 
interested in how it handled that, tho based on what I know of btrfs 
behavior in general, I can /predict/ that which copy it'd scrub out and 
which it would keep, would again depend on the PID of the scrub thread, 
since both copies would appear valid (would verify against their checksum 
on the same device) when read, and it's only when matched against the 
other that a problem, presumably with the other copy, would be detected.

My conclusions were two:  

x) Make *VERY* sure I don't actually do that in practice!  If for some 
reason I mount degraded, make sure I consistently use the same device, so 
I don't get incompatible divergence.

y) If which version of the data you keep really matters, in the event of 
a device dropout and would-be re-add, it may be worthwhile to discard/
trim/wipe the entire to-be-re-added device and btrfs device add it, then 
balance, as if it were an entirely new device addition, since that's the 
only way I know of to be sure that the wrong copy isn't picked.

This is VERY VERY different behavior than mdraid would exhibit.  But the 
purpose and use-cases for btrfs raid1 are different as well.  For my 
particular use-case of checksummed file integrity and ensuring /some/ 
copy of the data survived, and since I had tested and found this behavior 
BEFORE actual deployment, I not entirely happily accepted it.  I'm not 
happy with it, but at least I found out about it in my pre-testing, and 
could adapt my recovery practices accordingly.

But that /does/ mean one can't as simply just pull a device from a 
running raid, then plug it back in and re-add, and expect everything to 
just work, as one could do (and I tested!) with mdraid.  One must be 
rather more careful with btrfs raid, at least at this point, unless of 
course the object is to test full restore procedures as well!

OTOH, from a more philosophical perspective mult-device mdraid handling 
has been around for rather longer than multi-device btrfs, and I did see 
mdraid markedly improve in the years I used it.  I expect btrfs raid 
handling will be rather more robust and mature in another decade or so, 
too, and I've already seen reasonable improvement in the six or eight 
months I've been using it (and the 6-8 months before that too, since when 
I first looked at btrfs I decided it simply wasn't mature enough for me 
to run, yet, so I kicked back for a few months and came at it again). =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04  6:10           ` Duncan
@ 2014-01-04 11:20             ` Chris Samuel
  2014-01-04 13:03               ` Duncan
  2014-01-04 14:51             ` Chris Mason
  2014-01-04 21:22             ` Jim Salter
  2 siblings, 1 reply; 40+ messages in thread
From: Chris Samuel @ 2014-01-04 11:20 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1101 bytes --]

On Sat, 4 Jan 2014 06:10:14 AM Duncan wrote:

> Btrfs remains under development and there are clear warnings
> about using it without  backups one hasn't tested recovery from
> or are not otherwise prepared to  actually use.  It's stated in
> multiple locations on the wiki; it's stated on the kernel btrfs
> config option, and it's stated in mkfs.btrfs output when you
> create the filesystem.

Actually the scary warnings are gone from the Kconfig file for what will be the 
3.13 kernel.  Removed by this commit:

commit 4204617d142c0887e45fda2562cb5c58097b918e
Author: David Sterba <dsterba@suse.cz>
Date:   Wed Nov 20 14:32:34 2013 +0100

    btrfs: update kconfig help text
    
    Reflect the current status. Portions of the text taken from the
    wiki pages.
    
    Signed-off-by: David Sterba <dsterba@suse.cz>
    Signed-off-by: Chris Mason <chris.mason@fusionio.com>


-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04  5:57             ` Dave
@ 2014-01-04 11:28               ` Chris Samuel
  2014-01-04 14:56                 ` Chris Mason
  0 siblings, 1 reply; 40+ messages in thread
From: Chris Samuel @ 2014-01-04 11:28 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 609 bytes --]

On Sat, 4 Jan 2014 12:57:02 AM Dave wrote:

> I find myself annoyed by the constant disclaimers I
> read on this list, about the experimental status of Btrfs, but it's
> apparent that this hasn't sunk in for everyone.

Btrfs will no longer marked as experimental in the kernel as of 3.13.

Unless someone submits a patch to fix it first. :-)

Can we also keep things polite here please.

thanks,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04 11:20             ` Chris Samuel
@ 2014-01-04 13:03               ` Duncan
  0 siblings, 0 replies; 40+ messages in thread
From: Duncan @ 2014-01-04 13:03 UTC (permalink / raw)
  To: linux-btrfs

Chris Samuel posted on Sat, 04 Jan 2014 22:20:20 +1100 as excerpted:

> On Sat, 4 Jan 2014 06:10:14 AM Duncan wrote:
> 
>> Btrfs remains under development and there are clear warnings about
>> using it without  backups one hasn't tested recovery from or are not
>> otherwise prepared to  actually use.  It's stated in multiple locations
>> on the wiki; it's stated on the kernel btrfs config option, and it's
>> stated in mkfs.btrfs output when you create the filesystem.
> 
> Actually the scary warnings are gone from the Kconfig file for what will
> be the 3.13 kernel.  Removed by this commit:
> 
> commit 4204617d142c0887e45fda2562cb5c58097b918e

FWIW, I'd characterize that as toned down somewhat, not /gone/.  You 
don't see ext4 or other "mature" filesystems saying "The filesystem disk 
format is no longer unstable, and it's not expected to change 
unless" ..., do you?

"Not expected to change" and etc is definitely toned down from what it 
was, no argument there, but it still isn't exactly what one would expect 
in a description from a stable filesystem.  If there's still some chance 
of the disk format changing, what does that say about the code /dealing/ 
with that disk format?  That doesn't sound exactly like something I'd be 
comfortable staking my reputation as a sysadmin on as judged fully 
reliable and ready for my mission-critical data, for sure!

Tho agreed, one certainly has to read between the lines a bit more for 
the kernel option now than they did.

But the real kicker for me was when I redid several of my btrfs 
partitions to take advantage of newer features, 16 KiB nodes, etc, and 
saw the warning it's giving, yes, in btrfs-progs 3.12 after all the 
recent documentation changes, etc.  Not everybody builds their own 
kernel, but it's kind of hard to get a btrfs filesystem without making 
one!  (Yes, I know the installers make the filesystem for many people, 
and may well hide the output, but if so and the distros don't provide a 
similar warning when people choose btrfs, that's entirely on the distros 
at that point.  Not much btrfs as upstream can do about that.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04  6:10           ` Duncan
  2014-01-04 11:20             ` Chris Samuel
@ 2014-01-04 14:51             ` Chris Mason
  2014-01-04 15:23               ` Goffredo Baroncelli
  2014-01-04 20:08               ` Duncan
  2014-01-04 21:22             ` Jim Salter
  2 siblings, 2 replies; 40+ messages in thread
From: Chris Mason @ 2014-01-04 14:51 UTC (permalink / raw)
  To: 1i5t5.duncan; +Cc: linux-btrfs

On Sat, 2014-01-04 at 06:10 +0000, Duncan wrote:
> Chris Murphy posted on Fri, 03 Jan 2014 16:22:44 -0700 as excerpted:
> 
> > I would not make this option persistent by putting it permanently in the
> > grub.cfg; although I don't know the consequence of always mounting with
> > degraded even if not necessary it could have some negative effects (?)
> 
> Degraded only actually does anything if it's actually needed.  On a 
> normal array it'll be a NOOP, so should be entirely safe for /normal/ 
> operation, but that doesn't mean I'd /recommend/ it for normal operation, 
> since it bypasses checks that are there for a reason, thus silently 
> bypassing information that an admin needs to know before he boots it 
> anyway, in ordered to recover.
> 

> However, I've some other comments to add:
> 
> 1) As you I'm uncomfortable with the whole idea of adding degraded 
> permanently at this point.
> 

I added mount -o degraded just because I wanted the admin to be notified
of failures.  Right now it's still the most reliable way to notify them,
but I definitely agree we can do better.  Leaving it on all the time?  I
don't think this is a great long term solution, unless you are actively
monitoring the system to make sure there are no failures.

Also, as Neil Brown pointed out it does put you at risk of transient
device detection failures getting things out of sync.

> Test:  
> 
> a) Create a two device btrfs raid1.
> 
> b) Mount it and write some data to it.
> 
> c) Unmount it, unplug one device, mount degraded the remaining device.
> 
> d) Write some data to a test file on it, noting the path/filename and 
> data.
> 
> e) Unmount again, switch plugged devices so the formerly unplugged one is 
> now the plugged one, and again mount degraded.
> 
> f) Write some DIFFERENT data to the SAME path/file as in (d), so the two 
> versions each on its own device have now incompatibly forked.
> 
> g) Unmount, plug both devices in and mount, now undegraded.
> 
> What I discovered back then, and to my knowledge the same behavior exists 
> today, is that entirely unexpectedly from and in contrast to my mdraid 
> experience, THE FILESYSTEM MOUNTED WITHOUT PROTEST!!
> 
> h) I checked the file and one variant as written was returned.  STILL NO 
> WARNING!  While I didn't test it, I'm assuming based on the PID-based 
> round-robin read-assignment that I now know btrfs uses, that which copy I 
> got would depend on whether the PID of the reading thread was even or 
> odd, as that's what determines what device of the pair is read.  (There 
> has actually been some discussion of that as it's not a particularly 
> intelligent balancing scheme and it's on the list to change, but the 
> current even/odd works well enough for an initial implementation while 
> the filesystem remains under development.)
> 
> i) Were I rerunning the test today, I'd try a scrub and see what it did 
> with the difference.  But I was early enough in my btrfs learning that I 
> didn't know to run it at that point, so didn't do so.  I'd still be 
> interested in how it handled that, tho based on what I know of btrfs 
> behavior in general, I can /predict/ that which copy it'd scrub out and 
> which it would keep, would again depend on the PID of the scrub thread, 
> since both copies would appear valid (would verify against their checksum 
> on the same device) when read, and it's only when matched against the 
> other that a problem, presumably with the other copy, would be detected.
> 

It'll pick the latest generation number and use that one as the one true
source.  For the others you'll get crc errors which make it fall back to
the latest one.  If the two have exactly the same generation number,
we'll have a hard time picking the best one.

Ilya has a series of changes from this year's GSOC that we need to clean
up and integrate.  It detects offline devices and brings them up to date
automatically.

He targeted the pull-one-drive use case explicitly.

-chris


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04 11:28               ` Chris Samuel
@ 2014-01-04 14:56                 ` Chris Mason
  2014-01-05  9:20                   ` Chris Samuel
  0 siblings, 1 reply; 40+ messages in thread
From: Chris Mason @ 2014-01-04 14:56 UTC (permalink / raw)
  To: chris; +Cc: linux-btrfs

On Sat, 2014-01-04 at 22:28 +1100, Chris Samuel wrote:
> On Sat, 4 Jan 2014 12:57:02 AM Dave wrote:
> 
> > I find myself annoyed by the constant disclaimers I
> > read on this list, about the experimental status of Btrfs, but it's
> > apparent that this hasn't sunk in for everyone.
> 
> Btrfs will no longer marked as experimental in the kernel as of 3.13.
> 
> Unless someone submits a patch to fix it first. :-)
> 
> Can we also keep things polite here please.

Seconded ;)  We're really focused on nailing down these problems instead
of hiding behind the experimental flag.  I know we won't be perfect
overnight, but it's time to focus on production workloads.

-chris


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04 14:51             ` Chris Mason
@ 2014-01-04 15:23               ` Goffredo Baroncelli
  2014-01-04 20:08               ` Duncan
  1 sibling, 0 replies; 40+ messages in thread
From: Goffredo Baroncelli @ 2014-01-04 15:23 UTC (permalink / raw)
  To: Chris Mason; +Cc: 1i5t5.duncan, linux-btrfs

On 2014-01-04 15:51, Chris Mason wrote:
> I added mount -o degraded just because I wanted the admin to be notified
> of failures.  Right now it's still the most reliable way to notify them,
> but I definitely agree we can do better.  

I think that we should align us to what the others raid subsystem (md
and dm) do in these cases.
Reading the man page of mdadm, to me it seems that an array is
constructed even without some disks; the only requirement is the disks
have to be valid (i.e. not out of sync)

> Leaving it on all the time?  I
> don't think this is a great long term solution, unless you are actively
> monitoring the system to make sure there are no failures.

Anyway mdadm has the "monitor" mode, which reports this kind of error.
>From mdadm man page:
"Follow or Monitor
              Monitor one or more md devices and act on any state
              changes.  This is only meaningful for RAID1,
              4, 5, 6, 10 or multipath arrays, as only these have
              interesting state.  RAID0  or  Linear  never
              have missing, spare, or failed drives, so there is
              nothing to monitor.
"

Best regards
GB



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04  2:59           ` Jim Salter
  2014-01-04  5:57             ` Dave
@ 2014-01-04 19:18             ` Chris Murphy
  2014-01-04 21:16               ` Jim Salter
  1 sibling, 1 reply; 40+ messages in thread
From: Chris Murphy @ 2014-01-04 19:18 UTC (permalink / raw)
  To: linux-btrfs


On Jan 3, 2014, at 7:59 PM, Jim Salter <jim@jrs-s.net> wrote:

> 
> On 01/03/2014 07:27 PM, Chris Murphy wrote:
>> This is the wrong way to solve this. /etc/grub.d/10_linux is subject to being replaced on updates. It is not recommended it be edited, same as for grub.cfg. The correct way is as I already stated, which is to edit the GRUB_CMDLINE_LINUX= line in /etc/default/grub. 
> Fair enough - though since I already have to monkey-patch 00_header, I kind of already have an eye on grub.d so it doesn't seem as onerous as it otherwise would. There is definitely a lot of work that needs to be done on the boot sequence for btrfs IMO.

Most of this work is done for a while in current versions of GRUB 2.00. There are a few fixes due in 2.02.  There are some logical challenges making snapshots bootable in a coherent way. But a major advantage of Btrfs is that functionality is contained in one place so once the kernel is booted things usually just work, so I'm not sure what else you're referring to? 


>> I think it's bad advice to recommend always persistently mounting a good volume with this option. There's a reason why degraded is not the default mount option, and why there isn't yet automatic degraded mount functionality. That fstab contains other errors.
> What other errors does it contain? Aside from adding the "degraded" option, that's a bone-stock fstab entry from an Ubuntu Server installation.

fs_passno is 1 which doesn't apply to Btrfs.


>> You're simply dissatisfied with the state of Btrfs development and are suggesting bad hacks as a work around. That's my argument. Again, if your use case requires automatic degraded mounts, use a technology that's mature and well tested for that use case. Don't expect a lot of sympathy if these bad hacks cause you problems later. 
> You're suggesting the wrong alternatives here (mdraid, LVM, etc) - they don't provide the features that I need or are accustomed to (true snapshots, copy on write, self-correcting redundant arrays, and on down the line).

Well actually LVM thinp does have fast snapshots without requiring preallocation, and uses COW. I'm not sure what you mean by self-correcting, but if the drive reports a read error md, lvm, and Btrfs raid1+ all will get missing data from mirror/parity reconstruction, and write corrected data back to the bad sector. All offer scrubbing (except Btrfs raid5/6). If you mean an independent means of verifying data via checksumming, true you're looking at Btrfs, ZFS, or PI.

> If you're going to shoo me off, the correct way to do it is to wave me in the direction of ZFS

There's no shooing, I'm just making observations.


Chris Murphy


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04 14:51             ` Chris Mason
  2014-01-04 15:23               ` Goffredo Baroncelli
@ 2014-01-04 20:08               ` Duncan
  1 sibling, 0 replies; 40+ messages in thread
From: Duncan @ 2014-01-04 20:08 UTC (permalink / raw)
  To: linux-btrfs

Chris Mason posted on Sat, 04 Jan 2014 14:51:23 +0000 as excerpted:

> It'll pick the latest generation number and use that one as the one true
> source.  For the others you'll get crc errors which make it fall back to
> the latest one.  If the two have exactly the same generation number,
> we'll have a hard time picking the best one.
> 
> Ilya has a series of changes from this year's GSOC that we need to clean
> up and integrate.  It detects offline devices and brings them up to date
> automatically.
> 
> He targeted the pull-one-drive use case explicitly.

Thanks for the explanation and bits to look forward to.

I'll be looking forward to seeing that GSOC stuff then, as having 
dropouts and re-adds auto-handled would be a sweet feature to add to the 
raid featureset, improving things from a sysadmin's prepared-to-deal-with-
recovery perspective quite a bit. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04 19:18             ` Chris Murphy
@ 2014-01-04 21:16               ` Jim Salter
  2014-01-05 20:25                 ` Chris Murphy
  0 siblings, 1 reply; 40+ messages in thread
From: Jim Salter @ 2014-01-04 21:16 UTC (permalink / raw)
  To: Chris Murphy, linux-btrfs


On 01/04/2014 02:18 PM, Chris Murphy wrote:
> I'm not sure what else you're referring to?(working on boot 
> environment of btrfs)

Just the string of caveats regarding mounting at boot time - needing to 
monkeypatch 00_header to avoid the bogus sparse file error (which, 
worse, tells you to press a key when pressing a key does nothing) 
followed by this, in my opinion completely unexpected, behavior when 
missing a disk in a fault-tolerant array, which also requires 
monkey-patching in fstab and now elsewhere in GRUB to avoid.

Please keep in mind - I think we got off on the wrong foot here, and I'm 
sorry for my part in that, it was unintentional. I *love* btrfs, and 
think the devs are doing incredible work. I'm excited about it. I'm 
aware it's not intended for production yet. However, it's just on the 
cusp, with distributions not only including it in their installers but a 
couple teetering on the fence with declaring it their next default FS 
(Oracle Unbreakable, OpenSuse, hell even RedHat was flirting with the 
idea) that it seems to me some extra testing with an eye towards 
production isn't a bad thing. That's why I'm here. Not to crap on 
anybody, but to get involved, hopefully helpfully.

> fs_passno is 1 which doesn't apply to Btrfs.
Again, that's the distribution's default, so the argument should be with 
them, not me... with that said, I'd respectfully argue that fs_passno 1 
is correct for any root file system; if the file system itself declines 
to run an fsck that's up to the filesystem, but it's correct to specify 
fs_passno 1 if the filesystem is to be mounted as root in the first place.

I'm open to hearing why that's a bad idea, if you have a specific reason?

> Well actually LVM thinp does have fast snapshots without requiring 
> preallocation, and uses COW.

LVM's snapshots aren't very useful for me - there's a performance 
penalty while you have them in place, so they're best used as a 
transient use-then-immediately-delete feature, for instance for 
rsync'ing off a database binary. Until recently, there also wasn't a 
good way to roll back an LV to a snapshot, and even now, that can be 
pretty problematic. Finally, there's no way to get a partial copy of an 
LV snapshot out of the snapshot and back into production, so if eg you 
have virtual machines of significant size, you could be looking at 
*hours* of file copy operations to restore an individual VM out of a 
snapshot (if you even have the drive space available for it), as 
compared to btrfs' cp --reflink=always operation, which allows you to do 
the same thing instantaneously.

FWIW, I think the ability to do cp --reflink=always is one of the big 
killer features that makes btrfs more attractive than zfs (which, again 
FWIW, I have 5+ years of experience with, and is my current primary 
storage system).

> I'm not sure what you mean by self-correcting, but if the drive 
> reports a read error md, lvm, and Btrfs raid1+ all will get missing 
> data from mirror/parity reconstruction, and write corrected data back 
> to the bad sector.

You're assuming that the drive will actually *report* a read error, 
which is frequently not the case. I have a production ZFS array right 
now that I need to replace an Intel SSD on - the SSD has thrown > 10K 
checksum errors in six months. Zero read or write errors. Neither 
hardware RAID nor mdraid nor LVM would have helped me there.

Since running filesystems that do block-level checksumming, I have 
become aware that bitrot happens without hardware errors getting thrown 
FAR more frequently than I would have thought before having the tools to 
spot it. ZFS, and now btrfs, are the only tools at hand that can 
actually prevent it.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04  6:10           ` Duncan
  2014-01-04 11:20             ` Chris Samuel
  2014-01-04 14:51             ` Chris Mason
@ 2014-01-04 21:22             ` Jim Salter
  2014-01-05 11:01               ` Duncan
  2 siblings, 1 reply; 40+ messages in thread
From: Jim Salter @ 2014-01-04 21:22 UTC (permalink / raw)
  To: Duncan, linux-btrfs


On 01/04/2014 01:10 AM, Duncan wrote:
> The example given in the OP was of a 4-device raid10, already the 
> minimum number to work undegraded, with one device dropped out, to 
> below the minimum required number to mount undegraded, so of /course/ 
> it wouldn't mount without that option.

The issue was not realizing that a degraded fault-tolerant array would 
refuse to mount without being passed an -o degraded option. Yes, it's on 
the wiki - but it's on the wiki under *replacing* a device, not in the 
FAQ, not in the head of the "multiple devices" section, etc; and no 
coherent message is thrown either on the console or in the kernel log 
when you do attempt to mount a degraded array without the correct argument.

IMO that's a bug. =)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04 14:56                 ` Chris Mason
@ 2014-01-05  9:20                   ` Chris Samuel
  2014-01-05 11:16                     ` Duncan
  0 siblings, 1 reply; 40+ messages in thread
From: Chris Samuel @ 2014-01-05  9:20 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason

[-- Attachment #1: Type: text/plain, Size: 933 bytes --]

On Sat, 4 Jan 2014 02:56:39 PM Chris Mason wrote:

> Seconded +ADs-)  We're really focused on nailing down these problems instead
> of hiding behind the experimental flag.  I know we won't be perfect
> overnight, but it's time to focus on production workloads.

Perhaps an option here is to remove the need to specify the degraded flag but 
if the filesystem notice that it is mounting a RAID array and would otherwise 
fail it then sets the degraded flag itself and carries on?

That way the fact it was degraded would be visible in /proc/mounts and could 
be detected with health check scripts like NRPE for icinga/nagios.

Looking at the code this would be in read_one_dev() in fs/btrfs/volumes.c ?

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04 21:22             ` Jim Salter
@ 2014-01-05 11:01               ` Duncan
  0 siblings, 0 replies; 40+ messages in thread
From: Duncan @ 2014-01-05 11:01 UTC (permalink / raw)
  To: linux-btrfs

Jim Salter posted on Sat, 04 Jan 2014 16:22:53 -0500 as excerpted:


> On 01/04/2014 01:10 AM, Duncan wrote:
>> The example given in the OP was of a 4-device raid10, already the
>> minimum number to work undegraded, with one device dropped out, to
>> below the minimum required number to mount undegraded, so of /course/
>> it wouldn't mount without that option.
> 
> The issue was not realizing that a degraded fault-tolerant array would
> refuse to mount without being passed an -o degraded option. Yes, it's on
> the wiki - but it's on the wiki under *replacing* a device, not in the
> FAQ, not in the head of the "multiple devices" section, etc; and no
> coherent message is thrown either on the console or in the kernel log
> when you do attempt to mount a degraded array without the correct
> argument.
> 
> IMO that's a bug. =)

I'd agree, usability bug, one of many smoothing out the rough "it works, 
but it's not easy to work with it" bugs.

FWIW I'm seeing progress in that area, now.  The rush of functional bugs 
and fixes for them has finally slowed down to the point where there's 
beginning to be time to focus on the usability and rough edges bugs.  I 
believe I saw a post in October or November from Chris Mason, where he 
said yes, the maturing of btrfs has been predicted before, but it really 
does seem like the functional bugs are slowing down to the point where 
the usability bugs can finally be addressed, and 2014 really does look 
like the year that btrfs will finally start shaping up into a mature 
looking and acting filesystem, including in usability, etc.

And Chris mentioned the GSoS project that worked on one angle of this 
specific issue, too.  Getting that code integrated and having btrfs 
finally be able to recognize a dropped and re-added device and 
automatically trigger a resync... that'd be a pretty sweet improvement to 
get. =:^)  While they're working on that they may well take a look at at 
least giving the admin more information on a degraded-needed mount 
failure, too, tweaking the kernel log messages, etc, and possibly taking 
a second look as to whether full refusing to mount is the best situation 
then, or not.

Actually, I wonder... what about mounting in such a situation, but read-
only and refusing to go writable unless degraded is added too?  That 
would preserve the "first, do no harm, don't make the problem worse" 
ideal, while mounting but read-only unless degraded is added with the rw, 
wouldn't be /quite/ as drastic as refusing to mount entirely, unless 
degraded is added.  I actually think that, plus some better logging 
saying hey, we don't have enough devices to write with the requested raid 
level, so remount rw,degraded, and either add another device or 
reconfigure the raid mode to something suitable for the number of devices.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-05  9:20                   ` Chris Samuel
@ 2014-01-05 11:16                     ` Duncan
  0 siblings, 0 replies; 40+ messages in thread
From: Duncan @ 2014-01-05 11:16 UTC (permalink / raw)
  To: linux-btrfs

Chris Samuel posted on Sun, 05 Jan 2014 20:20:26 +1100 as excerpted:

> On Sat, 4 Jan 2014 02:56:39 PM Chris Mason wrote:
> 
>> Seconded +ADs-)  We're really focused on nailing down these problems
>> instead of hiding behind the experimental flag.  I know we won't be
>> perfect overnight, but it's time to focus on production workloads.
> 
> Perhaps an option here is to remove the need to specify the degraded
> flag but if the filesystem notice that it is mounting a RAID array and
> would otherwise fail it then sets the degraded flag itself and carries
> on?
> 
> That way the fact it was degraded would be visible in /proc/mounts and
> could be detected with health check scripts like NRPE for icinga/nagios.
> 
> Looking at the code this would be in read_one_dev() in
> fs/btrfs/volumes.c ?

The idea I came up elsewhere was to mount read-only, with a dmesg to the 
effect that the filesystem was configured for a raid-level that the 
current number of devices couldn't support, so mount rw,degraded to 
accept that temporarily and to make changes, either by adding a new 
device to fill out the required number for the configured raid level, or 
by reducing the configured raid level to match reality.

The read-only mount would be better than not mounting at all, while 
preserving the "first, do no further harm" ideal, since mounted read-
only, the existing situation should at least remain stable.  It would 
also alert the admin to problems, with a reasonable log message saying 
how to fix them, while letting the admin at least access the filesystem 
in read-only mode, thereby giving him tools access to manage whatever 
maintenance tasks are necessary, should it be the rootfs.  The admin 
could then take the action they deemed appropriate, whether that was 
getting the data backed up, or mounting degraded,rw in ordered to either 
add a device and bring it back to functional or to rebalance to a lower 
data/metadata redundancy level due to lack of devices.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-04 21:16               ` Jim Salter
@ 2014-01-05 20:25                 ` Chris Murphy
  2014-01-06 10:20                   ` Chris Samuel
  0 siblings, 1 reply; 40+ messages in thread
From: Chris Murphy @ 2014-01-05 20:25 UTC (permalink / raw)
  To: linux-btrfs


On Jan 4, 2014, at 2:16 PM, Jim Salter <jim@jrs-s.net> wrote:

> 
> On 01/04/2014 02:18 PM, Chris Murphy wrote:
>> I'm not sure what else you're referring to?(working on boot environment of btrfs)
> 
> Just the string of caveats regarding mounting at boot time - needing to monkeypatch 00_header to avoid the bogus sparse file error

I don't know what "bogus sparse file error" refers to. What version of GRUB? I'm seeing Ubuntu 12.03 precise-updates listing GRUB 1.99 which is rather old.


> (which, worse, tells you to press a key when pressing a key does nothing) followed by this, in my opinion completely unexpected, behavior when missing a disk in a fault-tolerant array, which also requires monkey-patching in fstab and now elsewhere in GRUB to avoid.

and…

> I'm aware it's not intended for production yet.

On the one hand you say you're aware, yet on the other hand you say the missing disk behavior is completely unexpected.

Some parts of Btrfs, in certain contexts, are production ready. But the developmental state of Btrfs places a burden on the user to know more details about that state than he might otherwise be expected to know with more stable/mature file systems.

My opinion is that it's inappropriate for degraded mounts to be made automatic when there's no method of notifying user space of this state change. Gnome-shell via udisks will inform users of a degraded md array. Something equivalent to that is needed before Btrfs should enable a scenario where a user boots a computer in degraded state without being informed as if there's nothing wrong at all. That's demonstrably far worse than "scary" boot failure, during which one copy of data is still likely safe, unlike permitting uninformed degraded rw operation.



> However, it's just on the cusp, with distributions not only including it in their installers but a couple teetering on the fence with declaring it their next default FS (Oracle Unbreakable, OpenSuse, hell even RedHat was flirting with the idea) that it seems to me some extra testing with an eye towards production isn't a bad thing.

Does the Ubuntu 12.03 LTS installer let you create sysroot on a Btrfs raid1 volume?

> That's why I'm here. Not to crap on anybody, but to get involved, hopefully helpfully.

I think you're better off using something more developmental, it necessarily needs to exist in the first place there, before it can trickle down to an LTS release.

> 
>> fs_passno is 1 which doesn't apply to Btrfs.
> Again, that's the distribution's default, so the argument should be with them, not me…

Yes so you'd want to file a bug? That's how you get involved.

> with that said, I'd respectfully argue that fs_passno 1 is correct for any root file system; if the file system itself declines to run an fsck that's up to the filesystem, but it's correct to specify fs_passno 1 if the filesystem is to be mounted as root in the first place.
> 
> I'm open to hearing why that's a bad idea, if you have a specific reason?

It's a minor point, but it shows that fs_passno has become quaint, like grandma's iron cozy. It's not applicable for either XFS or Btrfs. It's arguably inapplicable for ext3/4 but its fsck program has an optimization to skip fully checking the file system if the journal replay succeeds. There is no unattended fsck for either XFS or Btrfs.

On systemd systems, it reads fstab, and if fs_passno is non-zero it checks for the existence of /sbin/fsck.<fs> and if it doesn't exist, then it doesn't run fsck for that entry. This topic was recently brought up and is in the archives.


>> Well actually LVM thinp does have fast snapshots without requiring preallocation, and uses COW.
> 
> LVM's snapshots aren't very useful for me - there's a performance penalty while you have them in place, so they're best used as a transient use-then-immediately-delete feature, for instance for rsync'ing off a database binary. Until recently, there also wasn't a good way to roll back an LV to a snapshot, and even now, that can be pretty problematic.

This describes old LVM snapshots, not LVM thinp snapshots.

> Finally, there's no way to get a partial copy of an LV snapshot out of the snapshot and back into production, so if eg you have virtual machines of significant size, you could be looking at *hours* of file copy operations to restore an individual VM out of a snapshot (if you even have the drive space available for it), as compared to btrfs' cp --reflink=always operation, which allows you to do the same thing instantaneously.

LVM isn't a file system, so limitations compared to Btrfs are expected.

> 
>> I'm not sure what you mean by self-correcting, but if the drive reports a read error md, lvm, and Btrfs raid1+ all will get missing data from mirror/parity reconstruction, and write corrected data back to the bad sector.
> 
> You're assuming that the drive will actually *report* a read error, which is frequently not the case.

This is discussed in significant detail in the linux-raid@ list archives. I'm not aware of data that explicitly concludes or proposes a ratio between ECC error detection with non-correction (resulting in a read error) vs silent data corruption. I've seen quite a few read errors from drives compared to what I think was SDC - but that's not a scientific sample. Polluting a lot of the data is a mismatch between default drive ERC timeouts compared to SCSI block layer timeouts, so when a drive ECC isn't able to produce a result within the SCSI block layer timeout time, we get a link reset. Now we don't know what the drive would have reported, a read error? Or bogus data?


> I have a production ZFS array right now that I need to replace an Intel SSD on - the SSD has thrown > 10K checksum errors in six months. Zero read or write errors. Neither hardware RAID nor mdraid nor LVM would have helped me there.

Of course, that's not their design goal. But I don't think the Btrfs devs are suggesting a design goal is to compensate for spectacular failure of the drive's ECC, because if all drives in your Btrfs volume behaved the way this one SSD you're reporting behaves, you'd inevitably still lose data. Btrfs checksumming isn't a substitute for drive ECC. What you're reporting is a significant ECC fail.


> Since running filesystems that do block-level checksumming, I have become aware that bitrot happens without hardware errors getting thrown FAR more frequently than I would have thought before having the tools to spot it. ZFS, and now btrfs, are the only tools at hand that can actually prevent it.

There are other tools than ZFS and Btrfs, they just aren't open source.

10K checksum errors in six months without a single read error is not bitrot, it's a more significant failure. Bitrot is one kind of silent data corruption, not all SDC is due to bit rot, there are a lot of other sources for data corruption in the storage stack.

Yes it's good we have ZFS and Btrfs for additional protection, but I don't see these file systems as getting manufacturers off the hook with respect to ECC. That needs to get better, they know it needs to get better and that's one of the major reasons why spinning drives have moved to 4K physical sectors. So moving to checksumming file systems isn't the only way to prevent these problems.


Chris Murphy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-05 20:25                 ` Chris Murphy
@ 2014-01-06 10:20                   ` Chris Samuel
  2014-01-06 18:30                     ` Chris Murphy
  0 siblings, 1 reply; 40+ messages in thread
From: Chris Samuel @ 2014-01-06 10:20 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 519 bytes --]

On Sun, 5 Jan 2014 01:25:19 PM Chris Murphy wrote:

> Does the Ubuntu 12.03 LTS installer let you create sysroot on a Btrfs raid1
> volume?

I doubt it, given the alpha for 14.04 doesn't seem to have the concept yet. 
:-)

https://bugs.launchpad.net/ubuntu/+source/grub-installer/+bug/1266200

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-06 10:20                   ` Chris Samuel
@ 2014-01-06 18:30                     ` Chris Murphy
  2014-01-06 19:25                       ` Jim Salter
  2014-01-06 19:31                       ` correct way to rollback a root filesystem? Jim Salter
  0 siblings, 2 replies; 40+ messages in thread
From: Chris Murphy @ 2014-01-06 18:30 UTC (permalink / raw)
  To: Chris Samuel; +Cc: linux-btrfs


On Jan 6, 2014, at 3:20 AM, Chris Samuel <chris@csamuel.org> wrote:

> On Sun, 5 Jan 2014 01:25:19 PM Chris Murphy wrote:
> 
>> Does the Ubuntu 12.03 LTS installer let you create sysroot on a Btrfs raid1
>> volume?
> 
> I doubt it, given the alpha for 14.04 doesn't seem to have the concept yet. 
> :-)
> 
> https://bugs.launchpad.net/ubuntu/+source/grub-installer/+bug/1266200

Color me surprised.

Fedora 20 lets you create Btrfs raid1/raid0 for rootfs, but due to a long standing grubby bug [1] /boot can't be on Btrfs, so it's only ext4. That means only one of your disks will get grub.cfg, and means if it dies, you won't boot without user intervention that also requires esoteric grub knowledge. 

/boot needs to be on Btrfs or it gets messy. The messy alternative, each drive has an ext4 boot partition means kernel updates have to be written to each drive, and each drives separate /boot/grub/grub.cfg needs to be updated. That's kinda ick x2.  Yes they could be made md raid1 to solve part of this.

It gets slightly more amusing on UEFI, where the installer needs to be smart enough to create (or reuse) the EFI System partition on each device [2] for the bootloader but NOT for the grub.cfg [3], otherwise we have separate grub.cfgs on each ESP to update when there are kernel updates.

And if a disk fails, and is replaced, while grub-install works on BIOS, it doesn't work on UEFI because it'll only install a bootloader if the ESP is mounted in the right location.

So until every duck is in the row, I think we can hardly point one finger when it comes to making a degrade system bootable without any human intervention.

[1] grubby fatal error updating grub.cfg when /boot is btrfs
https://bugzilla.redhat.com/show_bug.cgi?id=864198

[2] RFE: always create required bootloader partitions in custom partitioning
https://bugzilla.redhat.com/show_bug.cgi?id=1022316

[2] On EFI, grub.cfg should be in /boot/grub not /boot/efi/EFI/fedora
https://bugzilla.redhat.com/show_bug.cgi?id=1048999


Chris Murphy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-06 18:30                     ` Chris Murphy
@ 2014-01-06 19:25                       ` Jim Salter
  2014-01-06 22:05                         ` Chris Murphy
  2014-01-07  5:43                         ` Chris Samuel
  2014-01-06 19:31                       ` correct way to rollback a root filesystem? Jim Salter
  1 sibling, 2 replies; 40+ messages in thread
From: Jim Salter @ 2014-01-06 19:25 UTC (permalink / raw)
  To: Chris Murphy, Chris Samuel; +Cc: linux-btrfs

FWIW, Ubuntu (and I presume Debian) will work just fine with a single / 
on btrfs, single or multi disk.

I currently have two machines booting to a btrfs-raid10 / with no 
separate /boot, one booting to a btrfs single disk / with no /boot, and 
one booting to a btrfs-raid10 / with an ext4-on-mdraid1 /boot.

On 01/06/2014 01:30 PM, Chris Murphy wrote:
> Color me surprised. Fedora 20 lets you create Btrfs raid1/raid0 for 
> rootfs, but due to a long standing grubby bug [1] /boot can't be on 
> Btrfs, so it's only ext4. 



^ permalink raw reply	[flat|nested] 40+ messages in thread

* correct way to rollback a root filesystem?
  2014-01-06 18:30                     ` Chris Murphy
  2014-01-06 19:25                       ` Jim Salter
@ 2014-01-06 19:31                       ` Jim Salter
  2014-01-07 11:55                         ` Sander
  1 sibling, 1 reply; 40+ messages in thread
From: Jim Salter @ 2014-01-06 19:31 UTC (permalink / raw)
  To: linux-btrfs

Hi list -

I tried a kernel upgrade with moderately disastrous (non-btrfs-related) 
results this morning; after the kernel upgrade Xorg was completely 
borked beyond my ability to get it working properly again through any 
normal means. I do have hourly snapshots being taken by cron, though, so 
I'm successfully X'ing again on the machine in question right now.

It was quite a fight getting back to where I started even so, though - 
I'm embarassed to admit I finally ended up just doing a cp --reflink=all 
/mnt/@/.snapshots/snapshotname /mnt/@/ from the initramfs BusyBox 
prompt.  Which WORKED well enough, but obviously isn't ideal.

I tried the btrfs sub set-default command - again from BusyBox - and it 
didn't seem to want to work for me; I got an inappropriate ioctl error 
(which may be because I tried to use / instead of /mnt, where the root 
volume was CURRENTLY mounted, as an argument?). Before that, I'd tried 
setting subvol=@root (which is the writeable snapshot I created from the 
original read-only hourly snapshot I had) in GRUB and in fstab... but 
that's what landed me in BusyBox to begin with.

When I DID mount the filesystem in BusyBox on /mnt, I saw that @ and 
@home were listed under /mnt, but no other "directories" were - which 
explains why mounting -o subvol=@root didn't work. I guess the question 
is, WHY couldn't I see @root in there, since I had a working, readable, 
writeable snapshot which showed its own name as "root" when doing a 
btrfs sub show /.snapshots/root ?

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-06 19:25                       ` Jim Salter
@ 2014-01-06 22:05                         ` Chris Murphy
  2014-01-06 22:24                           ` Jim Salter
  2014-01-07  5:43                         ` Chris Samuel
  1 sibling, 1 reply; 40+ messages in thread
From: Chris Murphy @ 2014-01-06 22:05 UTC (permalink / raw)
  To: Jim Salter; +Cc: Chris Samuel, linux-btrfs


On Jan 6, 2014, at 12:25 PM, Jim Salter <jim@jrs-s.net> wrote:

> FWIW, Ubuntu (and I presume Debian) will work just fine with a single / on btrfs, single or multi disk.
> 
> I currently have two machines booting to a btrfs-raid10 / with no separate /boot, one booting to a btrfs single disk / with no /boot, and one booting to a btrfs-raid10 / with an ext4-on-mdraid1 /boot.

Did you create the multiple device layouts outside of the installer first?

What I'm seeing in the Ubuntu 12.03.04 installer is a choice of which disk to put the bootloader. If that's reliable UI, then it won't put it on both disks which means a single point of failure in which case -o degraded not being automatic with Btrfs is essentially pointless if we don't have a bootloader. I also see no way in the UI to even create Btrfs raid of any sort.

Chris Murphy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-06 22:05                         ` Chris Murphy
@ 2014-01-06 22:24                           ` Jim Salter
  0 siblings, 0 replies; 40+ messages in thread
From: Jim Salter @ 2014-01-06 22:24 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Chris Samuel, linux-btrfs

No, the installer is completely unaware. What I was getting at is that 
rebalancing (and installing the bootloader) is dead easy, so it doesn't 
bug me personally much.  It'd be nice to eventually get something in the 
installer to make it obvious to the oblivious that it can be done and 
how, but in the meantime, it's frankly easier to set up btrfs-raid 
WITHOUT installer support than it is to set up mdraid WITH installer support

Install process for 4-drive btrfs-raid10 root on Ubuntu (desktop or server):

1. do single-disk install on first disk, default all the way through 
except picking btrfs instead of ext4 for /
2. sfdisk -d /dev/sda | sfdisk /dev/sdb ; sfdisk -d /dev/sda | sfdisk 
/dev/sdc ; sfdisk -d /dev/sda | sfdisk /dev/sdd
3. btrfs dev add /dev/sdb1 /dev/sdc1 /dev/sdd1 /
4. btrfs balance start -dconvert=raid10 -mconvert=raid10 /
5. grub-install /dev/sdb ; grub-install /dev/sdc ; grub-install /dev/sdd

Done. The rebalancing takes less than a minute, and the system's 
responsive while it happens.  Once you've done the grub-install on the 
additional drives, you're good to go - Ubuntu already uses the UUID 
instead of a device ID for GRUB and fstab, so the btrfs mount will scan 
all drives and find any that are there. The only hitch is the need to 
mount degraded that I Chicken Littled about earlier so loudly. =)

On 01/06/2014 05:05 PM, Chris Murphy wrote:
> On Jan 6, 2014, at 12:25 PM, Jim Salter <jim@jrs-s.net> wrote:
>
>> FWIW, Ubuntu (and I presume Debian) will work just fine with a single / on btrfs, single or multi disk.
>>
>> I currently have two machines booting to a btrfs-raid10 / with no separate /boot, one booting to a btrfs single disk / with no /boot, and one booting to a btrfs-raid10 / with an ext4-on-mdraid1 /boot.
> Did you create the multiple device layouts outside of the installer first?
>
> What I'm seeing in the Ubuntu 12.03.04 installer is a choice of which disk to put the bootloader. If that's reliable UI, then it won't put it on both disks which means a single point of failure in which case -o degraded not being automatic with Btrfs is essentially pointless if we don't have a bootloader. I also see no way in the UI to even create Btrfs raid of any sort.
>
> Chris Murphy


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
  2014-01-06 19:25                       ` Jim Salter
  2014-01-06 22:05                         ` Chris Murphy
@ 2014-01-07  5:43                         ` Chris Samuel
  1 sibling, 0 replies; 40+ messages in thread
From: Chris Samuel @ 2014-01-07  5:43 UTC (permalink / raw)
  To: linux-btrfs

On 07/01/14 06:25, Jim Salter wrote:

> FWIW, Ubuntu (and I presume Debian) will work just fine with a single /
> on btrfs, single or multi disk.
> 
> I currently have two machines booting to a btrfs-raid10 / with no
> separate /boot, one booting to a btrfs single disk / with no /boot, and
> one booting to a btrfs-raid10 / with an ext4-on-mdraid1 /boot.

Actually I've run into a problem with grub where a fresh install cannot
boot from a btrfs /boot if your first partition is not 1MB aligned
(sector 2048) then there is then not enough space for it to store its
btrfs code. :-(

https://bugs.launchpad.net/ubuntu/+source/grub-installer/+bug/1266195

I don't want to move my first partition as it's a Dell special (type
'de') and I'm not sure what the impact would be, so I just created an
ext4 /boot and the install then worked.

Regarding RAID, yes I realise it's easy to do post-fact, in fact on the
same test system I added an external USB2 drive to the root filesystem
and rebalanced as RAID-1, worked nicely.

I'm planning on adding dual SSDs as my OS disks to my desktop and this
experiment was to learn whether the Kubuntu installer handled it yet and
if not to do a quick practice of setting it up by hand. :-)

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: correct way to rollback a root filesystem?
  2014-01-06 19:31                       ` correct way to rollback a root filesystem? Jim Salter
@ 2014-01-07 11:55                         ` Sander
  0 siblings, 0 replies; 40+ messages in thread
From: Sander @ 2014-01-07 11:55 UTC (permalink / raw)
  To: Jim Salter; +Cc: linux-btrfs

Jim Salter wrote (ao):
> I tried a kernel upgrade with moderately disastrous
> (non-btrfs-related) results this morning; after the kernel upgrade
> Xorg was completely borked beyond my ability to get it working
> properly again through any normal means. I do have hourly snapshots
> being taken by cron, though, so I'm successfully X'ing again on the
> machine in question right now.
> 
> It was quite a fight getting back to where I started even so, though
> - I'm embarassed to admit I finally ended up just doing a cp
> --reflink=all /mnt/@/.snapshots/snapshotname /mnt/@/ from the
> initramfs BusyBox prompt.  Which WORKED well enough, but obviously
> isn't ideal.
> 
> I tried the btrfs sub set-default command - again from BusyBox - and
> it didn't seem to want to work for me; I got an inappropriate ioctl
> error (which may be because I tried to use / instead of /mnt, where
> the root volume was CURRENTLY mounted, as an argument?). Before
> that, I'd tried setting subvol=@root (which is the writeable
> snapshot I created from the original read-only hourly snapshot I
> had) in GRUB and in fstab... but that's what landed me in BusyBox to
> begin with.
> 
> When I DID mount the filesystem in BusyBox on /mnt, I saw that @ and
> @home were listed under /mnt, but no other "directories" were -
> which explains why mounting -o subvol=@root didn't work. I guess the
> question is, WHY couldn't I see @root in there, since I had a
> working, readable, writeable snapshot which showed its own name as
> "root" when doing a btrfs sub show /.snapshots/root ?

I don't quite get how your setup is.

In my setup, all subvolumes and snapshots are under /.root/

# cat /etc/fstab
LABEL=panda   /  btrfs  subvol=rootvolume,space_cache,inode_cache,compress=lzo,ssd  0  0
LABEL=panda   /home           btrfs   subvol=home                                   0  0
LABEL=panda   /root           btrfs   subvol=root                                   0  0
LABEL=panda   /var            btrfs   subvol=var                                    0  0
LABEL=panda   /holding        btrfs   subvol=.holding                               0  0
LABEL=panda   /.root          btrfs   subvolid=0                                    0  0
/Varlib       /var/lib        none    bind                                          0  0


In case of an OS upgrade gone wrong, I would mount subvolid=0, move
subvolume 'rootvolume' out of the way, and move (rename) the last known
good snapshot to 'rootvolume'.

Not sure if that works though. Never tried.

	Sander

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2014-01-07 11:55 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-03 22:28 btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT Jim Salter
2014-01-03 22:42 ` Emil Karlson
2014-01-03 22:43 ` Joshua Schüler
2014-01-03 22:56   ` Jim Salter
2014-01-03 23:04     ` Hugo Mills
2014-01-03 23:04     ` Joshua Schüler
2014-01-03 23:13       ` Jim Salter
2014-01-03 23:18         ` Hugo Mills
2014-01-03 23:25           ` Jim Salter
2014-01-03 23:32             ` Chris Murphy
2014-01-03 23:22         ` Chris Murphy
2014-01-04  6:10           ` Duncan
2014-01-04 11:20             ` Chris Samuel
2014-01-04 13:03               ` Duncan
2014-01-04 14:51             ` Chris Mason
2014-01-04 15:23               ` Goffredo Baroncelli
2014-01-04 20:08               ` Duncan
2014-01-04 21:22             ` Jim Salter
2014-01-05 11:01               ` Duncan
2014-01-03 23:19     ` Chris Murphy
     [not found]     ` <CAOjFWZ7zC3=4oH6=SBZA+PhZMrSK1KjxoRN6L2vqd=GTBKKTQA@mail.gmail.com>
2014-01-03 23:42       ` Jim Salter
2014-01-03 23:45         ` Jim Salter
2014-01-04  0:27         ` Chris Murphy
2014-01-04  2:59           ` Jim Salter
2014-01-04  5:57             ` Dave
2014-01-04 11:28               ` Chris Samuel
2014-01-04 14:56                 ` Chris Mason
2014-01-05  9:20                   ` Chris Samuel
2014-01-05 11:16                     ` Duncan
2014-01-04 19:18             ` Chris Murphy
2014-01-04 21:16               ` Jim Salter
2014-01-05 20:25                 ` Chris Murphy
2014-01-06 10:20                   ` Chris Samuel
2014-01-06 18:30                     ` Chris Murphy
2014-01-06 19:25                       ` Jim Salter
2014-01-06 22:05                         ` Chris Murphy
2014-01-06 22:24                           ` Jim Salter
2014-01-07  5:43                         ` Chris Samuel
2014-01-06 19:31                       ` correct way to rollback a root filesystem? Jim Salter
2014-01-07 11:55                         ` Sander

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.