raid1: cannot add disk to replace faulty because can only mount fs as read-only.

All of lore.kernel.org
 help / color / mirror / Atom feed

* raid1:  cannot add disk to replace faulty because can only mount fs as read-only.
@ 2017-01-24 18:57 Hans Deragon
  2017-01-24 19:48 ` Adam Borowski
       [not found] ` <W75Sc6PDCBok7W75TcCgc7@videotron.ca>
  0 siblings, 2 replies; 19+ messages in thread
From: Hans Deragon @ 2017-01-24 18:57 UTC (permalink / raw)
  To: linux-btrfs

Greetings,

Warning: Btrfs user here; no knowledge of the inside working of btrfs. 
If I am in the wrong mailing list, please redirect me and accept my 
apologies.

At home, lacking of disks and free SATA ports, I created a raid1 btrfs 
filesystem by converting an existing single btrfs instance into a 
degraded raid1, then added the other driver. The exact commands I used 
have been lost.

Worked well, until one of my drive died. Total death; the OS does not 
detect it anymore. I bought another drive, but alas, I cannot add it:

# btrfs replace start -B 2 /dev/sdd /mnt/brtfs-raid1-b
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt/brtfs-raid1-b": 
Read-only file system

Here is the command I used to mount it:

mount -t btrfs -o ro,degraded,recovery,nosuid,nodev,nofail,x-gvfs-show 
/dev/disk/by-uuid/975bdbb3-9a9c-4a72-ad67-6cda545fda5e 
/mnt/brtfs-raid1-b

If I remove 'ro' from the option, I cannot get the filesystem mounted 
because of the following error:

BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not 
allowed

So I am stuck. I can only mount the filesystem as read-only, which 
prevents me to add a disk.

It seams related to bug: 
https://bugzilla.kernel.org/show_bug.cgi?id=60594

I am using Ubuntu 16.04 LTS with kernel 4.4.0-59-generic. Is there any 
hope to add a disk? Else, can I recreate a raid1 with only one disk and 
add another, but never suffer from the same problem again? I did not 
lost any data, but I do have some serious downtime because of this. I 
wish that if a drive fail, the btrfs filesystem still mounts rw and 
leave the OS running, but warns the user of a failing disk and easily 
allow the addition of a new drive to reintroduce redundancy. However, 
this scenarios seams impossible with the current state of affair. Am I 
right?

Best regards and thank you for your contribution to the open source 
movement,
Hans Deragon

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1:  cannot add disk to replace faulty because can only mount fs as read-only.
  2017-01-24 18:57 raid1: cannot add disk to replace faulty because can only mount fs as read-only Hans Deragon
@ 2017-01-24 19:48 ` Adam Borowski
       [not found] ` <W75Sc6PDCBok7W75TcCgc7@videotron.ca>
  1 sibling, 0 replies; 19+ messages in thread
From: Adam Borowski @ 2017-01-24 19:48 UTC (permalink / raw)
  To: Hans Deragon; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 914 bytes --]

On Tue, Jan 24, 2017 at 01:57:24PM -0500, Hans Deragon wrote:
> If I remove 'ro' from the option, I cannot get the filesystem mounted
> because of the following error:
> 
> BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not
> allowed
> 
> So I am stuck. I can only mount the filesystem as read-only, which prevents
> me to add a disk.

A known problem: you get only one shot at fixing the filesystem, but that's
not because of some damage but because the check whether the fs is in a
shape is good enough to mount is oversimplistic.

Here's a patch, if you apply it and recompile, you'll be able to mount
degraded rw.

Note that it removes a safety harness: here, the harness got tangled up and
keeps you from recovering when it shouldn't, but it _has_ valid uses that.


Meow!
-- 
Autotools hint: to do a zx-spectrum build on a pdp11 host, type:
  ./configure --host=zx-spectrum --build=pdp11

[-- Attachment #2: 0001-NOT-FOR-MERGING-btrfs-make-too-many-missing-devices-.patch --]
[-- Type: text/x-diff, Size: 1559 bytes --]

>From 1367d3da6b0189797f6090b11d8716a1cc136593 Mon Sep 17 00:00:00 2001
From: Adam Borowski <kilobyte@angband.pl>
Date: Mon, 23 Jan 2017 19:03:20 +0100
Subject: [PATCH] [NOT-FOR-MERGING] btrfs: make "too many missing devices"
 check non-fatal

It breaks degraded mounts of multi-device filesystems that have any single
blocks, which are naturally created if it has been mounted degraded before.
Obviously, any further device loss will result in data loss, but the user
has already specified -odegraded so that's understood.

For a real fix, we'd want to check whether any of single blocks are missing,
as that would allow telling apart broken JBOD filesystems from bona-fide
degraded RAIDs.

(This patch is for the benefit of folks who'd have to recreate a filesystem
just because it got degraded.)
---
 fs/btrfs/disk-io.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 18004169552c..1b25b9e24662 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3060,10 +3060,9 @@ int open_ctree(struct super_block *sb,
 	     fs_info->num_tolerated_disk_barrier_failures &&
 	    !(sb->s_flags & MS_RDONLY)) {
 		btrfs_warn(fs_info,
-"missing devices (%llu) exceeds the limit (%d), writeable mount is not allowed",
+"missing devices (%llu) exceeds the limit (%d), add more or risk data loss",
 			fs_info->fs_devices->missing_devices,
 			fs_info->num_tolerated_disk_barrier_failures);
-		goto fail_sysfs;
 	}
 
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: raid1:  cannot add disk to replace faulty because can only mount fs as read-only.
       [not found] ` <W75Sc6PDCBok7W75TcCgc7@videotron.ca>
@ 2017-01-27 16:47   ` Hans Deragon
  2017-01-27 20:03     ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 19+ messages in thread
From: Hans Deragon @ 2017-01-27 16:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Adam Borowski

On 2017-01-24 14:48, Adam Borowski wrote:

> On Tue, Jan 24, 2017 at 01:57:24PM -0500, Hans Deragon wrote:
> 
>> If I remove 'ro' from the option, I cannot get the filesystem mounted 
>> because of the following error: BTRFS: missing devices(1) exceeds the 
>> limit(0), writeable mount is not allowed So I am stuck. I can only 
>> mount the filesystem as read-only, which prevents me to add a disk.
> 
> A known problem: you get only one shot at fixing the filesystem, but 
> that's
> not because of some damage but because the check whether the fs is in a
> shape is good enough to mount is oversimplistic.
> 
> Here's a patch, if you apply it and recompile, you'll be able to mount
> degraded rw.
> 
> Note that it removes a safety harness: here, the harness got tangled up 
> and
> keeps you from recovering when it shouldn't, but it _has_ valid uses 
> that.
> 
> Meow!

Greetings,

Ok, that solution will solve my problem in the short run, i.e. getting 
my raid1 up again.

However, as a user, I am seeking for an easy, no maintenance raid 
solution.  I wish that if a drive fails, the btrfs filesystem still 
mounts rw and leaves the OS running, but warns the user of the failing 
disk and easily allow the addition of a new drive to reintroduce 
redundancy.  Are there any plans within the btrfs community to implement 
such a feature?  In a year from now, when the other drive will fail, 
will I hit again this problem, i.e. my OS failing to start, booting into 
a terminal, and cannot reintroduce a new drive without recompiling the 
kernel?

Best regards,
Hans Deragon

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
  2017-01-27 16:47   ` Hans Deragon
@ 2017-01-27 20:03     ` Austin S. Hemmelgarn
  2017-01-27 20:28       ` Adam Borowski
  2017-01-28  9:17       ` Andrei Borzenkov
  0 siblings, 2 replies; 19+ messages in thread
From: Austin S. Hemmelgarn @ 2017-01-27 20:03 UTC (permalink / raw)
  To: Hans Deragon, linux-btrfs; +Cc: Adam Borowski

On 2017-01-27 11:47, Hans Deragon wrote:
> On 2017-01-24 14:48, Adam Borowski wrote:
>
>> On Tue, Jan 24, 2017 at 01:57:24PM -0500, Hans Deragon wrote:
>>
>>> If I remove 'ro' from the option, I cannot get the filesystem mounted
>>> because of the following error: BTRFS: missing devices(1) exceeds the
>>> limit(0), writeable mount is not allowed So I am stuck. I can only
>>> mount the filesystem as read-only, which prevents me to add a disk.
>>
>> A known problem: you get only one shot at fixing the filesystem, but
>> that's
>> not because of some damage but because the check whether the fs is in a
>> shape is good enough to mount is oversimplistic.
>>
>> Here's a patch, if you apply it and recompile, you'll be able to mount
>> degraded rw.
>>
>> Note that it removes a safety harness: here, the harness got tangled
>> up and
>> keeps you from recovering when it shouldn't, but it _has_ valid uses
>> that.
>>
>> Meow!
>
> Greetings,
>
> Ok, that solution will solve my problem in the short run, i.e. getting
> my raid1 up again.
>
> However, as a user, I am seeking for an easy, no maintenance raid
> solution.  I wish that if a drive fails, the btrfs filesystem still
> mounts rw and leaves the OS running, but warns the user of the failing
> disk and easily allow the addition of a new drive to reintroduce
> redundancy.  Are there any plans within the btrfs community to implement
> such a feature?  In a year from now, when the other drive will fail,
> will I hit again this problem, i.e. my OS failing to start, booting into
> a terminal, and cannot reintroduce a new drive without recompiling the
> kernel?
Before I make any suggestions regarding this, I should point out that 
mounting read-write when a device is missing is what caused this issue 
in the first place.  Doing so is extremely dangerous in any RAID setup, 
regardless of your software stack.  The filesystem is expected to store 
things reliably when a write succeeds, and if you've got a broken RAID 
array, claiming that you can store things reliably is generally a lie. 
MD and LVM both have things in place to mitigate most of the risk, but 
even there it's still risky.  Yes, it's not convenient to have to deal 
with a system that won't boot, but it's at least a whole lot easier from 
Linux than it is in most other operating systems.

Now, the first step to reliable BTRFS usage is using up-to-date kernels. 
  If you're actually serious about using BTRFS, you should be doing this 
anyway though.  Assuming you're keeping up-to-date on the kernel, then 
you won't hit this same problem again (or at least you shouldn't, since 
multiple people now have checks for this in their regression testing 
suites for BTRFS).

The second is proper monitoring.  A well set up monitoring system will 
let you know when the disk is failing before it gets to the point of 
just disappearing from the system most of the time.  There is currently 
no specific monitoring tool for BTRFS, but it's really easy to set up 
automated monitoring for stuff like this.  It's impractical for me to 
cover exact configuration here, since I don't know how much background 
you have dealing with stuff like this (and you're probably using systemd 
since it's Ubnutu, and I have near zero background dealing with 
recurring task scheduling with that).  I can however cover a list of 
what you should be monitoring and roughly how often:
1. SMART status from the storage devices.  You'll need smartmontools for 
this.  In general, I'd suggest using smartctl through cron or a systemd 
timer unit to monitor this instead of smartd.  Basic command-line that 
will work on all modern SATA disks to perform the checks you want is:
smartctl -H /dev/sda
You'll need one call for each disk, just replace /dev/sda with each 
device.  Note that this should be the device itself, not the partitions. 
  If that command spits out a warning (or returns with an exit code 
other than 0), something's wrong and you should at least investigate 
(and possibly look at replacing the disk).  I would suggest checking 
SMART status at least daily, and potentially much more frequently. 
When the self-checks in the disk firmware start failing (which is what 
this is checking), it generally means that failure is imminent, usually 
within a couple of days at most.
2. BTRFS scrub.  if you're serious about data safety, you should be 
running a scrub on the filesystem regularly.  As a general rule, once a 
week is reasonable unless you have marginal hardware or are seriously 
paranoid.  Make sure to check the results later with the 'btrfs scrub 
status' command.  It will tell you if it found any errors, and how many 
it was able to fix.  Isolated single errors are generally not a sign of 
imminent failure, it's when they start happening regularly or you see a 
whole lot at once that you're in trouble.  Scrub will also fix most 
synchronization issues between devices in a RAID set.
3. BTRFS device stats.  BTRFS stores per-device error counters in the 
filesystem.  These track cumulative errors since the last time they were 
reset, including errors encountered during normal operation.  You should 
be checking these regularly.  I"m a bit paranoid, so most of my systems 
check every hour.  Daily is usually sufficient for most people.  There 
are a couple of options for checking these.  The newest versions of 
btrfs-progs (which are not in Ubuntu yet) have a switch that will change 
the exit code if any counter is non-zero.  The other option 9which works 
regardless of btrfs-progs version) is to use a script to check the output.
4. Filesystem mount flags.  When BTRFS encounters a severe error (I'm 
not sure about the full list that will trigger this, except that it 
doesn't include read errors if they get corrected (which they should if 
you're using RAID)), it will remount the filesystem read-only.  This is 
a safety measure to prevent the kernel or the rest of the system from 
making any issues with the filesystem worse.  If you monitor the mount 
options for the filesystem to know when this happens (note that the 
response _SHOULD NOT_ be remounting the FS writable again, if the kernel 
remounted it read-only, something is seriously wrong).  A number of 
monitoring tools can actually automate checking this one for you (as 
well other stuff like disk usage), but it's pretty easy to find scripts 
that can do this on the internet because this is pretty standard 
behavior among a wide variety of Linux filesystems.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
  2017-01-27 20:03     ` Austin S. Hemmelgarn
@ 2017-01-27 20:28       ` Adam Borowski
  2017-01-28  9:17       ` Andrei Borzenkov
  1 sibling, 0 replies; 19+ messages in thread
From: Adam Borowski @ 2017-01-27 20:28 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Hans Deragon, linux-btrfs

On Fri, Jan 27, 2017 at 03:03:18PM -0500, Austin S. Hemmelgarn wrote:
> On 2017-01-27 11:47, Hans Deragon wrote:
> > However, as a user, I am seeking for an easy, no maintenance raid
> > solution.  I wish that if a drive fails, the btrfs filesystem still
> > mounts rw and leaves the OS running, but warns the user of the failing
> > disk and easily allow the addition of a new drive to reintroduce
> > redundancy.

> Before I make any suggestions regarding this, I should point out that
> mounting read-write when a device is missing is what caused this issue in
> the first place.  Doing so is extremely dangerous in any RAID setup,
> regardless of your software stack.  The filesystem is expected to store
> things reliably when a write succeeds, and if you've got a broken RAID
> array, claiming that you can store things reliably is generally a lie. MD
> and LVM both have things in place to mitigate most of the risk, but even
> there it's still risky.  Yes, it's not convenient to have to deal with a
> system that won't boot, but it's at least a whole lot easier from Linux than
> it is in most other operating systems.

Now, now.  Other RAID implementations already have this feature that you're
clamoring for!  When it is degraded, they will continue without a hitch, and
perform their duties not even bothering the user.  Then a couple years
later, the other disk will fail.  Obviously, there are no backups -- "we
have RAID".  This is when I get a call.

> The second is proper monitoring.  A well set up monitoring system will let
> you know when the disk is failing before it gets to the point of just
> disappearing from the system most of the time.

No problem, the second busted disk I mentioned above will include a full
mbox with a mail from mdadm for every single day.  They were either unread,
or read by an admin who ignored them and perhaps even wrote a filter to send
them to /dev/null.  Because the system still works, what's the hurry?


Meow!
-- 
Autotools hint: to do a zx-spectrum build on a pdp11 host, type:
  ./configure --host=zx-spectrum --build=pdp11

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
  2017-01-27 20:03     ` Austin S. Hemmelgarn
  2017-01-27 20:28       ` Adam Borowski
@ 2017-01-28  9:17       ` Andrei Borzenkov
  2017-01-30 12:18         ` Austin S. Hemmelgarn
       [not found]         ` <YAvBcoM9EImXYYAvCcegSf@videotron.ca>
  1 sibling, 2 replies; 19+ messages in thread
From: Andrei Borzenkov @ 2017-01-28  9:17 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Hans Deragon, linux-btrfs; +Cc: Adam Borowski

27.01.2017 23:03, Austin S. Hemmelgarn пишет:
> On 2017-01-27 11:47, Hans Deragon wrote:
>> On 2017-01-24 14:48, Adam Borowski wrote:
>>
>>> On Tue, Jan 24, 2017 at 01:57:24PM -0500, Hans Deragon wrote:
>>>
>>>> If I remove 'ro' from the option, I cannot get the filesystem mounted
>>>> because of the following error: BTRFS: missing devices(1) exceeds the
>>>> limit(0), writeable mount is not allowed So I am stuck. I can only
>>>> mount the filesystem as read-only, which prevents me to add a disk.
>>>
>>> A known problem: you get only one shot at fixing the filesystem, but
>>> that's
>>> not because of some damage but because the check whether the fs is in a
>>> shape is good enough to mount is oversimplistic.
>>>
>>> Here's a patch, if you apply it and recompile, you'll be able to mount
>>> degraded rw.
>>>
>>> Note that it removes a safety harness: here, the harness got tangled
>>> up and
>>> keeps you from recovering when it shouldn't, but it _has_ valid uses
>>> that.
>>>
>>> Meow!
>>
>> Greetings,
>>
>> Ok, that solution will solve my problem in the short run, i.e. getting
>> my raid1 up again.
>>
>> However, as a user, I am seeking for an easy, no maintenance raid
>> solution.  I wish that if a drive fails, the btrfs filesystem still
>> mounts rw and leaves the OS running, but warns the user of the failing
>> disk and easily allow the addition of a new drive to reintroduce
>> redundancy.  Are there any plans within the btrfs community to implement
>> such a feature?  In a year from now, when the other drive will fail,
>> will I hit again this problem, i.e. my OS failing to start, booting into
>> a terminal, and cannot reintroduce a new drive without recompiling the
>> kernel?
> Before I make any suggestions regarding this, I should point out that
> mounting read-write when a device is missing is what caused this issue
> in the first place.


How do you replace device when filesystem is mounted read-only?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
  2017-01-28  9:17       ` Andrei Borzenkov
@ 2017-01-30 12:18         ` Austin S. Hemmelgarn
       [not found]         ` <YAvBcoM9EImXYYAvCcegSf@videotron.ca>
  1 sibling, 0 replies; 19+ messages in thread
From: Austin S. Hemmelgarn @ 2017-01-30 12:18 UTC (permalink / raw)
  To: Hans Deragon, linux-btrfs; +Cc: Andrei Borzenkov, Adam Borowski

On 2017-01-28 04:17, Andrei Borzenkov wrote:
> 27.01.2017 23:03, Austin S. Hemmelgarn пишет:
>> On 2017-01-27 11:47, Hans Deragon wrote:
>>> On 2017-01-24 14:48, Adam Borowski wrote:
>>>
>>>> On Tue, Jan 24, 2017 at 01:57:24PM -0500, Hans Deragon wrote:
>>>>
>>>>> If I remove 'ro' from the option, I cannot get the filesystem mounted
>>>>> because of the following error: BTRFS: missing devices(1) exceeds the
>>>>> limit(0), writeable mount is not allowed So I am stuck. I can only
>>>>> mount the filesystem as read-only, which prevents me to add a disk.
>>>>
>>>> A known problem: you get only one shot at fixing the filesystem, but
>>>> that's
>>>> not because of some damage but because the check whether the fs is in a
>>>> shape is good enough to mount is oversimplistic.
>>>>
>>>> Here's a patch, if you apply it and recompile, you'll be able to mount
>>>> degraded rw.
>>>>
>>>> Note that it removes a safety harness: here, the harness got tangled
>>>> up and
>>>> keeps you from recovering when it shouldn't, but it _has_ valid uses
>>>> that.
>>>>
>>>> Meow!
>>>
>>> Greetings,
>>>
>>> Ok, that solution will solve my problem in the short run, i.e. getting
>>> my raid1 up again.
>>>
>>> However, as a user, I am seeking for an easy, no maintenance raid
>>> solution.  I wish that if a drive fails, the btrfs filesystem still
>>> mounts rw and leaves the OS running, but warns the user of the failing
>>> disk and easily allow the addition of a new drive to reintroduce
>>> redundancy.  Are there any plans within the btrfs community to implement
>>> such a feature?  In a year from now, when the other drive will fail,
>>> will I hit again this problem, i.e. my OS failing to start, booting into
>>> a terminal, and cannot reintroduce a new drive without recompiling the
>>> kernel?
>> Before I make any suggestions regarding this, I should point out that
>> mounting read-write when a device is missing is what caused this issue
>> in the first place.
>
>
> How do you replace device when filesystem is mounted read-only?
>
I'm saying that the use case you're asking to have supported is the 
reason stuff like this happens.  If you're mounting read-write degraded 
and fixing the filesystem _immediately_ then it's not an issue, that's 
exactly what read-write degraded mounts are for.  If you're mounting 
read-write degraded and then having the system run as if nothing was 
wrong, then I have zero sympathy because that's _dangerous_, even with 
LVM, MD-RAID, or even hardware RAID (actually, especially with hardware 
RAID, LVM and MD are smart enough to automatically re-sync, most 
hardware RAID controllers aren't).

That said, as I mentioned further down in my initial reply, you 
absolutely should be monitoring the filesystem and not letting things 
get this bad if at all possible.  It's actually very rare that a storage 
device fails catastrophically with no warning (at least, on the scale 
that most end users are operating).  At a minimum, even if you're using 
ext4 on top of LVM, you should be monitoring SMART attributes on the 
storage devices (or whatever the SCSI equivalent is if you use 
SCSI/SAS/FC devices).  While not 100% reliable (they are getting better 
though), they're generally a pretty good way to tell if a disk is likely 
to fail in the near future.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
       [not found]         ` <YAvBcoM9EImXYYAvCcegSf@videotron.ca>
@ 2017-02-01  2:51           ` Hans Deragon
  2017-02-01  5:23             ` Duncan
  0 siblings, 1 reply; 19+ messages in thread
From: Hans Deragon @ 2017-02-01  2:51 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, linux-btrfs; +Cc: Andrei Borzenkov, Adam Borowski


[-- Attachment #1.1: Type: text/plain, Size: 5162 bytes --]

On 2017-01-30 07:18, Austin S. Hemmelgarn wrote:
> On 2017-01-28 04:17, Andrei Borzenkov wrote:
>> 27.01.2017 23:03, Austin S. Hemmelgarn пишет:
>>> On 2017-01-27 11:47, Hans Deragon wrote:
>>>> On 2017-01-24 14:48, Adam Borowski wrote:
>>>>
>>>>> On Tue, Jan 24, 2017 at 01:57:24PM -0500, Hans Deragon wrote:
>>>>>
>>>>>> If I remove 'ro' from the option, I cannot get the filesystem mounted
>>>>>> because of the following error: BTRFS: missing devices(1) exceeds the
>>>>>> limit(0), writeable mount is not allowed So I am stuck. I can only
>>>>>> mount the filesystem as read-only, which prevents me to add a disk.
>>>>>
>>>>> A known problem: you get only one shot at fixing the filesystem, but
>>>>> that's
>>>>> not because of some damage but because the check whether the fs is
>>>>> in a
>>>>> shape is good enough to mount is oversimplistic.
>>>>>
>>>>> Here's a patch, if you apply it and recompile, you'll be able to mount
>>>>> degraded rw.
>>>>>
>>>>> Note that it removes a safety harness: here, the harness got tangled
>>>>> up and
>>>>> keeps you from recovering when it shouldn't, but it _has_ valid uses
>>>>> that.
>>>>>
>>>>> Meow!
>>>>
>>>> Greetings,
>>>>
>>>> Ok, that solution will solve my problem in the short run, i.e. getting
>>>> my raid1 up again.
>>>>
>>>> However, as a user, I am seeking for an easy, no maintenance raid
>>>> solution.  I wish that if a drive fails, the btrfs filesystem still
>>>> mounts rw and leaves the OS running, but warns the user of the failing
>>>> disk and easily allow the addition of a new drive to reintroduce
>>>> redundancy.  Are there any plans within the btrfs community to
>>>> implement
>>>> such a feature?  In a year from now, when the other drive will fail,
>>>> will I hit again this problem, i.e. my OS failing to start, booting
>>>> into
>>>> a terminal, and cannot reintroduce a new drive without recompiling the
>>>> kernel?
>>> Before I make any suggestions regarding this, I should point out that
>>> mounting read-write when a device is missing is what caused this issue
>>> in the first place.
>>
>>
>> How do you replace device when filesystem is mounted read-only?
>>
> I'm saying that the use case you're asking to have supported is the
> reason stuff like this happens.  If you're mounting read-write degraded
> and fixing the filesystem _immediately_ then it's not an issue, that's
> exactly what read-write degraded mounts are for.  If you're mounting
> read-write degraded and then having the system run as if nothing was
> wrong, then I have zero sympathy because that's _dangerous_, even with
> LVM, MD-RAID, or even hardware RAID (actually, especially with hardware
> RAID, LVM and MD are smart enough to automatically re-sync, most
> hardware RAID controllers aren't).
> 
> That said, as I mentioned further down in my initial reply, you
> absolutely should be monitoring the filesystem and not letting things
> get this bad if at all possible.  It's actually very rare that a storage
> device fails catastrophically with no warning (at least, on the scale
> that most end users are operating).  At a minimum, even if you're using
> ext4 on top of LVM, you should be monitoring SMART attributes on the
> storage devices (or whatever the SCSI equivalent is if you use
> SCSI/SAS/FC devices).  While not 100% reliable (they are getting better
> though), they're generally a pretty good way to tell if a disk is likely
> to fail in the near future.

Greetings,

I totally understand your concerns.  However, anybody using raid is a
grown up and though for them if they do not understand this.  But the
current scenario makes it difficult for me to put redundancy back into
service!  How much time did I waited until I find the mailing list,
subscribe to it, post my email and get an answer?  Wouldn't it be better
if the user could actually add the disk at anytime, mostly ASAP?

And to fix this, I have to learn how to patch and compile the kernel.  I
have not done this since the beginning of the century.  More delays,
more risk added to the system (what if I compile the kernel with the
wrong parameters?).  Fortunately, my raid1 system is for my home system
and I do not need that data available right now.  The data is safe, but
I have no time to fiddle with this issue and put the raid1 in service by
compiling a new kernel.  I do have the extra drive sitting on my desk,
useless for the moment.

Which market is btrfs raid targeted for?  In the enterprise world, all
the proprietary raid solutions I know of alerts admins when a disk is a
problem, but allows continuous, uninterrupted read-write service.  If
you have hundreds of employees depending upon a NAS, you do not want
them to turn their thumbs until the new drive is put in place.  In SOHO,
the admin is often outsourced, maybe attending someone's else problem
when the drive failure occurs.  What should then the business do?  Tell
everybody to go home?

Is this the same problem with raid6?  If one drive dies, system goes
down even if redundancy still remains?

Best regards,
Hans Deragon


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 246 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
  2017-02-01  2:51           ` Hans Deragon
@ 2017-02-01  5:23             ` Duncan
  2017-02-01 11:55               ` Adam Borowski
  0 siblings, 1 reply; 19+ messages in thread
From: Duncan @ 2017-02-01  5:23 UTC (permalink / raw)
  To: linux-btrfs

Hans Deragon posted on Tue, 31 Jan 2017 21:51:22 -0500 as excerpted:

> But the current scenario makes it difficult for me to put redundancy
> back into service!  How much time did I waited until I find the mailing
> list, subscribe to it, post my email and get an answer?  Wouldn't it be
> better if the user could actually add the disk at anytime, mostly ASAP?
> 
> And to fix this, I have to learn how to patch and compile the kernel.  I
> have not done this since the beginning of the century.  More delays,
> more risk added to the system (what if I compile the kernel with the
> wrong parameters?).

Comes with the territory.  Note that nobody with any knowledge of btrfs 
is claiming it's fully stable and mature, as you seem to expect.  Rather, 
the state is explicitly stabilizing, NOT fully stable and mature, backups 
very strongly recommended as there's a real possibility you'll need to 
use them, running current kernels and keeping up with the list if you 
choose to run btrfs strongly recommended, etc.

The patch fixing the problem and making return from degraded not the one-
shot thing it tends to be now will eventually be merged, but known 
problems, with or without patches available to fix them, are just part of 
running a still stabilizing filesystem, and if you choose to do so and 
run into those problems, you have the choice of waiting for a fix to make 
its way to you, or if a patch is already available, rebuilding with that 
patch.

Otherwise, you simply mkfs and restore from the backup that was strongly 
recommended if the data isn't of only trivial value in the first place.  
If you didn't have that backup, and the data was stored on a still 
stabilizing btrfs, well then, you defined it as of only trivial value by 
the inaction of not having that backup while running a filesystem known 
to be still stabilizing, didn't you?

Otherwise, run a filesystem more appropriately stable and mature 
according to your needs, as btrfs in its current state apparently doesn't 
meet those needs.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
  2017-02-01  5:23             ` Duncan
@ 2017-02-01 11:55               ` Adam Borowski
  2017-02-01 22:48                 ` Duncan
  0 siblings, 1 reply; 19+ messages in thread
From: Adam Borowski @ 2017-02-01 11:55 UTC (permalink / raw)
  To: linux-btrfs

On Wed, Feb 01, 2017 at 05:23:16AM +0000, Duncan wrote:
> Hans Deragon posted on Tue, 31 Jan 2017 21:51:22 -0500 as excerpted:
> > But the current scenario makes it difficult for me to put redundancy
> > back into service!  How much time did I waited until I find the mailing
> > list, subscribe to it, post my email and get an answer?  Wouldn't it be
> > better if the user could actually add the disk at anytime, mostly ASAP?
> > 
> > And to fix this, I have to learn how to patch and compile the kernel.  I
> > have not done this since the beginning of the century.  More delays,
> > more risk added to the system (what if I compile the kernel with the
> > wrong parameters?).
> 
> The patch fixing the problem and making return from degraded not the one-
> shot thing it tends to be now will eventually be merged

Not anything like the one I posted to this thread -- this one merely removes
a check that can't handle this particular (common) case of an otherwise
healthy RAID that lost one device then was mounted degraded twice.  We need
instead a better check, one that sees whether every block group is present.

This can be done quite easily, as as far as I know, the list of block group
is at that moment fully present in memory, but someone actually has to code
that, and I for one don't know btrfs internals (yet?).

> or if a patch is already available, rebuilding with that patch.

I've done smoke tests on this bandaid patch, and I _believe_ it is safe to
use it as long as you're certain the scenario of events is:
* you had a _redundant_ RAID, in good shape
* lost precisely one device
* mounted it degraded before (creating single blocks, but they're all on
  devices still present)
* no devices were added or removed since the original loss

> Otherwise, you simply mkfs and restore from the backup that was strongly 
> recommended if the data isn't of only trivial value in the first place.  
> If you didn't have that backup, and the data was stored on a still 
> stabilizing btrfs, well then, you defined it as of only trivial value

mkfs+restore is a lot of work, and it's likely to result in data loss,
bringing you back to the last backup.  Like, in the GitLab case yesterday
-- they lost "just" 6 hour of account database, but that was enough to give
them an article on every tech news site.

A degraded RAID with every bit of data present can still have that 6 hours
of data copied off it, but you need to organize temporary storage, lose
snapshot arrangements, etc.  It's so much better to just add a disk and
resync the RAID, like what you get in other RAID implementations.

In this case, you really want this this quick hack of a patch, even though
it comes from a doofus with little knowledge about btrfs, as it lets you
quickly recover in-place without that restore from backups.

> Otherwise, run a filesystem more appropriately stable and mature 
> according to your needs, as btrfs in its current state apparently doesn't 
> meet those needs.

No, I'm not touching a silentdatalossfs with a very long pole if I can help
it, "silentdatalossfs" currently defined as "any filesystem other than btrfs
and zfs", and zfs on Linux has more problems than btrfs.  Of last four $disk
failures I had recently (after a long stretch of luck), three had no
reported errors whatsoever, one had a single reported bad sector while scrub
found 3600 more.  Yeah, you're supposed to get warnings but somehow, on
three different computers from diverse manufacturers (one of these was a
RasPi with a SD card...), I got none.  Of these failures, those that were on
ext4 resulted in lots of woe and data loss, despite adequate backups going
months back -- _silent_ data loss means you keep backing up trash while good
backups expire; those on btrfs immediately told me that I should hit backups
and what exact files were lost.

And data checksums are just one of several upsides of btrfs.  So my decision
is to grit my teeth and deal with btrfs' downsides.

Meow!
-- 
Autotools hint: to do a zx-spectrum build on a pdp11 host, type:
  ./configure --host=zx-spectrum --build=pdp11

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
  2017-02-01 11:55               ` Adam Borowski
@ 2017-02-01 22:48                 ` Duncan
  2017-02-02 12:49                   ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 19+ messages in thread
From: Duncan @ 2017-02-01 22:48 UTC (permalink / raw)
  To: linux-btrfs

Adam Borowski posted on Wed, 01 Feb 2017 12:55:30 +0100 as excerpted:

> On Wed, Feb 01, 2017 at 05:23:16AM +0000, Duncan wrote:
>> Hans Deragon posted on Tue, 31 Jan 2017 21:51:22 -0500 as excerpted:
>> > But the current scenario makes it difficult for me to put redundancy
>> > back into service!  How much time did I waited until I find the
>> > mailing list, subscribe to it, post my email and get an answer?
>> > Wouldn't it be better if the user could actually add the disk at
>> > anytime, mostly ASAP?
>> > 
>> > And to fix this, I have to learn how to patch and compile the kernel.
>> > I have not done this since the beginning of the century.  More
>> > delays,
>> > more risk added to the system (what if I compile the kernel with the
>> > wrong parameters?).
>> 
>> The patch fixing the problem and making return from degraded not the
>> one-
>> shot thing it tends to be now will eventually be merged
> 
> Not anything like the one I posted to this thread -- this one merely
> removes a check that can't handle this particular (common) case of an
> otherwise healthy RAID that lost one device then was mounted degraded
> twice.  We need instead a better check, one that sees whether every
> block group is present.
> 
> This can be done quite easily, as as far as I know, the list of block
> group is at that moment fully present in memory, but someone actually
> has to code that, and I for one don't know btrfs internals (yet?).

I didn't mention it because you spared me the trouble with your hack-
patch that did the current job, but FWIW, there's actually a patch that 
does per-chunk testing as you indicate, but it got merged into a longer 
running feature-add project (hot-spares, IIRC), and thus isn't likely to 
be mainline-merged until that project is merged.  

But who knows when that might be?  Could be years before it is considered 
ready.

Meanwhile, perhaps it's simply because I'm not a dev and am not 
appreciating the complexities of some detail or other, but as 
demonstrated by the people who have local-merged that patch to get out of 
just this sort of jam, as well as the continuing saga of more and more 
people appearing here with the same problem, it could be an arguably high 
priority fix on its own, and should have been reviewed and ultimately 
mainline-merged on its own merits, instead of being stuck in someone's 
feature-add project queue for potentially years, while more and more 
people who could have definitely used it have to either suffer without it 
or go and find and local-merge it themselves.  Even if this feature is 
critical to the longer term feature, merge of this little one now would 
make the final patch set for the longer term feature that much smaller.

But that's a btrfs-using sysadmin's viewpoint, not a developer viewpoint, 
and it's the developer's doing the work, so they get to define when and 
how it gets merged, and us non-devs must either simply live with it, or 
if the circumstances allow, fund some dev to have our specific interests 
as their priority and take care of it for us.

Meanwhile, perhaps I should have bookmarked that patch at least as it 
appeared on-list, but I didn't, so while I know it exists, I too would 
have to go looking to actually find it, should I end up needing it.  In 
my defense, I /thought/ the immediate benefit was obvious enough that it 
would be mainline-merged by now, not hoovered-up into some long-term 
project with no real hint as to /when/ it might be merged.  Oh, well...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
  2017-02-01 22:48                 ` Duncan
@ 2017-02-02 12:49                   ` Austin S. Hemmelgarn
  2017-02-02 14:25                     ` Adam Borowski
  2017-02-03  9:35                     ` Duncan
  0 siblings, 2 replies; 19+ messages in thread
From: Austin S. Hemmelgarn @ 2017-02-02 12:49 UTC (permalink / raw)
  To: linux-btrfs

On 2017-02-01 17:48, Duncan wrote:
> Adam Borowski posted on Wed, 01 Feb 2017 12:55:30 +0100 as excerpted:
>
>> On Wed, Feb 01, 2017 at 05:23:16AM +0000, Duncan wrote:
>>> Hans Deragon posted on Tue, 31 Jan 2017 21:51:22 -0500 as excerpted:
>>>> But the current scenario makes it difficult for me to put redundancy
>>>> back into service!  How much time did I waited until I find the
>>>> mailing list, subscribe to it, post my email and get an answer?
>>>> Wouldn't it be better if the user could actually add the disk at
>>>> anytime, mostly ASAP?
>>>>
>>>> And to fix this, I have to learn how to patch and compile the kernel.
>>>> I have not done this since the beginning of the century.  More
>>>> delays,
>>>> more risk added to the system (what if I compile the kernel with the
>>>> wrong parameters?).
>>>
>>> The patch fixing the problem and making return from degraded not the
>>> one-
>>> shot thing it tends to be now will eventually be merged
>>
>> Not anything like the one I posted to this thread -- this one merely
>> removes a check that can't handle this particular (common) case of an
>> otherwise healthy RAID that lost one device then was mounted degraded
>> twice.  We need instead a better check, one that sees whether every
>> block group is present.
>>
>> This can be done quite easily, as as far as I know, the list of block
>> group is at that moment fully present in memory, but someone actually
>> has to code that, and I for one don't know btrfs internals (yet?).
>
> I didn't mention it because you spared me the trouble with your hack-
> patch that did the current job, but FWIW, there's actually a patch that
> does per-chunk testing as you indicate, but it got merged into a longer
> running feature-add project (hot-spares, IIRC), and thus isn't likely to
> be mainline-merged until that project is merged.
>
> But who knows when that might be?  Could be years before it is considered
> ready.
>
> Meanwhile, perhaps it's simply because I'm not a dev and am not
> appreciating the complexities of some detail or other, but as
> demonstrated by the people who have local-merged that patch to get out of
> just this sort of jam, as well as the continuing saga of more and more
> people appearing here with the same problem, it could be an arguably high
> priority fix on its own, and should have been reviewed and ultimately
> mainline-merged on its own merits, instead of being stuck in someone's
> feature-add project queue for potentially years, while more and more
> people who could have definitely used it have to either suffer without it
> or go and find and local-merge it themselves.  Even if this feature is
> critical to the longer term feature, merge of this little one now would
> make the final patch set for the longer term feature that much smaller.
Agreed, it should have been mainlined.  I have no issue with the 
hot-spare patches depending on it, but it's a severe bug.
>
> But that's a btrfs-using sysadmin's viewpoint, not a developer viewpoint,
> and it's the developer's doing the work, so they get to define when and
> how it gets merged, and us non-devs must either simply live with it, or
> if the circumstances allow, fund some dev to have our specific interests
> as their priority and take care of it for us.
I don't care in this case if I draw some flak from the developers, but 
this particular developer viewpoint is wrong.  If this was a commercial 
software product, the person responsible would at least be facing some 
serious reprimand, and depending on the company, possibly would be out 
of a job.  This is a severe bug that makes a not all that uncommon 
(albeit bad) use case fail completely.  The fix had no dependencies 
itself and
>
> Meanwhile, perhaps I should have bookmarked that patch at least as it
> appeared on-list, but I didn't, so while I know it exists, I too would
> have to go looking to actually find it, should I end up needing it.  In
> my defense, I /thought/ the immediate benefit was obvious enough that it
> would be mainline-merged by now, not hoovered-up into some long-term
> project with no real hint as to /when/ it might be merged.  Oh, well...
I think (although I'm not sure about it) that this:
http://www.spinics.net/lists/linux-btrfs/msg47283.html
is the first posting of the patch series.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
  2017-02-02 12:49                   ` Austin S. Hemmelgarn
@ 2017-02-02 14:25                     ` Adam Borowski
  2017-02-02 15:06                       ` Austin S. Hemmelgarn
       [not found]                       ` <ZIyPcL4cW36fIZIyQcB9Hs@videotron.ca>
  2017-02-03  9:35                     ` Duncan
  1 sibling, 2 replies; 19+ messages in thread
From: Adam Borowski @ 2017-02-02 14:25 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

On Thu, Feb 02, 2017 at 07:49:50AM -0500, Austin S. Hemmelgarn wrote:
> This is a severe bug that makes a not all that uncommon (albeit bad) use
> case fail completely.  The fix had no dependencies itself and

I don't see what's bad in mounting a RAID degraded.  Yeah, it provides no
redundancy but that's no worse than using a single disk from the start.
And most people not doing storage/server farm don't have a stack of spare
disks at hand, so getting a replacement might take a while.

Being able to continue to run when a disk fails is the whole point of RAID
-- despite what some folks think, RAIDs are not for backups but for uptime.
And if your uptime goes to hell because the moment a disk fails you need to
drop everything and replace the disk immediately, why would you use RAID?

> > I /thought/ the immediate benefit was obvious enough that it
> > would be mainline-merged by now, not hoovered-up into some long-term
> > project with no real hint as to /when/ it might be merged.  Oh, well...
> I think (although I'm not sure about it) that this:
> http://www.spinics.net/lists/linux-btrfs/msg47283.html
> is the first posting of the patch series.

Is there a more recent version somewhere?  Mechanically rebasing+resolving
conflicts doesn't work, I'd need to do a more involved refresh, which would
be a waste of time if it's already done by someone with an actual clue about
this code.


Meow!
-- 
Autotools hint: to do a zx-spectrum build on a pdp11 host, type:
  ./configure --host=zx-spectrum --build=pdp11

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
  2017-02-02 14:25                     ` Adam Borowski
@ 2017-02-02 15:06                       ` Austin S. Hemmelgarn
       [not found]                       ` <ZIyPcL4cW36fIZIyQcB9Hs@videotron.ca>
  1 sibling, 0 replies; 19+ messages in thread
From: Austin S. Hemmelgarn @ 2017-02-02 15:06 UTC (permalink / raw)
  To: Adam Borowski; +Cc: linux-btrfs

On 2017-02-02 09:25, Adam Borowski wrote:
> On Thu, Feb 02, 2017 at 07:49:50AM -0500, Austin S. Hemmelgarn wrote:
>> This is a severe bug that makes a not all that uncommon (albeit bad) use
>> case fail completely.  The fix had no dependencies itself and
>
> I don't see what's bad in mounting a RAID degraded.  Yeah, it provides no
> redundancy but that's no worse than using a single disk from the start.
> And most people not doing storage/server farm don't have a stack of spare
> disks at hand, so getting a replacement might take a while.
Running degraded is bad. Period.  If you don't have a disk on hand to 
replace the failed one (and if you care about redundancy, you should 
have at least one spare on hand), you should be converting to a single 
disk, not continuing to run in degraded mode until you get a new disk. 
The moment you start talking about running degraded long enough that you 
will be _booting_ the system with the array degraded, you need to be 
converting to a single disk.  This is of course impractical for 
something like a hardware array or an LVM volume, but it's _trivial_ 
with BTRFS, and protects you from all kinds of bad situations that can't 
happen with a single disk but can completely destroy the filesystem if 
it's a degraded array.  Running a single disk is not exactly the same as 
running a degraded array, it's actually marginally safer (even if you 
aren't using dup profile for metadata) because there are fewer moving 
parts to go wrong.  It's also exponentially more efficient.
>
> Being able to continue to run when a disk fails is the whole point of RAID
> -- despite what some folks think, RAIDs are not for backups but for uptime.
> And if your uptime goes to hell because the moment a disk fails you need to
> drop everything and replace the disk immediately, why would you use RAID?
Because just replacing a disk and rebuilding the array is almost always 
much cheaper in terms of time than rebuilding the system from a backup. 
IOW, even if you have to drop everything and replace the disk 
immediately, it's still less time consuming than restoring from a 
backup.  It also has the advantage that you don't lose any data.
>
>>> I /thought/ the immediate benefit was obvious enough that it
>>> would be mainline-merged by now, not hoovered-up into some long-term
>>> project with no real hint as to /when/ it might be merged.  Oh, well...
>> I think (although I'm not sure about it) that this:
>> http://www.spinics.net/lists/linux-btrfs/msg47283.html
>> is the first posting of the patch series.
>
> Is there a more recent version somewhere?  Mechanically rebasing+resolving
> conflicts doesn't work, I'd need to do a more involved refresh, which would
> be a waste of time if it's already done by someone with an actual clue about
> this code.
There may be, but I haven't looked very far.  Qu would probably be the 
person to ask.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
  2017-02-02 12:49                   ` Austin S. Hemmelgarn
  2017-02-02 14:25                     ` Adam Borowski
@ 2017-02-03  9:35                     ` Duncan
  1 sibling, 0 replies; 19+ messages in thread
From: Duncan @ 2017-02-03  9:35 UTC (permalink / raw)
  To: linux-btrfs

Austin S. Hemmelgarn posted on Thu, 02 Feb 2017 07:49:50 -0500 as
excerpted:

> I think (although I'm not sure about it) that this:
> http://www.spinics.net/lists/linux-btrfs/msg47283.html is the first
> posting of the patch series.

Yes.  That looks like it.  Thanks.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
       [not found]                       ` <ZIyPcL4cW36fIZIyQcB9Hs@videotron.ca>
@ 2017-02-08  3:21                         ` Hans Deragon
  2017-02-08 12:50                           ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 19+ messages in thread
From: Hans Deragon @ 2017-02-08  3:21 UTC (permalink / raw)
  To: linux-btrfs

Greetings,

On 2017-02-02 10:06, Austin S. Hemmelgarn wrote:
> On 2017-02-02 09:25, Adam Borowski wrote:
>> On Thu, Feb 02, 2017 at 07:49:50AM -0500, Austin S. Hemmelgarn wrote:
>>> This is a severe bug that makes a not all that uncommon (albeit bad) use
>>> case fail completely.  The fix had no dependencies itself and
>>
>> I don't see what's bad in mounting a RAID degraded.  Yeah, it provides no
>> redundancy but that's no worse than using a single disk from the start.
>> And most people not doing storage/server farm don't have a stack of spare
>> disks at hand, so getting a replacement might take a while.
> Running degraded is bad. Period.  If you don't have a disk on hand to
> replace the failed one (and if you care about redundancy, you should
> have at least one spare on hand), you should be converting to a single
> disk, not continuing to run in degraded mode until you get a new disk.
> The moment you start talking about running degraded long enough that you
> will be _booting_ the system with the array degraded, you need to be
> converting to a single disk.  This is of course impractical for
> something like a hardware array or an LVM volume, but it's _trivial_
> with BTRFS, and protects you from all kinds of bad situations that can't
> happen with a single disk but can completely destroy the filesystem if
> it's a degraded array.  Running a single disk is not exactly the same as
> running a degraded array, it's actually marginally safer (even if you
> aren't using dup profile for metadata) because there are fewer moving
> parts to go wrong.  It's also exponentially more efficient.
>>
>> Being able to continue to run when a disk fails is the whole point of
>> RAID
>> -- despite what some folks think, RAIDs are not for backups but for
>> uptime.
>> And if your uptime goes to hell because the moment a disk fails you
>> need to
>> drop everything and replace the disk immediately, why would you use RAID?
> Because just replacing a disk and rebuilding the array is almost always
> much cheaper in terms of time than rebuilding the system from a backup.
> IOW, even if you have to drop everything and replace the disk
> immediately, it's still less time consuming than restoring from a
> backup.  It also has the advantage that you don't lose any data.

We disagree on letting people run degraded, which I support, you not.  I
respect your opinion.  However, I have to ask who decides these rules?
Obviously, not me since I am a simple btrfs home user.

Since Oracle is funding btrfs development, is that Oracle's official
stand on how to handle a failed disk?  Who decides of btrfs's roadmap?
I have no clue who is who on this mailing list and who influences the
features of btrfs.

Oracle is obviously using raid systems internally.  How do the operators
of these raid systems feel about this "not let the system run in
degraded mode"?

As a home user, I do not want to have a disk always available.  This is
paying a disk very expensively when the raid system can run easily for
two years without disk failure.  I want to buy the new disk (asap, of
course) once one died.  At that moment, the cost of a drive would have
fallen drastically.  Yes, I can live with running my home system (which
has backups) for a day or two, in degraded rw mode until I purchase and
can install a new disk.  Chances are low that both disks will quit at
around the same time.

Simply because I cannot run in degraded mode and cannot add a disk to my
current degraded raid1, despite having my replacement disk in my hands,
I must resort to switch to mdadm or zfs.

Having a policy that limits user's options for the sake that they are
too stupid to understand the implications is wrong.  Its ok for
applications, but not at the operating system; there should be a way to
force this.  A
--yes-i-know-what-i-am-doing-now-please-mount-rw-degraded-so-i-can-install-the-new-disk
parameter must be implemented.  Currently, it is like disallowing root
to run mkfs over an existing filesystem because people could erase data
by mistake.  Let people do what they want and let them live with the
consequences.

hdparm has a --yes-i-know-what-i-am-doing flag.  btrfs needs one.

Whoever decides about btrfs features to add, please consider this one.

Best regards,
Hans Deragon

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
  2017-02-08  3:21                         ` Hans Deragon
@ 2017-02-08 12:50                           ` Austin S. Hemmelgarn
  2017-02-08 13:46                             ` Tomasz Torcz
  0 siblings, 1 reply; 19+ messages in thread
From: Austin S. Hemmelgarn @ 2017-02-08 12:50 UTC (permalink / raw)
  To: Hans Deragon, linux-btrfs

On 2017-02-07 22:21, Hans Deragon wrote:
> Greetings,
>
> On 2017-02-02 10:06, Austin S. Hemmelgarn wrote:
>> On 2017-02-02 09:25, Adam Borowski wrote:
>>> On Thu, Feb 02, 2017 at 07:49:50AM -0500, Austin S. Hemmelgarn wrote:
>>>> This is a severe bug that makes a not all that uncommon (albeit bad) use
>>>> case fail completely.  The fix had no dependencies itself and
>>>
>>> I don't see what's bad in mounting a RAID degraded.  Yeah, it provides no
>>> redundancy but that's no worse than using a single disk from the start.
>>> And most people not doing storage/server farm don't have a stack of spare
>>> disks at hand, so getting a replacement might take a while.
>> Running degraded is bad. Period.  If you don't have a disk on hand to
>> replace the failed one (and if you care about redundancy, you should
>> have at least one spare on hand), you should be converting to a single
>> disk, not continuing to run in degraded mode until you get a new disk.
>> The moment you start talking about running degraded long enough that you
>> will be _booting_ the system with the array degraded, you need to be
>> converting to a single disk.  This is of course impractical for
>> something like a hardware array or an LVM volume, but it's _trivial_
>> with BTRFS, and protects you from all kinds of bad situations that can't
>> happen with a single disk but can completely destroy the filesystem if
>> it's a degraded array.  Running a single disk is not exactly the same as
>> running a degraded array, it's actually marginally safer (even if you
>> aren't using dup profile for metadata) because there are fewer moving
>> parts to go wrong.  It's also exponentially more efficient.
>>>
>>> Being able to continue to run when a disk fails is the whole point of
>>> RAID
>>> -- despite what some folks think, RAIDs are not for backups but for
>>> uptime.
>>> And if your uptime goes to hell because the moment a disk fails you
>>> need to
>>> drop everything and replace the disk immediately, why would you use RAID?
>> Because just replacing a disk and rebuilding the array is almost always
>> much cheaper in terms of time than rebuilding the system from a backup.
>> IOW, even if you have to drop everything and replace the disk
>> immediately, it's still less time consuming than restoring from a
>> backup.  It also has the advantage that you don't lose any data.
>
> We disagree on letting people run degraded, which I support, you not.  I
> respect your opinion.  However, I have to ask who decides these rules?
> Obviously, not me since I am a simple btrfs home user.
This is a pretty typical stance among seasoned system administrators. 
It's worth pointing out that I'm not saying you shouldn't run with a 
single disk for an extended period of time, I'm saying you should 
_convert_ to single disk profiles until you can get a replacement, and 
then convert back to raid profiles once you have the replacement.  It is 
exponentially safer in BTRFS to run single data single metadata than 
half raid1 data half raid1 metadata.  This is one of the big reasons 
that I've avoided MD over the years, it's functionally impossible to do 
this with MD arrays.
>
> Since Oracle is funding btrfs development, is that Oracle's official
> stand on how to handle a failed disk?  Who decides of btrfs's roadmap?
> I have no clue who is who on this mailing list and who influences the
> features of btrfs.
>
> Oracle is obviously using raid systems internally.  How do the operators
> of these raid systems feel about this "not let the system run in
> degraded mode"?
They replace the disks immediately, so it's irrelevant to them.  Oracle 
isn't the sole source of funding (I'm actually not even sure they are 
anymore CLM works for Facebook now last I knew), but you have to 
understand that it has been developed primarily as an _enterprise_ 
filesystem.  This means that certain perfectly reasonable assumptions 
are made about the conditions under which it will be used.
>
> As a home user, I do not want to have a disk always available.  This is
> paying a disk very expensively when the raid system can run easily for
> two years without disk failure.  I want to buy the new disk (asap, of
> course) once one died.  At that moment, the cost of a drive would have
> fallen drastically.  Yes, I can live with running my home system (which
> has backups) for a day or two, in degraded rw mode until I purchase and
> can install a new disk.  Chances are low that both disks will quit at
> around the same time.
You're missing my point.  I have zero issue with running with one disk 
when the other fails.  I have issue with not telling the FS that it 
won't have another disk for a while.  IOW, in that situation, I would run:
btrfs balance start -dconvert=single -mconvert=dup /whatever
To convert to profiles _designed_ for a single device and then convert 
back to raid1 when I got another disk.  The issue you've stumbled across 
is only partial motivation for this, the bigger motivation is that 
running half a 2 disk array is more risky than running a single disk by 
itself.
>
> Simply because I cannot run in degraded mode and cannot add a disk to my
> current degraded raid1, despite having my replacement disk in my hands,
> I must resort to switch to mdadm or zfs.
Both MDADM and ZFS still have the issue that it is more dangerous to run 
half a 2 disk RAID1 array than a single disk.  That doesn't change just 
because the software handles things a bit differently.
>
> Having a policy that limits user's options for the sake that they are
> too stupid to understand the implications is wrong.  Its ok for
> applications, but not at the operating system; there should be a way to
> force this.  A
> --yes-i-know-what-i-am-doing-now-please-mount-rw-degraded-so-i-can-install-the-new-disk
> parameter must be implemented.  Currently, it is like disallowing root
> to run mkfs over an existing filesystem because people could erase data
> by mistake.  Let people do what they want and let them live with the
> consequences.
A patch exists to fix the particular issue you encountered.  Trust me, I 
wish it had been merged like it should have been too, then we could just 
tell people in this situation to upgrade their kernel instead of telling 
them to rebuild it with a patch.
>
> hdparm has a --yes-i-know-what-i-am-doing flag.  btrfs needs one.
>
> Whoever decides about btrfs features to add, please consider this one.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
  2017-02-08 12:50                           ` Austin S. Hemmelgarn
@ 2017-02-08 13:46                             ` Tomasz Torcz
  2017-02-08 19:06                               ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 19+ messages in thread
From: Tomasz Torcz @ 2017-02-08 13:46 UTC (permalink / raw)
  To: linux-btrfs

On Wed, Feb 08, 2017 at 07:50:22AM -0500, Austin S. Hemmelgarn wrote:
>  It is exponentially safer in BTRFS
> to run single data single metadata than half raid1 data half raid1 metadata.

  Why?
 
> To convert to profiles _designed_ for a single device and then convert back
> to raid1 when I got another disk.  The issue you've stumbled across is only
> partial motivation for this, the bigger motivation is that running half a 2
> disk array is more risky than running a single disk by itself.

  Again, why?  What's the difference?  What causes increased risk?

-- 
Tomasz Torcz                Only gods can safely risk perfection,
xmpp: zdzichubg@chrome.pl     it's a dangerous thing for a man.  -- Alia


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
  2017-02-08 13:46                             ` Tomasz Torcz
@ 2017-02-08 19:06                               ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 19+ messages in thread
From: Austin S. Hemmelgarn @ 2017-02-08 19:06 UTC (permalink / raw)
  To: Tomasz Torcz, linux-btrfs

On 2017-02-08 08:46, Tomasz Torcz wrote:
> On Wed, Feb 08, 2017 at 07:50:22AM -0500, Austin S. Hemmelgarn wrote:
>>  It is exponentially safer in BTRFS
>> to run single data single metadata than half raid1 data half raid1 metadata.
>
>   Why?
>
>> To convert to profiles _designed_ for a single device and then convert back
>> to raid1 when I got another disk.  The issue you've stumbled across is only
>> partial motivation for this, the bigger motivation is that running half a 2
>> disk array is more risky than running a single disk by itself.
>
>   Again, why?  What's the difference?  What causes increased risk?
Aside from bugs like the one that sparked this thread that is?  Just off 
the top of my head:
* You're running with half a System chunk.  This is _very_ risky because 
almost any errors in the system chunk run the risk of nuking entire 
files and possibly the whole filesystem.  This is part of the reason 
that I explicitly listed -mconvert=dup instead of -mconvert=single.
* It performs significantly better.  As odd as this sounds, this 
actually has an impact on safety.  Better overall performance reduces 
the size of the windows of time during which part of the filesystem is 
committed.  This has less impact than running a traditional filesystem 
on top of a traditional RAID array, but it still has some impact.
* Single device is exponentially more well tested than running a 
degraded multi-device array.  IOW, you're less likely to hit obscure 
bugs by running a single profile instead of half a raid1 profile.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-02-08 19:08 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-24 18:57 raid1: cannot add disk to replace faulty because can only mount fs as read-only Hans Deragon
2017-01-24 19:48 ` Adam Borowski
     [not found] ` <W75Sc6PDCBok7W75TcCgc7@videotron.ca>
2017-01-27 16:47   ` Hans Deragon
2017-01-27 20:03     ` Austin S. Hemmelgarn
2017-01-27 20:28       ` Adam Borowski
2017-01-28  9:17       ` Andrei Borzenkov
2017-01-30 12:18         ` Austin S. Hemmelgarn
     [not found]         ` <YAvBcoM9EImXYYAvCcegSf@videotron.ca>
2017-02-01  2:51           ` Hans Deragon
2017-02-01  5:23             ` Duncan
2017-02-01 11:55               ` Adam Borowski
2017-02-01 22:48                 ` Duncan
2017-02-02 12:49                   ` Austin S. Hemmelgarn
2017-02-02 14:25                     ` Adam Borowski
2017-02-02 15:06                       ` Austin S. Hemmelgarn
     [not found]                       ` <ZIyPcL4cW36fIZIyQcB9Hs@videotron.ca>
2017-02-08  3:21                         ` Hans Deragon
2017-02-08 12:50                           ` Austin S. Hemmelgarn
2017-02-08 13:46                             ` Tomasz Torcz
2017-02-08 19:06                               ` Austin S. Hemmelgarn
2017-02-03  9:35                     ` Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.