All of lore.kernel.org
 help / color / mirror / Atom feed
* Extend BTRFS_IOC_DEVICES_READY for degraded RAID
@ 2015-01-05  9:46 Harald Hoyer
  2015-01-05 11:31 ` Lennart Poettering
  0 siblings, 1 reply; 6+ messages in thread
From: Harald Hoyer @ 2015-01-05  9:46 UTC (permalink / raw)
  To: linux-btrfs, Kay Sievers, Lennart Poettering

We have BTRFS_IOC_DEVICES_READY to report, if all devices are present, so that
a udev rule can report ID_BTRFS_READY and SYSTEMD_READY.

I think we need a third state here for a degraded RAID, which can be mounted,
but should only after a certain timeout/kernel command line params.

We also have to rethink how to handle the udev DB update for the change of the
state. incomplete -> degraded -> complete

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Extend BTRFS_IOC_DEVICES_READY for degraded RAID
  2015-01-05  9:46 Extend BTRFS_IOC_DEVICES_READY for degraded RAID Harald Hoyer
@ 2015-01-05 11:31 ` Lennart Poettering
  2015-01-05 12:08   ` Austin S Hemmelgarn
  2015-01-05 16:36   ` Goffredo Baroncelli
  0 siblings, 2 replies; 6+ messages in thread
From: Lennart Poettering @ 2015-01-05 11:31 UTC (permalink / raw)
  To: Harald Hoyer; +Cc: linux-btrfs, Kay Sievers

On Mon, 05.01.15 10:46, Harald Hoyer (harald@redhat.com) wrote:

> We have BTRFS_IOC_DEVICES_READY to report, if all devices are present, so that
> a udev rule can report ID_BTRFS_READY and SYSTEMD_READY.
> 
> I think we need a third state here for a degraded RAID, which can be mounted,
> but should only after a certain timeout/kernel command line params.
> 
> We also have to rethink how to handle the udev DB update for the change of the
> state. incomplete -> degraded -> complete

I am not convinced that automatically booting degraded arrays would be
a good idea. Instead, requiring one manual step before booting a
degraded array sounds OK to me.

Lennart

-- 
Lennart Poettering, Red Hat

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Extend BTRFS_IOC_DEVICES_READY for degraded RAID
  2015-01-05 11:31 ` Lennart Poettering
@ 2015-01-05 12:08   ` Austin S Hemmelgarn
  2015-01-05 16:36   ` Goffredo Baroncelli
  1 sibling, 0 replies; 6+ messages in thread
From: Austin S Hemmelgarn @ 2015-01-05 12:08 UTC (permalink / raw)
  To: Lennart Poettering, Harald Hoyer; +Cc: linux-btrfs, Kay Sievers

[-- Attachment #1: Type: text/plain, Size: 942 bytes --]

On 2015-01-05 06:31, Lennart Poettering wrote:
> On Mon, 05.01.15 10:46, Harald Hoyer (harald@redhat.com) wrote:
>
>> We have BTRFS_IOC_DEVICES_READY to report, if all devices are present, so that
>> a udev rule can report ID_BTRFS_READY and SYSTEMD_READY.
>>
>> I think we need a third state here for a degraded RAID, which can be mounted,
>> but should only after a certain timeout/kernel command line params.
>>
>> We also have to rethink how to handle the udev DB update for the change of the
>> state. incomplete -> degraded -> complete
>
> I am not convinced that automatically booting degraded arrays would be
> a good idea. Instead, requiring one manual step before booting a
> degraded array sounds OK to me.
>
> Lennart
>
I can think of half a dozen use cases where it is better to 
automatically mount degraded and send a notification that this happened 
than to refuse to mount without manual intervention.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Extend BTRFS_IOC_DEVICES_READY for degraded RAID
  2015-01-05 11:31 ` Lennart Poettering
  2015-01-05 12:08   ` Austin S Hemmelgarn
@ 2015-01-05 16:36   ` Goffredo Baroncelli
  2015-01-05 17:02     ` Austin S Hemmelgarn
  1 sibling, 1 reply; 6+ messages in thread
From: Goffredo Baroncelli @ 2015-01-05 16:36 UTC (permalink / raw)
  To: Lennart Poettering, Harald Hoyer
  Cc: linux-btrfs, Kay Sievers, Chris Mason, David Sterba

On 2015-01-05 12:31, Lennart Poettering wrote:
> On Mon, 05.01.15 10:46, Harald Hoyer (harald@redhat.com) wrote:
> 
>> We have BTRFS_IOC_DEVICES_READY to report, if all devices are present, so that
>> a udev rule can report ID_BTRFS_READY and SYSTEMD_READY.
>>
>> I think we need a third state here for a degraded RAID, which can be mounted,
>> but should only after a certain timeout/kernel command line params.
>>
>> We also have to rethink how to handle the udev DB update for the change of the
>> state. incomplete -> degraded -> complete
> 
> I am not convinced that automatically booting degraded arrays would be
> a good idea. Instead, requiring one manual step before booting a
> degraded array sounds OK to me.

I think that a good use case is when the root filesystem is a raid one.

However I don't think that the current architecture is enough flexible to
perform this job:
- mounting a raid filesystem in degraded mode is good for some setup
but it is not the right solution for all: a configure
parameter to allow one behavior or the other is needed:
- the degraded mode should be allowed only if not all the devices are
discovered AND a timeout is expired. This timeout is another variable which 
(IMHO) should be configurable;
- there are different degrees of degraded mode: if the raid is a RAID6,
losing a device would be acceptable; loosing two devices may be 
unacceptable. Again there is no a simple answer; it is needed a 
configurable policy;
- pay attention that the current architecture has some flaws: if a device
disappear during the device discovery, ID_BTRFS_READY returns OK
even if a device is missing.

I proposed a mount.btrfs helper[1], which (IMHO) is a good place for
this kind of job. However I have to point out that both Chris, and David 
aren't fully convinced of this solution. I hope that they could change
opinion.

In conclusion, I see some use case to allow to mount a degraded btrfs
filesystem; however I don't see as viable the idea to enhance the
BTRFS_IOC_DEVICES_READY ioctl. More logic is required.


> 
> Lennart
> 

BR
G.Baroncelli

[1] http://www.spinics.net/lists/linux-btrfs/msg39706.html


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Extend BTRFS_IOC_DEVICES_READY for degraded RAID
  2015-01-05 16:36   ` Goffredo Baroncelli
@ 2015-01-05 17:02     ` Austin S Hemmelgarn
  2015-01-05 17:57       ` Goffredo Baroncelli
  0 siblings, 1 reply; 6+ messages in thread
From: Austin S Hemmelgarn @ 2015-01-05 17:02 UTC (permalink / raw)
  To: kreijack, Lennart Poettering, Harald Hoyer
  Cc: linux-btrfs, Kay Sievers, Chris Mason, David Sterba

[-- Attachment #1: Type: text/plain, Size: 2784 bytes --]

On 2015-01-05 11:36, Goffredo Baroncelli wrote:
> On 2015-01-05 12:31, Lennart Poettering wrote:
>> On Mon, 05.01.15 10:46, Harald Hoyer (harald@redhat.com) wrote:
>>
>>> We have BTRFS_IOC_DEVICES_READY to report, if all devices are present, so that
>>> a udev rule can report ID_BTRFS_READY and SYSTEMD_READY.
>>>
>>> I think we need a third state here for a degraded RAID, which can be mounted,
>>> but should only after a certain timeout/kernel command line params.
>>>
>>> We also have to rethink how to handle the udev DB update for the change of the
>>> state. incomplete -> degraded -> complete
>>
>> I am not convinced that automatically booting degraded arrays would be
>> a good idea. Instead, requiring one manual step before booting a
>> degraded array sounds OK to me.
>
> I think that a good use case is when the root filesystem is a raid one.
>
> However I don't think that the current architecture is enough flexible to
> perform this job:
> - mounting a raid filesystem in degraded mode is good for some setup
> but it is not the right solution for all: a configure
> parameter to allow one behavior or the other is needed:
> - the degraded mode should be allowed only if not all the devices are
> discovered AND a timeout is expired. This timeout is another variable which
> (IMHO) should be configurable;
These first 2 points can be easily handled with some simple logic in 
userspace without needing a mount helper.
> - there are different degrees of degraded mode: if the raid is a RAID6,
> losing a device would be acceptable; loosing two devices may be
> unacceptable. Again there is no a simple answer; it is needed a
> configurable policy;
This can be solved by providing 2 new return values for the 
BBTRFS_IOC_DEVICES_READY ioctl (instead of just one), one for for arrays 
that are in such a state that losing another disk will almost certainly 
cause data loss (ie, a RAID6 with two missing devices, or a BTRFS 
raid1/10 with one missing device), and one for an array (theoretically) 
won't lose any data if one more device drops out (ie, a RAID6 (or 
something with higher parity) with one missing disk), and then provide a 
module parameter to allow forcing the kernel to report one or the other.
> - pay attention that the current architecture has some flaws: if a device
> disappear during the device discovery, ID_BTRFS_READY returns OK
> even if a device is missing.
Point 4 would require for some kind of continuous scanning/notification 
(and therefore add more bulk, the lack of which is in my opinion one of 
the biggest advantages of BTRFS over ZFS), and even then there will 
always be the possibility that a device drops out between you calling 
the ioctl and trying to mount the filesystem.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Extend BTRFS_IOC_DEVICES_READY for degraded RAID
  2015-01-05 17:02     ` Austin S Hemmelgarn
@ 2015-01-05 17:57       ` Goffredo Baroncelli
  0 siblings, 0 replies; 6+ messages in thread
From: Goffredo Baroncelli @ 2015-01-05 17:57 UTC (permalink / raw)
  To: Austin S Hemmelgarn, Lennart Poettering, Harald Hoyer
  Cc: linux-btrfs, Kay Sievers, Chris Mason, David Sterba

On 2015-01-05 18:02, Austin S Hemmelgarn wrote:
> On 2015-01-05 11:36, Goffredo Baroncelli wrote:
>> On 2015-01-05 12:31, Lennart Poettering wrote:
>>> On Mon, 05.01.15 10:46, Harald Hoyer (harald@redhat.com) wrote:
>>> 
>>>> We have BTRFS_IOC_DEVICES_READY to report, if all devices are
>>>> present, so that a udev rule can report ID_BTRFS_READY and
>>>> SYSTEMD_READY.
>>>> 
>>>> I think we need a third state here for a degraded RAID, which
>>>> can be mounted, but should only after a certain timeout/kernel
>>>> command line params.
>>>> 
>>>> We also have to rethink how to handle the udev DB update for
>>>> the change of the state. incomplete -> degraded -> complete
>>> 
>>> I am not convinced that automatically booting degraded arrays
>>> would be a good idea. Instead, requiring one manual step before
>>> booting a degraded array sounds OK to me.
>> 
>> I think that a good use case is when the root filesystem is a raid
>> one.
>> 
>> However I don't think that the current architecture is enough
>> flexible to perform this job: 
> - mounting a raid filesystem in
>> degraded mode is good for some setup but it is not the right
>> solution for all: a configure parameter to allow one behavior or
>> the other is needed: 
> - the degraded mode should be allowed only if
>> not all the devices are discovered AND a timeout is expired. This
>> timeout is another variable which (IMHO) should be configurable;
> These first 2 points can be easily handled with some simple logic in
> userspace without needing a mount helper.

If you implement it in a mount.btrfs, you have this logic available 
for all cases, not only for mounting the root fs

>> - there are different degrees of degraded mode: if the raid is a
>> RAID6, losing a device would be acceptable; loosing two devices may
>> be unacceptable. Again there is no a simple answer; it is needed a 
>> configurable policy;

> This can be solved by providing 2 new return values for the
> BBTRFS_IOC_DEVICES_READY ioctl (instead of just one), one for for
> arrays that are in such a state that losing another disk will almost
> certainly cause data loss (ie, a RAID6 with two missing devices, or a
> BTRFS raid1/10 with one missing device), and one for an array
> (theoretically) won't lose any data if one more device drops out (ie,
> a RAID6 (or something with higher parity) with one missing disk)

This is a detail; the point is that it is needed to implement this policy.
I am suggesting to not "spread" this logic in too many subsystem (kernel,
systemd, udev, scripts......).

BTRFS couples a filesystem with a devices manager. This exposes a lot of 
new problems and options. I am suggesting to create a "tool" to manage all
these new problems/options. This tool is (of course) btrfs specific, and I
am convinced that a good place to start is a mount.btrfs helper.


>, and
> then provide a module parameter to allow forcing the kernel to report
> one or the other.

this policy should be different by mount point: if the machine is a
remote one, I can allow to mount the root of filesystem even in degraded 
mode to start some "recovery"; but a more conservative policy may be 
applied to the other ones fss.

This is one of the reason to let the policy out from the kernel.

>> - pay attention that the current architecture has some flaws: if a
>> device disappear during the device discovery, ID_BTRFS_READY
>> returns OK even if a device is missing.

> Point 4 would require for some kind of continuous
> scanning/notification (and therefore add more bulk, the lack of which
> is in my opinion one of the biggest advantages of BTRFS over ZFS),
> and even then there will always be the possibility that a device
> drops out between you calling the ioctl and trying to mount the
> filesystem.

If you shorter the windows, then less likely it may happen.



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-01-05 17:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-05  9:46 Extend BTRFS_IOC_DEVICES_READY for degraded RAID Harald Hoyer
2015-01-05 11:31 ` Lennart Poettering
2015-01-05 12:08   ` Austin S Hemmelgarn
2015-01-05 16:36   ` Goffredo Baroncelli
2015-01-05 17:02     ` Austin S Hemmelgarn
2015-01-05 17:57       ` Goffredo Baroncelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.