All of lore.kernel.org
 help / color / mirror / Atom feed
From: pg@btrfs.list.sabi.co.UK (Peter Grandi)
To: Linux Btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: raid1 degraded mount still produce single chunks, writeable mount not allowed
Date: Thu, 9 Mar 2017 09:49:31 +0000	[thread overview]
Message-ID: <22721.9515.564892.221096@tree.ty.sabi.co.uk> (raw)
In-Reply-To: <0d731a6d-4677-1d58-9f79-a8d7d2bcac37@gmail.com>

>> Consider the common case of a 3-member volume with a 'raid1'
>> target profile: if the sysadm thinks that a drive should be
>> replaced, the goal is to take it out *without* converting every
>> chunk to 'single', because with 2-out-of-3 devices half of the
>> chunks will still be fully mirrored.

>> Also, removing the device to be replaced should really not be
>> the same thing as balancing the chunks, if there is space, to be
>> 'raid1' across remaining drives, because that's a completely
>> different operation.

> There is a command specifically for replacing devices.  It
> operates very differently from the add+delete or delete+add
> sequences. [ ... ]

Perhaps it was not clear that I was talking about removing a
device, as distinct from replacing it, and that I used "removed"
instead of "deleted" deliberately, to avoid the confusion with
the 'delete' command.

In the everyday practice of system administration it often
happens that a device should be removed first, and replaced
later, for example when it is suspected to be faulty, or is
intermittently faulty. The replacement can be done with
'replace' or 'add+delete' or 'delete+add', but that's a
different matter.

Perhaps I should have not have used the generic verb "remove",
but written "make unavailable".

This brings about again the topic of some "confusion" in the
design of the Btrfs multidevice handling logic, where at least
initially one could only expand the storage space of a
multidevice by 'add' of a new device or shrink the storage space
by 'delete' of an existing one, but I think it was not conceived
at Btrfs design time of storage space being nominally constant
but for a device (and the chunks on it) having a state of
"available" ("present", "online", "enabled") or "unavailable"
("absent", "offline", "disabled"), either because of events or
because of system administrator action.

The 'missing' pseudo-device designator was added later, and
'replace' also later to avoid having to first expand then shrink
(or viceversa) the storage space and the related copying.

My impression is that it would be less "confused" if the Btrfs
device handling logic were changed to allow for the the state of
"member of the multidevice set but not actually available" and
the related consequent state for chunks that ought to be on it;
that probably would be essential to fixing the confusing current
aspects of recovery in a multidevice set. That would be very
useful even if it may require a change in the on-disk format to
distinguish the distinct states of membership and availability
for devices and mark chunks as available or not (chunks of course
being only possible on member devices).

That is, it would also be nice to have the opposite state of "not
member of the multidevice set but actually available to it", that
is a spare device, and related logic.

Note: simply setting '/sys/block/$DEV/device/delete' is not a
good option, because that makes the device unavailable not just
to Btrfs, but also to the whole systems. In the ordinary practice
of system administration it may well be useful to make a device
unavailable to Btrfs but still available to the system, for
example for testing, and anyhow they are logically distinct
states. That also means a member device might well be available
to the system, but marked as "not available" to Btrfs.

  reply	other threads:[~2017-03-09  9:49 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-02  0:30 raid1 degraded mount still produce single chunks, writeable mount not allowed Chris Murphy
2017-03-02 10:37 ` Adam Borowski
2017-03-03  5:56   ` Kai Krakow
2017-03-03 10:13     ` Adam Borowski
2017-03-03 12:19     ` Austin S. Hemmelgarn
2017-03-03 20:10       ` Kai Krakow
2017-03-06 13:07         ` Austin S. Hemmelgarn
2017-03-02 13:41 ` Duncan
2017-03-02 17:26   ` Andrei Borzenkov
2017-03-02 17:58     ` Austin S. Hemmelgarn
2017-03-03  0:47   ` Peter Grandi
2017-03-03  1:15     ` Chris Murphy
2017-03-03  1:18       ` Qu Wenruo
2017-03-03  1:48         ` Chris Murphy
2017-03-04  4:38           ` Chris Murphy
2017-03-04  9:55             ` waxhead
2017-03-03  3:38     ` Duncan
2017-03-03 12:38     ` Austin S. Hemmelgarn
2017-03-05 19:13       ` Peter Grandi
2017-03-05 19:55         ` Peter Grandi
2017-03-06 13:18         ` Austin S. Hemmelgarn
2017-03-09  9:49           ` Peter Grandi [this message]
2017-03-09 13:54             ` Austin S. Hemmelgarn
2017-03-03 10:16   ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=22721.9515.564892.221096@tree.ty.sabi.co.uk \
    --to=pg@btrfs.list.sabi.co.uk \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.