All of lore.kernel.org
 help / color / mirror / Atom feed
* ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": Read-only file system
@ 2016-07-12 11:46 Tamas Baumgartner-Kis
  2016-07-13  7:21 ` Duncan
  0 siblings, 1 reply; 4+ messages in thread
From: Tamas Baumgartner-Kis @ 2016-07-12 11:46 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2573 bytes --]

Hi,


I have a problem with the current BTRFS 4.6


I'm running a Archlinux in a KVM to test BTRFS.

First I played with one device and subvolumes.

After that I added a second device to make a raid1.

# btrfs device add /dev/sdb /mnt
# btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt


To make a stresstest I removed the first device and wanted to
boot, but unfortunately the system couldn't boot.

So I booted into a liveSystem:

#uname -a
Linux archiso 4.6.3-1-ARCH #1 SMP PREEMPT Fri Jun 24 21:19:13 CEST 2016
x86_64 GNU/Linux

First I tried to mount the "leftover" device with the degraded option

# mount -o degraded /dev/sda /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sda,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.


but this works only if I also use the read-only option.

# mount -oro,degraded /dev/sda /mnt

If I try then to replace the missing device I got an error

# btrfs replace start -B 1 /dev/sdb /mnt
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": Read-only file system

Hire are some additional info about the system

#  btrfs --version
btrfs-progs v4.6



# btrfs fi show
Label: 'hdd0'  uuid: 97b5c51a-65d3-4a84-9382-9b99756ca4ab
	Total devices 2 FS bytes used 1.09GiB
	devid    2 size 10.00GiB used 3.56GiB path /dev/sda
	*** Some devices missing



# btrfs fi df /mnt
Data, RAID1: total=2.00GiB, used=1.04GiB
Data, single: total=1.00GiB, used=640.00KiB
System, RAID1: total=32.00MiB, used=16.00KiB
System, single: total=32.00MiB, used=0.00B
Metadata, RAID1: total=256.00MiB, used=54.02MiB
Metadata, single: total=256.00MiB, used=256.00KiB
GlobalReserve, single: total=32.00MiB, used=0.00B



# dmesg
[ 9753.746858] BTRFS info (device sda): allowing degraded mounts
[ 9753.746863] BTRFS info (device sda): disk space caching is enabled
[ 9753.746865] BTRFS: has skinny extents
[ 9753.819035] BTRFS: missing devices(1) exceeds the limit(0), writeable
mount is not allowed
[ 9753.838758] BTRFS: open_ctree failed
[ 9800.077556] BTRFS info (device sda): allowing degraded mounts
[ 9800.077561] BTRFS info (device sda): disk space caching is enabled
[ 9800.077562] BTRFS: has skinny extents


Is this a mistake I'm doing or is this some how a bug?

Kind Regards
    Tamas Baumgartner-Kis

-- 
Tamás Baumgartner-Kis
Rechenzentrum der Universität Freiburg
Phone: +49 761 203 4605
E-Mail: Tamas.Baumgartner-Kis@rz.uni-freiburg.de


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5264 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": Read-only file system
  2016-07-12 11:46 ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": Read-only file system Tamas Baumgartner-Kis
@ 2016-07-13  7:21 ` Duncan
  0 siblings, 0 replies; 4+ messages in thread
From: Duncan @ 2016-07-13  7:21 UTC (permalink / raw)
  To: linux-btrfs

Tamas Baumgartner-Kis posted on Tue, 12 Jul 2016 13:46:56 +0200 as
excerpted:

> Hi,
> 
> 
> I have a problem with the current BTRFS 4.6
> 
> 
> I'm running a Archlinux in a KVM to test BTRFS.
> 
> First I played with one device and subvolumes.
> 
> After that I added a second device to make a raid1.
> 
> # btrfs device add /dev/sdb /mnt
> # btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt

So both data and metadata.  Thanks for specifying the command as
sometimes it's unclear that the conversion was done for both, or
just one.

> To make a stresstest I removed the first device and wanted to
> boot, but unfortunately the system couldn't boot.
> 
> So I booted into a liveSystem:
> 
> #uname -a
> Linux archiso 4.6.3-1-ARCH #1 SMP PREEMPT Fri Jun 24 21:19:13 CEST 2016
> x86_64 GNU/Linux
> 
> First I tried to mount the "leftover" device with the degraded option
> 
> # mount -o degraded /dev/sda /mnt
> mount: wrong fs type, bad option, bad superblock on /dev/sda,
>        missing codepage or helper program, or other error
> 
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.
> 
> 
> but this works only if I also use the read-only option.
> 
> # mount -oro,degraded /dev/sda /mnt
> 
> If I try then to replace the missing device I got an error
> 
> # btrfs replace start -B 1 /dev/sdb /mnt
> ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": Read-only file system

That's expected.  Adding/deleting/replacing a device requires a
writable filesystem.

> Hire are some additional info about the system
> 
> #  btrfs --version
> btrfs-progs v4.6
> 
> 
> 
> # btrfs fi show
> Label: 'hdd0'  uuid: 97b5c51a-65d3-4a84-9382-9b99756ca4ab
> 	Total devices 2 FS bytes used 1.09GiB
> 	devid    2 size 10.00GiB used 3.56GiB path /dev/sda
> 	*** Some devices missing
> 
> 
> 
> # btrfs fi df /mnt
> Data, RAID1: total=2.00GiB, used=1.04GiB
> Data, single: total=1.00GiB, used=640.00KiB
> System, RAID1: total=32.00MiB, used=16.00KiB
> System, single: total=32.00MiB, used=0.00B
> Metadata, RAID1: total=256.00MiB, used=54.02MiB
> Metadata, single: total=256.00MiB, used=256.00KiB
> GlobalReserve, single: total=32.00MiB, used=0.00B

This reveals the problem.  You have single chunks in addition
to the raid1 chunks.  Current btrfs will refuse to mount
writable with a device missing in such a case, in ordered
to prevent further damage.

Which is a problem, because current btrfs raid1 requires at
least two devices to write further raid1 content.  So what
happens when you have a two-device raid1 degraded to a
single device, is btrfs can no longer write raid1, because
that requires two devices, so it starts writing single
mode chunks.

Which means as long as you repair the raid1 in that same
mount session, you're good.

But you only get that one chance.  If you don't repair it
in that first mount session after it starts writing to the
degraded raid1 and thus creates those single mode chunks,
you don't get a second chance, because once those single
mode chunks are there it will refuse to mount writable
with a missing device.  All you can do then is mount
degraded read-only, and copy your data off.


This is a known issue with *current* btrfs.  There are
actually two sets of patches in discussion to fix the
problem, but I don't believe (and your results support
it as well) that 4.6 got them.  I'm not actually sure
what 4.7 status is as I've not tracked it /that/ closely.

The first attempt at a fix was a patch set that
had btrfs check each chunk, and if all chunks
were accounted for, as they will be on an originally
two-device raid1 that had a device dropped and then
had single-mode chunks written to the other one, it
would still allow degraded writable mount.  Only if
some chunks end up not available as they're on the
missing device, would the filesystem only allow
degraded, read-only mounting.  This is referred to
as the per-chunk check patchset.

But while that strategy and patch-set worked, further
discussion decided it was a work-around to the actual
problem -- internally, btrfs tracks two numbers for
minimum allowed devices for writable mount, full
functionality, and degraded but everything still
available.  For raid1 full functionality, obviously
the minimum is two devices, but the degraded minimum
should be just one device, of course also requiring
that no more than a single device should be missing,
since btrfs raid1 is only two copies no matter the
number of devices (above 1).

The real bug was decided to be that for raid1,
btrfs had both the minimums set to two devices.
Which is why the forced-switch to single-mode
chunk writing code was added in the first place, as
a workaround to /this/ problem, instead of
fixing it by allowing writing to only a single
device with the other copy missing, if degraded
was in the options.

However, by the time that decision was reached
and a patch created and in-testing to change
the raid1 mode degraded writable minimum, it was
already too late in the 4.6 cycle to get such
a big change in.

Meanwhile, the other problem was that the initial
per-chunk check patches were added to a patch-set
that wasn't yet considered mature and thus wasn't
picked for early 4.6.  The delay was fortunate in
that it allowed the real problem to be discovered
and a patch created, but that's why a fix may not
have made it into 4.7 either, because if the patch
set it's a part of is still not considered mature,
it would not have been pulled for 4.7 either, and
the new patch fixing the real problem would still
be in limbo along with it.

Unless of course it was individually cherry-picked
apart from the patchset as a whole.  As a user not
a dev myself, I followed the discussion, but I haven't
followed developments close enough to know what the
current status is, and whether the second patch fix
actually made it into 4.7, or not.

So in summary, it's a known problem, with an early
proposed patch that was decided to be really a
work-around that didn't fix the real problem, and a
second proposed patch now available, but I don't
know the status of testing and whether it reached
mainline in time for 4.7.

But they /are/ aware of the problem and /are/ working
on it.  In the mean time, you have three choices.
You can:

1)  Try to be careful and actually do a replace
on the first degraded writable mount of a btrfs raid1,
because you know that's the only chance you'll get with
current code to repair it.

2) Find and apply one or the other patches manually.

3) Just let the thing go read-only if it's going
to, and copy everything over to a different
filesystem from the read-only btrfs before blowing
it away, if it comes to that.


But meanwhile, while the above btrfs fi df reveals
the problem as we see it on the existing filesystem,
it says nothing about how it got that way.  Your
sequence above doesn't mention mounting the
degraded raid1 writable once, for it to create those
single-mode chunks that are now blocking writable
mount, but that's one way it could have happened.

Another way would be if the balance-conversion from
single mode to raid1 never properly completed in the
first place.  But I'm assuming it did and that you
had a full raid1 btrfs fi df report at one point.

A third way would be if some other bug triggered
btrfs to suddenly start writing single mode
chunks.  There were some bugs like that in the
past, but they've been fixed for some time.  But
perhaps there are similar newer bugs, or perhaps
you ran the filesystem on an old kernel with
that bug.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": Read-only file system
  2016-07-13 10:24 Tamas Baumgartner-Kis
@ 2016-07-13 16:28 ` Chris Murphy
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Murphy @ 2016-07-13 16:28 UTC (permalink / raw)
  To: Tamas Baumgartner-Kis; +Cc: Btrfs BTRFS

On Wed, Jul 13, 2016 at 4:24 AM, Tamas Baumgartner-Kis
<Tamas.Baumgartner-Kis@rz.uni-freiburg.de> wrote:
> Hi Duncan,
>
> many many thanks for your nice explanation and pointing it out
> what could happened.
>
>> This reveals the problem.  You have single chunks in addition
>> to the raid1 chunks.  Current btrfs will refuse to mount
>> writable with a device missing in such a case, in ordered
>> to prevent further damage.
>
>
>> But meanwhile, while the above btrfs fi df reveals
>> the problem as we see it on the existing filesystem,
>> it says nothing about how it got that way.  Your
>> sequence above doesn't mention mounting the
>> degraded raid1 writable once, for it to create those
>> single-mode chunks that are now blocking writable
>> mount, but that's one way it could have happened.
>
>
> You're right, I booted first in to the installed system on the harddisk
> and ended up in the rescueshell because obviously the "degraded" option
> in the fstab is missing. So I mounted the harddisk manually with
> the "degraded" option. But after that I decided to do the repairing
> in a LiveSystem... I assume that is where the problem come from.
> Because in the LiveSystem I wasn't able to mount the harddisk only
> with the degraded option.
>
> So as you mentioned either you fix the missing harddisk during the
> running of the System or after that you have one shot (for example in
> a LiveSystem), otherwise you have to copy from the readonly mounted
> harddisk.
>
>> Another way would be if the balance-conversion from
>> single mode to raid1 never properly completed in the
>> first place.  But I'm assuming it did and that you
>> had a full raid1 btrfs fi df report at one point.
>
>> A third way would be if some other bug triggered
>> btrfs to suddenly start writing single mode
>> chunks.  There were some bugs like that in the
>> past, but they've been fixed for some time.  But
>> perhaps there are similar newer bugs, or perhaps
>> you ran the filesystem on an old kernel with
>> that bug.


Yeah I've run into this several times.

The particularly vicious scenario is Drive A goes offline or is
unavailable, and Drive B is mounted degraded, silently gets single
chunks to which data is written, and then Drive A is replaced but
these single chunks still exist only on Drive B. If Drive B dies, you
have data loss, for a volume that is ostensibly raid 1.

The flaw is the allocation of single chunks when degraded, it should
write only into raid1 chunks, existing or newly allocated. It's data
loss waiting to happen.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": Read-only file system
@ 2016-07-13 10:24 Tamas Baumgartner-Kis
  2016-07-13 16:28 ` Chris Murphy
  0 siblings, 1 reply; 4+ messages in thread
From: Tamas Baumgartner-Kis @ 2016-07-13 10:24 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2102 bytes --]

Hi Duncan,

many many thanks for your nice explanation and pointing it out
what could happened.

> This reveals the problem.  You have single chunks in addition
> to the raid1 chunks.  Current btrfs will refuse to mount
> writable with a device missing in such a case, in ordered
> to prevent further damage.


> But meanwhile, while the above btrfs fi df reveals
> the problem as we see it on the existing filesystem,
> it says nothing about how it got that way.  Your
> sequence above doesn't mention mounting the
> degraded raid1 writable once, for it to create those
> single-mode chunks that are now blocking writable
> mount, but that's one way it could have happened.


You're right, I booted first in to the installed system on the harddisk
and ended up in the rescueshell because obviously the "degraded" option
in the fstab is missing. So I mounted the harddisk manually with
the "degraded" option. But after that I decided to do the repairing
in a LiveSystem... I assume that is where the problem come from.
Because in the LiveSystem I wasn't able to mount the harddisk only
with the degraded option.

So as you mentioned either you fix the missing harddisk during the
running of the System or after that you have one shot (for example in
a LiveSystem), otherwise you have to copy from the readonly mounted
harddisk.

> Another way would be if the balance-conversion from
> single mode to raid1 never properly completed in the
> first place.  But I'm assuming it did and that you
> had a full raid1 btrfs fi df report at one point.

> A third way would be if some other bug triggered
> btrfs to suddenly start writing single mode
> chunks.  There were some bugs like that in the
> past, but they've been fixed for some time.  But
> perhaps there are similar newer bugs, or perhaps
> you ran the filesystem on an old kernel with
> that bug.

Thank you and kind regards
    Tamas Baumgartner-Kis

-- 
Tamás Baumgartner-Kis
Rechenzentrum der Universität Freiburg
Phone: +49 761 203 4605
E-Mail: Tamas.Baumgartner-Kis@rz.uni-freiburg.de


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5264 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-07-13 16:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-12 11:46 ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": Read-only file system Tamas Baumgartner-Kis
2016-07-13  7:21 ` Duncan
2016-07-13 10:24 Tamas Baumgartner-Kis
2016-07-13 16:28 ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.