All of lore.kernel.org
 help / color / mirror / Atom feed
* paused balance convert from raid1 can no longer be a writeable mount
@ 2015-02-04  7:02 Chris Murphy
  2015-02-04 20:53 ` Chris Murphy
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Murphy @ 2015-02-04  7:02 UTC (permalink / raw)
  To: Btrfs BTRFS

Problem occurs with 3.19.0-0.rc7.git0.1.fc22.x86_64, no regression
testing or attempt to reproduce has been done yet. But the file system
isn't particularly old.

Steps 1-6 occur with kernel 3.16 through 3.19 with no errors.

1. mkfs.btrfs -draid1 -mraid1 /dev/sd[bc]  ## btrfs-progs ~3.16 or 3.17
2. mount /dev/sdb /mnt/btr
3. copy some files to /mnt/btr
4. unmount /mnt/btr
5. Disconnect /dev/sdc

Steps 6-10 occur only with kernel 3.19

6. mount -odegraded /dev/sdb /mnt/btr
7. btrfs balance start -dconvert=single -mconvert=single -f /mnt/btr
8. In another shell, btrfs balance pause /mnt/btr
9. Wait for pause confirmation in 1st shell, then umount /mnt/btr
10. mount -odegraded /dev/sdb /mnt/btr

-msingle=dup was disallowed so I chose single

[ 2029.715092] BTRFS error (device sdc): unable to start balance with
target metadata profile 32


Result when mounting:

[39691.150313] BTRFS info (device sdb): allowing degraded mounts
[39691.152501] BTRFS info (device sdb): disk space caching is enabled
[39693.756987] BTRFS: too many missing devices, writeable mount is not allowed
[39693.778349] BTRFS: open_ctree failed

I have no reason to think this is a regression, but haven't tried
older kernels yet.

Additional information:


[ 5719.840900] BTRFS info (device sdc): found 16 extents
[ 6097.761142] usb 1-1.4: USB disconnect, device number 4
[ 6097.774052] sd 3:0:0:0: [sdc] Synchronizing SCSI cache
[ 6097.783575] sd 3:0:0:0: [sdc] Synchronize Cache(10) failed: Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

5719 is about the time of the balance pause. I don't know the meaning
of the last two messages or their implication in possibly causing the
problem.


[root@f22s ~]# btrfs check /dev/sdb
warning, device 2 is missing
warning devid 2 not found already
Checking filesystem on /dev/sdb
UUID: 0f1c615f-30a0-4166-8a3c-987849551513
checking extents
checking free space cache
Error reading 476011409408, -1
failed to load free space cache for block group 476368076800
checking fs roots
checking csums
checking root refs
found 164679408219 bytes used err is 0
total csum bytes: 354762924
total tree bytes: 608239616
total fs tree bytes: 139395072
total extent tree bytes: 58785792
btree space waste bytes: 84024816
file data blocks allocated: 378008100864
 referenced 385864163328
Btrfs v3.18.2

No change with -orecovery,degraded; -oro,degraded does mount.

btrfs-image -c9 -t4 uses 100% CPU and hangs indefinitely, 353MB image here:
https://drive.google.com/file/d/0B_2Asp8DGjJ9b2p0aUpGUTVzVU0/view?pli=1

bug report for writeable mount fail is here, includes dmesg
https://bugzilla.kernel.org/show_bug.cgi?id=92641

separate bug report for btrfs-image hang (includes strace), here
https://bugzilla.kernel.org/show_bug.cgi?id=92651




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: paused balance convert from raid1 can no longer be a writeable mount
  2015-02-04  7:02 paused balance convert from raid1 can no longer be a writeable mount Chris Murphy
@ 2015-02-04 20:53 ` Chris Murphy
  2015-02-05  2:38   ` Zygo Blaxell
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Murphy @ 2015-02-04 20:53 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

This is completely reproducible with a brand new file system created
as raid1, using kernel 3.19 and btrfs-progs 3.18.

The conversion from raid1 to single, if paused, will apparently break
the file system's ability to be subsequently mounted writable. And
further, btrfs-image fails. I've updated the bug report.
https://bugzilla.kernel.org/show_bug.cgi?id=92641

First, the conversion from data/metadata raid1 should be faster than
requiring fully reading and writing the file system. As this is a 2
device raid1, each is already effectively data/metadata single, so I'm
not sure why anything other than metadata needs rewriting.

Second, either what I'm doing should be disallowed (user can't force
conversion of a degraded array to single), or the file system
shouldn't break like this.

Third, the error message is confusing "too many missing devices,
writeable mount is not allowed" the first part of that is definitely
not true. How can there be too many missing devices when it started
out as a 2 device volume and the remaining device isn't an ro or seed
device?


Chris Murphy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: paused balance convert from raid1 can no longer be a writeable mount
  2015-02-04 20:53 ` Chris Murphy
@ 2015-02-05  2:38   ` Zygo Blaxell
  2015-02-05  3:00     ` Chris Murphy
  0 siblings, 1 reply; 4+ messages in thread
From: Zygo Blaxell @ 2015-02-05  2:38 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 2067 bytes --]

On Wed, Feb 04, 2015 at 01:53:09PM -0700, Chris Murphy wrote:
> This is completely reproducible with a brand new file system created
> as raid1, using kernel 3.19 and btrfs-progs 3.18.

I think you'll find it's reproducible with any kernel after 3.8-rc1
(circa October 2012).

> The conversion from raid1 to single, if paused, will apparently break
> the file system's ability to be subsequently mounted writable. 

Only if you remove a disk (or one fails).

> And
> further, btrfs-image fails. I've updated the bug report.
> https://bugzilla.kernel.org/show_bug.cgi?id=92641
> 
> First, the conversion from data/metadata raid1 should be faster than
> requiring fully reading and writing the file system. As this is a 2
> device raid1, each is already effectively data/metadata single, so I'm
> not sure why anything other than metadata needs rewriting.
> 
> Second, either what I'm doing should be disallowed (user can't force
> conversion of a degraded array to single), or the file system
> shouldn't break like this.
> 
> Third, the error message is confusing "too many missing devices,
> writeable mount is not allowed" the first part of that is definitely
> not true. How can there be too many missing devices when it started
> out as a 2 device volume and the remaining device isn't an ro or seed
> device?

I'd point out bug #60594, but it seems you've already been there.
I bumped into the same bug myself.

The problem is that one is more than the maximum number of missing devices
for the single profile, and you are missing one disk, so the filesystem
gives up.  It doesn't check that all the single chunks are on currently
present disks.

If you revert commit 292fd7fc39aa06668f3a8db546714e727120cb3e
you might be able to finish the balance and resume non-degraded read-write
operation.

> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: paused balance convert from raid1 can no longer be a writeable mount
  2015-02-05  2:38   ` Zygo Blaxell
@ 2015-02-05  3:00     ` Chris Murphy
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Murphy @ 2015-02-05  3:00 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: Chris Murphy, Btrfs BTRFS

On Wed, Feb 4, 2015 at 7:38 PM, Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
> On Wed, Feb 04, 2015 at 01:53:09PM -0700, Chris Murphy wrote:
>> This is completely reproducible with a brand new file system created
>> as raid1, using kernel 3.19 and btrfs-progs 3.18.
>
> I think you'll find it's reproducible with any kernel after 3.8-rc1
> (circa October 2012).
>
>> The conversion from raid1 to single, if paused, will apparently break
>> the file system's ability to be subsequently mounted writable.
>
> Only if you remove a disk (or one fails).

Conversion is done while degraded, so yes.

> The problem is that one is more than the maximum number of missing devices
> for the single profile, and you are missing one disk, so the filesystem
> gives up.  It doesn't check that all the single chunks are on currently
> present disks.
>
> If you revert commit 292fd7fc39aa06668f3a8db546714e727120cb3e
> you might be able to finish the balance and resume non-degraded read-write
> operation.

Good to know. In my case it's a throw away file system. I guess the
current work around is to not force conversion down to single unless
for sure it won't be interrupted.

I haven't tested it but hopefully conversion of degraded raid1 to
raid10/5/6 can successfully be done. I can see someone with a raid1
say, oh screw it, just add more drives recover to raid5 rather than a
long raid1 rebuild followed by a raid5/6 conversion.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-02-05  3:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-04  7:02 paused balance convert from raid1 can no longer be a writeable mount Chris Murphy
2015-02-04 20:53 ` Chris Murphy
2015-02-05  2:38   ` Zygo Blaxell
2015-02-05  3:00     ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.