All of lore.kernel.org
 help / color / mirror / Atom feed
* multi-device btrfs with single data mode and disk failure
@ 2016-09-15  7:44 Alexandre Poux
  2016-09-15 15:38 ` Chris Murphy
  0 siblings, 1 reply; 26+ messages in thread
From: Alexandre Poux @ 2016-09-15  7:44 UTC (permalink / raw)
  To: linux-btrfs

I had a btrfs partition on a 6 disk array without raid (metadata in
raid10, but data in single), and one of the disks just died.

So I lost some of my data, ok, I knew that.

But two question :

  *

    Is it possible to know (using metadata I suppose) what data I have
    lost ?

  *

    Is it possible to do some king of a "btrfs delete missing" on this
    kind of setup, in order to recover access in rw to my other data, or
    I must copy all my data on a new partition

I haven't been able to get any answer on google or in the wiki, so I
send here an e-mail, hoping that's the right place. Excuse me, if I'm wrong.

Thank you for any help

(Sorry for my poor english)

uname -a :
Linux Grand-PC 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016
x86_64 GNU/Linux

btrfs --version :
btrfs-progs v4.7.1

btrfs fi show :
Label: 'Data'  uuid: 62db560b-a040-4c64-b613-6e7db033dc4d
    Total devices 6 FS bytes used 6.66TiB
    devid    1 size 2.53TiB used 2.12TiB path /dev/sdd6
    devid    7 size 2.53TiB used 2.12TiB path /dev/sdb6
    devid    9 size 262.57GiB used 0.00B path /dev/sde6
    devid   11 size 2.53TiB used 2.12TiB path /dev/sdc6
    devid   12 size 728.32GiB used 312.03GiB path /dev/sda6
    *** Some devices missing


mount -o recovery,ro,degraded /dev/sda6 /Data

relevant part of dmesg :
[ 1828.093704] BTRFS warning (device sda6): 'recovery' is deprecated,
use 'usebackuproot' instead
[ 1828.093708] BTRFS info (device sda6): trying to use backup root at
mount time
[ 1828.093718] BTRFS info (device sda6): allowing degraded mounts
[ 1828.093719] BTRFS info (device sda6): disk space caching is enabled
[ 1828.107763] BTRFS warning (device sda6): devid 8 uuid
950378c0-307c-413d-9805-ab2bb899aa78 missing

btrfs fi df /Data
Data, single: total=6.65TiB, used=6.65TiB
System, RAID1: total=32.00MiB, used=768.00KiB
Metadata, RAID1: total=13.00GiB, used=10.99GiB
GlobalReserve, single: total=512.00MiB, used=0.00B




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-15  7:44 multi-device btrfs with single data mode and disk failure Alexandre Poux
@ 2016-09-15 15:38 ` Chris Murphy
  2016-09-15 16:30   ` Alexandre Poux
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Murphy @ 2016-09-15 15:38 UTC (permalink / raw)
  To: Alexandre Poux; +Cc: Btrfs BTRFS

On Thu, Sep 15, 2016 at 1:44 AM, Alexandre Poux <pums974@gmail.com> wrote:
> I had a btrfs partition on a 6 disk array without raid (metadata in
> raid10, but data in single), and one of the disks just died.
>
> So I lost some of my data, ok, I knew that.
>
> But two question :
>
>   *
>
>     Is it possible to know (using metadata I suppose) what data I have
>     lost ?

The safest option is to remount read only and do a read only scrub.
That will spit out messages for corrupt (missing) metadata and data,
to the kernel message buffer.  The  missing data will appear as
corrupt files that can't be fixed with full file paths. There will
likely be so many that dmesg will be useless so you'll need to use
journalctl -fk to follow the scrub; or journalctl -bk after the fact
or even -b-1 -k or -b-2 -k, etc.; or /var/log/messages as it's
probably going to exceed the kernel message buffer, so dmesg won't
help.




>     Is it possible to do some king of a "btrfs delete missing" on this
>     kind of setup, in order to recover access in rw to my other data, or
>     I must copy all my data on a new partition

That *should* work :) Except that your file system with 6 drives is
too full to be shrunk to 5 drives. Btrfs will either refuse, or get
confused, about how to shrink a nearly full 6 drive volume into 5.

So you'll have to do one of three things:

1. Add a 2+TB drive, then remove the missing one; OR
2. btrfs replace is faster and is raid10 reliable; OR
3. Read only scrub to get a file listing of bad files, then remount
read-write degraded and delete them all. Now you maybe can do a device
delete missing. But it's still a tight fit, it basically has to
balance things out to get it to fit on an odd number of drives, it may
actually not work even though there seems to be enough total space,
there has to be enough space on FOUR drives.


I'd go with option 2.  And that should still spit out the paths to bad
files. If the replace works, I'm pretty sure you still need to delete
all of the files that are missing in order to get rid of the
corruption warnings on any subsequent scrub or balance.



>
> btrfs --version :
> btrfs-progs v4.7.1

You should upgrade to 4.7.2 or downgrade to 4.6.1 before doing btrfs
check. Not urgent so long as you don't actually do a repair with this
version.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-15 15:38 ` Chris Murphy
@ 2016-09-15 16:30   ` Alexandre Poux
  2016-09-15 16:54     ` Chris Murphy
  0 siblings, 1 reply; 26+ messages in thread
From: Alexandre Poux @ 2016-09-15 16:30 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Thank you very much for your answers

Le 15/09/2016 à 17:38, Chris Murphy a écrit :
> On Thu, Sep 15, 2016 at 1:44 AM, Alexandre Poux <pums974@gmail.com> wrote:
>>     Is it possible to do some king of a "btrfs delete missing" on this
>>     kind of setup, in order to recover access in rw to my other data, or
>>     I must copy all my data on a new partition
> That *should* work :) Except that your file system with 6 drives is
> too full to be shrunk to 5 drives. Btrfs will either refuse, or get
> confused, about how to shrink a nearly full 6 drive volume into 5.
>
> So you'll have to do one of three things:
>
> 1. Add a 2+TB drive, then remove the missing one; OR
> 2. btrfs replace is faster and is raid10 reliable; OR
> 3. Read only scrub to get a file listing of bad files, then remount
> read-write degraded and delete them all. Now you maybe can do a device
> delete missing. But it's still a tight fit, it basically has to
> balance things out to get it to fit on an odd number of drives, it may
> actually not work even though there seems to be enough total space,
> there has to be enough space on FOUR drives.
>
Are you sure you are talking about data in single mode ?
I don't understand why you are talking about raid10,
or the fact that it will have to rebalance everything.

Moreover, even in degraded mode I cannot mount it in rw
It tells me
"too many missing devices, writeable remount is not allowed"
due to the fact I'm in single mode.

And as far as as know, btrfs replace and btrfs delete, are not supposed
to work in read only...

I would like to tell him forgot about the missing data, and give me back
my partition.

In fact I'm pretty sure, there was no data at all on the dead device,
only metadata in raid1.
I'm currently scrubing to be absolutely sure


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-15 16:30   ` Alexandre Poux
@ 2016-09-15 16:54     ` Chris Murphy
       [not found]       ` <760be1b7-79b2-a25d-7c60-04ceac1b6e40@gmail.com>
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Murphy @ 2016-09-15 16:54 UTC (permalink / raw)
  To: Alexandre Poux; +Cc: Chris Murphy, Btrfs BTRFS

On Thu, Sep 15, 2016 at 10:30 AM, Alexandre Poux <pums974@gmail.com> wrote:
> Thank you very much for your answers
>
> Le 15/09/2016 à 17:38, Chris Murphy a écrit :
>> On Thu, Sep 15, 2016 at 1:44 AM, Alexandre Poux <pums974@gmail.com> wrote:
>>>     Is it possible to do some king of a "btrfs delete missing" on this
>>>     kind of setup, in order to recover access in rw to my other data, or
>>>     I must copy all my data on a new partition
>> That *should* work :) Except that your file system with 6 drives is
>> too full to be shrunk to 5 drives. Btrfs will either refuse, or get
>> confused, about how to shrink a nearly full 6 drive volume into 5.
>>
>> So you'll have to do one of three things:
>>
>> 1. Add a 2+TB drive, then remove the missing one; OR
>> 2. btrfs replace is faster and is raid10 reliable; OR
>> 3. Read only scrub to get a file listing of bad files, then remount
>> read-write degraded and delete them all. Now you maybe can do a device
>> delete missing. But it's still a tight fit, it basically has to
>> balance things out to get it to fit on an odd number of drives, it may
>> actually not work even though there seems to be enough total space,
>> there has to be enough space on FOUR drives.
>>
> Are you sure you are talking about data in single mode ?
> I don't understand why you are talking about raid10,
> or the fact that it will have to rebalance everything.

Yeah sorry I got confused in that very last sentence. Single, it will
find space in 1GiB increments. Of course this fails because that data
doesn't exist anymore, but to start the operation it needs to be
possible.


>
> Moreover, even in degraded mode I cannot mount it in rw
> It tells me
> "too many missing devices, writeable remount is not allowed"
> due to the fact I'm in single mode.

Oh you're in that trap. Well now you're stuck. I've had the case where
I could mount read write degraded with metadata raid1 and data single,
but it was good for only one mount and then I get the same message you
get and it was only possible to mount read only. At that point it's
totally suck unless you're adept at manipulating the file system with
a hex editor...

Someone might have a patch somewhere that drops this check and lets
too many missing devices to mount anyway... I seem to recall this.
It'd be in the archives if it exists.



> And as far as as know, btrfs replace and btrfs delete, are not supposed
> to work in read only...

It doesn't. Must be read write mounted.


>
> I would like to tell him forgot about the missing data, and give me back
> my partition.

This feature doesn't exist yet. I really want to see this, it'd be
great for ceph and gluster if the volume could lose a drive, report
all the missing files to the cluster file system, delete the device
and the file references, and then the cluster knows that brick doesn't
have those files and can replicate them somewhere else or even back to
the brick that had them.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
       [not found]       ` <760be1b7-79b2-a25d-7c60-04ceac1b6e40@gmail.com>
@ 2016-09-15 21:54         ` Chris Murphy
  2016-09-19 22:05           ` Alexandre Poux
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Murphy @ 2016-09-15 21:54 UTC (permalink / raw)
  To: Alexandre Poux; +Cc: Btrfs BTRFS

On Thu, Sep 15, 2016 at 3:48 PM, Alexandre Poux <pums974@gmail.com> wrote:
>
> Le 15/09/2016 à 18:54, Chris Murphy a écrit :
>> On Thu, Sep 15, 2016 at 10:30 AM, Alexandre Poux <pums974@gmail.com> wrote:
>>> Thank you very much for your answers
>>>
>>> Le 15/09/2016 à 17:38, Chris Murphy a écrit :
>>>> On Thu, Sep 15, 2016 at 1:44 AM, Alexandre Poux <pums974@gmail.com> wrote:
>>>>>     Is it possible to do some king of a "btrfs delete missing" on this
>>>>>     kind of setup, in order to recover access in rw to my other data, or
>>>>>     I must copy all my data on a new partition
>>>> That *should* work :) Except that your file system with 6 drives is
>>>> too full to be shrunk to 5 drives. Btrfs will either refuse, or get
>>>> confused, about how to shrink a nearly full 6 drive volume into 5.
>>>>
>>>> So you'll have to do one of three things:
>>>>
>>>> 1. Add a 2+TB drive, then remove the missing one; OR
>>>> 2. btrfs replace is faster and is raid10 reliable; OR
>>>> 3. Read only scrub to get a file listing of bad files, then remount
>>>> read-write degraded and delete them all. Now you maybe can do a device
>>>> delete missing. But it's still a tight fit, it basically has to
>>>> balance things out to get it to fit on an odd number of drives, it may
>>>> actually not work even though there seems to be enough total space,
>>>> there has to be enough space on FOUR drives.
>>>>
>>> Are you sure you are talking about data in single mode ?
>>> I don't understand why you are talking about raid10,
>>> or the fact that it will have to rebalance everything.
>> Yeah sorry I got confused in that very last sentence. Single, it will
>> find space in 1GiB increments. Of course this fails because that data
>> doesn't exist anymore, but to start the operation it needs to be
>> possible.
> No problem
>>> Moreover, even in degraded mode I cannot mount it in rw
>>> It tells me
>>> "too many missing devices, writeable remount is not allowed"
>>> due to the fact I'm in single mode.
>> Oh you're in that trap. Well now you're stuck. I've had the case where
>> I could mount read write degraded with metadata raid1 and data single,
>> but it was good for only one mount and then I get the same message you
>> get and it was only possible to mount read only. At that point it's
>> totally suck unless you're adept at manipulating the file system with
>> a hex editor...
>>
>> Someone might have a patch somewhere that drops this check and lets
>> too many missing devices to mount anyway... I seem to recall this.
>> It'd be in the archives if it exists.
>>
>>
>>
>>> And as far as as know, btrfs replace and btrfs delete, are not supposed
>>> to work in read only...
>> It doesn't. Must be read write mounted.
>>
>>
>>> I would like to tell him forgot about the missing data, and give me back
>>> my partition.
>> This feature doesn't exist yet. I really want to see this, it'd be
>> great for ceph and gluster if the volume could lose a drive, report
>> all the missing files to the cluster file system, delete the device
>> and the file references, and then the cluster knows that brick doesn't
>> have those files and can replicate them somewhere else or even back to
>> the brick that had them.
>>
> So I found this patch : https://patchwork.kernel.org/patch/7014141/
>
> Does this seems ok ?

No idea I haven't tried it.

>
> So after patching my kernel with it,
> I should be able to mount in rw my partition, and thus,
> I will be able to do a btrfs delete missing
> Which will just forgot about the old disk and everything should be fine
> afterward ?

It will forget about the old disk but it will try to migrate all
metadata and data that was on that disk to the remaining drives; so
until you delete all files that are corrupt, you'll continue to get
corruption messages about them.

>
> Is this risky ? or not so much ?

Probably. If you care about the data, mount read only, back up what
you can, then see if you can fix it after that.

> The scrubing is almost finished, and as I was expecting, I lost no data
> at all.

Well I'd guess the device delete should work then, but I still have no
idea if that patch will let you mount it degraded read-write. Worth a
shot though, it'll save time.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-15 21:54         ` Chris Murphy
@ 2016-09-19 22:05           ` Alexandre Poux
  2016-09-20 17:03             ` Alexandre Poux
  0 siblings, 1 reply; 26+ messages in thread
From: Alexandre Poux @ 2016-09-19 22:05 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS



Le 15/09/2016 à 23:54, Chris Murphy a écrit :
> On Thu, Sep 15, 2016 at 3:48 PM, Alexandre Poux <pums974@gmail.com> wrote:
>> Le 15/09/2016 à 18:54, Chris Murphy a écrit :
>>> On Thu, Sep 15, 2016 at 10:30 AM, Alexandre Poux <pums974@gmail.com> wrote:
>>>> Thank you very much for your answers
>>>>
>>>> Le 15/09/2016 à 17:38, Chris Murphy a écrit :
>>>>> On Thu, Sep 15, 2016 at 1:44 AM, Alexandre Poux <pums974@gmail.com> wrote:
>>>>>>     Is it possible to do some king of a "btrfs delete missing" on this
>>>>>>     kind of setup, in order to recover access in rw to my other data, or
>>>>>>     I must copy all my data on a new partition
>>>>> That *should* work :) Except that your file system with 6 drives is
>>>>> too full to be shrunk to 5 drives. Btrfs will either refuse, or get
>>>>> confused, about how to shrink a nearly full 6 drive volume into 5.
>>>>>
>>>>> So you'll have to do one of three things:
>>>>>
>>>>> 1. Add a 2+TB drive, then remove the missing one; OR
>>>>> 2. btrfs replace is faster and is raid10 reliable; OR
>>>>> 3. Read only scrub to get a file listing of bad files, then remount
>>>>> read-write degraded and delete them all. Now you maybe can do a device
>>>>> delete missing. But it's still a tight fit, it basically has to
>>>>> balance things out to get it to fit on an odd number of drives, it may
>>>>> actually not work even though there seems to be enough total space,
>>>>> there has to be enough space on FOUR drives.
>>>>>
>>>> Are you sure you are talking about data in single mode ?
>>>> I don't understand why you are talking about raid10,
>>>> or the fact that it will have to rebalance everything.
>>> Yeah sorry I got confused in that very last sentence. Single, it will
>>> find space in 1GiB increments. Of course this fails because that data
>>> doesn't exist anymore, but to start the operation it needs to be
>>> possible.
>> No problem
>>>> Moreover, even in degraded mode I cannot mount it in rw
>>>> It tells me
>>>> "too many missing devices, writeable remount is not allowed"
>>>> due to the fact I'm in single mode.
>>> Oh you're in that trap. Well now you're stuck. I've had the case where
>>> I could mount read write degraded with metadata raid1 and data single,
>>> but it was good for only one mount and then I get the same message you
>>> get and it was only possible to mount read only. At that point it's
>>> totally suck unless you're adept at manipulating the file system with
>>> a hex editor...
>>>
>>> Someone might have a patch somewhere that drops this check and lets
>>> too many missing devices to mount anyway... I seem to recall this.
>>> It'd be in the archives if it exists.
>>>
>>>
>>>
>>>> And as far as as know, btrfs replace and btrfs delete, are not supposed
>>>> to work in read only...
>>> It doesn't. Must be read write mounted.
>>>
>>>
>>>> I would like to tell him forgot about the missing data, and give me back
>>>> my partition.
>>> This feature doesn't exist yet. I really want to see this, it'd be
>>> great for ceph and gluster if the volume could lose a drive, report
>>> all the missing files to the cluster file system, delete the device
>>> and the file references, and then the cluster knows that brick doesn't
>>> have those files and can replicate them somewhere else or even back to
>>> the brick that had them.
>>>
>> So I found this patch : https://patchwork.kernel.org/patch/7014141/
>>
>> Does this seems ok ?
> No idea I haven't tried it.
>
>> So after patching my kernel with it,
>> I should be able to mount in rw my partition, and thus,
>> I will be able to do a btrfs delete missing
>> Which will just forgot about the old disk and everything should be fine
>> afterward ?
> It will forget about the old disk but it will try to migrate all
> metadata and data that was on that disk to the remaining drives; so
> until you delete all files that are corrupt, you'll continue to get
> corruption messages about them.
>
>> Is this risky ? or not so much ?
> Probably. If you care about the data, mount read only, back up what
> you can, then see if you can fix it after that.
>
>> The scrubing is almost finished, and as I was expecting, I lost no data
>> at all.
> Well I'd guess the device delete should work then, but I still have no
> idea if that patch will let you mount it degraded read-write. Worth a
> shot though, it'll save time.
>
OK, so I found some time to work on it.

I decided to do some tests in a vm (virtualbox) with 3 disks
after making an array with 3 disks, metadata in raid1 and data in single,
I remove one disk to reproduce my situation.

I tried the patch, and, after updated it (nothing fancy),
I can indeed mount a degraded partition with data in single.

But I can't remove the device :
#btrfs device remove missing /mnt
ERROR: error removing device 'missing': Input/output error
or
#btrfs device remove 2 /mnt
ERROR: error removing devid 2: Input/output error

replace doesn't work either
btrfs replace start -B 2 /dev/sdb /mnt
BTRFS error (device sda2): btrfs_scrub_dev(<missing disk>, 2, /dev/sdb)
failed -12
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": Cannot allocate
memory, <illegal result value>

but scrub works, and I can add and remove an another device if I want.
It's just the missing device that I cant get rid of.

Any other idea ?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-19 22:05           ` Alexandre Poux
@ 2016-09-20 17:03             ` Alexandre Poux
  2016-09-20 17:54               ` Chris Murphy
  0 siblings, 1 reply; 26+ messages in thread
From: Alexandre Poux @ 2016-09-20 17:03 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS



Le 20/09/2016 à 00:05, Alexandre Poux a écrit :
>
> Le 15/09/2016 à 23:54, Chris Murphy a écrit :
>> On Thu, Sep 15, 2016 at 3:48 PM, Alexandre Poux <pums974@gmail.com> wrote:
>>> Le 15/09/2016 à 18:54, Chris Murphy a écrit :
>>>> On Thu, Sep 15, 2016 at 10:30 AM, Alexandre Poux <pums974@gmail.com> wrote:
>>>>> Thank you very much for your answers
>>>>>
>>>>> Le 15/09/2016 à 17:38, Chris Murphy a écrit :
>>>>>> On Thu, Sep 15, 2016 at 1:44 AM, Alexandre Poux <pums974@gmail.com> wrote:
>>>>>>>     Is it possible to do some king of a "btrfs delete missing" on this
>>>>>>>     kind of setup, in order to recover access in rw to my other data, or
>>>>>>>     I must copy all my data on a new partition
>>>>>> That *should* work :) Except that your file system with 6 drives is
>>>>>> too full to be shrunk to 5 drives. Btrfs will either refuse, or get
>>>>>> confused, about how to shrink a nearly full 6 drive volume into 5.
>>>>>>
>>>>>> So you'll have to do one of three things:
>>>>>>
>>>>>> 1. Add a 2+TB drive, then remove the missing one; OR
>>>>>> 2. btrfs replace is faster and is raid10 reliable; OR
>>>>>> 3. Read only scrub to get a file listing of bad files, then remount
>>>>>> read-write degraded and delete them all. Now you maybe can do a device
>>>>>> delete missing. But it's still a tight fit, it basically has to
>>>>>> balance things out to get it to fit on an odd number of drives, it may
>>>>>> actually not work even though there seems to be enough total space,
>>>>>> there has to be enough space on FOUR drives.
>>>>>>
>>>>> Are you sure you are talking about data in single mode ?
>>>>> I don't understand why you are talking about raid10,
>>>>> or the fact that it will have to rebalance everything.
>>>> Yeah sorry I got confused in that very last sentence. Single, it will
>>>> find space in 1GiB increments. Of course this fails because that data
>>>> doesn't exist anymore, but to start the operation it needs to be
>>>> possible.
>>> No problem
>>>>> Moreover, even in degraded mode I cannot mount it in rw
>>>>> It tells me
>>>>> "too many missing devices, writeable remount is not allowed"
>>>>> due to the fact I'm in single mode.
>>>> Oh you're in that trap. Well now you're stuck. I've had the case where
>>>> I could mount read write degraded with metadata raid1 and data single,
>>>> but it was good for only one mount and then I get the same message you
>>>> get and it was only possible to mount read only. At that point it's
>>>> totally suck unless you're adept at manipulating the file system with
>>>> a hex editor...
>>>>
>>>> Someone might have a patch somewhere that drops this check and lets
>>>> too many missing devices to mount anyway... I seem to recall this.
>>>> It'd be in the archives if it exists.
>>>>
>>>>
>>>>
>>>>> And as far as as know, btrfs replace and btrfs delete, are not supposed
>>>>> to work in read only...
>>>> It doesn't. Must be read write mounted.
>>>>
>>>>
>>>>> I would like to tell him forgot about the missing data, and give me back
>>>>> my partition.
>>>> This feature doesn't exist yet. I really want to see this, it'd be
>>>> great for ceph and gluster if the volume could lose a drive, report
>>>> all the missing files to the cluster file system, delete the device
>>>> and the file references, and then the cluster knows that brick doesn't
>>>> have those files and can replicate them somewhere else or even back to
>>>> the brick that had them.
>>>>
>>> So I found this patch : https://patchwork.kernel.org/patch/7014141/
>>>
>>> Does this seems ok ?
>> No idea I haven't tried it.
>>
>>> So after patching my kernel with it,
>>> I should be able to mount in rw my partition, and thus,
>>> I will be able to do a btrfs delete missing
>>> Which will just forgot about the old disk and everything should be fine
>>> afterward ?
>> It will forget about the old disk but it will try to migrate all
>> metadata and data that was on that disk to the remaining drives; so
>> until you delete all files that are corrupt, you'll continue to get
>> corruption messages about them.
>>
>>> Is this risky ? or not so much ?
>> Probably. If you care about the data, mount read only, back up what
>> you can, then see if you can fix it after that.
>>
>>> The scrubing is almost finished, and as I was expecting, I lost no data
>>> at all.
>> Well I'd guess the device delete should work then, but I still have no
>> idea if that patch will let you mount it degraded read-write. Worth a
>> shot though, it'll save time.
>>
> OK, so I found some time to work on it.
>
> I decided to do some tests in a vm (virtualbox) with 3 disks
> after making an array with 3 disks, metadata in raid1 and data in single,
> I remove one disk to reproduce my situation.
>
> I tried the patch, and, after updated it (nothing fancy),
> I can indeed mount a degraded partition with data in single.
>
> But I can't remove the device :
> #btrfs device remove missing /mnt
> ERROR: error removing device 'missing': Input/output error
> or
> #btrfs device remove 2 /mnt
> ERROR: error removing devid 2: Input/output error
>
> replace doesn't work either
> btrfs replace start -B 2 /dev/sdb /mnt
> BTRFS error (device sda2): btrfs_scrub_dev(<missing disk>, 2, /dev/sdb)
> failed -12
> ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": Cannot allocate
> memory, <illegal result value>
>
> but scrub works, and I can add and remove an another device if I want.
> It's just the missing device that I cant get rid of.
>
> Any other idea ?
If I wanted to try to edit my partitions with an hex editor, where would
I find infos on how to do that ?
I really don't want to go this way, but if this is relatively simple, it
may be worth to try.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 17:03             ` Alexandre Poux
@ 2016-09-20 17:54               ` Chris Murphy
  2016-09-20 18:19                 ` Alexandre Poux
  2016-09-20 18:56                 ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 26+ messages in thread
From: Chris Murphy @ 2016-09-20 17:54 UTC (permalink / raw)
  To: Alexandre Poux; +Cc: Chris Murphy, Btrfs BTRFS

On Tue, Sep 20, 2016 at 11:03 AM, Alexandre Poux <pums974@gmail.com> wrote:

> If I wanted to try to edit my partitions with an hex editor, where would
> I find infos on how to do that ?
> I really don't want to go this way, but if this is relatively simple, it
> may be worth to try.

Simple is relative. First you'd need
https://btrfs.wiki.kernel.org/index.php/On-disk_Format to get some
understanding of where things are to edit, and then btrfs-map-logical
to convert btrfs logical addresses to physical device and sector to
know what to edit.

I'd call it distinctly non-trivial and very tedious.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 17:54               ` Chris Murphy
@ 2016-09-20 18:19                 ` Alexandre Poux
  2016-09-20 18:38                   ` Chris Murphy
  2016-09-20 18:56                 ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 26+ messages in thread
From: Alexandre Poux @ 2016-09-20 18:19 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS



Le 20/09/2016 à 19:54, Chris Murphy a écrit :
> On Tue, Sep 20, 2016 at 11:03 AM, Alexandre Poux <pums974@gmail.com> wrote:
>
>> If I wanted to try to edit my partitions with an hex editor, where would
>> I find infos on how to do that ?
>> I really don't want to go this way, but if this is relatively simple, it
>> may be worth to try.
> Simple is relative. First you'd need
> https://btrfs.wiki.kernel.org/index.php/On-disk_Format to get some
> understanding of where things are to edit, and then btrfs-map-logical
> to convert btrfs logical addresses to physical device and sector to
> know what to edit.
>
> I'd call it distinctly non-trivial and very tedious.
>
OK, another idea:
would it be possible to trick btrfs with a manufactured file that the
disk is present while it isn't ?

I mean, looking for a few minutes on the hexdump of my trivial test
partition, header of members of btrfs array seems very alike.
So maybe, I can make a file wich would have enough header to make btrfs
believe that this is my device, and then remove it as usual....
looks like a long shot, but it doesn't hurt to ask....

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 18:19                 ` Alexandre Poux
@ 2016-09-20 18:38                   ` Chris Murphy
  2016-09-20 18:53                     ` Alexandre Poux
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Murphy @ 2016-09-20 18:38 UTC (permalink / raw)
  To: Alexandre Poux; +Cc: Chris Murphy, Btrfs BTRFS

On Tue, Sep 20, 2016 at 12:19 PM, Alexandre Poux <pums974@gmail.com> wrote:
>
>
> Le 20/09/2016 à 19:54, Chris Murphy a écrit :
>> On Tue, Sep 20, 2016 at 11:03 AM, Alexandre Poux <pums974@gmail.com> wrote:
>>
>>> If I wanted to try to edit my partitions with an hex editor, where would
>>> I find infos on how to do that ?
>>> I really don't want to go this way, but if this is relatively simple, it
>>> may be worth to try.
>> Simple is relative. First you'd need
>> https://btrfs.wiki.kernel.org/index.php/On-disk_Format to get some
>> understanding of where things are to edit, and then btrfs-map-logical
>> to convert btrfs logical addresses to physical device and sector to
>> know what to edit.
>>
>> I'd call it distinctly non-trivial and very tedious.
>>
> OK, another idea:
> would it be possible to trick btrfs with a manufactured file that the
> disk is present while it isn't ?
>
> I mean, looking for a few minutes on the hexdump of my trivial test
> partition, header of members of btrfs array seems very alike.
> So maybe, I can make a file wich would have enough header to make btrfs
> believe that this is my device, and then remove it as usual....
> looks like a long shot, but it doesn't hurt to ask....

There may be another test that applies to single profiles, that
disallows dropping a device. I think that's the place to look next.
The superblock is easy to copy, but you'll need the device specific
UUID which should be locatable with btrfs-show-super -f for each
devid. The bigger problem is that Btrfs at mount time doesn't just
look at the superblock and then mount. It actually reads parts of each
tree, the extent of which I don't know. And it's doing a bunch of
sanity tests as it reads those things, including transid (generation).
So I'm not sure how easily spoofable a fake device is going to be.

As a practical matter, migrate it to a new volume is faster and more
reliable. Unfortunately, the inability to mount it read write is going
to prevent you from making read only snapshots to use with btrfs
send/receive. What might work, is find out what on-disk modification
btrfs-tune does to make a device a read-only seed. Again your volume
is missing a device so btrfs-tune won't let you modify it. But if you
could force that to happen, it's probably a very minor change to
metadata on each device, maybe it'll act like a seed device when you
next mount it, in which case you'll be able to add a device and
remount it read write and then delete the seed causing migration of
everything that does remain on the volume over to the new device. I've
never tried anything like this so I have no idea if it'll work. And
even in the best case I haven't tried a multiple device seed going to
a single device sprout (is it even allowed when removing the seed?).
So...more questions than answers.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 18:38                   ` Chris Murphy
@ 2016-09-20 18:53                     ` Alexandre Poux
  2016-09-20 19:11                       ` Chris Murphy
                                         ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Alexandre Poux @ 2016-09-20 18:53 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS



Le 20/09/2016 à 20:38, Chris Murphy a écrit :
> On Tue, Sep 20, 2016 at 12:19 PM, Alexandre Poux <pums974@gmail.com> wrote:
>>
>> Le 20/09/2016 à 19:54, Chris Murphy a écrit :
>>> On Tue, Sep 20, 2016 at 11:03 AM, Alexandre Poux <pums974@gmail.com> wrote:
>>>
>>>> If I wanted to try to edit my partitions with an hex editor, where would
>>>> I find infos on how to do that ?
>>>> I really don't want to go this way, but if this is relatively simple, it
>>>> may be worth to try.
>>> Simple is relative. First you'd need
>>> https://btrfs.wiki.kernel.org/index.php/On-disk_Format to get some
>>> understanding of where things are to edit, and then btrfs-map-logical
>>> to convert btrfs logical addresses to physical device and sector to
>>> know what to edit.
>>>
>>> I'd call it distinctly non-trivial and very tedious.
>>>
>> OK, another idea:
>> would it be possible to trick btrfs with a manufactured file that the
>> disk is present while it isn't ?
>>
>> I mean, looking for a few minutes on the hexdump of my trivial test
>> partition, header of members of btrfs array seems very alike.
>> So maybe, I can make a file wich would have enough header to make btrfs
>> believe that this is my device, and then remove it as usual....
>> looks like a long shot, but it doesn't hurt to ask....
> There may be another test that applies to single profiles, that
> disallows dropping a device. I think that's the place to look next.
> The superblock is easy to copy, but you'll need the device specific
> UUID which should be locatable with btrfs-show-super -f for each
> devid. The bigger problem is that Btrfs at mount time doesn't just
> look at the superblock and then mount. It actually reads parts of each
> tree, the extent of which I don't know. And it's doing a bunch of
> sanity tests as it reads those things, including transid (generation).
> So I'm not sure how easily spoofable a fake device is going to be.
> As a practical matter, migrate it to a new volume is faster and more
> reliable. Unfortunately, the inability to mount it read write is going
> to prevent you from making read only snapshots to use with btrfs
> send/receive. What might work, is find out what on-disk modification
> btrfs-tune does to make a device a read-only seed. Again your volume
> is missing a device so btrfs-tune won't let you modify it. But if you
> could force that to happen, it's probably a very minor change to
> metadata on each device, maybe it'll act like a seed device when you
> next mount it, in which case you'll be able to add a device and
> remount it read write and then delete the seed causing migration of
> everything that does remain on the volume over to the new device. I've
> never tried anything like this so I have no idea if it'll work. And
> even in the best case I haven't tried a multiple device seed going to
> a single device sprout (is it even allowed when removing the seed?).
> So...more questions than answers.
>
Sorry if I wasn't clear, but with the patch mentionned earlyer, I can
get a read write mount.
What I can't do is remove the device.
As for moving data to an another volume, since it's only data and
nothing fancy (no subvolume or anything), a simple rsync would do the trick.
My problem in this case is that I don't have enough available space
elsewhere to move my data.
That's why I'm trying this hard to recover the partition...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 17:54               ` Chris Murphy
  2016-09-20 18:19                 ` Alexandre Poux
@ 2016-09-20 18:56                 ` Austin S. Hemmelgarn
  2016-09-20 19:06                   ` Alexandre Poux
  1 sibling, 1 reply; 26+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-20 18:56 UTC (permalink / raw)
  To: Chris Murphy, Alexandre Poux; +Cc: Btrfs BTRFS

On 2016-09-20 13:54, Chris Murphy wrote:
> On Tue, Sep 20, 2016 at 11:03 AM, Alexandre Poux <pums974@gmail.com> wrote:
>
>> If I wanted to try to edit my partitions with an hex editor, where would
>> I find infos on how to do that ?
>> I really don't want to go this way, but if this is relatively simple, it
>> may be worth to try.
>
> Simple is relative. First you'd need
> https://btrfs.wiki.kernel.org/index.php/On-disk_Format to get some
> understanding of where things are to edit, and then btrfs-map-logical
> to convert btrfs logical addresses to physical device and sector to
> know what to edit.
>
> I'd call it distinctly non-trivial and very tedious.
>
It really is.  I've done this before, but I had a copy of the on-disk 
format documentation, a couple of working filesystems, a full copy of 
the current kernel sources for reference, and about 8 cups of green tea 
(my beverage of choice for staying awake and focused).  I got _really_ 
lucky and it was something that really was simple to fix once I found it 
(it amounted to about 64 bytes of changes, it took me maybe 5 minutes to 
actually correct the issue once I found where it was), but it took me a 
good couple of hours to figure out what to even look for, plus another 
hour just to find it, and I'm not sure I would be able to do it any 
faster if I had to again (unlike doing so for ext4, which is a walk in 
the park by comparison).

TBH the only thing I'd worry about using a hex editor to fix in BTRFS is 
the super-blocks or system chunks, because they're pretty easy to find, 
and usually not all that hard to fix.  In fact, if it hadn't been for 
the fact that I had no backup of the data I would lose by recreating 
that filesystem, and I was _really_ bored that day, I probably wouldn't 
have even tried.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 18:56                 ` Austin S. Hemmelgarn
@ 2016-09-20 19:06                   ` Alexandre Poux
  0 siblings, 0 replies; 26+ messages in thread
From: Alexandre Poux @ 2016-09-20 19:06 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Chris Murphy; +Cc: Btrfs BTRFS

Le 20/09/2016 à 20:56, Austin S. Hemmelgarn a écrit :
> On 2016-09-20 13:54, Chris Murphy wrote:
>> On Tue, Sep 20, 2016 at 11:03 AM, Alexandre Poux <pums974@gmail.com>
>> wrote:
>>
>>> If I wanted to try to edit my partitions with an hex editor, where
>>> would
>>> I find infos on how to do that ?
>>> I really don't want to go this way, but if this is relatively
>>> simple, it
>>> may be worth to try.
>>
>> Simple is relative. First you'd need
>> https://btrfs.wiki.kernel.org/index.php/On-disk_Format to get some
>> understanding of where things are to edit, and then btrfs-map-logical
>> to convert btrfs logical addresses to physical device and sector to
>> know what to edit.
>>
>> I'd call it distinctly non-trivial and very tedious.
>>
> It really is.  I've done this before, but I had a copy of the on-disk
> format documentation, a couple of working filesystems, a full copy of
> the current kernel sources for reference, and about 8 cups of green
> tea (my beverage of choice for staying awake and focused).  I got
> _really_ lucky and it was something that really was simple to fix once
> I found it (it amounted to about 64 bytes of changes, it took me maybe
> 5 minutes to actually correct the issue once I found where it was),
> but it took me a good couple of hours to figure out what to even look
> for, plus another hour just to find it, and I'm not sure I would be
> able to do it any faster if I had to again (unlike doing so for ext4,
> which is a walk in the park by comparison).
>
> TBH the only thing I'd worry about using a hex editor to fix in BTRFS
> is the super-blocks or system chunks, because they're pretty easy to
> find, and usually not all that hard to fix.  In fact, if it hadn't
> been for the fact that I had no backup of the data I would lose by
> recreating that filesystem, and I was _really_ bored that day, I
> probably wouldn't have even tried.
OK I will forget this.
Thank you


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 18:53                     ` Alexandre Poux
@ 2016-09-20 19:11                       ` Chris Murphy
       [not found]                         ` <4e7ec5eb-7fb6-2d19-f29d-82461e2d0bd2@gmail.com>
  2016-09-20 19:43                       ` Austin S. Hemmelgarn
  2016-09-20 20:59                       ` Graham Cobb
  2 siblings, 1 reply; 26+ messages in thread
From: Chris Murphy @ 2016-09-20 19:11 UTC (permalink / raw)
  To: Alexandre Poux; +Cc: Chris Murphy, Btrfs BTRFS

On Tue, Sep 20, 2016 at 12:53 PM, Alexandre Poux <pums974@gmail.com> wrote:
>
>
> Le 20/09/2016 à 20:38, Chris Murphy a écrit :
>> On Tue, Sep 20, 2016 at 12:19 PM, Alexandre Poux <pums974@gmail.com> wrote:
>>>
>>> Le 20/09/2016 à 19:54, Chris Murphy a écrit :
>>>> On Tue, Sep 20, 2016 at 11:03 AM, Alexandre Poux <pums974@gmail.com> wrote:
>>>>
>>>>> If I wanted to try to edit my partitions with an hex editor, where would
>>>>> I find infos on how to do that ?
>>>>> I really don't want to go this way, but if this is relatively simple, it
>>>>> may be worth to try.
>>>> Simple is relative. First you'd need
>>>> https://btrfs.wiki.kernel.org/index.php/On-disk_Format to get some
>>>> understanding of where things are to edit, and then btrfs-map-logical
>>>> to convert btrfs logical addresses to physical device and sector to
>>>> know what to edit.
>>>>
>>>> I'd call it distinctly non-trivial and very tedious.
>>>>
>>> OK, another idea:
>>> would it be possible to trick btrfs with a manufactured file that the
>>> disk is present while it isn't ?
>>>
>>> I mean, looking for a few minutes on the hexdump of my trivial test
>>> partition, header of members of btrfs array seems very alike.
>>> So maybe, I can make a file wich would have enough header to make btrfs
>>> believe that this is my device, and then remove it as usual....
>>> looks like a long shot, but it doesn't hurt to ask....
>> There may be another test that applies to single profiles, that
>> disallows dropping a device. I think that's the place to look next.
>> The superblock is easy to copy, but you'll need the device specific
>> UUID which should be locatable with btrfs-show-super -f for each
>> devid. The bigger problem is that Btrfs at mount time doesn't just
>> look at the superblock and then mount. It actually reads parts of each
>> tree, the extent of which I don't know. And it's doing a bunch of
>> sanity tests as it reads those things, including transid (generation).
>> So I'm not sure how easily spoofable a fake device is going to be.
>> As a practical matter, migrate it to a new volume is faster and more
>> reliable. Unfortunately, the inability to mount it read write is going
>> to prevent you from making read only snapshots to use with btrfs
>> send/receive. What might work, is find out what on-disk modification
>> btrfs-tune does to make a device a read-only seed. Again your volume
>> is missing a device so btrfs-tune won't let you modify it. But if you
>> could force that to happen, it's probably a very minor change to
>> metadata on each device, maybe it'll act like a seed device when you
>> next mount it, in which case you'll be able to add a device and
>> remount it read write and then delete the seed causing migration of
>> everything that does remain on the volume over to the new device. I've
>> never tried anything like this so I have no idea if it'll work. And
>> even in the best case I haven't tried a multiple device seed going to
>> a single device sprout (is it even allowed when removing the seed?).
>> So...more questions than answers.
>>
> Sorry if I wasn't clear, but with the patch mentionned earlyer, I can
> get a read write mount.
> What I can't do is remove the device.
> As for moving data to an another volume, since it's only data and
> nothing fancy (no subvolume or anything), a simple rsync would do the trick.
> My problem in this case is that I don't have enough available space
> elsewhere to move my data.
> That's why I'm trying this hard to recover the partition...

And no backup? Umm, I'd resolve that sooner than anything else. It
should be true that it'll tolerate a read only mount indefinitely, but
read write? Not sure. This sort of edge case isn't well tested at all
seeing as it required changing the kernel to reduce safe guards. So
all bets are off the whole thing could become unmountable, not even
read only, and then it's a scraping job.

I think what you want to do here is reasonable, there's no missing
data on the missing device. If the device were present and you deleted
it, Btrfs would presumably have nothing to migrate, it'd just shrink
the fs, update all supers, wipe the signatures off the device being
removed, that's it. So there's some safeguard in place that's
disallowing the remove missing in this case even though there's no
data or metadata to migrate off the drive.

In another thread about clusters and planned data loss, I describe how
this functionality has a practical real world benefit other than your
particular situation. So it would be nice if it were possible but I
can't tell you what the safe guard is that's preventing it from being
removed, or if it's even just one safeguard.

What do you get for btrfs-debug-tree -t 3 <dev>

That should show the chunk tree, and what I'm wondering if if the
chunk tree has any references to chunks on the missing device. Even if
there are no extents on that device, if there are chunks, that might
be one of the safeguards.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 18:53                     ` Alexandre Poux
  2016-09-20 19:11                       ` Chris Murphy
@ 2016-09-20 19:43                       ` Austin S. Hemmelgarn
  2016-09-20 19:54                         ` Alexandre Poux
  2016-09-20 19:55                         ` Chris Murphy
  2016-09-20 20:59                       ` Graham Cobb
  2 siblings, 2 replies; 26+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-20 19:43 UTC (permalink / raw)
  To: Alexandre Poux, Chris Murphy; +Cc: Btrfs BTRFS

On 2016-09-20 14:53, Alexandre Poux wrote:
>
>
> Le 20/09/2016 à 20:38, Chris Murphy a écrit :
>> On Tue, Sep 20, 2016 at 12:19 PM, Alexandre Poux <pums974@gmail.com> wrote:
>>>
>>> Le 20/09/2016 à 19:54, Chris Murphy a écrit :
>>>> On Tue, Sep 20, 2016 at 11:03 AM, Alexandre Poux <pums974@gmail.com> wrote:
>>>>
>>>>> If I wanted to try to edit my partitions with an hex editor, where would
>>>>> I find infos on how to do that ?
>>>>> I really don't want to go this way, but if this is relatively simple, it
>>>>> may be worth to try.
>>>> Simple is relative. First you'd need
>>>> https://btrfs.wiki.kernel.org/index.php/On-disk_Format to get some
>>>> understanding of where things are to edit, and then btrfs-map-logical
>>>> to convert btrfs logical addresses to physical device and sector to
>>>> know what to edit.
>>>>
>>>> I'd call it distinctly non-trivial and very tedious.
>>>>
>>> OK, another idea:
>>> would it be possible to trick btrfs with a manufactured file that the
>>> disk is present while it isn't ?
>>>
>>> I mean, looking for a few minutes on the hexdump of my trivial test
>>> partition, header of members of btrfs array seems very alike.
>>> So maybe, I can make a file wich would have enough header to make btrfs
>>> believe that this is my device, and then remove it as usual....
>>> looks like a long shot, but it doesn't hurt to ask....
>> There may be another test that applies to single profiles, that
>> disallows dropping a device. I think that's the place to look next.
>> The superblock is easy to copy, but you'll need the device specific
>> UUID which should be locatable with btrfs-show-super -f for each
>> devid. The bigger problem is that Btrfs at mount time doesn't just
>> look at the superblock and then mount. It actually reads parts of each
>> tree, the extent of which I don't know. And it's doing a bunch of
>> sanity tests as it reads those things, including transid (generation).
>> So I'm not sure how easily spoofable a fake device is going to be.
>> As a practical matter, migrate it to a new volume is faster and more
>> reliable. Unfortunately, the inability to mount it read write is going
>> to prevent you from making read only snapshots to use with btrfs
>> send/receive. What might work, is find out what on-disk modification
>> btrfs-tune does to make a device a read-only seed. Again your volume
>> is missing a device so btrfs-tune won't let you modify it. But if you
>> could force that to happen, it's probably a very minor change to
>> metadata on each device, maybe it'll act like a seed device when you
>> next mount it, in which case you'll be able to add a device and
>> remount it read write and then delete the seed causing migration of
>> everything that does remain on the volume over to the new device. I've
>> never tried anything like this so I have no idea if it'll work. And
>> even in the best case I haven't tried a multiple device seed going to
>> a single device sprout (is it even allowed when removing the seed?).
>> So...more questions than answers.
>>
> Sorry if I wasn't clear, but with the patch mentionned earlyer, I can
> get a read write mount.
> What I can't do is remove the device.
> As for moving data to an another volume, since it's only data and
> nothing fancy (no subvolume or anything), a simple rsync would do the trick.
> My problem in this case is that I don't have enough available space
> elsewhere to move my data.
> That's why I'm trying this hard to recover the partition...
First off, as Chris said, if you can read the data and don't already 
have a backup, that should be your first priority.  This really is an 
edge case that's not well tested, and the kernel technically doesn't 
officially support it.

Now, beyond that and his suggestions, there's another option, but it's 
risky, so I wouldn't even think about trying it without a backup (unless 
of course you can trivially regenerate the data).  Multiple devices 
support and online resizing allows for a rather neat trick to regenerate 
a filesystem in place.  The process is pretty simple:
1. Shrink the existing filesystem down to the minimum size possible.
2. Create a new partition in the free space, and format it as a 
temporary BTRFS filesystem.  Ideally, this FS should be mixed mode, and 
ideally single profile.  If you don't have much free space, you can use 
a flash drive to start this temporary filesystem instead.
3. Start copying files from the old filesystem to the temporary one.
4. Once the new filesystem is about 95% full, stop copying, shrink the 
old filesystem again, create a new partition, and add that partition to 
the temporary filesystem.
5. Repeat steps 3-4 until you have everything off of the old filesystem.
6. Re-format the remaining portion of the old filesystem using the 
parameters you want for the replacement filesystem.
7. Start copying files from the temporary filesystem to the new filesystem.
8. As you empty out each temporary partition, remove it from the 
temporary filesystem, delete the partition, and expand the new filesystem.

This takes a while, and is only safe if you have reliable hardware, but 
I've done it before and it works reliably as long as you don't have many 
big files on the old filesystem (things can get complicated if you do). 
The other negative aspect is that if you aren't careful, it's possible 
to get stuck half-way, but in such a case, adding a flash drive to the 
temporary filesystem can usually give you enough extra space to get 
things unstuck.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
       [not found]                         ` <4e7ec5eb-7fb6-2d19-f29d-82461e2d0bd2@gmail.com>
@ 2016-09-20 19:46                           ` Chris Murphy
  2016-09-20 20:18                             ` Alexandre Poux
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Murphy @ 2016-09-20 19:46 UTC (permalink / raw)
  To: Alexandre Poux; +Cc: Chris Murphy, Btrfs BTRFS

On Tue, Sep 20, 2016 at 1:31 PM, Alexandre Poux <pums974@gmail.com> wrote:
>
>
> Le 20/09/2016 à 21:11, Chris Murphy a écrit :

>> And no backup? Umm, I'd resolve that sooner than anything else.
> Yeah you are absolutely right, this was a temporary solution which came
> to be not that temporary.
> And I regret it already...

Well on the bright side, if this were LVM or mdadm linear/concat
array, the whole thing would be toast because any other file system
would have lost too much fs metadata on the missing device.

>>  It
>> should be true that it'll tolerate a read only mount indefinitely, but
>> read write? Not sure. This sort of edge case isn't well tested at all
>> seeing as it required changing the kernel to reduce safe guards. So
>> all bets are off the whole thing could become unmountable, not even
>> read only, and then it's a scraping job.
> I'm not that crazy, I tried the patch inside a virtual machine on
> virtual drives...
> And since it's only virtual, it may not work on the real partition...

Are you sure the virtual setup lacked a CHUNK_ITEM on the missing
device? That might be what pinned it in that case.

You could try some sort of overlay for your remaining drives.
Something like this:
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file

Make sure you understand the gotcha about cloning which applies here:
https://btrfs.wiki.kernel.org/index.php/Gotchas

I think it's safe to use blockdev --setro on every real device  you're
trying to protect from changes. And when mounting you'll at least need
to use device= mount option to explicitly mount each of the overlay
devices. Based on the wiki, I'm wincing, I don't really know for sure
if device mount option is enough to compel Btrfs to only use those
devices and not go off the rails and still use one of the real
devices, but at least if they're setro it won't matter (the mount will
just fail somehow due to write failures).

So now you can try removing the missing device... and see what
happens. You could inspect the overlay files and see what changes were
made.

>> What do you get for btrfs-debug-tree -t 3 <dev>
>>
>> That should show the chunk tree, and what I'm wondering if if the
>> chunk tree has any references to chunks on the missing device. Even if
>> there are no extents on that device, if there are chunks, that might
>> be one of the safeguards.
>>
> You'll find it attached.
> The missing device is the devid 8 (since it's the only one missing in
> btrfs fi show)
> I found it only once line 63

Yeah bummer. Not used for system, data, or metadata chunks at all.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 19:43                       ` Austin S. Hemmelgarn
@ 2016-09-20 19:54                         ` Alexandre Poux
  2016-09-20 20:02                           ` Chris Murphy
  2016-09-20 19:55                         ` Chris Murphy
  1 sibling, 1 reply; 26+ messages in thread
From: Alexandre Poux @ 2016-09-20 19:54 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Chris Murphy; +Cc: Btrfs BTRFS



Le 20/09/2016 à 21:43, Austin S. Hemmelgarn a écrit :
> On 2016-09-20 14:53, Alexandre Poux wrote:
>>
>>
>> Le 20/09/2016 à 20:38, Chris Murphy a écrit :
>>> On Tue, Sep 20, 2016 at 12:19 PM, Alexandre Poux <pums974@gmail.com>
>>> wrote:
>>>>
>>>> Le 20/09/2016 à 19:54, Chris Murphy a écrit :
>>>>> On Tue, Sep 20, 2016 at 11:03 AM, Alexandre Poux
>>>>> <pums974@gmail.com> wrote:
>>>>>
>>>>>> If I wanted to try to edit my partitions with an hex editor,
>>>>>> where would
>>>>>> I find infos on how to do that ?
>>>>>> I really don't want to go this way, but if this is relatively
>>>>>> simple, it
>>>>>> may be worth to try.
>>>>> Simple is relative. First you'd need
>>>>> https://btrfs.wiki.kernel.org/index.php/On-disk_Format to get some
>>>>> understanding of where things are to edit, and then btrfs-map-logical
>>>>> to convert btrfs logical addresses to physical device and sector to
>>>>> know what to edit.
>>>>>
>>>>> I'd call it distinctly non-trivial and very tedious.
>>>>>
>>>> OK, another idea:
>>>> would it be possible to trick btrfs with a manufactured file that the
>>>> disk is present while it isn't ?
>>>>
>>>> I mean, looking for a few minutes on the hexdump of my trivial test
>>>> partition, header of members of btrfs array seems very alike.
>>>> So maybe, I can make a file wich would have enough header to make
>>>> btrfs
>>>> believe that this is my device, and then remove it as usual....
>>>> looks like a long shot, but it doesn't hurt to ask....
>>> There may be another test that applies to single profiles, that
>>> disallows dropping a device. I think that's the place to look next.
>>> The superblock is easy to copy, but you'll need the device specific
>>> UUID which should be locatable with btrfs-show-super -f for each
>>> devid. The bigger problem is that Btrfs at mount time doesn't just
>>> look at the superblock and then mount. It actually reads parts of each
>>> tree, the extent of which I don't know. And it's doing a bunch of
>>> sanity tests as it reads those things, including transid (generation).
>>> So I'm not sure how easily spoofable a fake device is going to be.
>>> As a practical matter, migrate it to a new volume is faster and more
>>> reliable. Unfortunately, the inability to mount it read write is going
>>> to prevent you from making read only snapshots to use with btrfs
>>> send/receive. What might work, is find out what on-disk modification
>>> btrfs-tune does to make a device a read-only seed. Again your volume
>>> is missing a device so btrfs-tune won't let you modify it. But if you
>>> could force that to happen, it's probably a very minor change to
>>> metadata on each device, maybe it'll act like a seed device when you
>>> next mount it, in which case you'll be able to add a device and
>>> remount it read write and then delete the seed causing migration of
>>> everything that does remain on the volume over to the new device. I've
>>> never tried anything like this so I have no idea if it'll work. And
>>> even in the best case I haven't tried a multiple device seed going to
>>> a single device sprout (is it even allowed when removing the seed?).
>>> So...more questions than answers.
>>>
>> Sorry if I wasn't clear, but with the patch mentionned earlyer, I can
>> get a read write mount.
>> What I can't do is remove the device.
>> As for moving data to an another volume, since it's only data and
>> nothing fancy (no subvolume or anything), a simple rsync would do the
>> trick.
>> My problem in this case is that I don't have enough available space
>> elsewhere to move my data.
>> That's why I'm trying this hard to recover the partition...
> First off, as Chris said, if you can read the data and don't already
> have a backup, that should be your first priority.  This really is an
> edge case that's not well tested, and the kernel technically doesn't
> officially support it.
>
> Now, beyond that and his suggestions, there's another option, but it's
> risky, so I wouldn't even think about trying it without a backup
> (unless of course you can trivially regenerate the data).  Multiple
> devices support and online resizing allows for a rather neat trick to
> regenerate a filesystem in place.  The process is pretty simple:
> 1. Shrink the existing filesystem down to the minimum size possible.
> 2. Create a new partition in the free space, and format it as a
> temporary BTRFS filesystem.  Ideally, this FS should be mixed mode,
> and ideally single profile.  If you don't have much free space, you
> can use a flash drive to start this temporary filesystem instead.
> 3. Start copying files from the old filesystem to the temporary one.
> 4. Once the new filesystem is about 95% full, stop copying, shrink the
> old filesystem again, create a new partition, and add that partition
> to the temporary filesystem.
> 5. Repeat steps 3-4 until you have everything off of the old filesystem.
> 6. Re-format the remaining portion of the old filesystem using the
> parameters you want for the replacement filesystem.
> 7. Start copying files from the temporary filesystem to the new
> filesystem.
> 8. As you empty out each temporary partition, remove it from the
> temporary filesystem, delete the partition, and expand the new
> filesystem.
>
> This takes a while, and is only safe if you have reliable hardware,
> but I've done it before and it works reliably as long as you don't
> have many big files on the old filesystem (things can get complicated
> if you do). The other negative aspect is that if you aren't careful,
> it's possible to get stuck half-way, but in such a case, adding a
> flash drive to the temporary filesystem can usually give you enough
> extra space to get things unstuck.
>
OK, good idea, but to be able to do that, I have to use the patch that
allow me to mount the partition in rw, otherwise I won't be able to
shrink it I suppose..
And even with the patch I'm not sure that I won't get an IO error the
same way I get it when I try to remove the device.
I will try it on my virtual machine.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 19:43                       ` Austin S. Hemmelgarn
  2016-09-20 19:54                         ` Alexandre Poux
@ 2016-09-20 19:55                         ` Chris Murphy
  2016-09-21 11:07                           ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 26+ messages in thread
From: Chris Murphy @ 2016-09-20 19:55 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Alexandre Poux, Chris Murphy, Btrfs BTRFS

On Tue, Sep 20, 2016 at 1:43 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

>> First off, as Chris said, if you can read the data and don't already have a
> backup, that should be your first priority.  This really is an edge case
> that's not well tested, and the kernel technically doesn't officially
> support it.
>
> Now, beyond that and his suggestions, there's another option, but it's
> risky, so I wouldn't even think about trying it without a backup (unless of
> course you can trivially regenerate the data).  Multiple devices support and
> online resizing allows for a rather neat trick to regenerate a filesystem in
> place.  The process is pretty simple:
> 1. Shrink the existing filesystem down to the minimum size possible.
> 2. Create a new partition in the free space, and format it as a temporary
> BTRFS filesystem.  Ideally, this FS should be mixed mode, and ideally single
> profile.  If you don't have much free space, you can use a flash drive to
> start this temporary filesystem instead.
> 3. Start copying files from the old filesystem to the temporary one.
> 4. Once the new filesystem is about 95% full, stop copying, shrink the old
> filesystem again, create a new partition, and add that partition to the
> temporary filesystem.
> 5. Repeat steps 3-4 until you have everything off of the old filesystem.
> 6. Re-format the remaining portion of the old filesystem using the
> parameters you want for the replacement filesystem.
> 7. Start copying files from the temporary filesystem to the new filesystem.
> 8. As you empty out each temporary partition, remove it from the temporary
> filesystem, delete the partition, and expand the new filesystem.
>
> This takes a while, and is only safe if you have reliable hardware, but I've
> done it before and it works reliably as long as you don't have many big
> files on the old filesystem (things can get complicated if you do). The
> other negative aspect is that if you aren't careful, it's possible to get
> stuck half-way, but in such a case, adding a flash drive to the temporary
> filesystem can usually give you enough extra space to get things unstuck.

Yes I thought of this also.

Gotcha is that he'll need to apply the patch that allows degraded rw
mounts with a device missing on the actual computer with these drives.
He tested that patch in a VM with virtual devices.

What might be easier is just 'btrfs dev rm /dev/sda6' because that one
has the least amount of data on it:

devid   12 size 728.32GiB used 312.03GiB path /dev/sda6

which should fit on all remaining devices. But, does Btrfs get pissed
at some point that there's this missing device it might want to write
to? I have no idea to what degree this patched kernel permits a lot of
degraded writing.

The other quandary is the file system will do online shrink, but the
kernel can sometimes get pissy about partition map changes on devices
with active volumes, even when using partprobe to update the kernel's
idea of the partition map.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 19:54                         ` Alexandre Poux
@ 2016-09-20 20:02                           ` Chris Murphy
  0 siblings, 0 replies; 26+ messages in thread
From: Chris Murphy @ 2016-09-20 20:02 UTC (permalink / raw)
  To: Alexandre Poux; +Cc: Austin S. Hemmelgarn, Chris Murphy, Btrfs BTRFS

On Tue, Sep 20, 2016 at 1:54 PM, Alexandre Poux <pums974@gmail.com> wrote:
>
> OK, good idea, but to be able to do that, I have to use the patch that
> allow me to mount the partition in rw, otherwise I won't be able to
> shrink it I suppose..
> And even with the patch I'm not sure that I won't get an IO error the
> same way I get it when I try to remove the device.
> I will try it on my virtual machine.

The shrink itself is pretty trivial in that its just moving block
groups around if necessary, it's part of the balance code, there's not
much metadata being changed, just CoW the block groups, and then
update the chunk tree and supers. It is trickier when it comes to
either partition map changes while the fs is still mounted; or doing
it the way I was describing by deleting one of the present devices in
which case you can then just use that now empty partition as a starter
for a new file system.

It's a catch 22 either way.

Note that by default if you don't specify a devid for shrink, it's
only resizing devid1.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 19:46                           ` Chris Murphy
@ 2016-09-20 20:18                             ` Alexandre Poux
  2016-09-20 21:05                               ` Alexandre Poux
  2016-09-20 21:15                               ` Chris Murphy
  0 siblings, 2 replies; 26+ messages in thread
From: Alexandre Poux @ 2016-09-20 20:18 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS



Le 20/09/2016 à 21:46, Chris Murphy a écrit :
> On Tue, Sep 20, 2016 at 1:31 PM, Alexandre Poux <pums974@gmail.com> wrote:
>>
>> Le 20/09/2016 à 21:11, Chris Murphy a écrit :
>>> And no backup? Umm, I'd resolve that sooner than anything else.
>> Yeah you are absolutely right, this was a temporary solution which came
>> to be not that temporary.
>> And I regret it already...
> Well on the bright side, if this were LVM or mdadm linear/concat
> array, the whole thing would be toast because any other file system
> would have lost too much fs metadata on the missing device.
>
>>>  It
>>> should be true that it'll tolerate a read only mount indefinitely, but
>>> read write? Not sure. This sort of edge case isn't well tested at all
>>> seeing as it required changing the kernel to reduce safe guards. So
>>> all bets are off the whole thing could become unmountable, not even
>>> read only, and then it's a scraping job.
>> I'm not that crazy, I tried the patch inside a virtual machine on
>> virtual drives...
>> And since it's only virtual, it may not work on the real partition...
> Are you sure the virtual setup lacked a CHUNK_ITEM on the missing
> device? That might be what pinned it in that case.
In fact in my virtual setup there was more chunk missing (1 metadata 1
System and 1 Data).
I will try to do a setup closer to my real one.
> You could try some sort of overlay for your remaining drives.
> Something like this:
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
>
> Make sure you understand the gotcha about cloning which applies here:
> https://btrfs.wiki.kernel.org/index.php/Gotchas
>
> I think it's safe to use blockdev --setro on every real device  you're
> trying to protect from changes. And when mounting you'll at least need
> to use device= mount option to explicitly mount each of the overlay
> devices. Based on the wiki, I'm wincing, I don't really know for sure
> if device mount option is enough to compel Btrfs to only use those
> devices and not go off the rails and still use one of the real
> devices, but at least if they're setro it won't matter (the mount will
> just fail somehow due to write failures).
>
> So now you can try removing the missing device... and see what
> happens. You could inspect the overlay files and see what changes were
> made.
Wow that looks like nice.
So, if it work, and if we find a way to fix the filesystem inside the vm,
I can use this over the real partion to check if it works before trying
the fix for real.
Nice idea.
>>> What do you get for btrfs-debug-tree -t 3 <dev>
>>>
>>> That should show the chunk tree, and what I'm wondering if if the
>>> chunk tree has any references to chunks on the missing device. Even if
>>> there are no extents on that device, if there are chunks, that might
>>> be one of the safeguards.
>>>
>> You'll find it attached.
>> The missing device is the devid 8 (since it's the only one missing in
>> btrfs fi show)
>> I found it only once line 63
> Yeah bummer. Not used for system, data, or metadata chunks at all.
>
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 18:53                     ` Alexandre Poux
  2016-09-20 19:11                       ` Chris Murphy
  2016-09-20 19:43                       ` Austin S. Hemmelgarn
@ 2016-09-20 20:59                       ` Graham Cobb
  2 siblings, 0 replies; 26+ messages in thread
From: Graham Cobb @ 2016-09-20 20:59 UTC (permalink / raw)
  Cc: Btrfs BTRFS

On 20/09/16 19:53, Alexandre Poux wrote:
> As for moving data to an another volume, since it's only data and
> nothing fancy (no subvolume or anything), a simple rsync would do the trick.
> My problem in this case is that I don't have enough available space
> elsewhere to move my data.
> That's why I'm trying this hard to recover the partition...

I am sure you have already thought about this, but... it might be
easier, and even maybe faster, to backup the data to a cloud server,
then recreate and download again.

Backblaze B2 is very cheap for upload and storage (don't know about
download charges, though).  And rclone works well to handle rsync-style
copies (although you might want to use tar or dar if you need to
preserve file attributes).

And if that works, rclone + B2 might make a reasonable offsite backup
solution for the future!

Graham

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 20:18                             ` Alexandre Poux
@ 2016-09-20 21:05                               ` Alexandre Poux
  2016-09-20 21:15                               ` Chris Murphy
  1 sibling, 0 replies; 26+ messages in thread
From: Alexandre Poux @ 2016-09-20 21:05 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS



Le 20/09/2016 à 22:18, Alexandre Poux a écrit :
>
> Le 20/09/2016 à 21:46, Chris Murphy a écrit :
>> On Tue, Sep 20, 2016 at 1:31 PM, Alexandre Poux <pums974@gmail.com> wrote:
>>> Le 20/09/2016 à 21:11, Chris Murphy a écrit :
>>>> And no backup? Umm, I'd resolve that sooner than anything else.
>>> Yeah you are absolutely right, this was a temporary solution which came
>>> to be not that temporary.
>>> And I regret it already...
>> Well on the bright side, if this were LVM or mdadm linear/concat
>> array, the whole thing would be toast because any other file system
>> would have lost too much fs metadata on the missing device.
>>
>>>>  It
>>>> should be true that it'll tolerate a read only mount indefinitely, but
>>>> read write? Not sure. This sort of edge case isn't well tested at all
>>>> seeing as it required changing the kernel to reduce safe guards. So
>>>> all bets are off the whole thing could become unmountable, not even
>>>> read only, and then it's a scraping job.
>>> I'm not that crazy, I tried the patch inside a virtual machine on
>>> virtual drives...
>>> And since it's only virtual, it may not work on the real partition...
>> Are you sure the virtual setup lacked a CHUNK_ITEM on the missing
>> device? That might be what pinned it in that case.
> In fact in my virtual setup there was more chunk missing (1 metadata 1
> System and 1 Data).
> I will try to do a setup closer to my real one.
Good news, I made a test were in my virtual setup, I was missing no
chunk at all
And in this case, It has no problem to remove it !
What I did is
- make an array with 6 disks (data single, metadata raid1)
- dd if=/dev/zero of=/mnt/somefile bs=64M count=16 # make a 1G file
- use btrfs-debug-tree to identify which device was not used
- shutdown the vm, remove this virtual device, and restart the vm
- mount the array in degraded but with read write thanks to the patched
kernel
- btrfs remove missing
- and voilà !
I will try with something else than /dev/null, but this is very encouraging
Do you think that my test is too trivial ?
Should I try something else before trying on the real partition with the
overlay ?

>> You could try some sort of overlay for your remaining drives.
>> Something like this:
>> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
>>
>> Make sure you understand the gotcha about cloning which applies here:
>> https://btrfs.wiki.kernel.org/index.php/Gotchas
>>
>> I think it's safe to use blockdev --setro on every real device  you're
>> trying to protect from changes. And when mounting you'll at least need
>> to use device= mount option to explicitly mount each of the overlay
>> devices. Based on the wiki, I'm wincing, I don't really know for sure
>> if device mount option is enough to compel Btrfs to only use those
>> devices and not go off the rails and still use one of the real
>> devices, but at least if they're setro it won't matter (the mount will
>> just fail somehow due to write failures).
>>
>> So now you can try removing the missing device... and see what
>> happens. You could inspect the overlay files and see what changes were
>> made.
> Wow that looks like nice.
> So, if it work, and if we find a way to fix the filesystem inside the vm,
> I can use this over the real partion to check if it works before trying
> the fix for real.
> Nice idea.
>>>> What do you get for btrfs-debug-tree -t 3 <dev>
>>>>
>>>> That should show the chunk tree, and what I'm wondering if if the
>>>> chunk tree has any references to chunks on the missing device. Even if
>>>> there are no extents on that device, if there are chunks, that might
>>>> be one of the safeguards.
>>>>
>>> You'll find it attached.
>>> The missing device is the devid 8 (since it's the only one missing in
>>> btrfs fi show)
>>> I found it only once line 63
>> Yeah bummer. Not used for system, data, or metadata chunks at all.
>>
>>
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 20:18                             ` Alexandre Poux
  2016-09-20 21:05                               ` Alexandre Poux
@ 2016-09-20 21:15                               ` Chris Murphy
  2016-09-29 12:55                                 ` Alexandre Poux
  1 sibling, 1 reply; 26+ messages in thread
From: Chris Murphy @ 2016-09-20 21:15 UTC (permalink / raw)
  To: Alexandre Poux; +Cc: Chris Murphy, Btrfs BTRFS

On Tue, Sep 20, 2016 at 2:18 PM, Alexandre Poux <pums974@gmail.com> wrote:
>
>
> Le 20/09/2016 à 21:46, Chris Murphy a écrit :
>> On Tue, Sep 20, 2016 at 1:31 PM, Alexandre Poux <pums974@gmail.com> wrote:
>>>
>>> Le 20/09/2016 à 21:11, Chris Murphy a écrit :
>>>> And no backup? Umm, I'd resolve that sooner than anything else.
>>> Yeah you are absolutely right, this was a temporary solution which came
>>> to be not that temporary.
>>> And I regret it already...
>> Well on the bright side, if this were LVM or mdadm linear/concat
>> array, the whole thing would be toast because any other file system
>> would have lost too much fs metadata on the missing device.
>>
>>>>  It
>>>> should be true that it'll tolerate a read only mount indefinitely, but
>>>> read write? Not sure. This sort of edge case isn't well tested at all
>>>> seeing as it required changing the kernel to reduce safe guards. So
>>>> all bets are off the whole thing could become unmountable, not even
>>>> read only, and then it's a scraping job.
>>> I'm not that crazy, I tried the patch inside a virtual machine on
>>> virtual drives...
>>> And since it's only virtual, it may not work on the real partition...
>> Are you sure the virtual setup lacked a CHUNK_ITEM on the missing
>> device? That might be what pinned it in that case.
> In fact in my virtual setup there was more chunk missing (1 metadata 1
> System and 1 Data).
> I will try to do a setup closer to my real one.

Probably the reason why that missing device has no used chunks is
because it's so small. Btrfs allocates block groups to devices with
the most unallocated space first. Only once the unallocated space is
even (approximately) on all devices would it allocate a block group to
the small device.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 19:55                         ` Chris Murphy
@ 2016-09-21 11:07                           ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 26+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-21 11:07 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Alexandre Poux, Btrfs BTRFS

On 2016-09-20 15:55, Chris Murphy wrote:
> On Tue, Sep 20, 2016 at 1:43 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>
>>> First off, as Chris said, if you can read the data and don't already have a
>> backup, that should be your first priority.  This really is an edge case
>> that's not well tested, and the kernel technically doesn't officially
>> support it.
>>
>> Now, beyond that and his suggestions, there's another option, but it's
>> risky, so I wouldn't even think about trying it without a backup (unless of
>> course you can trivially regenerate the data).  Multiple devices support and
>> online resizing allows for a rather neat trick to regenerate a filesystem in
>> place.  The process is pretty simple:
>> 1. Shrink the existing filesystem down to the minimum size possible.
>> 2. Create a new partition in the free space, and format it as a temporary
>> BTRFS filesystem.  Ideally, this FS should be mixed mode, and ideally single
>> profile.  If you don't have much free space, you can use a flash drive to
>> start this temporary filesystem instead.
>> 3. Start copying files from the old filesystem to the temporary one.
>> 4. Once the new filesystem is about 95% full, stop copying, shrink the old
>> filesystem again, create a new partition, and add that partition to the
>> temporary filesystem.
>> 5. Repeat steps 3-4 until you have everything off of the old filesystem.
>> 6. Re-format the remaining portion of the old filesystem using the
>> parameters you want for the replacement filesystem.
>> 7. Start copying files from the temporary filesystem to the new filesystem.
>> 8. As you empty out each temporary partition, remove it from the temporary
>> filesystem, delete the partition, and expand the new filesystem.
>>
>> This takes a while, and is only safe if you have reliable hardware, but I've
>> done it before and it works reliably as long as you don't have many big
>> files on the old filesystem (things can get complicated if you do). The
>> other negative aspect is that if you aren't careful, it's possible to get
>> stuck half-way, but in such a case, adding a flash drive to the temporary
>> filesystem can usually give you enough extra space to get things unstuck.
>
> Yes I thought of this also.
>
> Gotcha is that he'll need to apply the patch that allows degraded rw
> mounts with a device missing on the actual computer with these drives.
> He tested that patch in a VM with virtual devices.
>
> What might be easier is just 'btrfs dev rm /dev/sda6' because that one
> has the least amount of data on it:
>
> devid   12 size 728.32GiB used 312.03GiB path /dev/sda6
>
> which should fit on all remaining devices. But, does Btrfs get pissed
> at some point that there's this missing device it might want to write
> to? I have no idea to what degree this patched kernel permits a lot of
> degraded writing.
>
> The other quandary is the file system will do online shrink, but the
> kernel can sometimes get pissy about partition map changes on devices
> with active volumes, even when using partprobe to update the kernel's
> idea of the partition map.
>
Excellent point that I forgot to mention.  In my experience, the trick 
is to unmount filesystems from every partition that changed before 
running partprobe, and things usually work (I say usually because if 
your root filesystem is on the device you're re-partitioning, the kernel 
gets a whole lot pickier about changes).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-20 21:15                               ` Chris Murphy
@ 2016-09-29 12:55                                 ` Alexandre Poux
  2016-09-30 23:46                                   ` Alexandre Poux
  0 siblings, 1 reply; 26+ messages in thread
From: Alexandre Poux @ 2016-09-29 12:55 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Hi,

I finally did it : patched the kernel and removed the device.
As expected he did not scream since there was nothing at all on the device.
Now I'm checking that everything is fine:
scrub (in read only)
check (in read only)
but I think that everything will be OK
If not, I will rebuild the array from scratch (I did managed to save my
data)

Thank you both for your guidance.
I think that a warning should be put in the wiki in order for other user
to not do the same mistake I did :
never ever use the single mode

I will try to do it soon

Again thank you

Le 20/09/2016 à 23:15, Chris Murphy a écrit :
> On Tue, Sep 20, 2016 at 2:18 PM, Alexandre Poux <pums974@gmail.com> wrote:
>>
>> Le 20/09/2016 à 21:46, Chris Murphy a écrit :
>>> On Tue, Sep 20, 2016 at 1:31 PM, Alexandre Poux <pums974@gmail.com> wrote:
>>>> Le 20/09/2016 à 21:11, Chris Murphy a écrit :
>>>>> And no backup? Umm, I'd resolve that sooner than anything else.
>>>> Yeah you are absolutely right, this was a temporary solution which came
>>>> to be not that temporary.
>>>> And I regret it already...
>>> Well on the bright side, if this were LVM or mdadm linear/concat
>>> array, the whole thing would be toast because any other file system
>>> would have lost too much fs metadata on the missing device.
>>>
>>>>>  It
>>>>> should be true that it'll tolerate a read only mount indefinitely, but
>>>>> read write? Not sure. This sort of edge case isn't well tested at all
>>>>> seeing as it required changing the kernel to reduce safe guards. So
>>>>> all bets are off the whole thing could become unmountable, not even
>>>>> read only, and then it's a scraping job.
>>>> I'm not that crazy, I tried the patch inside a virtual machine on
>>>> virtual drives...
>>>> And since it's only virtual, it may not work on the real partition...
>>> Are you sure the virtual setup lacked a CHUNK_ITEM on the missing
>>> device? That might be what pinned it in that case.
>> In fact in my virtual setup there was more chunk missing (1 metadata 1
>> System and 1 Data).
>> I will try to do a setup closer to my real one.
> Probably the reason why that missing device has no used chunks is
> because it's so small. Btrfs allocates block groups to devices with
> the most unallocated space first. Only once the unallocated space is
> even (approximately) on all devices would it allocate a block group to
> the small device.
>
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: multi-device btrfs with single data mode and disk failure
  2016-09-29 12:55                                 ` Alexandre Poux
@ 2016-09-30 23:46                                   ` Alexandre Poux
  0 siblings, 0 replies; 26+ messages in thread
From: Alexandre Poux @ 2016-09-30 23:46 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Hello again,

Just a quick question.

I did a full scrub and got no error at all

And a full check that gave me this :

#> btrfs check --check-data-csum -p /dev/sde6

Checking filesystem on /dev/sde6
UUID: 62db560b-a040-4c64-b613-6e7db033dc4d
checking extents [o]
checking free space cache [o]
checking fs roots [.]
checking csums
checking root refs
checking quota groups
Counts for qgroup id: 0/5 are different
our:        referenced 7239132803072 referenced compressed 7239132803072
disk:        referenced 7238982733824 referenced compressed 7238982733824
diff:        referenced 150069248 referenced compressed 150069248
our:        exclusive 7239132803072 exclusive compressed 7239132803072
disk:        exclusive 7238982733824 exclusive compressed 7238982733824
diff:        exclusive 150069248 exclusive compressed 150069248
found 7323422314496 bytes used err is 0
total csum bytes: 7020314688
total tree bytes: 11797741568
total fs tree bytes: 2904932352
total extent tree bytes: 656654336
btree space waste bytes: 1560529439
file data blocks allocated: 297363385454592
 referenced 6628544720896

I'm guessing that's not important, but I found nothing about this
so I don't really know what's about.

Can just confirm that everything seems OK ?

Do you think of an another test I should do before starting to use my
array again ?

Le 29/09/2016 à 14:55, Alexandre Poux a écrit :
> Hi,
>
> I finally did it : patched the kernel and removed the device.
> As expected he did not scream since there was nothing at all on the device.
> Now I'm checking that everything is fine:
> scrub (in read only)
> check (in read only)
> but I think that everything will be OK
> If not, I will rebuild the array from scratch (I did managed to save my
> data)
>
> Thank you both for your guidance.
> I think that a warning should be put in the wiki in order for other user
> to not do the same mistake I did :
> never ever use the single mode
>
> I will try to do it soon
>
> Again thank you
>
> Le 20/09/2016 à 23:15, Chris Murphy a écrit :
>> On Tue, Sep 20, 2016 at 2:18 PM, Alexandre Poux <pums974@gmail.com> wrote:
>>> Le 20/09/2016 à 21:46, Chris Murphy a écrit :
>>>> On Tue, Sep 20, 2016 at 1:31 PM, Alexandre Poux <pums974@gmail.com> wrote:
>>>>> Le 20/09/2016 à 21:11, Chris Murphy a écrit :
>>>>>> And no backup? Umm, I'd resolve that sooner than anything else.
>>>>> Yeah you are absolutely right, this was a temporary solution which came
>>>>> to be not that temporary.
>>>>> And I regret it already...
>>>> Well on the bright side, if this were LVM or mdadm linear/concat
>>>> array, the whole thing would be toast because any other file system
>>>> would have lost too much fs metadata on the missing device.
>>>>
>>>>>>  It
>>>>>> should be true that it'll tolerate a read only mount indefinitely, but
>>>>>> read write? Not sure. This sort of edge case isn't well tested at all
>>>>>> seeing as it required changing the kernel to reduce safe guards. So
>>>>>> all bets are off the whole thing could become unmountable, not even
>>>>>> read only, and then it's a scraping job.
>>>>> I'm not that crazy, I tried the patch inside a virtual machine on
>>>>> virtual drives...
>>>>> And since it's only virtual, it may not work on the real partition...
>>>> Are you sure the virtual setup lacked a CHUNK_ITEM on the missing
>>>> device? That might be what pinned it in that case.
>>> In fact in my virtual setup there was more chunk missing (1 metadata 1
>>> System and 1 Data).
>>> I will try to do a setup closer to my real one.
>> Probably the reason why that missing device has no used chunks is
>> because it's so small. Btrfs allocates block groups to devices with
>> the most unallocated space first. Only once the unallocated space is
>> even (approximately) on all devices would it allocate a block group to
>> the small device.
>>
>>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-09-30 23:46 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-15  7:44 multi-device btrfs with single data mode and disk failure Alexandre Poux
2016-09-15 15:38 ` Chris Murphy
2016-09-15 16:30   ` Alexandre Poux
2016-09-15 16:54     ` Chris Murphy
     [not found]       ` <760be1b7-79b2-a25d-7c60-04ceac1b6e40@gmail.com>
2016-09-15 21:54         ` Chris Murphy
2016-09-19 22:05           ` Alexandre Poux
2016-09-20 17:03             ` Alexandre Poux
2016-09-20 17:54               ` Chris Murphy
2016-09-20 18:19                 ` Alexandre Poux
2016-09-20 18:38                   ` Chris Murphy
2016-09-20 18:53                     ` Alexandre Poux
2016-09-20 19:11                       ` Chris Murphy
     [not found]                         ` <4e7ec5eb-7fb6-2d19-f29d-82461e2d0bd2@gmail.com>
2016-09-20 19:46                           ` Chris Murphy
2016-09-20 20:18                             ` Alexandre Poux
2016-09-20 21:05                               ` Alexandre Poux
2016-09-20 21:15                               ` Chris Murphy
2016-09-29 12:55                                 ` Alexandre Poux
2016-09-30 23:46                                   ` Alexandre Poux
2016-09-20 19:43                       ` Austin S. Hemmelgarn
2016-09-20 19:54                         ` Alexandre Poux
2016-09-20 20:02                           ` Chris Murphy
2016-09-20 19:55                         ` Chris Murphy
2016-09-21 11:07                           ` Austin S. Hemmelgarn
2016-09-20 20:59                       ` Graham Cobb
2016-09-20 18:56                 ` Austin S. Hemmelgarn
2016-09-20 19:06                   ` Alexandre Poux

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.