All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID Assembly with Missing Empty Drive
@ 2016-03-22 20:19 John Marrett
  2016-03-22 21:18 ` Henk Slager
  0 siblings, 1 reply; 16+ messages in thread
From: John Marrett @ 2016-03-22 20:19 UTC (permalink / raw)
  To: linux-btrfs

I recently had a drive failure in a file server running btrfs. The
failed drive was completely non-functional. I added a new drive to the
filesystem successfully, when I attempted to remove the failed drive I
encountered an error. I discovered that I actually experienced a dual
drive failure, the second drive only exhibited as failed when btrfs
tried to write to the drives in the filesystem when I removed the
disk.

I shut down the array and imaged the failed drive using GNU ddrescue,
I was able to recover all but a few kb from the drive. Unfortunately,
when I imaged the drive I overwrote the drive that I had successfully
added to the filesystem.

This brings me to my current state, I now have two devices missing:

 - the completely failed drive
 - the empty drive that I overwrote with the second failed disks image

Consequently I can't start the filesystem. I've discussed the issue in
the past with Ke and other people on the #btrfs channel, the
concensus; as I understood it, is that with the right patch it should
be possible to mount either the array with the empty drive absent or
to create a new btrfs fileystem on an empty drive and then manipulate
its UUIDs so that it believes it's the missing UUID from the existing
btrfs filesystem.

Here's the info showing the current state of the filesystem:

ubuntu@ubuntu:~$ sudo btrfs filesystem show
warning, device 6 is missing
warning devid 6 not found already
warning devid 7 not found already
Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
    Total devices 7 FS bytes used 5.47TiB
    devid    1 size 1.81TiB used 1.71TiB path /dev/sda3
    devid    2 size 1.81TiB used 1.71TiB path /dev/sdb3
    devid    3 size 1.82TiB used 1.72TiB path /dev/sdc1
    devid    4 size 1.82TiB used 1.72TiB path /dev/sdd1
    devid    5 size 2.73TiB used 2.62TiB path /dev/sde1
    *** Some devices missing
btrfs-progs v4.0
ubuntu@ubuntu:~$ sudo mount -o degraded /dev/sda3 /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sda3,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
ubuntu@ubuntu:~$ dmesg
[...]
[  749.322385] BTRFS info (device sde1): allowing degraded mounts
[  749.322404] BTRFS info (device sde1): disk space caching is enabled
[  749.323571] BTRFS warning (device sde1): devid 6 uuid
f41bcb72-e88a-432f-9961-01307ec291a9 is missing
[  749.335543] BTRFS warning (device sde1): devid 7 uuid
17f8e02a-923e-4ac3-9db2-eb1b47c1a8db missing
[  749.407802] BTRFS: bdev (null) errs: wr 81791613, rd 57814378,
flush 0, corrupt 0, gen 0
[  749.407808] BTRFS: bdev /dev/sde1 errs: wr 0, rd 5002, flush 0,
corrupt 0, gen 0
[  774.759717] BTRFS: too many missing devices, writeable mount is not allowed
[  774.804053] BTRFS: open_ctree failed

Thank you in advance for your help,

-JohnF

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-22 20:19 RAID Assembly with Missing Empty Drive John Marrett
@ 2016-03-22 21:18 ` Henk Slager
  2016-03-22 21:23   ` John Marrett
  2016-03-22 22:08   ` John Marrett
  0 siblings, 2 replies; 16+ messages in thread
From: Henk Slager @ 2016-03-22 21:18 UTC (permalink / raw)
  To: John Marrett; +Cc: linux-btrfs

On Tue, Mar 22, 2016 at 9:19 PM, John Marrett <johnf@zioncluster.ca> wrote:
> I recently had a drive failure in a file server running btrfs. The
> failed drive was completely non-functional. I added a new drive to the

I asume you did     btrfs device add  ?
Or did you do this with    btrfs replace  ?

> filesystem successfully, when I attempted to remove the failed drive I
> encountered an error. I discovered that I actually experienced a dual
> drive failure, the second drive only exhibited as failed when btrfs
> tried to write to the drives in the filesystem when I removed the
> disk.
>
> I shut down the array and imaged the failed drive using GNU ddrescue,
> I was able to recover all but a few kb from the drive. Unfortunately,
> when I imaged the drive I overwrote the drive that I had successfully
> added to the filesystem.
>
> This brings me to my current state, I now have two devices missing:
>
>  - the completely failed drive
>  - the empty drive that I overwrote with the second failed disks image
>
> Consequently I can't start the filesystem. I've discussed the issue in
> the past with Ke and other people on the #btrfs channel, the
> concensus; as I understood it, is that with the right patch it should
> be possible to mount either the array with the empty drive absent or
> to create a new btrfs fileystem on an empty drive and then manipulate
> its UUIDs so that it believes it's the missing UUID from the existing
> btrfs filesystem.
>
> Here's the info showing the current state of the filesystem:
>
> ubuntu@ubuntu:~$ sudo btrfs filesystem show
> warning, device 6 is missing
> warning devid 6 not found already
> warning devid 7 not found already
> Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
>     Total devices 7 FS bytes used 5.47TiB
>     devid    1 size 1.81TiB used 1.71TiB path /dev/sda3
>     devid    2 size 1.81TiB used 1.71TiB path /dev/sdb3
>     devid    3 size 1.82TiB used 1.72TiB path /dev/sdc1
>     devid    4 size 1.82TiB used 1.72TiB path /dev/sdd1
>     devid    5 size 2.73TiB used 2.62TiB path /dev/sde1
>     *** Some devices missing
> btrfs-progs v4.0

The used kernel version might also give people some hints.

Also, you have not stated what raid type the fs is; likely not raid6,
but rather raid 1 or 10 or 5
btrfs filesystem usage  will report and show this.

If it is raid6, you could still fix the issue in theory. AFAIK there
are no patches to fix a dual error in case it is other raid type or
single. The only option is then to use   btrfs rescue   on the
umounted array and hope to copy as much as possible off the damaged fs
to other storage.

> ubuntu@ubuntu:~$ sudo mount -o degraded /dev/sda3 /mnt
> mount: wrong fs type, bad option, bad superblock on /dev/sda3,
>        missing codepage or helper program, or other error
>
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.
> ubuntu@ubuntu:~$ dmesg
> [...]
> [  749.322385] BTRFS info (device sde1): allowing degraded mounts
> [  749.322404] BTRFS info (device sde1): disk space caching is enabled
> [  749.323571] BTRFS warning (device sde1): devid 6 uuid
> f41bcb72-e88a-432f-9961-01307ec291a9 is missing
> [  749.335543] BTRFS warning (device sde1): devid 7 uuid
> 17f8e02a-923e-4ac3-9db2-eb1b47c1a8db missing
> [  749.407802] BTRFS: bdev (null) errs: wr 81791613, rd 57814378,
> flush 0, corrupt 0, gen 0
> [  749.407808] BTRFS: bdev /dev/sde1 errs: wr 0, rd 5002, flush 0,
> corrupt 0, gen 0
> [  774.759717] BTRFS: too many missing devices, writeable mount is not allowed
> [  774.804053] BTRFS: open_ctree failed
>
> Thank you in advance for your help,
>
> -JohnF
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-22 21:18 ` Henk Slager
@ 2016-03-22 21:23   ` John Marrett
  2016-03-22 22:08   ` John Marrett
  1 sibling, 0 replies; 16+ messages in thread
From: John Marrett @ 2016-03-22 21:23 UTC (permalink / raw)
  To: Henk Slager; +Cc: linux-btrfs

After further discussion in #btrfs:

I left out the raid level, it's raid1:

ubuntu@ubuntu:~$ sudo btrfs filesystem df /mnt
Data, RAID1: total=6.04TiB, used=5.46TiB
System, RAID1: total=32.00MiB, used=880.00KiB
Metadata, RAID1: total=14.00GiB, used=11.59GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

It is possible to mount the filesystem with -o recover,ro

It may be possible to comment out this check:

https://github.com/torvalds/linux/blob/master/fs/btrfs/super.c#L1770

And then to mount read/write, remove the failed drive, add a new
drive. If there are no more interesting suggestions forthcoming I will
try it, though to test I'll want to overlay the underlying devices and
then export them using iSCSI, AoE or NBD in order to avoid further
damage to my filesystem.

Unfortunately I don't have nearly enough disk space available to make
a complete copy of the data and rebuild the filesystem.

-JohnF

On Tue, Mar 22, 2016 at 5:18 PM, Henk Slager <eye1tm@gmail.com> wrote:
> On Tue, Mar 22, 2016 at 9:19 PM, John Marrett <johnf@zioncluster.ca> wrote:
>> I recently had a drive failure in a file server running btrfs. The
>> failed drive was completely non-functional. I added a new drive to the
>
> I asume you did     btrfs device add  ?
> Or did you do this with    btrfs replace  ?
>
>> filesystem successfully, when I attempted to remove the failed drive I
>> encountered an error. I discovered that I actually experienced a dual
>> drive failure, the second drive only exhibited as failed when btrfs
>> tried to write to the drives in the filesystem when I removed the
>> disk.
>>
>> I shut down the array and imaged the failed drive using GNU ddrescue,
>> I was able to recover all but a few kb from the drive. Unfortunately,
>> when I imaged the drive I overwrote the drive that I had successfully
>> added to the filesystem.
>>
>> This brings me to my current state, I now have two devices missing:
>>
>>  - the completely failed drive
>>  - the empty drive that I overwrote with the second failed disks image
>>
>> Consequently I can't start the filesystem. I've discussed the issue in
>> the past with Ke and other people on the #btrfs channel, the
>> concensus; as I understood it, is that with the right patch it should
>> be possible to mount either the array with the empty drive absent or
>> to create a new btrfs fileystem on an empty drive and then manipulate
>> its UUIDs so that it believes it's the missing UUID from the existing
>> btrfs filesystem.
>>
>> Here's the info showing the current state of the filesystem:
>>
>> ubuntu@ubuntu:~$ sudo btrfs filesystem show
>> warning, device 6 is missing
>> warning devid 6 not found already
>> warning devid 7 not found already
>> Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
>>     Total devices 7 FS bytes used 5.47TiB
>>     devid    1 size 1.81TiB used 1.71TiB path /dev/sda3
>>     devid    2 size 1.81TiB used 1.71TiB path /dev/sdb3
>>     devid    3 size 1.82TiB used 1.72TiB path /dev/sdc1
>>     devid    4 size 1.82TiB used 1.72TiB path /dev/sdd1
>>     devid    5 size 2.73TiB used 2.62TiB path /dev/sde1
>>     *** Some devices missing
>> btrfs-progs v4.0
>
> The used kernel version might also give people some hints.
>
> Also, you have not stated what raid type the fs is; likely not raid6,
> but rather raid 1 or 10 or 5
> btrfs filesystem usage  will report and show this.
>
> If it is raid6, you could still fix the issue in theory. AFAIK there
> are no patches to fix a dual error in case it is other raid type or
> single. The only option is then to use   btrfs rescue   on the
> umounted array and hope to copy as much as possible off the damaged fs
> to other storage.
>
>> ubuntu@ubuntu:~$ sudo mount -o degraded /dev/sda3 /mnt
>> mount: wrong fs type, bad option, bad superblock on /dev/sda3,
>>        missing codepage or helper program, or other error
>>
>>        In some cases useful info is found in syslog - try
>>        dmesg | tail or so.
>> ubuntu@ubuntu:~$ dmesg
>> [...]
>> [  749.322385] BTRFS info (device sde1): allowing degraded mounts
>> [  749.322404] BTRFS info (device sde1): disk space caching is enabled
>> [  749.323571] BTRFS warning (device sde1): devid 6 uuid
>> f41bcb72-e88a-432f-9961-01307ec291a9 is missing
>> [  749.335543] BTRFS warning (device sde1): devid 7 uuid
>> 17f8e02a-923e-4ac3-9db2-eb1b47c1a8db missing
>> [  749.407802] BTRFS: bdev (null) errs: wr 81791613, rd 57814378,
>> flush 0, corrupt 0, gen 0
>> [  749.407808] BTRFS: bdev /dev/sde1 errs: wr 0, rd 5002, flush 0,
>> corrupt 0, gen 0
>> [  774.759717] BTRFS: too many missing devices, writeable mount is not allowed
>> [  774.804053] BTRFS: open_ctree failed
>>
>> Thank you in advance for your help,
>>
>> -JohnF
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-22 21:18 ` Henk Slager
  2016-03-22 21:23   ` John Marrett
@ 2016-03-22 22:08   ` John Marrett
  2016-03-25 22:31     ` John Marrett
  1 sibling, 1 reply; 16+ messages in thread
From: John Marrett @ 2016-03-22 22:08 UTC (permalink / raw)
  To: Henk Slager; +Cc: linux-btrfs

Henk,

> I asume you did     btrfs device add  ?
> Or did you do this with    btrfs replace  ?

Just realised I missed this question, sorry, I performed an add
followed by a (failed) delete.

-JohnF

>
>> filesystem successfully, when I attempted to remove the failed drive I
>> encountered an error. I discovered that I actually experienced a dual
>> drive failure, the second drive only exhibited as failed when btrfs
>> tried to write to the drives in the filesystem when I removed the
>> disk.
>>
>> I shut down the array and imaged the failed drive using GNU ddrescue,
>> I was able to recover all but a few kb from the drive. Unfortunately,
>> when I imaged the drive I overwrote the drive that I had successfully
>> added to the filesystem.
>>
>> This brings me to my current state, I now have two devices missing:
>>
>>  - the completely failed drive
>>  - the empty drive that I overwrote with the second failed disks image
>>
>> Consequently I can't start the filesystem. I've discussed the issue in
>> the past with Ke and other people on the #btrfs channel, the
>> concensus; as I understood it, is that with the right patch it should
>> be possible to mount either the array with the empty drive absent or
>> to create a new btrfs fileystem on an empty drive and then manipulate
>> its UUIDs so that it believes it's the missing UUID from the existing
>> btrfs filesystem.
>>
>> Here's the info showing the current state of the filesystem:
>>
>> ubuntu@ubuntu:~$ sudo btrfs filesystem show
>> warning, device 6 is missing
>> warning devid 6 not found already
>> warning devid 7 not found already
>> Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
>>     Total devices 7 FS bytes used 5.47TiB
>>     devid    1 size 1.81TiB used 1.71TiB path /dev/sda3
>>     devid    2 size 1.81TiB used 1.71TiB path /dev/sdb3
>>     devid    3 size 1.82TiB used 1.72TiB path /dev/sdc1
>>     devid    4 size 1.82TiB used 1.72TiB path /dev/sdd1
>>     devid    5 size 2.73TiB used 2.62TiB path /dev/sde1
>>     *** Some devices missing
>> btrfs-progs v4.0
>
> The used kernel version might also give people some hints.
>
> Also, you have not stated what raid type the fs is; likely not raid6,
> but rather raid 1 or 10 or 5
> btrfs filesystem usage  will report and show this.
>
> If it is raid6, you could still fix the issue in theory. AFAIK there
> are no patches to fix a dual error in case it is other raid type or
> single. The only option is then to use   btrfs rescue   on the
> umounted array and hope to copy as much as possible off the damaged fs
> to other storage.
>
>> ubuntu@ubuntu:~$ sudo mount -o degraded /dev/sda3 /mnt
>> mount: wrong fs type, bad option, bad superblock on /dev/sda3,
>>        missing codepage or helper program, or other error
>>
>>        In some cases useful info is found in syslog - try
>>        dmesg | tail or so.
>> ubuntu@ubuntu:~$ dmesg
>> [...]
>> [  749.322385] BTRFS info (device sde1): allowing degraded mounts
>> [  749.322404] BTRFS info (device sde1): disk space caching is enabled
>> [  749.323571] BTRFS warning (device sde1): devid 6 uuid
>> f41bcb72-e88a-432f-9961-01307ec291a9 is missing
>> [  749.335543] BTRFS warning (device sde1): devid 7 uuid
>> 17f8e02a-923e-4ac3-9db2-eb1b47c1a8db missing
>> [  749.407802] BTRFS: bdev (null) errs: wr 81791613, rd 57814378,
>> flush 0, corrupt 0, gen 0
>> [  749.407808] BTRFS: bdev /dev/sde1 errs: wr 0, rd 5002, flush 0,
>> corrupt 0, gen 0
>> [  774.759717] BTRFS: too many missing devices, writeable mount is not allowed
>> [  774.804053] BTRFS: open_ctree failed
>>
>> Thank you in advance for your help,
>>
>> -JohnF
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-22 22:08   ` John Marrett
@ 2016-03-25 22:31     ` John Marrett
  2016-03-26  0:49       ` Chris Murphy
  0 siblings, 1 reply; 16+ messages in thread
From: John Marrett @ 2016-03-25 22:31 UTC (permalink / raw)
  To: linux-btrfs

Continuing with my recovery efforts I've built overlay mounts of each
of the block devices supporting my btrfs filesystem as well as the new
disk I'm trying to introduce. I have patched the kernel to disable the
check for multiple missing devices. I then exported the overlayed
devices using iSCSI to a second system to attempt the recovery.

I am able to mount the device rw, then I can remove missing devices
which removes the missing empty disk. I can add in a new device to the
filesystem and then attempt to remove the second missing disk (which
has 2.7 TB of content on it).

Unfortunately this removal fails as follows:

ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt
ERROR: error removing the device 'missing' - Input/output error

The kernel shows:

[ 2772.000680] BTRFS warning (device sdd): csum failed ino 257 off
695730176 csum 2566472073 expected csum 2706136415
[ 2772.000724] BTRFS warning (device sdd): csum failed ino 257 off
695734272 csum 2566472073 expected csum 2558511802
[ 2772.000736] BTRFS warning (device sdd): csum failed ino 257 off
695746560 csum 2566472073 expected csum 3360772439
[ 2772.000742] BTRFS warning (device sdd): csum failed ino 257 off
695750656 csum 2566472073 expected csum 1205516886
[...]

Can anyone offer any advice as to how I should proceed from here?

One safe option is recreating the array. Now that I have discovered I
can mount the filesystem in degraded,ro mode I could purchase another
new disk, this will give me enough free disk space to copy all the
data off this array and onto a new non-redundant array. I can then add
all the drives in to the new array and convert it back to RAID1.

Here's a full breakdown of the commands that I ran in the process I
describe above; my patch only allows a remount with a missing device,
it's not very significant:

ubuntu@btrfs-recovery:~$ sudo mount -o degraded,ro /dev/sda /mnt
ubuntu@btrfs-recovery:~$ sudo mount -o remount,rw /mnt

Here we see the two missing devices:

ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show
Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
    Total devices 7 FS bytes used 5.47TiB
    devid    1 size 1.81TiB used 1.71TiB path /dev/sde
    devid    2 size 1.81TiB used 1.71TiB path /dev/sda
    devid    3 size 1.82TiB used 1.72TiB path /dev/sdc
    devid    4 size 1.82TiB used 1.72TiB path /dev/sdd
    devid    5 size 2.73TiB used 2.62TiB path /dev/sdf
    devid    6 size 2.73TiB used 2.62TiB path
    devid    7 size 2.73TiB used 0.00 path

I remove the first missing device:

ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt

The unused missing device is removed:

ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show
Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
    Total devices 6 FS bytes used 5.47TiB
    devid    1 size 1.81TiB used 1.71TiB path /dev/sde
    devid    2 size 1.81TiB used 1.71TiB path /dev/sda
    devid    3 size 1.82TiB used 1.72TiB path /dev/sdc
    devid    4 size 1.82TiB used 1.72TiB path /dev/sdd
    devid    5 size 2.73TiB used 2.62TiB path /dev/sdf
    devid    6 size 2.73TiB used 2.62TiB path

I add a new device:

ubuntu@btrfs-recovery:~$ sudo btrfs device add /dev/sdb /mnt
ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show
Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
    Total devices 7 FS bytes used 5.47TiB
    devid    1 size 1.81TiB used 1.71TiB path /dev/sde
    devid    2 size 1.81TiB used 1.71TiB path /dev/sda
    devid    3 size 1.82TiB used 1.72TiB path /dev/sdc
    devid    4 size 1.82TiB used 1.72TiB path /dev/sdd
    devid    5 size 2.73TiB used 2.62TiB path /dev/sdf
    devid    6 size 2.73TiB used 2.62TiB path
    devid    7 size 2.73TiB used 0.00 path /dev/sdb

Here's some more details on the techniques necessary to get to this
point, in the hopes that others can benefit from them. I will also
update the apparently broken parallels scripts on the mdadm wiki.

To create overlay mounts use the following script; it will create
overlays for each device in the device list, using a sparse overlay
file located in /home/ubuntu/$device-overlay, each overlay will be
performed using a 512 MB file (the size passed to truncate).

for device in sda3 sdb3 sdc1 sdd1 sde1 sdf1; do
  dev=/dev/$device
  ovl=/home/ubuntu/$device-overlay
  truncate -s512M $ovl
  newdevname=$device
  size=$(blockdev --getsize "$dev")
  loop=$(losetup -f --show -- "$ovl")
  echo Setting up loop for $dev using overlay $ovl on loop $loop for
target $newdevname
  printf '%s\n' "0 $size snapshot $dev $loop P 8" | dmsetup create "$newdevname"
done

I used iscsitarget to export the block devices from the server,
configuration files are as follows (on ubuntu):

Install

sudo apt install iscsitarget

Enable

/etc/default/iscsitarget
ISCSITARGET_ENABLE=true

Exports

/etc/iet/ietd.conf

Target iqn.2001-04.com.example:storage.lun1
        IncomingUser
        OutgoingUser
        Lun 0 Path=/dev/mapper/sda3,Type=fileio
        Alias LUN1

Target iqn.2001-04.com.example:storage.lun2
        IncomingUser
        OutgoingUser
        Lun 0 Path=/dev/mapper/sdb3,Type=fileio
        Alias LUN2

Target iqn.2001-04.com.example:storage.lun3
        IncomingUser
        OutgoingUser
        Lun 0 Path=/dev/mapper/sdc1,Type=fileio
        Alias LUN3

Target iqn.2001-04.com.example:storage.lun4
        IncomingUser
        OutgoingUser
        Lun 0 Path=/dev/mapper/sdd1,Type=fileio
        Alias LUN4

Target iqn.2001-04.com.example:storage.lun5
        IncomingUser
        OutgoingUser
        Lun 0 Path=/dev/mapper/sde1,Type=fileio
        Alias LUN5

Target iqn.2001-04.com.example:storage.lun6
        IncomingUser
        OutgoingUser
        Lun 0 Path=/dev/mapper/sdf1,Type=fileio
        Alias LUN6

Start the service:

/etc/init.d/iscsitarget start

On the client I accessed these exports using open-iscsi as follows:

Install

sudo apt install open-iscsi

Discover LUNs (on host carbon):

sudo iscsiadm -m discovery -t st -p carbon

Add nodes:

sudo iscsiadm -m node

The exported disks will appear as new block devices as /dev/sd*

ubuntu@btrfs-recovery:~$ ls -l /dev/sd*
brw-rw---- 1 root disk 8,  0 Mar 25 21:18 /dev/sda
brw-rw---- 1 root disk 8, 16 Mar 25 21:21 /dev/sdb
brw-rw---- 1 root disk 8, 32 Mar 25 21:18 /dev/sdc
brw-rw---- 1 root disk 8, 48 Mar 25 21:18 /dev/sdd
brw-rw---- 1 root disk 8, 64 Mar 25 21:18 /dev/sde
brw-rw---- 1 root disk 8, 80 Mar 25 21:18 /dev/sdf

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-25 22:31     ` John Marrett
@ 2016-03-26  0:49       ` Chris Murphy
  2016-03-26  1:21         ` John Marrett
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Murphy @ 2016-03-26  0:49 UTC (permalink / raw)
  To: John Marrett; +Cc: linux-btrfs

On Fri, Mar 25, 2016 at 4:31 PM, John Marrett <johnf@zioncluster.ca> wrote:
> Continuing with my recovery efforts I've built overlay mounts of each
> of the block devices supporting my btrfs filesystem as well as the new
> disk I'm trying to introduce. I have patched the kernel to disable the
> check for multiple missing devices. I then exported the overlayed
> devices using iSCSI to a second system to attempt the recovery.
>
> I am able to mount the device rw, then I can remove missing devices
> which removes the missing empty disk. I can add in a new device to the
> filesystem and then attempt to remove the second missing disk (which
> has 2.7 TB of content on it).
>
> Unfortunately this removal fails as follows:
>
> ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt
> ERROR: error removing the device 'missing' - Input/output error


Quite honestly I don't understand how Btrfs raid1 volume with two
missing devices even permits you to mount it degraded,rw in the first
place. That's rather mystifying considering the other thread where
there's a 4 disk raid10 with one missing device, and rw,degraded mount
is allowed only once, after that it disallows further attempts to
rw,degraded mount it.

Anyway, maybe it's possible there's no dual missing metadata chunks,
although I find it hard to believe. But OK, maybe it works for a while
and you can copy some stuff off the drives where there's at least one
data copy. If there's dual  missing data copies but there's still at
least 1 metadata copy, then file system will just spit out noisy error
messages. But if there ends up being dual missing metadata, I expect a
crash, or the file system goes read only, or maybe even unmounts. I'm
not sure. But once there's 0 copies of metadata I don't see how the
file system can correct for that.

Because there are two devices missing, I doubt this matters, but I
think you're better off using 'btrfs replace' for this rather than
'device add' followed by 'device remove'. The two catches with
replace: the replacement device must be as big or bigger than the one
being replaced; you have to do a resize on the replacement device,
using 'fi resize devid:max' to use all the space if the new one is
bigger than the old device. But I suspect either the first or second
replacement will fail also, it's too many missing devices.

So what can happen, if there's 0 copies of metadata, is that you might
not get everything off the drives before you hit the 0 copies problem
and the ensuing face plant. In that case you might have to depend on
btrfs restore. It could be really tedious to find out what can be
scraped. But I still think you're better off than any other file
system in this case, because they wouldn't even mount if there were
two mirrors lost.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-26  0:49       ` Chris Murphy
@ 2016-03-26  1:21         ` John Marrett
  2016-03-26  1:41           ` Chris Murphy
  0 siblings, 1 reply; 16+ messages in thread
From: John Marrett @ 2016-03-26  1:21 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-btrfs

Chris,

> Quite honestly I don't understand how Btrfs raid1 volume with two
> missing devices even permits you to mount it degraded,rw in the first
> place.

I think you missed my previous post, it's simple, I patched the kernel
to bypass the check for missing devices with rw mounts, I did this
because one of my missing devices has no data on it, it's actually
confirmed by my mounting as you can see here:

ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show
Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
    Total devices 7 FS bytes used 5.47TiB
    devid    1 size 1.81TiB used 1.71TiB path /dev/sde
    devid    2 size 1.81TiB used 1.71TiB path /dev/sda
    devid    3 size 1.82TiB used 1.72TiB path /dev/sdc
    devid    4 size 1.82TiB used 1.72TiB path /dev/sdd
    devid    5 size 2.73TiB used 2.62TiB path /dev/sdf
    devid    6 size 2.73TiB used 2.62TiB path
    devid    7 size 2.73TiB used 0.00 path

> Anyway, maybe it's possible there's no dual missing metadata chunks,
> although I find it hard to believe.

Considering the above do you still think that I may have missing metadata?

> Because there are two devices missing, I doubt this matters, but I
> think you're better off using 'btrfs replace' for this rather than
> 'device add' followed by 'device remove'. The two catches with

I'll try btrfs replace for the second device (with data) after
removing the first.

Do you think my chances are better moving data off the array in read only mode?

-JohnF

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-26  1:21         ` John Marrett
@ 2016-03-26  1:41           ` Chris Murphy
  2016-03-26 12:15             ` John Marrett
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Murphy @ 2016-03-26  1:41 UTC (permalink / raw)
  To: John Marrett; +Cc: Chris Murphy, linux-btrfs

[let me try keeping the list cc'd]

On Fri, Mar 25, 2016 at 7:21 PM, John Marrett <johnf@zioncluster.ca> wrote:
> Chris,
>
>> Quite honestly I don't understand how Btrfs raid1 volume with two
>> missing devices even permits you to mount it degraded,rw in the first
>> place.
>
> I think you missed my previous post, it's simple, I patched the kernel
> to bypass the check for missing devices with rw mounts, I did this
> because one of my missing devices has no data on it, it's actually
> confirmed by my mounting as you can see here:
>

Yeah too many emails today, and I'm skimming too much.



>
> ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show
> Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
>     Total devices 7 FS bytes used 5.47TiB
>     devid    1 size 1.81TiB used 1.71TiB path /dev/sde
>     devid    2 size 1.81TiB used 1.71TiB path /dev/sda
>     devid    3 size 1.82TiB used 1.72TiB path /dev/sdc
>     devid    4 size 1.82TiB used 1.72TiB path /dev/sdd
>     devid    5 size 2.73TiB used 2.62TiB path /dev/sdf
>     devid    6 size 2.73TiB used 2.62TiB path
>     devid    7 size 2.73TiB used 0.00 path
>
>> Anyway, maybe it's possible there's no dual missing metadata chunks,
>> although I find it hard to believe.
>
> Considering the above do you still think that I may have missing metadata?

Post 'btrfs fi usage' for the fileystem. That may give some insight
what's expected to be on all the missing drives.

>
>> Because there are two devices missing, I doubt this matters, but I
>> think you're better off using 'btrfs replace' for this rather than
>> 'device add' followed by 'device remove'. The two catches with
>
> I'll try btrfs replace for the second device (with data) after
> removing the first.
>
> Do you think my chances are better moving data off the array in read only mode?

My expectation is that whether copying everything or using replace, if
either process arrives at no metadata copies found, it's going to stop
whatever it's doing. Question is only how that manifests.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-26  1:41           ` Chris Murphy
@ 2016-03-26 12:15             ` John Marrett
  2016-03-26 20:42               ` Chris Murphy
  0 siblings, 1 reply; 16+ messages in thread
From: John Marrett @ 2016-03-26 12:15 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-btrfs

Chris,

> Post 'btrfs fi usage' for the fileystem. That may give some insight
> what's expected to be on all the missing drives.

Here's the information, I believe that the missing we see in most
entries is the failed and absent drive, only the unallocated shows two
missing entries, the 2.73 TB is the missing but empty device. I don't
know if there's a way to prove it however.

ubuntu@btrfs-recovery:~$ sudo btrfs fi usage /mnt
Overall:
    Device size:          15.45TiB
    Device allocated:          12.12TiB
    Device unallocated:           3.33TiB
    Device missing:           5.46TiB
    Used:              10.93TiB
    Free (estimated):           2.25TiB    (min: 2.25TiB)
    Data ratio:                  2.00
    Metadata ratio:              2.00
    Global reserve:         512.00MiB    (used: 0.00B)

Data,RAID1: Size:6.04TiB, Used:5.46TiB
   /dev/sda       2.61TiB
   /dev/sdb       1.71TiB
   /dev/sdc       1.72TiB
   /dev/sdd       1.72TiB
   /dev/sdf       1.71TiB
   missing       2.61TiB

Metadata,RAID1: Size:14.00GiB, Used:11.59GiB
   /dev/sda       8.00GiB
   /dev/sdb       2.00GiB
   /dev/sdc       3.00GiB
   /dev/sdd       4.00GiB
   /dev/sdf       3.00GiB
   missing       8.00GiB

System,RAID1: Size:32.00MiB, Used:880.00KiB
   /dev/sda      32.00MiB
   missing      32.00MiB

Unallocated:
   /dev/sda     111.49GiB
   /dev/sdb      98.02GiB
   /dev/sdc      98.02GiB
   /dev/sdd      98.02GiB
   /dev/sdf      98.02GiB
   missing     111.49GiB
   missing       2.73TiB

I tried to remove missing, first remove missing only removes the
2.73TiB missing entry seen above. All the other missing entries
remain.

I can't "replace", it's not a valid command on my btrfs tools version;
I upgraded btrfs this morning in order to have the btrfs fi usage
command.

ubuntu@btrfs-recovery:~$ sudo btrfs version
btrfs-progs v4.0
ubuntu@btrfs-recovery:~$ dpkg -l | grep btrfs
ii  btrfs-tools                        4.0-2
 amd64        Checksumming Copy on Write Filesystem utilities

For those interested in my recovery techniques, here's how I rebuild
the overlay loop devices, be careful, these scripts make certain
assumptions that may not be accurate for your system:

On Client:

sudo umount /mnt
sudo /etc/init.d/open-iscsi stop

On Server:

/etc/init.d/iscsitarget stop
loop_devices=$(losetup -a | grep overlay | tr ":" " " | awk ' { printf
$1 " " } END { print "" } ')
for fn in /dev/mapper/sd??; do dmsetup remove $fn; done
for ln in $loop_devices; do losetup -d $ln; done
cd /home/ubuntu
rm sd*overlay

for device in sda3 sdb3 sdc1 sdd1 sde1 sdf1; do
  dev=/dev/$device
  ovl=/home/ubuntu/$device-overlay
  truncate -s512M $ovl
  newdevname=$device
  size=$(blockdev --getsize "$dev")
  loop=$(losetup -f --show -- "$ovl")
  echo Setting up loop for $dev using overlay $ovl on loop $loop for
target $newdevname
  printf '%s\n' "0 $size snapshot $dev $loop P 8" | dmsetup create "$newdevname"
done

Start the targets

/etc/init.d/iscsitarget start

On Client:

sudo /etc/init.d/open-iscsi start

-JohnF

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-26 12:15             ` John Marrett
@ 2016-03-26 20:42               ` Chris Murphy
  2016-03-26 21:01                 ` John Marrett
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Murphy @ 2016-03-26 20:42 UTC (permalink / raw)
  To: John Marrett; +Cc: Chris Murphy, linux-btrfs

On Sat, Mar 26, 2016 at 6:15 AM, John Marrett <johnf@zioncluster.ca> wrote:
> Chris,
>
>> Post 'btrfs fi usage' for the fileystem. That may give some insight
>> what's expected to be on all the missing drives.
>
> Here's the information, I believe that the missing we see in most
> entries is the failed and absent drive, only the unallocated shows two
> missing entries, the 2.73 TB is the missing but empty device. I don't
> know if there's a way to prove it however.
>
> ubuntu@btrfs-recovery:~$ sudo btrfs fi usage /mnt
> Overall:
>     Device size:          15.45TiB
>     Device allocated:          12.12TiB
>     Device unallocated:           3.33TiB
>     Device missing:           5.46TiB
>     Used:              10.93TiB
>     Free (estimated):           2.25TiB    (min: 2.25TiB)
>     Data ratio:                  2.00
>     Metadata ratio:              2.00
>     Global reserve:         512.00MiB    (used: 0.00B)
>
> Data,RAID1: Size:6.04TiB, Used:5.46TiB
>    /dev/sda       2.61TiB
>    /dev/sdb       1.71TiB
>    /dev/sdc       1.72TiB
>    /dev/sdd       1.72TiB
>    /dev/sdf       1.71TiB
>    missing       2.61TiB
>
> Metadata,RAID1: Size:14.00GiB, Used:11.59GiB
>    /dev/sda       8.00GiB
>    /dev/sdb       2.00GiB
>    /dev/sdc       3.00GiB
>    /dev/sdd       4.00GiB
>    /dev/sdf       3.00GiB
>    missing       8.00GiB
>
> System,RAID1: Size:32.00MiB, Used:880.00KiB
>    /dev/sda      32.00MiB
>    missing      32.00MiB
>
> Unallocated:
>    /dev/sda     111.49GiB
>    /dev/sdb      98.02GiB
>    /dev/sdc      98.02GiB
>    /dev/sdd      98.02GiB
>    /dev/sdf      98.02GiB
>    missing     111.49GiB
>    missing       2.73TiB
>
> I tried to remove missing, first remove missing only removes the
> 2.73TiB missing entry seen above. All the other missing entries
> remain.

Well off hand it seems like the missing 2.73TB has nothing on it at
all, and doesn't need to be counted as missing. The other missing is
counted, and should have all of its data replicated elsewhere. But
then you're running into csum errors. So something still isn't right,
we just don't understand what it is.


> I can't "replace", it's not a valid command on my btrfs tools version;
> I upgraded btrfs this morning in order to have the btrfs fi usage
> command.

Btrfs replace has been around for a while. 'man btrfs replace' the
command takes the form 'btrfs replace start' plus three required
pieces of information. You should be able to infer the missing devid
using 'btrfs show' looks like it's 6.



> ubuntu@btrfs-recovery:~$ sudo btrfs version
> btrfs-progs v4.0
> ubuntu@btrfs-recovery:~$ dpkg -l | grep btrfs
> ii  btrfs-tools                        4.0-2
>  amd64        Checksumming Copy on Write Filesystem utilities

I would use something newer, but btrfs replace is in 4.0. But I also
don't see in this thread what kernel version you're using.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-26 20:42               ` Chris Murphy
@ 2016-03-26 21:01                 ` John Marrett
  2016-03-26 21:15                   ` Chris Murphy
  0 siblings, 1 reply; 16+ messages in thread
From: John Marrett @ 2016-03-26 21:01 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-btrfs

> Well off hand it seems like the missing 2.73TB has nothing on it at
> all, and doesn't need to be counted as missing. The other missing is
> counted, and should have all of its data replicated elsewhere. But
> then you're running into csum errors. So something still isn't right,
> we just don't understand what it is.

I'm not sure what we can do to get a better understanding of these
errors, that said it may not be necessary if replace helps, more
below.

> Btrfs replace has been around for a while. 'man btrfs replace' the
> command takes the form 'btrfs replace start' plus three required
> pieces of information. You should be able to infer the missing devid
> using 'btrfs show' looks like it's 6.

I was looking under btrfs device, sorry about that. I do have the
command. I tried replace and it seemed more promising than the last
attempt, it wrote enough data to the new drive to overflow and break
my overlay. I'm trying it without the overlay on the destination
device, I'll report back later with the results.

I'm running ubuntu linux-image-4.2.0-34-generic with a patch to remove
this check:

https://github.com/torvalds/linux/blob/master/fs/btrfs/super.c#L1770

I can switch to whatever kernel though as desired. Would you prefer a
mainline ubuntu packaged kernel? Straight from kernel.org?

-JohnF

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-26 21:01                 ` John Marrett
@ 2016-03-26 21:15                   ` Chris Murphy
  2016-03-27 12:15                     ` John Marrett
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Murphy @ 2016-03-26 21:15 UTC (permalink / raw)
  To: John Marrett; +Cc: Chris Murphy, linux-btrfs

On Sat, Mar 26, 2016 at 3:01 PM, John Marrett <johnf@zioncluster.ca> wrote:
>> Well off hand it seems like the missing 2.73TB has nothing on it at
>> all, and doesn't need to be counted as missing. The other missing is
>> counted, and should have all of its data replicated elsewhere. But
>> then you're running into csum errors. So something still isn't right,
>> we just don't understand what it is.
>
> I'm not sure what we can do to get a better understanding of these
> errors, that said it may not be necessary if replace helps, more
> below.
>
>> Btrfs replace has been around for a while. 'man btrfs replace' the
>> command takes the form 'btrfs replace start' plus three required
>> pieces of information. You should be able to infer the missing devid
>> using 'btrfs show' looks like it's 6.
>
> I was looking under btrfs device, sorry about that. I do have the
> command. I tried replace and it seemed more promising than the last
> attempt, it wrote enough data to the new drive to overflow and break
> my overlay. I'm trying it without the overlay on the destination
> device, I'll report back later with the results.
>
> I'm running ubuntu linux-image-4.2.0-34-generic with a patch to remove
> this check:
>
> https://github.com/torvalds/linux/blob/master/fs/btrfs/super.c#L1770
>
> I can switch to whatever kernel though as desired. Would you prefer a
> mainline ubuntu packaged kernel? Straight from kernel.org?

Things are a lot more deterministic for developers and testers if
you're using something current. It might not matter in this case that
you're using 4.2 but all you have to do is look at the git pulls in
the list archives to see many hundreds, often over 1000, btrfs changes
per kernel cycle. So, lots and lots of fixes have happened since 4.2.
And any bugs found in 4.2 don't really matter, because you'd have to
try to reproduce in 4.4.6 or 4.5, and then the fix would go into 4.6
before it'd get backported, and then 4.2 won't be getting backports
done by upstream. That's why list folks always suggest using something
so recent. Again, in this case it might not matter, I don't read or
understand every single commit.

If you do want to use a newer one, I'd build against kernel.org, just
because the developers only use that base. And use 4.4.6 or 4.5.

It's reasonable to keep the overlay on the existing devices, but
remove the overlay for the replacement so that you're directly writing
to it. If that blows up with 4.2 you can still start over with a newer
kernel. *shrug*


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-26 21:15                   ` Chris Murphy
@ 2016-03-27 12:15                     ` John Marrett
  2016-03-27 14:59                       ` John Marrett
  0 siblings, 1 reply; 16+ messages in thread
From: John Marrett @ 2016-03-27 12:15 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-btrfs

>> I was looking under btrfs device, sorry about that. I do have the
>> command. I tried replace and it seemed more promising than the last
>> attempt, it wrote enough data to the new drive to overflow and break
>> my overlay. I'm trying it without the overlay on the destination
>> device, I'll report back later with the results.

It looks like replace worked!

I got the following final output:

ubuntu@btrfs-recovery:~$ sudo btrfs replace status /mnt
Started on 26.Mar 20:59:12, finished on 27.Mar 05:20:01, 0 write errs,
0 uncorr. read errs

The filesystem appears in good health, no more missing devices:

ubuntu@btrfs-recovery:~$ sudo btrfs filesystem show
Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
    Total devices 6 FS bytes used 5.47TiB
    devid    1 size 1.81TiB used 1.71TiB path /dev/sdb
    devid    2 size 1.81TiB used 1.71TiB path /dev/sde
    devid    3 size 1.82TiB used 1.72TiB path /dev/sdd
    devid    4 size 1.82TiB used 1.72TiB path /dev/sdc
    devid    5 size 2.73TiB used 2.62TiB path /dev/sda
    devid    6 size 2.73TiB used 2.62TiB path /dev/sdf

btrfs-progs v4.0

However the dmesg output shows some errors despite the 0 uncorr. read
errs reported above:

[112178.006315] BTRFS: checksum error at logical 8576298061824 on dev /dev/sda,
sector 4333289864, root 259, inode 10017264, offset 32444416, length 4096, links
 1 (path: mythtv/store/4663_20150809180500.mpg)
[112178.006327] btrfs_dev_stat_print_on_error: 5 callbacks suppressed
[112178.006330] BTRFS: bdev /dev/sda errs: wr 0, rd 5002, flush 0, corrupt 16, g
en 0

And the underlying file does appear to be damaged:

ubuntu@btrfs-recovery:/mnt/@home/mythtv$ dd
if=store/4663_20150809180500.mpg of=/dev/null
dd: error reading ‘store/4663_20150809180500.mpg’: Input/output error
63368+0 records in
63368+0 records out
32444416 bytes (32 MB) copied, 1.08476 s, 29.9 MB/s

Here's some dmesg output when accessing a damaged file:

[140789.642357] BTRFS warning (device sdc): csum failed ino 10017264
off 32854016 csum 2566472073 expected csum 1193787476
[140789.642503] BTRFS warning (device sdc): csum failed ino 10017264
off 32919552 csum 2566472073 expected csum 2825707817
[140789.645768] BTRFS warning (device sdc): csum failed ino 10017264
off 32509952 csum 2566472073 expected csum 834024150

I can also see that one device has had a few errors; this is the
device that was ddrescued and recorded some errors before being
ddrescued:

[/dev/sda].write_io_errs   0
[/dev/sda].read_io_errs    5002
[/dev/sda].flush_io_errs   0
[/dev/sda].corruption_errs 153
[/dev/sda].generation_errs 0

> If you do want to use a newer one, I'd build against kernel.org, just
> because the developers only use that base. And use 4.4.6 or 4.5.

At this point I could remove the overlays and recover the filesystem
permanently, however I'm also deeply indebted to the btrfs community
and want to give anything I can back. I've built (but not installed ;)
) a straight kernel.org 4.5 w/my missing device check patch applied.
Is there any interest or value in attempting to switch to this kernel,
add/delete a device and see if I experience the same errors as before
I tried replace? What information should I gather if I do this?

-JohnF

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-27 12:15                     ` John Marrett
@ 2016-03-27 14:59                       ` John Marrett
  2016-03-27 16:08                         ` Henk Slager
  0 siblings, 1 reply; 16+ messages in thread
From: John Marrett @ 2016-03-27 14:59 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-btrfs

>> If you do want to use a newer one, I'd build against kernel.org, just
>> because the developers only use that base. And use 4.4.6 or 4.5.
>
> At this point I could remove the overlays and recover the filesystem
> permanently, however I'm also deeply indebted to the btrfs community
> and want to give anything I can back. I've built (but not installed ;)
> ) a straight kernel.org 4.5 w/my missing device check patch applied.
> Is there any interest or value in attempting to switch to this kernel,
> add/delete a device and see if I experience the same errors as before
> I tried replace? What information should I gather if I do this?

I've built and installed a 4.5 straight from kernel.org with my patch.

I encountered the same errors in recovery when I use add/delete
instead of using replace, here's the sequence of commands:

ubuntu@btrfs-recovery:~$ sudo mount -o degraded,ro /dev/sda /mnt
ubuntu@btrfs-recovery:~$ sudo mount -o remount,rw /mnt
# Remove first empty device
ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt
# Add blank drive
ubuntu@btrfs-recovery:~$ sudo btrfs device add /dev/sde /mnt
# Remove second missing device with data
ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt

And the resulting error:

ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt
ERROR: error removing the device 'missing' - Input/output error

Here's what we see in dmesg after deleting the missing device:

[  588.231341] BTRFS info (device sdd): relocating block group
10560347308032 flags 17
[  664.306122] BTRFS warning (device sdd): csum failed ino 257 off
695730176 csum 2566472073 expected csum 2706136415
[  664.306164] BTRFS warning (device sdd): csum failed ino 257 off
695734272 csum 2566472073 expected csum 2558511802
[  664.306182] BTRFS warning (device sdd): csum failed ino 257 off
695746560 csum 2566472073 expected csum 3360772439
[  664.306191] BTRFS warning (device sdd): csum failed ino 257 off
695750656 csum 2566472073 expected csum 1205516886
[  664.344179] BTRFS warning (device sdd): csum failed ino 257 off
695730176 csum 2566472073 expected csum 2706136415
[  664.344213] BTRFS warning (device sdd): csum failed ino 257 off
695734272 csum 2566472073 expected csum 2558511802
[  664.344224] BTRFS warning (device sdd): csum failed ino 257 off
695746560 csum 2566472073 expected csum 3360772439
[  664.344233] BTRFS warning (device sdd): csum failed ino 257 off
695750656 csum 2566472073 expected csum 1205516886
[  664.344684] BTRFS warning (device sdd): csum failed ino 257 off
695730176 csum 2566472073 expected csum 2706136415
[  664.344693] BTRFS warning (device sdd): csum failed ino 257 off
695734272 csum 2566472073 expected csum 2558511802

Is there anything of value I can do here to help address this possible
issue in btrfs itself, or should I remove the overlays, replace the
device and move on?

Please let me know,

-JohnF

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-27 14:59                       ` John Marrett
@ 2016-03-27 16:08                         ` Henk Slager
  2016-03-29 11:55                           ` John Marrett
  0 siblings, 1 reply; 16+ messages in thread
From: Henk Slager @ 2016-03-27 16:08 UTC (permalink / raw)
  To: John Marrett; +Cc: Chris Murphy, linux-btrfs

On Sun, Mar 27, 2016 at 4:59 PM, John Marrett <johnf@zioncluster.ca> wrote:
>>> If you do want to use a newer one, I'd build against kernel.org, just
>>> because the developers only use that base. And use 4.4.6 or 4.5.
>>
>> At this point I could remove the overlays and recover the filesystem
>> permanently, however I'm also deeply indebted to the btrfs community
>> and want to give anything I can back. I've built (but not installed ;)
>> ) a straight kernel.org 4.5 w/my missing device check patch applied.
>> Is there any interest or value in attempting to switch to this kernel,
>> add/delete a device and see if I experience the same errors as before
>> I tried replace? What information should I gather if I do this?
>
> I've built and installed a 4.5 straight from kernel.org with my patch.
>
> I encountered the same errors in recovery when I use add/delete
> instead of using replace, here's the sequence of commands:
>
> ubuntu@btrfs-recovery:~$ sudo mount -o degraded,ro /dev/sda /mnt
> ubuntu@btrfs-recovery:~$ sudo mount -o remount,rw /mnt
> # Remove first empty device
> ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt
> # Add blank drive
> ubuntu@btrfs-recovery:~$ sudo btrfs device add /dev/sde /mnt
> # Remove second missing device with data
> ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt
>
> And the resulting error:
>
> ubuntu@btrfs-recovery:~$ sudo btrfs device delete missing /mnt
> ERROR: error removing the device 'missing' - Input/output error
>
> Here's what we see in dmesg after deleting the missing device:
>
> [  588.231341] BTRFS info (device sdd): relocating block group
> 10560347308032 flags 17
> [  664.306122] BTRFS warning (device sdd): csum failed ino 257 off
> 695730176 csum 2566472073 expected csum 2706136415
> [  664.306164] BTRFS warning (device sdd): csum failed ino 257 off
> 695734272 csum 2566472073 expected csum 2558511802
> [  664.306182] BTRFS warning (device sdd): csum failed ino 257 off
> 695746560 csum 2566472073 expected csum 3360772439
> [  664.306191] BTRFS warning (device sdd): csum failed ino 257 off
> 695750656 csum 2566472073 expected csum 1205516886
> [  664.344179] BTRFS warning (device sdd): csum failed ino 257 off
> 695730176 csum 2566472073 expected csum 2706136415
> [  664.344213] BTRFS warning (device sdd): csum failed ino 257 off
> 695734272 csum 2566472073 expected csum 2558511802
> [  664.344224] BTRFS warning (device sdd): csum failed ino 257 off
> 695746560 csum 2566472073 expected csum 3360772439
> [  664.344233] BTRFS warning (device sdd): csum failed ino 257 off
> 695750656 csum 2566472073 expected csum 1205516886
> [  664.344684] BTRFS warning (device sdd): csum failed ino 257 off
> 695730176 csum 2566472073 expected csum 2706136415
> [  664.344693] BTRFS warning (device sdd): csum failed ino 257 off
> 695734272 csum 2566472073 expected csum 2558511802
>
> Is there anything of value I can do here to help address this possible
> issue in btrfs itself, or should I remove the overlays, replace the
> device and move on?
>
> Please let me know,

I think it is great that with your local patch you managed to get into
a writable situation.
In theory, with for example already a new spare disk already attached
and standby (hot spare patchset and more etc), a direct replace of the
failing disk, so internally or manually with btrfs-replace would have
prevented the few csum and other small errors. It could be that the
errors have another cause than due to the complete failing harddisk
initially, but that won't be easy to trackdown black and white. Also
the ddrescue action and local patch make tracking back difficult and
it was also based on outdated kernel+tools.

I think it is best that you just repeat the fixing again on the real
disks and just make sure you have an uptodate/latest kernel+tools when
fixing the few damaged files.
With   btrfs inspect-internal inode-resolve 257 <path>
you can see what file(s) are damaged.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID Assembly with Missing Empty Drive
  2016-03-27 16:08                         ` Henk Slager
@ 2016-03-29 11:55                           ` John Marrett
  0 siblings, 0 replies; 16+ messages in thread
From: John Marrett @ 2016-03-29 11:55 UTC (permalink / raw)
  To: Henk Slager; +Cc: Chris Murphy, linux-btrfs

> I think it is best that you just repeat the fixing again on the real
> disks and just make sure you have an uptodate/latest kernel+tools when
> fixing the few damaged files.
> With   btrfs inspect-internal inode-resolve 257 <path>
> you can see what file(s) are damaged.

I inspected the damaged files, they are base directories on the two
file systems that definitely don't have issues seen by btrfs replace:

ubuntu@btrfs-recovery:~$ sudo btrfs inspect-internal inode-resolve 257
/mnt/@home
/mnt/@home/aidan
ubuntu@btrfs-recovery:~$ sudo btrfs inspect-internal inode-resolve 257 /mnt/@
/mnt/@/home

I've completed recovery using the stock 4.5 kernel.org kernel. I'm
running a scrub now and it's going well so far.

Once someone authorizes my wiki account request I will update the wiki
with information on using replace instead of add/delete as well as
setting up overlay devices for filesystem recovery testing.

Thanks to everyone on the list and irc for their help with my problems.

-JohnF

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-03-29 11:55 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-22 20:19 RAID Assembly with Missing Empty Drive John Marrett
2016-03-22 21:18 ` Henk Slager
2016-03-22 21:23   ` John Marrett
2016-03-22 22:08   ` John Marrett
2016-03-25 22:31     ` John Marrett
2016-03-26  0:49       ` Chris Murphy
2016-03-26  1:21         ` John Marrett
2016-03-26  1:41           ` Chris Murphy
2016-03-26 12:15             ` John Marrett
2016-03-26 20:42               ` Chris Murphy
2016-03-26 21:01                 ` John Marrett
2016-03-26 21:15                   ` Chris Murphy
2016-03-27 12:15                     ` John Marrett
2016-03-27 14:59                       ` John Marrett
2016-03-27 16:08                         ` Henk Slager
2016-03-29 11:55                           ` John Marrett

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.